Question about Backpropagation Algorithm with Artificial Neural Networks — Order of updating

Hey everyone, I've been trying to get an ANN I coded to work with the backpropagation algorithm. I have read several papers on them, but I'm noticing a few discrepancies.

Here seems to be the super general format of the algorithm:

  1. Give input
  2. Get output
  3. Calculate error
  4. Calculate change in weights
  5. Repeat steps 3 and 4 until we reach the input level

But here's the problem: The weights need to be updated at some point, obviously. However, because we're back propagating, we need to use the weights of previous layers (ones closer to the output layer, I mean) when calculating the error for layers closer to the input layer. But we already calculated the weight changes for the layers closer to the output layer! So, when we use these weights to calculate the error for layers closer to the input, do we use their old values, or their "updated values"?

In other words, if we were to put the the step of updating the weights in my super general algorithm, would it be:

(Updating the weights immediately)

  1. Give input
  2. Get output
  3. Calculate error
  4. Calculate change in weights
  5. Update these weights
  6. Repeat steps 3,4,5 until we reach the input level

OR

(Using the "old" values of the weights)

  1. Give input
  2. Get output
  3. Calculate error
  4. Calculate change in weights
  5. Store these changes in a matrix, but don't change these weights yet
  6. Repeat steps 3,4,5 until we reach the input level
  7. Update the weights all at once using our stored values

In this paper I read, in both abstract examples (the ones based on figures 3.3 and 3.4), they say to use the old values, not to immediately update the values. However, in their "worked example 3.1", they use the new values (even though what they say they're using are the old values) for calculating the error of the hidden layer.

Also, in my book "Introduction to Machine Learning by Ethem Alpaydin", though there is a lot of abstract stuff I don't yet understand, he says "Note that the change in the first-layer weight delta-w_hj, makes use of the second layer weight v_h. Therefore, we should calculate the changes in both layers and update the first-layer weights, making use of the old value of the second-layer weights, then update the second-layer weights."

To be honest, it really seems like they just made a mistake and all the weights are updated simultaneously at the end, but I want to be sure. My ANN is giving me strange results, and I want to be positive that this isn't the cause.

Anyone know?

Thanks!

12
задан Donal Fellows 16 May 2011 в 06:41
поделиться