100 thoughts on “Backpropagation calculus | Deep learning, chapter 4”

  1. Two things worth adding here:
    1) In other resources and in implementations, you'd typically see these formulas in some more compact vectorized form, which carries with it the extra mental burden to parse the Hadamard product and to think through why the transpose of the weight matrix is used, but the underlying substance is all the same.

    2) Backpropagation is really one instance of a more general technique called "reverse mode differentiation" to compute derivatives of functions represented in some kind of directed graph form.

  2. When propagating backwards do you just take the activation of the next layer as if they were y? Or do you have to add the notches or what?

  3. 4:09 -> 4:11
    (a^L – y)^2 = 2(a^L – y) this is 'a huge' ;D mistake :[
    (a^L – y)^2 = (a^L – y)(a^L – y) it will work

  4. I don't recall hearing this video say anything about what value we should modify any one weight and bias by. At least one single example using values would have been nice.

  5. Thank you very much for your amazing explanations. I'd been buckling down a lot for understanding back propagation. While all the explanations I found failed to make me clearly understand the topic, your explanation just worked wonders! Thanks again for so nice videos!

  6. I attempted to create my own neural network from scratch based on this video series… However this last video still jumps over a few bits which likely proved to be my downfall… since the neural network really just does not work unfortunately.

  7. What an amazing video series. Everything was so well explained that I, having just learned this math, was able to follow along. I wish I could do more to support this channel.

  8. I'd trained dozens of models in CNN. but now I've understand what I did there. Awesome content with such a beautiful soundtrack. 😊

  9. Nice, but none of it makes it any more clear how I would write the code to do this. This did not even cover the adjustment of weights.

  10. at 4:44 I have stil problem to calculate weight. I don't understand how calculate all delta for sum of "z(L)" Please help.

  11. You are some sort of sorcerer. The chain rule was finally elucidated for me in about 20 seconds.
    So concise and intuitive indeed.
    That was one thing, for whatever reason, I couldn't get my head around as a younger student.
    Big props man. Love this channel.

  12. I admire you!
    It would help a lot, if you tell us how you manage to gain such an understanding from books. By that I mean, what does it take to be able to make those videos? Does everything you know come from books? Do you have a good professor? Do you experiment with visualization tools to achieve the geometrical interpretation? Or are you only gifted to gain such kind of understanding?

  13. Please make videos on deep reinforcement learning, Q-learning, DQN. It's the only channel that explains all the maths with the greatest visuals behind BP and basic DL.

  14. That one of the worst ways to learn Backprop, the idea comes from chain Rule and the rest is linear algebra. better with the book in description.

  15. Holy mother of God!!!
    How did I intuitively understand such complex thing!!!!??
    Grant,Sir!!! You are god to me

  16. Hello. I have a question. I watched a lot of videos and I can't figure out a thing about neural networks.

    Is the BIAS common for all the neurons of a layer or does every neuron have it's own bias?

    In some schematics the bias is like a neuron with activation (1) and it has different weights when connecting to every neuron of the layer. In other schematics, the bias has one value as activation and there are no weights(so the bias is equal for all the neurons of that layer)
    Thank you very much!

  17. how about simple without derivatives? just show the values how they are substracted andsuch

  18. It has taken me about 3-4 days worth time to understand all of these 4 lectures, lectures which are in total, no longer than 1 hour and 30 minutes.

    And I feel proud.

  19. So, I get that the desired output(y) for the layer of neurons in the output layer can either be a 0 or a 1. But, what is the desired output(y) when calculating the gradients for the second to last layer of neurons? What activation do we actually desire for the layer behind the activation layer?

  20. One of the best lectures I have ever heard. Great explanation of NN, cost functions, activation functions etc. Now I understand NN far far better…(P.S. I saw previous videos Part 1, 2,3 as well)

  21. Without this I never would've been able to make my first neural network, even though all it did was learn how to respond to Rock Paper Scissors when it already knows which one you're going to put (basically overfitting is the goal)

  22. this is math at the best and art too at the highest. the grace of the animation, the subtle music, the perfectly paced narration and the wonderful colour scheme! math and art or let's say math is art!

  23. Wow, so clear, thanks 😀
    I'm not sure why the derivative of z(L) is a(L-1) though :/
    Could anyone explain me? 🙂

  24. It's great that you started it backwards but when I started to program it I realized something. What is a^(L-1) when computing the first layer? Is it simply the value from the inputs?

    z^((L))=w^((L)) a^((L-1))+b^((L))

    a^((L))=f(z^L)

  25. Thank you so so so much for making this series. Within an hour, I feel that I have learned a good deal about Neural Networks. You are amazing!

  26. Is there any learning material available on the internet for a simple neural net which goes step by step, computing actual values (cost functions, derivations) for all weights, biases and iterations? That would be very practical.

  27. you are not a good teacher. You failed to extrapolate from the simple example to the general one. You should give an example with a 2 layer network with the cost function and activation function and then show, explicitly by writing down, how the derivatives go all the way back.

  28. This video doesn't actually suggest how one chooses a value to add to the weights, and the propagation seems to move forward to the first layer only – how are alterations added to the second and third layers ?

  29. At like 8:25, why is C0 being the sum going from j = 0 to nL-1 being the number of neurons of the last hidden layer, and then j being used in the output's subscript (Aj)? is that a mistake or am I missing something

  30. So with stochastic gradient descent would you only change some of the weight values each iteration in the training phase.

  31. Wow, this took a long time to get my head around fully, but I was finally able to understand it enough to implement my own version of backpropagation from scratch thanks to this video! Neural networks are something I've wanted to get into for a while and I'm really grateful for these wonderful in-depth explanations!

  32. That unexplained little formula addition at the end 9:30 showing what the partial derivative of the cost function with respect to the current node and layer when the current layer is not the output later really messed with me. In typical notation, that's lower case Delta. Correct?

  33. I have a problem with my neural net, which I build from scratch following these videos. Most of the time my net gives pretty sure answers like [0.999987, 0.000323] (until now I've only tested with self-created data like input [1,2,3,4] should give [1,0] as an output), but sometimes with different initializations of weights and biases the training ends with feedforward now giving some strange answers like [0.004, 0.000003]. There's still clear distinction with the probability of the right answer and the wrong answer, but it is nowhere near the optimal sure answer [1,0]. What is going on here? Is it that my gradient descent finds an local minimum which gives as an output the said [0.004, 0.000003] and gets stuck there? Is this a common problem with neural nets? Should I try to find initial configurations for which the local minimum gives the least error?

  34. Hope to see soon one more episode on the same thing in matrix notation which would make it more sensible to relate to actual implementation

  35. Many guys claim to know. Some guys actually know. But only one guy actually knows and can explain to his grandma as well with very beautiful animations. You are that ONE !!!

  36. I wish I saw this video much earlier since I'm good at chain rule and also optimization problems. I attended a lecture on neural networks in 1984. I didn't really understand how one can determine weights without fitting. Looked to me like back propagation was a swindle until I saw this video. Now I can show my friends using a couple of lines on the whiteboard.

  37. I have been trying to understand, how to host a 'hello_world' python server for about a week and I still don't understand. I has watched your 4 videos a few times and make my own neural network that can understand the world. Man, I wish people who teaches were at least 10% as good in teaching as you

  38. Just about an hour ago, I was totally alien to AI guys especially when they said "machine learns". Hats off to your selflessness making even a medico able to understand how a machine actually learns, relatively easier when I compared it to natural neural networks in our nervous system. Mark my words, with raging AI in healthcare, your videos will be a connecting link for someone away from AI, to know about how AI works.

  39. Hey for all of you getting discouraged because you don’t understand this – that was me last year. I went and taught myself derivatives and came back to try again and suddenly I understand everything. It’s such an amazing feeling to see that kind of work pay off. Don’t give up kiddos

  40. Congratulations fellow learner on making this far. You are/are going to be a good Machine Learning Engineer.(I am just telling that myself.)

  41. I understand what happens on the first level, but then what? We have the derivative for the a(l-1)-s and we recalculate another cost func, but now with (dC/da(l-1))^2 instead of (a(l)-y)^2?

  42. Im currently in 11. Grade learning Calculus and this Video gave me an awesome "WTF it makes sense" Moments, had to think about it for some time tho. Awesome Video!

  43. Can someone explain to me how to continue updating the weights? To update the weights before the output layer you do ErrorFunc' * ActivationFunc' * a(L-1) = nudges to W(L)

    From the video:
    ∂C/∂a(L) * ∂a(L)/∂z(L) * ∂z(L)/∂w(L)

    How do you extend the chain? Do you multiply that chain with W(L-1), OR W(L-1) and a(L-2) as well since the previous update ended in multiplying by a(L-1).

    So:
    ErrorFunc' * ActivationFunc' * a(L-1) * W(L-1) * a(L-2) = nudges to W(L-1)
    In notation:
    ∂C/∂a(L) * ∂a(L)/∂z(L) * ∂z(L)/∂w(L) *  ∂a(L-2)/∂z(L-1) * ∂z/∂w(L-1)

  44. This is easily my favorite youtube channel! Why not continue the series on something like convolutional neural networks?

Leave a Reply

Your email address will not be published. Required fields are marked *