Skip to content
# Backpropagation calculus | Deep learning, chapter 4

##
100 thoughts on “Backpropagation calculus | Deep learning, chapter 4”

## Leave a Reply

Artificial Intelligence | Research & Trends

Two things worth adding here:

1) In other resources and in implementations, you'd typically see these formulas in some more compact vectorized form, which carries with it the extra mental burden to parse the Hadamard product and to think through why the transpose of the weight matrix is used, but the underlying substance is all the same.

2) Backpropagation is really one instance of a more general technique called "reverse mode differentiation" to compute derivatives of functions represented in some kind of directed graph form.

When propagating backwards do you just take the activation of the next layer as if they were y? Or do you have to add the notches or what?

4:09 -> 4:11

(a^L – y)^2 = 2(a^L – y) this is 'a huge' ;D mistake :[

(a^L – y)^2 = (a^L – y)(a^L – y) it will work

That Backup diagram does help in calculating backprop

I don't recall hearing this video say anything about what value we should modify any one weight and bias by. At least one single example using values would have been nice.

Thank you very much for your amazing explanations. I'd been buckling down a lot for understanding back propagation. While all the explanations I found failed to make me clearly understand the topic, your explanation just worked wonders! Thanks again for so nice videos!

I attempted to create my own neural network from scratch based on this video series… However this last video still jumps over a few bits which likely proved to be my downfall… since the neural network really just does not work unfortunately.

What an amazing video series. Everything was so well explained that I, having just learned this math, was able to follow along. I wish I could do more to support this channel.

I'd trained dozens of models in CNN. but now I've understand what I did there. Awesome content with such a beautiful soundtrack. 😊

This all aligns so well with Andrew Ng's ML course!

Nice, but none of it makes it any more clear how I would write the code to do this. This did not even cover the adjustment of weights.

What's up with the number of layers?

Please, may I have some more?

Deep Learning? Convolutional Neural Networks?

How do you determine the desired value of a neuron in the hidden layer?

Brillant, thanks for this work!

You are the best to learn from with no weights and bias attached!!!

at 4:44 I have stil problem to calculate weight. I don't understand how calculate all delta for sum of "z(L)" Please help.

One hundred percent subscribed. Amazing stuff, thanks for putting in the effort.

You are some sort of sorcerer. The chain rule was finally elucidated for me in about 20 seconds.

So concise and intuitive indeed.

That was one thing, for whatever reason, I couldn't get my head around as a younger student.

Big props man. Love this channel.

I admire you!

It would help a lot, if you tell us how you manage to gain such an understanding from books. By that I mean, what does it take to be able to make those videos? Does everything you know come from books? Do you have a good professor? Do you experiment with visualization tools to achieve the geometrical interpretation? Or are you only gifted to gain such kind of understanding?

Please make videos on deep reinforcement learning, Q-learning, DQN. It's the only channel that explains all the maths with the greatest visuals behind BP and basic DL.

That one of the worst ways to learn Backprop, the idea comes from chain Rule and the rest is linear algebra. better with the book in description.

nicely put I now have a very good feeling how nets are trained

I there a paper I can use?

I should put your name in the acknowledgement part of my thesis

Great tutorial thanks for this series

This is guys is amazing! Can't stop watching! Thank you again!

So good! Love your video!

That gradient, nudge and the little number line is just genius.

Holy mother of God!!!

How did I intuitively understand such complex thing!!!!??

Grant,Sir!!! You are god to me

Hello. I have a question. I watched a lot of videos and I can't figure out a thing about neural networks.

Is the BIAS common for all the neurons of a layer or does every neuron have it's own bias?

In some schematics the bias is like a neuron with activation (1) and it has different weights when connecting to every neuron of the layer. In other schematics, the bias has one value as activation and there are no weights(so the bias is equal for all the neurons of that layer)

Thank you very much!

how about simple without derivatives? just show the values how they are substracted andsuch

It has taken me about 3-4 days worth time to understand all of these 4 lectures, lectures which are in total, no longer than 1 hour and 30 minutes.

And I feel proud.

So, I get that the desired output(y) for the layer of neurons in the output layer can either be a 0 or a 1. But, what is the desired output(y) when calculating the gradients for the second to last layer of neurons? What activation do we actually desire for the layer behind the activation layer?

Simply fantastic!!!!

The amount of help you are providing is nothing short of amazing.

This 4-video series was very helpful and your explanations are awesome!

Thank you!

One of the best lectures I have ever heard. Great explanation of NN, cost functions, activation functions etc. Now I understand NN far far better…(P.S. I saw previous videos Part 1, 2,3 as well)

is the cost function here the loss function or the averaged loss function?

Without this I never would've been able to make my first neural network, even though all it did was learn how to respond to Rock Paper Scissors when it already knows which one you're going to put (basically overfitting is the goal)

this is math at the best and art too at the highest. the grace of the animation, the subtle music, the perfectly paced narration and the wonderful colour scheme! math and art or let's say math is art!

Thanks for the wonderful lectures. Expecting more lectures in this field..

Great video series… Thanks a lot for explaining something very complex so nicely…

Wow, so clear, thanks 😀

I'm not sure why the derivative of z(L) is a(L-1) though :/

Could anyone explain me? 🙂

It's great that you started it backwards but when I started to program it I realized something. What is a^(L-1) when computing the first layer? Is it simply the value from the inputs?

z^((L))=w^((L)) a^((L-1))+b^((L))

a^((L))=f(z^L)

Thank you so so so much for making this series. Within an hour, I feel that I have learned a good deal about Neural Networks. You are amazing!

This video is art

Thank you. Love it.

1:55 BL?

Is there any learning material available on the internet for a simple neural net which goes step by step, computing actual values (cost functions, derivations) for all weights, biases and iterations? That would be very practical.

you are not a good teacher. You failed to extrapolate from the simple example to the general one. You should give an example with a 2 layer network with the cost function and activation function and then show, explicitly by writing down, how the derivatives go all the way back.

now i know the basic i will let it sink then watch how to code it using python

4:10 haha i laughed for no reason at the thoo

Hey , can you make a video for rnn? It would be of great help.

非常優秀的講解

This is SO MUCH computation. But amazing explanation, can't wait to implement.

This video doesn't actually suggest how one chooses a value to add to the weights, and the propagation seems to move forward to the first layer only – how are alterations added to the second and third layers ?

You easily won one more subscriber.

Maybe one day i will be smart enough to understand it all

Amazing stuff. Just… GREAT. I Cannot thank you enough!

At like 8:25, why is C0 being the sum going from j = 0 to nL-1 being the number of neurons of the last hidden layer, and then j being used in the output's subscript (Aj)? is that a mistake or am I missing something

So with stochastic gradient descent would you only change some of the weight values each iteration in the training phase.

Wow, this took a long time to get my head around fully, but I was finally able to understand it enough to implement my own version of backpropagation from scratch thanks to this video! Neural networks are something I've wanted to get into for a while and I'm really grateful for these wonderful in-depth explanations!

This is epic !

@3Blue1Brown, @9:25, I think wl+1 should be just wl. Please confirm

Are there anymore coming in the series? I found this very helpful.

That unexplained little formula addition at the end 9:30 showing what the partial derivative of the cost function with respect to the current node and layer when the current layer is not the output later really messed with me. In typical notation, that's lower case Delta. Correct?

Isn't this just Ordinary Least Squares with data reduction?

You are genius.

I have a problem with my neural net, which I build from scratch following these videos. Most of the time my net gives pretty sure answers like [0.999987, 0.000323] (until now I've only tested with self-created data like input [1,2,3,4] should give [1,0] as an output), but sometimes with different initializations of weights and biases the training ends with feedforward now giving some strange answers like [0.004, 0.000003]. There's still clear distinction with the probability of the right answer and the wrong answer, but it is nowhere near the optimal sure answer [1,0]. What is going on here? Is it that my gradient descent finds an local minimum which gives as an output the said [0.004, 0.000003] and gets stuck there? Is this a common problem with neural nets? Should I try to find initial configurations for which the local minimum gives the least error?

This is the best explaination to Chain Rule I've ever heard!!

Hope to see soon one more episode on the same thing in matrix notation which would make it more sensible to relate to actual implementation

Many guys claim to know. Some guys actually know. But only one guy actually knows and can explain to his grandma as well with very beautiful animations. You are that ONE !!!

I wish I saw this video much earlier since I'm good at chain rule and also optimization problems. I attended a lecture on neural networks in 1984. I didn't really understand how one can determine weights without fitting. Looked to me like back propagation was a swindle until I saw this video. Now I can show my friends using a couple of lines on the whiteboard.

This is so beautiful

I have been trying to understand, how to host a 'hello_world' python server for about a week and I still don't understand. I has watched your 4 videos a few times and make my own neural network that can understand the world. Man, I wish people who teaches were at least 10% as good in teaching as you

Thank you for such a great video!

i watched, i learned , i became a patreon

*meme of Fry saying "have my money "***Just about an hour ago, I was totally alien to AI guys especially when they said "machine learns". Hats off to your selflessness making even a medico able to understand how a machine actually learns, relatively easier when I compared it to natural neural networks in our nervous system. Mark my words, with raging AI in healthcare, your videos will be a connecting link for someone away from AI, to know about how AI works.

Hey for all of you getting discouraged because you don’t understand this – that was me last year. I went and taught myself derivatives and came back to try again and suddenly I understand everything. It’s such an amazing feeling to see that kind of work pay off. Don’t give up kiddos

going through your video is like meditation… a blissful experience.. thank you so much!

brilliantly clear! love it! It really helps!

Help a lot when I doing my ML homework. thanks <3

Need more videos on AI 😃

best series!! Thanks for this material!

No words can express my admiration with your work

Congratulations fellow learner on making this far. You are/are going to be a good Machine Learning Engineer.(I am just telling that myself.)

I understand what happens on the first level, but then what? We have the derivative for the a(l-1)-s and we recalculate another cost func, but now with (dC/da(l-1))^2 instead of (a(l)-y)^2?

I would be super interested in a video of you explaining how you make your videos 🙂

Commenting to help you with the youtube algorithm because these videos are great

I love this channel. Thank you

this was really good

Awesome !, This channel is unbelivably good. Thanks a lot man

Woww, now I understand

Im currently in 11. Grade learning Calculus and this Video gave me an awesome "WTF it makes sense" Moments, had to think about it for some time tho. Awesome Video!

Can someone explain to me how to continue updating the weights? To update the weights before the output layer you do ErrorFunc' * ActivationFunc' * a(L-1) = nudges to W(L)

From the video:

∂C/∂a(L) * ∂a(L)/∂z(L) * ∂z(L)/∂w(L)

How do you extend the chain? Do you multiply that chain with W(L-1), OR W(L-1) and a(L-2) as well since the previous update ended in multiplying by a(L-1).

So:

ErrorFunc' * ActivationFunc' * a(L-1) * W(L-1) * a(L-2) = nudges to W(L-1)

In notation:

∂C/∂a(L) * ∂a(L)/∂z(L) * ∂z(L)/∂w(L) * ∂a(L-2)/∂z(L-1) * ∂z/∂w(L-1)

This is easily my favorite youtube channel! Why not continue the series on something like convolutional neural networks?

Amazing Explanation.

Number of times I have watched this video before understanding

↓

I just learned chain rule in high school last week and I'm glad it has a real life application.