This video is brought to you in thanks to Squarespace. Whether you need a domain, website or online store – make it with Squarespace! In the last video in this series, we discussed the biologically inspired structure of deep learning neural networks and built-up an abstracted model based on that. We then went through the basics of how this model is able to form representations from input data. The focus of this video then will continue right where the last one left off, as we delve deeper into the structure and mathematics of neural nets to see how they form their pattern recognition capabilities! In my opinion, the best way to understand any complex topic is to work through an intuitive example. In the case of understanding how deep learning systems are able to build up layers of representation for their pattern recognition capabilities, the example we will focus on is an image, as it is the most visually representable, with the goal to detect higher-level structures in the image. Even based off this simple problem statement, a lot of ambiguity resides in it, so to begin with let’s start by defining the input layer. The input to the system is obviously the image we want to analyze, but for a computer we’ll have to be more precise than that. Thus, the input to our system is more precisely the pixels that comprise that image. In our case we’ll use a 3 by 3 image, so, 9 pixels as the input to the system each represented by a node. Now that we have our input layer, let’s assign values to what this input information means. For this example, we won’t care about the RGB color value of a pixel, just simply the Y component, the luminance value of a pixel. For our system we will then define, a bright pixel, in other words, a pixel with white luminance as having a +1 value, a dark or transparent pixel, as black, -1, and an ambiguous luminance pixel, as grey, 0. So, now that we have our input layer well-defined, we can begin to define the output we want to see. As stated earlier, we want to detect higher-level structures in the image, but just as with the input layer, this definition comes loaded with ambiguity. For now, we’ll just set arbitrary nodes for 4 different types of structures we want to be looking for in the image: a square, a V, an X and a cross. With these output nodes now set, let’s go back to the start of our network and begin filling it in towards the output – to see representation being built layer by layer. So, coming back to the input layer when 9 nodes representing each pixel in our image, let’s start by considering the receptive field of each node. A receptive field is simply the input value that maximizes the current nodes activation. The checkered areas show the part of the input the node doesn’t care about, in other words, a change in those values won’t affect that particular node. In the case of our example, the receptive field of each node is at it’s maximum when the pixel has a white value, +1. Now to build a neural network, let’s begin to add additional layers, what is referred to as hidden layers, as they are not exposed to the user for interaction. The number of hidden layers is where the ‘deep’ in deep learning comes from, with any network over 1 hidden layer being referred to as a deep network, and otherwise, as a shallow network. For now, for our system, we’ll add 1 hidden layer with 1 node in that layer, officially making our system a shallow network! As we mentioned in the last video in this series, a node in a neural network is simply the sum of all the inputs into it, with these inputs being all the previous layers output values. As you can see, as we randomize our image and pass along the pixel values of each node, we can see the input value to our hidden layer node changing. Now in a typical network we don’t just pass the raw value from the input node, instead we take this value and multiply by another value, referred to as a connection weight. So then, a node in a non-input layer is actually the sum of all the weighted inputs into it. This weight value as briefly mentioned in the previous video, is how strongly a particular node connects to output nodes, essentially making an output node the sum of the inputs that strongly activate it. While low weighted nodes will still have some minute affect, the larger weights are really what impact the output node. The reason for needing a weighted value, and not just passing the raw value of a node, is so the actual value of the node isn’t changed during error correction methods such as gradient descent and back- propagation, just the importance of that node. This will make more intuitive sense in the next video in this series where we will cover these methods. Coming back to our example, for visual representation sake, we will have black lines as negative weights, white as positive and the thickness of the line to represent the connection strength. And now, with weights incorporated in our example, we can see how the input to our hidden layer node is getting affected by random input images. Now with the input done, what about the output to our hidden layer? Rather than just passing the input value off into the next layer, in hidden layers often activation functions are applied to the node to remap the input value. Activation functions are necessary in neural networks to control the flow of values between hidden layers. Unlike the input to a neural network where we can set the values of the input, like we did by mapping pixel luminance to a value between -1 to +1. With hidden layers, in theory, the weighted sum of a node can range from negative to positive infinity, especially for larger systems with hundreds to thousands of nodes in a layer. If nodes in hidden layers were just to pass along this raw value, then no meaningful output would ever be produced. Since the node values aren’t bounded, nodes in the same layer could have vastly different values when no coherence at all. This is where activation functions come in, to add boundaries to our raw node value and get them to activate accordingly. In other words, to convert the input signal into a node to an output signal which can be used and understood by the following layers in the network in a predictable way. Activation functions also add non-linearity to neural networks, for time sake I will not delve into this concept in this video, however in the next video in this series we will cover it much more in-depth. Now, the activation function we will use for this layer of our example will be the hyperbolic tangent function. This function maps all the values on the x-axis to y-value between -1 and +1, with a smooth transition in between and approaching asymptotes on either side when x heads to infinity. All this really means is that for really negative values we assign them to -1, really positive to +1 and values in between -1 to +1 essentially keep their value. As a side note, this activation function is essentially a rescaled version of one of the most well known activation functions, the sigmoid, or as 3Blue1Brown would call it, the squishification function, which maps all values on the real axis of x to a y-value between 0 and 1. Additionally, as a side-side note, these functions are often preferred over a step function as they transition more smoothly between different input values. We will discuss more in depth the effects of different activation functions in the next videos in this series. Coming back on topic, you can now see how the activation function maps our input value for various input images. Now for visual representation sake, we will compress all this back into a single node with the activation function drawn on it, and with this we have successfully added and implemented our first node to our first hidden layer. Let’s now add a few more nodes, while keeping in mind for the sake of simplicity and visual representation we are only showing a subset of all the nodes in this hidden layer. However, please keep in mind in actuality there could be tens to hundreds of nodes in this layer, this choice of the number of nodes is known as a hyperparameter and will be covered later on in the deep learning series. Additionally, while we are building up these hidden layers, once again for the sake of simplicity and visual representation we will only show the connections of nodes that strongly activate each other. So, nodes that have low weights will not be drawn for the sake of keeping this diagram clutter free. Ok, so now with the housekeeping out of the way, as you can see the
00:06:58,840 –>00:07:01,720
receptive fields of the nodes in our hidden layer have gotten much more complicated, and are now combining 3 input nodes to make vertical lines. In the case of our example, input nodes 1, 4 & 7 create a vertical line looking at the first column of the image; 2, 5 & 8 for the second column and 3, 6 & 9 for the third and final column. In addition to these triplets of pixels, you can also see that they’re comprised of different color combinations depending on the weights. For example, the first node in our hidden layer which produces a black vertical line in column 1 has all negative weights attached to it, or for instance take this fifth node in our hidden layer, which has a negative weight for one pixel creating a white-black- white vertical line. Now, because we took our time with the first layer, making sure our input and output values where mapped with an activation function, we can simply add another layer repeating the process we used for the first hidden layer. This process can be repeated ten times, a hundred or even a thousand times for additional layers in more complex problems. Notice now, how the receptive fields have become even more complex. We went from individual pixels, to vertical lines of pixels, and now we are combining these vertical lines, thereby covering all the pixels in the image and we are finally beginning to see shapes! As a side point, with more than 1 hidden layer, we can now call this system a deep neural network! Coming back on topic, once again you can see that the weights are altered to produce different color combinations in the receptive fields of our nodes. For example, node 3, has negative weights in the left and right most vertical lines to produce a black-white-black pattern and a positive weight in the center vertical line to produce the same white-black-white pattern. When combined, a black X is formed. Now one might think we can call it a day here, since we are now producing the shapes we intended to see, however in our example we will add one final hidden layer before the output, a rectifying layer. The rectified linear unit, ReLU for short, is a type of node that incorporates a rectified linear activation function. Simply put, for negative values a 0 is output and for positive, they keep their value, due to the y=x linear relationship of the function. In the case of our example, we have properly tuned our input values and previous activation functions, we will only ever get values between -1 +1. So, this ReLU for our system will simply make any value from -1 to 0, a 0, and keep any positive value to +1 the same. You will see why this layer was added shortly when we run through some test images. Coming back to our example, let’s bring back the output layer we set at the start of this video and as you can see, we now have a completed neural network! In this examples case we have 3 hidden layers that are used to classify patterns in an input image. So, now that we have this network, let’s put it to the test and see how it will behave for an input image, in this case an inverted cross. When this image is propagated through the network the first layer it encounters is the input layer, where there receptive fields are the individual pixels of the image. As you can see, these input nodes take the raw brightness value from the pixels of the image. Now moving on to our first hidden layer is where things start to get interesting. Based on the weights we assigned to our network when we were first setting it up, our hidden layer nodes will get a corresponding value. For node 1 you can see that we have 3 non-zero values being input into it: a negative value with a negative weight, a positive value with a negative weight and another negative value with a negative weight. The summation of the inputs and passing them through the activation function would end up yielding an output around 0.66, and this makes sense since 2 out of the 3 pixels in our receptive field line up with our input. Now, using this methodology, we can also write out the values for the other nodes in this layer. For nodes 2, 4 and 8 we can see that the input value would be -1. Now that we have one hidden layer understood, the general pattern is followed for the next layer as well. For instance, node 4s value ends up being -1. This intuitively makes sense, as our input image is the exact opposite of what this nodes receptive field is. In more technical terms, we have 3 sets of negative values and 3 positive weights yielding -1 after the activation function is applied. Finally, going to our last hidden layer, for the seventh node we feed a negative value with a positive weight as the input to the rectified linear unit. Based on this rectified linear activation function this negative value gets set to 0, while for the eighth node, a negative value with a negative weight produces a +1 value. Now as you can see, our network has successfully identified the cross-output node, and it was also close to activating the X-output node. This makes intuitive sense, as in columns 1 & 3 of the inverse cross and X images, they both follow the same black-white- black pattern and differ by only 1 node, the center pixel in column 2, which is a black for the X and white for the inverse cross. And with this, we have successfully built a neural net that classifies patterns in an input image! Running through some more test images, you can see our network holds up and classifies other outputs as well! Now neural networks typically aren’t this perfect. We essentially hand-built this network and only cherry-picked the nodess that would produce the results we are looking for and had non-ambiguous values in their receptive fields, which therefore would yield non-ambiguous weighted values. As stated numerous times throughout this video, this was mainly for the sake of explanation, as the whole purpose of this video is to go layer-by-layer, showing the internals of a neural network and how the values and structure inside affect the representation it builds. In actuality, and as is the goal of deep learning which we discussed in part 1 of this series, this network structure is supposed to build representation on its own, with much less human engineering. How it will do so is through gradient descent and backpropagation, the methods that put the ‘learning’ in deep learning. This will be covered in the next video in this series! Now as you’ve seen in this video, it takes a lot of work to build deep learning systems – this doesn’t have to translate to building websites! Squarespace is the sponsor of this video and an online platform that makes building a website easy – with no need to worry about installing, patching or upgrading anything ever! In fact, our website,, was built through Squarespace! In today’s increasingly connected world, a digital presence with a website is a must-have, and whether you want a simple portfolio site, business site, online store or more – Squarespace has flexible templates that are optimized for both desktop and mobile use for them. Along with these templates the sites come with a vast array of features from mailing lists, podcast support, integrated analytics and much more! My personal favorite feature is the ability to customize the CSS on my site, making it much more personalized and unique! To support Singularity Prosperity and learn more about Squarespace, after your free trial go to and save 10% off your first purchase of a website or domain! At this point the video has come to a conclusion, I’d like to thank you for taking the time to watch it! if you enjoyed it, consider supporting me on Patreon our YouTube membership to keep this channel growing! Check out our website for more information, consider subscribing for more content and like our Facebook page for bite sized chunks of content! This has been Ankur, you’ve been watching Singularity Prosperity and I’ll see you again soon! [Music]

7 thoughts on “Building An AI (Neural Networks | What Is Deep Learning | Deep Learning Basics)”

  1. Become a YouTube member for many exclusive perks from early previews, bonus content, shoutouts and more! – AND – Join our Discord server for much better community discussions! – ALSO – This video was made possible by Squarespace. Sign up with this link and get 10% off your purchase of a website or domain after your free trial!

  2. Another incredible video as always, frankly this is becoming the best resource for conceptually understanding how neural networks work. 3 Blue 1 Brown's videos on it are also great, but I think this series breaks it down even better.

  3. Amazing, but a lot of work to setup in order to get the result. I can’t wait for next video to show how it can be done autonomously. Scary when think about it!

  4. I got confused on the function part, I am trying to understand how neural networks work, so I could maybe make my own basic neural network on guessing the number drawn.

Leave a Reply

Your email address will not be published. Required fields are marked *