Neural Networks - Model Representation I

王彩旗 edwardwangcq.com

于 2020-04-25 20:35:16 发布

阅读量156

点赞数

分类专栏：人工智能 # 机器学习

本文链接：https://blog.csdn.net/edward_wang1/article/details/105735688

版权

人工智能同时被 2 个专栏收录

142 篇文章 0 订阅

订阅专栏

机器学习

109 篇文章 0 订阅

订阅专栏

摘要: 本文是吴恩达 (Andrew Ng)老师《机器学习》课程，第九章《神经网络学习》中第64课时《模型表示 I》的视频原文字幕。为本人在视频学习过程中记录下来并加以修正，使其更加简洁，方便阅读，以便日后查阅使用。现分享给大家。如有错误，欢迎大家批评指正，在此表示诚挚地感谢！同时希望对大家的学习能有所帮助。

In this video, I want to start telling you about how we represent Neural Networks, in other words how we represent our hypotheses or how we represent our model when using your Neural Networks.

Neural Networks were developed as simulating neurons or networks of neurons in the brain. So, to explain the hypotheses representation, let’s start by looking at what a single neuron in the brain looks like. Your brain and mine is jam-packed full of neurons like these and neurons are cells in the brain and the two things to draw attention to are that first the neuron has cell body like so(blue circle), and moreover, the neuron has a number of input wires and these are called the dendrites who think of them as input wires and these receive inputs from other locations and the neuron also has an output wire called the axon. And this output wire is what it uses to send signal to other neurons or to send messages to other neurons. So, at a simplistic level, what a neuron is it is a computational unit that gets a number of inputs through its input wires, does some computation. And then it sens outputs, via its axon to other nodes or to other neurons in the brain.

Here’s an illustration of a group of neurons. The way that neurons communicate with each other is with little pulses of electricities. They’re also called spikes, but they’re just means of little pulse of electricity. So, here’s one neuron and what it does is if it wants to send message, what it does is it sends the little pulse of electricity via its axon to some different neuron and here this axon. There is this output wire. Connects to the input wire or connects to the dendrite of this second neuron over here, which then accepts this incoming message, does some computation and may in turn decide to send out its own messages on its axon to other neurons. And this is the process by which all human thought happens as these neurons doing computations and passing messages to other neurons as a result of what other inputs they’ve got. And by the way, this is how our senses and our muscles work as well. If you want to move one of your muscles, the way that works is that a neuron may send these pulses of electricities to your muscle and that causes your muscles to contract and if some sensor like your eye want to send a message to your brain, what it does it sends its pulses of electricity to a neuron in your brain like so.

In a neural network, or rather in an artificial neural network, that we implement in computer, we’re going to use a very simple model of what a neuron does. We’re going to model a neuron as just a logistic unit. So, when I draw a yellow circle like that, you should think of that as playing a role analogous to maybe the body of a neuron, and we then feed the neuron a few inputs via its dendrites or its input wires and the neuron does some computation and outputs some value on its output wire or in a biological neuron that is the axon and whenever I draw a diagram like this, what this means is that this represents a computation of $h_{\theta }(x)=\frac{1}{1+e^{-\theta ^{T}x}}$ . Where, as usual, x and $\theta$ are out parameter vectors like so. So, this is a very simple maybe a vastly over simplified model of the computation that the neuron does where it gets the number of inputs, $x_{1}$ , $x_{2}$ , $x_{3}$ , and outputs some value computed like so. When I draw a neural network, usually I draw only the input nodes $x_{1}$ , $x_{2}$ , $x_{3}$ , sometimes when it’s useful to do so, I draw an extra node for $x_{0}$ . This $x_{0}$ node is sometimes called the bias unit or the bias neuron, but because $x_{0}$ is already equal to 1, sometimes I draw it, sometimes I won’t just depending on whether there’s more the notationally convenient for that example. Finally, one last bit of terminology when we talk about neural networks, sometimes we’ll say that this is an artificial neuron with a sigmoid or a logistic activation function. So this activation function in the neuron network terminology, this is just another term for that function for that non-linear $g(z)=\frac{1}{1+e^{-z}}$ . And whereas so far I’ve been calling $\theta$ the parameters of the model, are mostly continued to use that terminology to call $\theta$ the parameters. But in the neural networks literature, sometimes you might hear people talk about weights of a model and weights just means exactly the same thing as parameters of the model. But almost to use the terminology parameters in these videos, but sometimes you may hear others use the weight terminology. So, this little diagram represents a single neuron.

What a neural network is it is just a group of these different neurons strung together. Concretely, here we have input units $x_{1}$ , $x_{2}$ , $x_{3}$ , and once again, sometimes we can draw this extra node $x_{0}$ , or sometimes not. I just draw that in here. And here we have three neurons, which I have written, you know, $a^{(2)}_{1}$ , $a^{(2)}_{2}$ , $a^{(2)}_{3}$ , around top bottom indices. Later and once again, we can if we want to add this $a^{(2)}_{0}$ and add an extra bias unit there. It always outputs the value of 1. Then finally we have this third node at the final layer, and it’s this third node that outputs the value that the hypothesis $h_{\theta }(x)$ computes. To introduce a bit more terminology in neural network, the first layer, this is also called the input layer, because this is where we input our features, $x_{1}$ , $x_{2}$ , $x_{3}$ . The final layer is also called the output layer, because that layer has the neuron, this one over here, that outputs the final value computed by a hypotheses, and layer two in between, this is called the hidden layer. The term hidden layer isn’t a great terminology, but the intuition is that, you know, in supervised learning where you get to see the inputs, and you get to see the correct outputs. Whereas the hidden layer are values you don’t get to observe in the training set. So if it’s not x and it’s not y and so we call those hidden. And later on we’ll see neural networks with more than one hidden layer, but in this example we have one input layer, layer 1; one hidden layer, layer 2; and one output layer, layer 3. But basically anything that isn’t an input layer and isn’t an output layer is called a hidden layer. So, I want to be really clear about what this neural network is doing. Let’s step through the computational steps that are represented by this diagram.

To explain the specific computations represented by a neural network, here’s a little bit more notation. I’m going to use $a^{(j)}_{i}$ to denote the activation of neuron i or unit i in layer j. So concretely, this $a^{(2)}_{1}$ is the activation of the first unit in layer 2, in our hidden layer. And by activation, I just mean, the value that is computed by and that is outputted by a specific. In addition, our neural network is parameterized by these matrices $\theta ^{(j)}$ where $\theta ^{(j)}$ is going to be a matrix of weights controlling the function mapping from one layer, maybe the first layer to the second layer or from the second layer to the third layer. So, here are the computations that are represented by this diagram. This first hidden unit here, has its value computed as follows: is $a^{(2)}_{1}$ is equal to the sigmoid function, or the sigmoid activation function also called logistic activation function, applied to this sort of linear combination of its inputs. And then this second hidden unit has this activation value computed as sigmoid of this. And similarly, for this third hidden unit, it’s computed by that formula. So here, we have three input units and three hidden units. So the dimension of $\theta ^{(1)}$ which is the matrix of parameters governing our mapping from three input units to three hidden units. $\theta ^{(1)}$ is going to be a 3×4 dimensional matrix, and more generally, if a network has $s_{j}$ units in layer j and and $s_{j+1}$ unit in layer j+1, then the matrix $\theta _{j}$ which governs the function mapping from layer j to layer j+1 that will have dimension $s_{_{j+1}} * s_{j}+1$ . Just be clear about this notation. This is $s_{j+1}$ and that’s $s_{j}$ and then this whole thing, plus 1. So, we talked about what the three hidden units do to compute their values. Finally, in the output layer, we have one more unit which computes $h_{\theta }(x)$ and that can also be written as $a^{(3)}_{1}$ and that’s equal to this. And you noticed that I’ve written this with a superscript (2) here because $\theta ^{(2)}$ here is a matrix of parameters, or the matrix of weights that controls the function that maps from the hidden units, that is the layer 2 units, to the 1 layer 3 unit, that is the output unit. To summarize, what we’ve done is shown how a picture like this over here defines an artificial neural network which defines a function h that maps from X’s input values to hopefully to some space and provisions y. And these hypotheses are parameterized by parameters that I’m denoting with a capital $\Theta$ , so that as we vary $\Theta$ we get different hypotheses. So we get different functions, say, from x to y. So this gives us a mathematical definition of how to represent the hypotheses in the neural network. In the next few videos, what I’d like to do is give you more intuition about what these hypotheses representations do, as well as go through a few examples and talk about how to compute them efficiently.

<end>

王彩旗 edwardwangcq.com

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Neural Networks - Model Representation I

In this video, I want to start telling you about how we represent Neural Networks, in other words how we represent our hypotheses or how we represent our model when using your Neural Networks.Neur...
复制链接

扫一扫

专栏目录