【深度学习入门】NNDL学习笔记(一)

前言

http://neuralnetworksanddeeplearning.com

本文是此电子书学习笔记。现初步完结。抽空会补上Softmax和部分练习题。

chapter 1 using nn to recognize handwriten digits

neural network uses the examples to automatically infer rules for recognizing handwritten digits. 

two important types of artificial neuron : the perceptron and the sigmoid neuron 感知器,

the standard learning algorithm for neural networks: stochastic gradient descent 随机梯度下降

Perceptrons

1.A method for weighing evidence to make decisions\ to compute the elementary logical functions.

A perceptron takes several binary inputs, x1,x2,…x1,x2,…, and produces a single binary output:

The neuron's output, 0 or 1, is determined by whether the weighted sum w\cdot x\equiv \sum_j w_jx_j is less than or greater than some threshold value. The threshold is a real number which is a parameter of the neuron.

 

output=\left\{\begin{matrix} 0& if\ w\cdot x+b\leq0\\ 1& if\ w\cdot x+b>0 \end{matrix}\right.

Perceptrons are also universal for computation.

Sigmoid Neurons

1. Crucial fact to learn: A small change in a weight (or bias) causes only a small change in output.

activation function f(w\cdot x+b)

output=\frac{1}{1+exp(-\sum_jw_j x_j-b)}=\frac{1}{1+exp(-w\cdot x-b)}

\Delta output is a linear function of the changes \Delta w_j\Delta b.

Exercises

1. Suppose we take all the weights and biases in a network of perceptrons, and multiply them by a positive constant, the behavior of the network doesn't change.

2. Because -c(w\cdot x+b)=0 \Rightarrow f(.)=\frac{1}{2}, but it should be 0 as the ouput of a perceptron.

The Architecture of a NN

1. MLPs = multilayer perceptrons

2. feedforward NN vs. recurrent NN (a neuron's output only affects its input at some later time)

A Simple Network to Classify handwritten digits

1. Learning with gradient descent

What we'd like is an algorithm which lets us find weights and biases so that the output from the network approximates y(x)for all training inputs x. To quantify how well we're achieving this goal we define a cost function*: Sometimes referred to as a loss or objective function. 

quadratic cost function \ mean squared error \ MSE: C(w,b)\equiv \frac{1}{2n} \sum_x ||y(x)-a||^2

 

Suppose in particular that C is a function of m variables, v1,…,vm:\Delta C= \nabla C \cdot \Delta v

\Delta v = -\eta\nabla Cv^{'}= v - \eta\nabla C

One problem: we need to compute the gradients ∇Cx separately for each training input x. 

Solution ---- stochastic gradient descent:

mini-batch of size m, a commonly used and powerful technique.

 \frac{\sum_{j=1}^{m} \nabla C_{X_j}}{m} \approx \frac{\sum_{x} \nabla C_x}{n} = \nabla C

2.Ball-mimicking variations

Have advantages but a major disadvantage: it turns out to be necessary to compute second partial derivatives of C, and this can be quite costly.

Exercises

An extreme version of gradient descent is to use a mini-batch size of just 1. This procedure is known as onlineon-line, or incremental learning. In online learning, a neural network learns from just one training input at a time (just as human beings do). 

One advantage: Faster.

One disadvantage: The batch can be not sufficient enough to represent all the input. And it's highly dependent on the sequence of batch.

Implementing the network to classify digits

with Python 2.7 and Numpy

1. Network class

\omega is the weight from the 2nd layer to the 3rd layer. Then, activation of the second layer will be: a'=\sigma (\omega a+b)

vectorizing: Apply the function elementwise to every entry in a vector.

2. hyper-parameters

 #epochs of training, the mini-batch size, and the learning rate η.

3. SVM support vector machine

python library:  scikit-learn,which provides a simple Python interface to a fast C-based library for SVMs known as LIBSVM.

sophisticated algorithm ≤ simple learning algorithm + good training data.

Toward Deep Learning

Networks with this kind of many-layer structure - two or more hidden layers - are called deep neural networks

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
### 回答1: nndl-book是指《自然语言处理综述》一书,它是由计算机科学领域的权威学者Christopher Manning和Hinrich Schütze共同编写的一本综述自然语言处理技术的教材。这本书首次出版于1999年,现已有第二版和第三版。nndl-book的内容广泛而深入,涵盖了自然语言处理领域的基础知识和最新进展,包括文本处理、语法分析、语义理解、信息检索、机器翻译等等方面。此外,书中还涉及了许多实用的技术和算法,比如条件随机场、最大熵模型、词向量和深度学习等。nndl-book的读者群体包括学术界和工业界的研究者、开发者和学生,也适合对自然语言处理领域感兴趣的读者学习。总之,nndl-book是自然语言处理领域的一本重要的参考书籍,它为我们深入了解自然语言处理的技术和应用提供了宝贵的指导。 ### 回答2: NNDL-Book是一个著名的Python深度学习库,它是一个开源项目,由加拿大多伦多大学教授Geoffrey Hinton和他的学生Alex Krizhevsky等人创建。NNDL-Book在计算机视觉、自然语言处理和语音识别等领域得到广泛应用,它提供了丰富的神经网络模型和算法,包括卷积神经网络(CNN)、循环神经网络(RNN)和长短时记忆网络(LSTM)等。此外,NNDL-Book还提供了多种数据处理工具和训练技巧,以帮助开发者更高效地构建和训练深度学习模型。总的来说,NNDL-Book是深度学习领域的重要工具之一,对于帮助人们在各种应用场景中实现AI自动化,提高效率和精度都有很大的帮助。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值