吴恩达机器学习课程第一次编程作业,蛮简单的,写一篇文章来记录一下
用一个最简单的神经网络——Logistic Regression来识别一个图片是“猫”还是“非猫”
虽然TF等等框架用起来很方便,但是自己动手搭网络还是可以加深理解
废话不多说,上代码
1. Pre-processing
import
这一段就是常规的import各种包,接着导入数据
# Loading the data (cat/non-cat)
这里导入数据之后得到的是:
train_set_x_orig: 训练集, shape = (209, 64, 64, 3)
train_set_y: 训练集的label, shape = (1, 209) 用1和0代表是猫或者不是猫
test_set_x_orig: 测试集, shape = (50, 64, 64, 3)
test_set_y: 测试集的label, shape = (1, 50)
# Reshape the training and test examples
每张图片都是64
Number of training examples: m_train = 209
Number of testing examples: m_test = 50
Height/Width of each image: num_px = 64
Each image is of size: (64, 64, 3)
train_set_x shape: (209, 64, 64, 3)
train_set_y shape: (1, 209)
test_set_x shape: (50, 64, 64, 3)
test_set_y shape: (1, 50)
接下来要把每一张图片flatten,也就是把
这里用到的方法是
X_flatten = X.reshape(X.shape[0], -1).T # X.T is the transpose of X
# Reshape the training and test examples
输出:
train_set_x_flatten shape: (12288, 209)
train_set_y shape: (1, 209)
test_set_x_flatten shape: (12288, 50)
test_set_y shape: (1, 50)
sanity check after reshaping: [17 31 56 22 33]
接下来是标准化,除以255之后RGB的值就会都在0到1之间
train_set_x
2.网络结构和学习算法
其实也没什么结构可言,就一个输入层,一个输出层。算法是逻辑回归。
一共有m张图片,对于第i张图片
Loss function用的是cross entropy
最后的loss是把所有训练样本都算一遍之后的平均
主要的步骤:
- Initialize the parameters of the model
- Learn the parameters for the model by minimizing the cost
- Use the learned parameters to make predictions (on the test set)
- Analyse the results and conclude
3.算法的组成部分
3.1 Sigmoid Function
对应的代码:
# GRADED FUNCTION: sigmoid
3.2 初始化参数
这里说的参数也就是w和b,weight和bias。weight初始化成一串零
# GRADED FUNCTION: initialize_with_zeros
3.3 Forward and Backward propagation
计算梯度和loss
# GRADED FUNCTION: propagate
基本是把上面的公式翻译成了代码
关于assert:Python3中assert用来判断一个表达式,在表达式为false的时候出触发异常
3.4 Optimization
知道梯度了之后就可以用梯度下降来optimize参数,具体来说就是w和b
对于参数
# GRADED FUNCTION: optimize
代码里的注释很详细了,没啥可说的
接着需要写一个函数来预测测试集中的图片是不是猫
现在w和b是已经训练好的,所以我们只需要把测试集里的图片拿出来,丢到这个模型里算一遍,结果大于0.5就认为是猫,,小于0.5就认为不是猫。
3.5 整合成一个模型
把上面写的这些函数整合到一起,训练
# GRADED FUNCTION: model
最后的最后,跑一下我们的成果吧。
d
输出如下
Cost after iteration 0: 0.693147
Cost after iteration 100: 0.584508
Cost after iteration 200: 0.466949
Cost after iteration 300: 0.376007
Cost after iteration 400: 0.331463
Cost after iteration 500: 0.303273
Cost after iteration 600: 0.279880
Cost after iteration 700: 0.260042
Cost after iteration 800: 0.242941
Cost after iteration 900: 0.228004
Cost after iteration 1000: 0.214820
Cost after iteration 1100: 0.203078
Cost after iteration 1200: 0.192544
Cost after iteration 1300: 0.183033
Cost after iteration 1400: 0.174399
Cost after iteration 1500: 0.166521
Cost after iteration 1600: 0.159305
Cost after iteration 1700: 0.152667
Cost after iteration 1800: 0.146542
Cost after iteration 1900: 0.140872
train accuracy: 99.04306220095694 %
test accuracy: 70.0 %
就一个这么简单的模型,能达到70%挺不错了
然后我们看看cost的变化情况
# Plot learning curve (with costs)
costs = np.squeeze(d['costs'])
plt.plot(costs)
plt.ylabel('cost')
plt.xlabel('iterations (per hundreds)')
plt.title("Learning rate =" + str(d["learning_rate"]))
plt.show()
3.6 思考:learning rate对训练的影响
learning_rates
输出:
learning rate is: 0.01
train accuracy: 99.52153110047847 %
test accuracy: 68.0 %
-------------------------------------------------------
learning rate is: 0.001
train accuracy: 88.99521531100478 %
test accuracy: 64.0 %
-------------------------------------------------------
learning rate is: 0.0001
train accuracy: 68.42105263157895 %
test accuracy: 36.0 %
-------------------------------------------------------
- Different learning rates give different costs and thus different predictions results.
- If the learning rate is too large (0.01), the cost may oscillate up and down. It may even diverge (though in this example, using 0.01 still eventually ends up at a good value for the cost).
- A lower cost doesn't mean a better model. You have to check if there is possibly overfitting. It happens when the training accuracy is a lot higher than the test accuracy.
说真的,现在还在写这么初级的东西有点惭愧,但也不失为一种让自己保持工作状态的办法。大概会一直写一段时间。
P.S. 封面是罗浮山的缆车