My机器学习之路---DayOne

楚江空晚

已于 2022-05-22 15:15:05 修改

阅读量174

点赞数

分类专栏：深度学习日记文章标签：机器学习 tensorflow 人工智能

于 2022-05-21 22:53:42 首次发布

本文链接：https://blog.csdn.net/Zxs021310/article/details/124903433

版权

深度学习日记专栏收录该内容

1 篇文章 0 订阅

订阅专栏

缘起

我从大概2022-3月份从杭州图书馆借到了《Tensorflow实战》，开始学习，记得刚开始了解这个的时候是在博客上看到Mircrosoft在数据分类上取得了突破性进展，那个时候百度就有深度学习研究院了IDL。过了几年，AI alphago战胜了李世石，DL展现出了赫赫威名。我在2014年即购买了一本机械工业出版社的《机器学习》，可惜那一本书太过官了，一直没有读下去。亡羊补牢，尤未晚矣！！！

2020/3-2022/5

从3月份这本书到5月，基本工作很忙，基本没时间学习这个，断断续续的读了5章，可以说是读了忘记，忘记了又读，实现是一种低效的无头苍蝇式的学习。我需要重新改变自己这种学习方式。改为问题式学习，read with questions,我提出问题，提出目标，回答问题，实现目标。in other word,我的目标是利用卷积神经网络来识别一些特定的图标，提取信息。我目标即为此。以下为我是实现此类目标所产生相关的，不相关的问题

问题

1.常见的优化函数有哪些？常见的损失函数有哪些？
（1).优化函数：

sgd: stochastic gradient descent:随机梯度下降算法，感觉有点像爬山算法的反面
sgd:梯度下降算法
Adam 自适应矩估计 the name Adam is derived from adaptive moment estimation.see this for it’s biography
sgd的扩展，和sgd的区别：
Stochastic gradient descent maintains a single learning rate (termed alpha) for all weight updates and the learning rate does not change during training.
adam的好处是
The method computes individual adaptive learning rates for different parameters from estimates of first and second moments of the gradients.这个正好对应《Tensorflow实战》优化方法中的学习率的设置。adm结合了一下两种sgd方法的好处。这个实现在上诉链接有link指向，我们后面再深入，留个指针。
···
The authors describe Adam as combining the advantages of two other extensions of stochastic gradient descent. Specifically:
Adaptive Gradient Algorithm (AdaGrad) that maintains a per-parameter learning rate that improves performance on problems with sparse gradients (e.g. natural language and computer vision problems).
Root Mean Square Propagation (RMSProp) that also maintains per-parameter learning rates that are adapted based on the average of recent magnitudes of the gradients for the weight (e.g. how quickly it is changing). This means the algorithm does well on online and non-stationary problems (e.g. noisy).
Adam realizes the benefits of both AdaGrad and RMSProp.*

*RMSProp:最近看到了，文章在这里
- Momentum,NAG,Adagrad,Adadelta,RMSprop,Adam 链接
  (2).损失函数
  cross_entropy:交叉熵
  H((1,0,0),(0.5,0.4,0.1))=-(1log0.5+0log0.4+0*log0.1)=0.3
cross_entropy=tf.reduce_mean(y_*tf.log(tf.clip_by_value(y,1e-10,1.0)))

mean_squared_error:(MSE) 均方误差样本和实际值之间的方差

mse =tf.reduce_mean(tf.quare(y-y_))

sparse_categorical_crossentropy

2.DP的结构一个DP的大致结构如何？
1.定义神经网络的结构和前向传播的输出结果
2.定义顺势函数以及选择反向传播优化的算法
3. 生成会话并在训练集上反复运行反向传播优化算法
这个是一个很白话的一个结构，后面有新的感悟的时候，再来优化这个答案。

keras 的好处:简洁，语法糖多，暂时感受，keras的结构很符合思维习惯，感觉比原书的哪些繁杂代码更符合我的思维习惯

import tensorflow.keras as keras
import numpy as np
model = keras.Sequential([keras.layers.Dense(units=1,input_shape=[1])])

model.compile(optimizer='sgd',loss='mean_squared_error')

xs = np.array([-1.0,0.0,1.0,2.0,3.0,4.0],dtype=float)
ys = np.array([-3.0,-1.0,1.0,3.0,5.0,7.0],dtype=float)


model.fit(xs,ys,epochs=500)

print(model.predict([10.0]))

4.深度学习中常用的数据集合 MNIST是什么意思
MNIST:Modified National Institute of Standards and Technology database
默认为手写字体的数据集合，后面常用的比如Fashion MNIST （衣服鞋帽之类的）很多

import tensorflow as tf
from tensorflow import keras

fashion_mnist =keras.datasets.fashion_mnist
#6K 
(train_images,train_labels),(test_images,test_labels) =fashion_mnist.load_data()

print(test_images[0].shape)
print(test_labels[0]) #9

model=keras.Sequential([
    keras.layers.Flatten(input_shape=(28,28)),#28*28图片大小
    keras.layers.Dense(128,activation=tf.nn.relu),#隐藏层128
    keras.layers.Dense(10,activation=tf.nn.softmax)#分类
])

model.compile(optimizer=tf.keras.optimizers.Adam(),loss='sparse_categorical_crossentropy')

model.fit(train_images,train_labels,epochs=5)

ccc=model.evaluate(test_images,test_labels)

print(ccc)

predictions=model.predict(tf.constant([test_images[0]],shape=[1,28,28]))

print(predictions)

out =tf.nn.softmax(predictions,axis=1)
#[5.1525501e-10 6.2194284e-11 3.0880638e-17 2.3430925e-08 1.8686040e-15 3.4536224e-02 5.1994795e-19 3.8873106e-02 4.8555979e-08 9.2659068e-01]
print(out)