文章作者是我的一位大神学长陈雨学长~,征得学长同意后将文章转载到了我的blog上,特在此感谢学长~
代码在学长的github上:
https://github.com/unnamed2/MDL
欢迎猛击
自己实现的机器学习(3):
3.0 开始写代码
3.0.1 代码不多,去掉矩阵运算和数据读取之外只有不到150行,在这里全部拿上来:
template<size_t Row,size_t Col>
void misnt_data_reader(std::vector<std::pair<Matrix, Matrix>>& outputs){/*...*/}//读取Minst数据集,读到outputs里面,对于每一张图,将其读成Row * Col的矩阵,存到pair的first 和该数据的标签值(这个图片上面是0-9中每一个数字的概率,非1即0) 详细代码右转github
//MINST数据集中有60000张手写数字的图片,每张都是28 * 28的灰度图 我们把前50000张作为训练集,训练模型用,后10000张用来进行测试评估,看看现在学的咋样了
//0-0.001之间的随机数,用这种初始化方法很不好,之后会有更好的初始化parameters的方法
float Random_0_1() {
return (float)rand() / (float)RAND_MAX * 0.001f;
}
//激活函数sigmoid和sigmoid的导函数
float Sigmoid(float x)
{
return 1.0f / (expf(-x) + 1.0f);
}
float Sigmoid_P(float x)
{
return Sigmoid(x)*(1.0f - Sigmoid(x));
}
//激活函数LeaklyReLU和他的导函数
float LeaklyReLU(float x) {
if (x > 0.0f)return x;
return 0.1f * x;
}
float LeaklyReLU_P(float x) {
if (x > 0)return 1.0f;
return 0.1f;
}
//开始啦
int main()
{
//读取数据
std::vector<std::pair<Matrix, Matrix>> datas;
printf("Initializing ...");
misnt_data_reader<28 * 28, 1>(datas);
printf("done.\ttraining...\n");
//训练参数:
//我们每计算25张图更新一次参数
const int sample_per_minibatch = 25;
//中间的输出的向量的维度
const int hiddens = 100;
//学习率 这里是0.01
const float learning_rate = 0.01f / sample_per_minibatch;
//训练的参数啊 parameters 用随机数初始化
Matrix W0(hiddens, 28 * 28, Random_0_1);
Matrix B0(hiddens, 1, Random_0_1);
Matrix W1(10, hiddens, Random_0_1);
Matrix B1(10,1, Random_0_1);
std::random_device rd;
std::mt19937 g(rd());
//保存偏导数的和的矩阵 全部初始化为0
Matrix partlW0(hiddens, 28 * 28, []() {return 0.0f; });
Matrix partlB0(hiddens, 1, []() {return 0.0f; });
Matrix partlW1(10, hiddens, []() {return 0.0f; });
Matrix partlB1(10, 1, []() {return 0.0f; });
//training开始训练
//每张图片训练20次
for (int i = 0; i < 20; i++) {
auto tp = std::chrono::system_clock::now();//当前时间
std::random_shuffle(datas.begin(), datas.begin() + 50000);//先把训练集随机打乱
int DataOffset = 0;
int ErrC = 0;
for (int j = 0; j < 50000 / sample_per_minibatch; j++) {
//清空偏导数的数组 全部变成0
partlB0.SetValue(0.0f);
partlB1.SetValue(0.0f);
partlW0.SetValue(0.0f);
partlW1.SetValue(0.0f);
//对于每一张图片
for (int k = 0; k < sample_per_minibatch; k++) {
auto & samp = datas[DataOffset + k];
//向前传播的过程,即计算输出和误差
Matrix A0 = W0 * samp.first + B0;
Matrix Z0 = A0.Apply(Sigmoid);//对A0的每一个元素计算Sigmoid 懒得写重载函数
Matrix A1 = W1 * Z0 + B1;
Matrix Y = A1.Apply(LeaklyReLU);
int idx = 0;//看看我们输出的10个数字中那个数字的值最大 那样我们就认为这个数字是最有可能的
for (int i = 0; i < 10; i++) {
if (Y(i, 0) > Y(idx, 0))idx = i;
}
if (samp.second(idx, 0) < 0.9f)ErrC++;//然而我们的结果和给定的标签并不一致的话,统计一下跪了几个
//backward pass反向传播的过程
Matrix Loss = Y-samp.second;//err 对 Y的 偏导数的值
//ElementMultiplyWith是逐元素求乘法
Loss.ElementMultiplyWith(A1.Apply(LeaklyReLU_P));//err 对 A1 的偏导数的值
partlB1 += Loss;//对B1的偏导数等于对A1的偏导数
partlW1 += Loss * ~Z0;//对W1的偏导数等于对A1的偏导数乘上Z0的逆矩阵
Loss = (~W1 * Loss); //求对Z0的偏导数
Loss.ElementMultiplyWith(A0.Apply(Sigmoid_P));//求对A0的偏导数
partlB0 += Loss;//求对B0的偏导数
partlW0 += Loss * ~samp.first;//求对W0的偏导数
}
// 更新
W0 -= partlW0 * learning_rate;
B0 -= partlB0 * learning_rate;
W1 -= partlW1 * learning_rate;
B1 -= partlB1 * learning_rate;
}
DataOffset += sample_per_minibatch;
auto ed = std::chrono::system_clock::now();//统计时间
auto g = ed - tp;
//用测试集测一下准确率
int errCount = 0;
for (int j = 0; j < 10000; j++) {
auto & samp = datas[50000 + j];
//暴力计算就可以了 求输出Y
Matrix A0 = W0 * samp.first + B0;
Matrix Z0 = A0.Apply(Sigmoid);
Matrix A1 = W1 * Z0 + B1;
Matrix Y = A1.Apply(LeaklyReLU);
int idx = 0;
for (int i = 0; i < 10; i++) {
if (Y(i, 0) > Y(idx, 0))idx = i;
}
if (samp.second(idx, 0) < 0.9f)errCount++;//GG了要统计
}
printf("Training %d / 20,loss %f%% on training set,loss %f%% on trst set,training cost %d ms\n", i+1,ErrC / 500.0f, errCount / 100.0f,std::chrono::duration_cast<std::chrono::duration<int,std::milli>>(g).count());
}
}
3.0.2 运行结果
Training 1 / 20,loss 75.921997% on training set,loss 87.839996% on trst set,training cost 5986 ms
Training 2 / 20,loss 58.952000% on training set,loss 61.270000% on trst set,training cost 5982 ms
Training 3 / 20,loss 30.878000% on training set,loss 50.520000% on trst set,training cost 5961 ms
Training 4 / 20,loss 12.102000% on training set,loss 38.549999% on trst set,training cost 5979 ms
Training 5 / 20,loss 4.478000% on training set,loss 43.619999% on trst set,training cost 5801 ms
Training 6 / 20,loss 6.040000% on training set,loss 37.169998% on trst set,training cost 5911 ms
Training 7 / 20,loss 9.270000% on training set,loss 33.570000% on trst set,training cost 5886 ms
Training 8 / 20,loss 4.076000% on training set,loss 32.689999% on trst set,training cost 5920 ms
Training 9 / 20,loss 5.840000% on training set,loss 28.670000% on trst set,training cost 6021 ms
Training 10 / 20,loss 9.454000% on training set,loss 29.030001% on trst set,training cost 6032 ms
Training 11 / 20,loss 1.994000% on training set,loss 27.410000% on trst set,training cost 5956 ms
Training 12 / 20,loss 7.128000% on training set,loss 28.510000% on trst set,training cost 5881 ms
Training 13 / 20,loss 6.376000% on training set,loss 25.799999% on trst set,training cost 5828 ms
Training 14 / 20,loss 4.098000% on training set,loss 28.410000% on trst set,training cost 5866 ms
Training 15 / 20,loss 8.378000% on training set,loss 28.200001% on trst set,training cost 5681 ms
Training 16 / 20,loss 3.168000% on training set,loss 25.540001% on trst set,training cost 5814 ms
Training 17 / 20,loss 4.834000% on training set,loss 26.549999% on trst set,training cost 5744 ms
Training 18 / 20,loss 0.086000% on training set,loss 24.250000% on trst set,training cost 5901 ms
Training 19 / 20,loss 8.382000% on training set,loss 25.600000% on trst set,training cost 5863 ms
Training 20 / 20,loss 4.632000% on training set,loss 31.549999% on trst set,training cost 5776 ms
3.1 过拟合
从上面的结果里面可以明显的看到,在训练集上的误差要比测试集小得多,甚至第18次训练训练集上准确率已经高达99.9%以上,但是在测试集上依然只有75%左右的正确率. 换句话说,神经网络已经开始玩”背答案”,对教过她的其他内容掌握的明显不好.
在机器学习的方法里面,有很多手段来避免过拟合,后面会告诉大家如何真真正正的训练出一个不管在哪正确率都在99.9%以上的算法.