机器学习笔记——第3篇

文章作者是我的一位大神学长陈雨学长~,征得学长同意后将文章转载到了我的blog上,特在此感谢学长~

代码在学长的github上:
https://github.com/unnamed2/MDL
欢迎猛击

自己实现的机器学习(3):

3.0 开始写代码

3.0.1 代码不多,去掉矩阵运算和数据读取之外只有不到150行,在这里全部拿上来:
template<size_t Row,size_t Col>
void misnt_data_reader(std::vector<std::pair<Matrix, Matrix>>& outputs){/*...*/}//读取Minst数据集,读到outputs里面,对于每一张图,将其读成Row * Col的矩阵,存到pair的first 和该数据的标签值(这个图片上面是0-9中每一个数字的概率,非1即0) 详细代码右转github
//MINST数据集中有60000张手写数字的图片,每张都是28 * 28的灰度图 我们把前50000张作为训练集,训练模型用,后10000张用来进行测试评估,看看现在学的咋样了
//0-0.001之间的随机数,用这种初始化方法很不好,之后会有更好的初始化parameters的方法
float Random_0_1() {
    return (float)rand() / (float)RAND_MAX * 0.001f;
}
//激活函数sigmoid和sigmoid的导函数
float Sigmoid(float x)
{
    return 1.0f / (expf(-x) + 1.0f);
}
float Sigmoid_P(float x)
{
    return Sigmoid(x)*(1.0f - Sigmoid(x));
}
//激活函数LeaklyReLU和他的导函数
float LeaklyReLU(float x) {
    if (x > 0.0f)return x;
    return 0.1f * x;
}
float LeaklyReLU_P(float x) {
    if (x > 0)return 1.0f;
    return 0.1f;
}

//开始啦
int main()
{

  //读取数据
    std::vector<std::pair<Matrix, Matrix>> datas;
    printf("Initializing ...");
    misnt_data_reader<28 * 28, 1>(datas);
    printf("done.\ttraining...\n");

  //训练参数:

  //我们每计算25张图更新一次参数
    const int sample_per_minibatch = 25;
  //中间的输出的向量的维度
    const int hiddens = 100;
  //学习率 这里是0.01
    const float learning_rate = 0.01f / sample_per_minibatch;

  //训练的参数啊 parameters 用随机数初始化
    Matrix W0(hiddens, 28 * 28, Random_0_1);
    Matrix B0(hiddens, 1, Random_0_1);
    Matrix W1(10, hiddens, Random_0_1);
    Matrix B1(10,1, Random_0_1);


    std::random_device rd;
    std::mt19937 g(rd());

  //保存偏导数的和的矩阵 全部初始化为0
    Matrix partlW0(hiddens, 28 * 28, []() {return 0.0f; });
    Matrix partlB0(hiddens, 1, []() {return 0.0f; });
    Matrix partlW1(10, hiddens, []() {return 0.0f; });
    Matrix partlB1(10, 1, []() {return 0.0f; });

    //training开始训练
    //每张图片训练20次
    for (int i = 0; i < 20; i++) {
        auto tp = std::chrono::system_clock::now();//当前时间
        std::random_shuffle(datas.begin(), datas.begin() + 50000);//先把训练集随机打乱
        int DataOffset = 0;
        int ErrC = 0;
        for (int j = 0; j < 50000 / sample_per_minibatch; j++) {
            //清空偏导数的数组 全部变成0
            partlB0.SetValue(0.0f);
            partlB1.SetValue(0.0f);
            partlW0.SetValue(0.0f);
            partlW1.SetValue(0.0f);
            //对于每一张图片
            for (int k = 0; k < sample_per_minibatch; k++) {
                auto & samp = datas[DataOffset + k];

                //向前传播的过程,即计算输出和误差
                Matrix A0 = W0 * samp.first + B0;
                Matrix Z0 = A0.Apply(Sigmoid);//对A0的每一个元素计算Sigmoid 懒得写重载函数
                Matrix A1 = W1 * Z0 + B1;
                Matrix Y = A1.Apply(LeaklyReLU);

                int idx = 0;//看看我们输出的10个数字中那个数字的值最大 那样我们就认为这个数字是最有可能的
                for (int i = 0; i < 10; i++) {
                    if (Y(i, 0) > Y(idx, 0))idx = i;
                }
                if (samp.second(idx, 0) < 0.9f)ErrC++;//然而我们的结果和给定的标签并不一致的话,统计一下跪了几个

                //backward pass反向传播的过程
                Matrix Loss = Y-samp.second;//err 对 Y的 偏导数的值
              //ElementMultiplyWith是逐元素求乘法
                Loss.ElementMultiplyWith(A1.Apply(LeaklyReLU_P));//err 对 A1 的偏导数的值

                partlB1 += Loss;//对B1的偏导数等于对A1的偏导数
                partlW1 += Loss * ~Z0;//对W1的偏导数等于对A1的偏导数乘上Z0的逆矩阵

                Loss = (~W1 * Loss); //求对Z0的偏导数
                Loss.ElementMultiplyWith(A0.Apply(Sigmoid_P));//求对A0的偏导数
                partlB0 += Loss;//求对B0的偏导数
                partlW0 += Loss * ~samp.first;//求对W0的偏导数
            }
            // 更新
            W0 -= partlW0 * learning_rate;
            B0 -= partlB0 * learning_rate;
            W1 -= partlW1 * learning_rate;
            B1 -= partlB1 * learning_rate;
        }
        DataOffset += sample_per_minibatch;
        auto ed = std::chrono::system_clock::now();//统计时间
        auto g = ed - tp;
        //用测试集测一下准确率
        int errCount = 0;
        for (int j = 0; j < 10000; j++) {
            auto & samp = datas[50000 + j];

            //暴力计算就可以了 求输出Y
            Matrix A0 = W0 * samp.first + B0;
            Matrix Z0 = A0.Apply(Sigmoid);

            Matrix A1 = W1 * Z0 + B1;
            Matrix Y = A1.Apply(LeaklyReLU);

            int idx = 0;
            for (int i = 0; i < 10; i++) {
                if (Y(i, 0) > Y(idx, 0))idx = i;
            }
            if (samp.second(idx, 0) < 0.9f)errCount++;//GG了要统计
        }
        printf("Training %d / 20,loss %f%% on training set,loss %f%% on trst set,training cost %d ms\n", i+1,ErrC / 500.0f, errCount / 100.0f,std::chrono::duration_cast<std::chrono::duration<int,std::milli>>(g).count());
    }


}
3.0.2 运行结果
Training 1 / 20,loss 75.921997% on training set,loss 87.839996% on trst set,training cost 5986 ms
Training 2 / 20,loss 58.952000% on training set,loss 61.270000% on trst set,training cost 5982 ms
Training 3 / 20,loss 30.878000% on training set,loss 50.520000% on trst set,training cost 5961 ms
Training 4 / 20,loss 12.102000% on training set,loss 38.549999% on trst set,training cost 5979 ms
Training 5 / 20,loss 4.478000% on training set,loss 43.619999% on trst set,training cost 5801 ms
Training 6 / 20,loss 6.040000% on training set,loss 37.169998% on trst set,training cost 5911 ms
Training 7 / 20,loss 9.270000% on training set,loss 33.570000% on trst set,training cost 5886 ms
Training 8 / 20,loss 4.076000% on training set,loss 32.689999% on trst set,training cost 5920 ms
Training 9 / 20,loss 5.840000% on training set,loss 28.670000% on trst set,training cost 6021 ms
Training 10 / 20,loss 9.454000% on training set,loss 29.030001% on trst set,training cost 6032 ms
Training 11 / 20,loss 1.994000% on training set,loss 27.410000% on trst set,training cost 5956 ms
Training 12 / 20,loss 7.128000% on training set,loss 28.510000% on trst set,training cost 5881 ms
Training 13 / 20,loss 6.376000% on training set,loss 25.799999% on trst set,training cost 5828 ms
Training 14 / 20,loss 4.098000% on training set,loss 28.410000% on trst set,training cost 5866 ms
Training 15 / 20,loss 8.378000% on training set,loss 28.200001% on trst set,training cost 5681 ms
Training 16 / 20,loss 3.168000% on training set,loss 25.540001% on trst set,training cost 5814 ms
Training 17 / 20,loss 4.834000% on training set,loss 26.549999% on trst set,training cost 5744 ms
Training 18 / 20,loss 0.086000% on training set,loss 24.250000% on trst set,training cost 5901 ms
Training 19 / 20,loss 8.382000% on training set,loss 25.600000% on trst set,training cost 5863 ms
Training 20 / 20,loss 4.632000% on training set,loss 31.549999% on trst set,training cost 5776 ms

3.1 过拟合

从上面的结果里面可以明显的看到,在训练集上的误差要比测试集小得多,甚至第18次训练训练集上准确率已经高达99.9%以上,但是在测试集上依然只有75%左右的正确率. 换句话说,神经网络已经开始玩”背答案”,对教过她的其他内容掌握的明显不好.

在机器学习的方法里面,有很多手段来避免过拟合,后面会告诉大家如何真真正正的训练出一个不管在哪正确率都在99.9%以上的算法.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值