动手实现Logistic Regression (c++)_测试

承上文,接口有了,实现晚了,就找点儿数据测吧。整个过程是在windows vs2008下完成。

数据样例如下:

1 0 1 2 5 8
1 1 3 4 7 8
1 0
1 0 1 4 5 8 9
1 0 1 2 4 5 9
0 4 5 7 8
1 0 2
1 2 3 5 6 7 8
1 0 3 4 5 6 7
1 0 1 2 4 5 7 8 9

数据来源是从 http://komarix.org/ 网站上找的一个LR工具包,里面自带的测试数据,我简单调整了格式。现在的数据格式是:分类标识 特征索引1 特征索引2 ......

测试代码,在Test函数中,如下:

void LogisticRegression::Test (void)
{
	/*TrainSGDOnSampleFile ("..\\Data\\SamplesTrain.txt", 10, 0.01, 100, 0.05);
	SaveLRModelTxt ("Model\\Mod_001_100_005.txt");*/
	/*LoadLRModelTxt ("Model\\Mod_001_100_005.txt");
	PredictOnSampleFile ("..\\Data\\SamplesTest.txt", "Model\\Rslt_001_100_005.txt", "Model\\Log_001_100_005.txt");*/

	/*TrainSGDOnSampleFile ("..\\Data\\SamplesTrain.txt", 10, 0.01, 1, 0.05);
	SaveLRModelTxt ("Model\\Mod_001_1_005.txt");*/
	LoadLRModelTxt ("Model\\Mod_001_1_005.txt");
	PredictOnSampleFile ("..\\Data\\SamplesTest.txt", "Model\\Rslt_001_1_005.txt", "Model\\Log_001_1_005.txt");
}

按照“训练-测试-换参数训练-再测试”来进行的。在main函数中调用Test函数运行:

#include "LogisticRegression.h"

#include <iostream>

using namespace std;

int main (void)
{
	cout << "Hello world for Logistic Regression" << endl;

	LogisticRegression toDo;
	toDo.Test ();

	return 0;
}

样本集合里共393个样本,前300个用来训练,后93个用来测试,共用了10个特征。当只有1轮迭代的时候,训练出的参数列表为:

10
0.492528
0.166805
0.126543
0.475935
0.137543
0.110732
0.0844769
0.159048
0.118025
0.0599141

(第一行是参数个数)。测试结果为81.72%:

The total number of sample is : 93
The correct prediction number is : 76
Precision : 0.817204

当最大迭代轮数增加到100时,训练过程的log为:

Hello world for Logistic Regression
In loop 0: current cost (0.541574) previous cost (1) ratio (0.458426)
In loop 1: current cost (0.433781) previous cost (0.541574) ratio (0.199035)
In loop 2: current cost (0.380875) previous cost (0.433781) ratio (0.121966)
In loop 3: current cost (0.340621) previous cost (0.380875) ratio (0.105687)
In loop 4: current cost (0.308502) previous cost (0.340621) ratio (0.0942953)
In loop 5: current cost (0.282294) previous cost (0.308502) ratio (0.0849548)
In loop 6: current cost (0.260501) previous cost (0.282294) ratio (0.0771983)
In loop 7: current cost (0.242082) previous cost (0.260501) ratio (0.0707073)
In loop 8: current cost (0.226293) previous cost (0.242082) ratio (0.0652207)
In loop 9: current cost (0.212595) previous cost (0.226293) ratio (0.0605327)
In loop 10: current cost (0.200586) previous cost (0.212595) ratio (0.0564851)
In loop 11: current cost (0.189964) previous cost (0.200586) ratio (0.0529563)
In loop 12: current cost (0.180494) previous cost (0.189964) ratio (0.0498525)

共进行了13轮迭代,直到cost下降幅度小于5%为止。训练出的参数为:

10
2.5668
0.12814
-0.135231
2.55564
-0.0892993
-0.256378
-0.279498
-0.100414
-0.185509
-0.447792

对比上面的参数,各个参数之间的大小关系没变,不过参数值之间的“差距”在拉大。此时再测,结果为100%:

The total number of sample is : 93
The correct prediction number is : 93
Precision : 1

完。


转载请注明出处:http://blog.csdn.net/xceman1997/article/details/17882981


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值