今天看了个blog后,测试了下Mahout的Logistic Regression (SGD), 具体讲解参考Mahout.
进入MAHOUT_HOME
1. 训练模型
$ bin/mahout trainlogistic --passes 1 --rate 1 --lambda 0.5 --input donut.csv --features 21 --output donut.model --target color --categories 2 --predictors x y xx xy yy a b c --types n n
21
color ~ -0.016*Intercept Term + -0.016*xy + -0.016*yy
Intercept Term -0.01559
xy -0.01559
yy -0.01559
0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 -0.015590929 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000
13/06/04 12:40:43 INFO driver.MahoutDriver: Program took 2013 ms (Minutes: 0.03355)
2. 测试模型
$ bin/mahout runlogistic --input donut.csv --model donut.model --auc --scores --confusion
结果:
"target","model-output","log-likelihood"
0,0.496,-0.685284
0,0.490,-0.674055
0,0.491,-0.675162
1,0.495,-0.703361
1,0.493,-0.706289
0,0.495,-0.683275
0,0.496,-0.685282
0,0.492,-0.677191
1,0.494,-0.704222
0,0.495,-0.684107
0,0.496,-0.684765
1,0.494,-0.705209
0,0.491,-0.675272
1,0.495,-0.703438
0,0.496,-0.685121
0,0.496,-0.684886
0,0.490,-0.672500
0,0.495,-0.682445
0,0.496,-0.684872
1,0.495,-0.703070
0,0.490,-0.672511
0,0.495,-0.683643
0,0.492,-0.677610
1,0.492,-0.708915
0,0.496,-0.684744
1,0.494,-0.704766
0,0.492,-0.677496
1,0.492,-0.708679
0,0.496,-0.685222
1,0.495,-0.703604
0,0.492,-0.677846
0,0.490,-0.672702
0,0.492,-0.676980
0,0.494,-0.681450
1,0.495,-0.702845
0,0.493,-0.679049
0,0.496,-0.684262
1,0.493,-0.706564
1,0.495,-0.704016
0,0.490,-0.672624
AUC = 0.52
confusion: [[27.0, 13.0], [0.0, 0.0]]
entropy: [[-0.7, -0.4], [-0.7, -0.5]]
13/06/04 12:45:04 INFO driver.MahoutDriver: Program took 203 ms (Minutes: 0.0033833333333333332)
参考:Mahout的Logistic Regression (SGD): https://cwiki.apache.org/MAHOUT/logistic-regression.html
http://jayatiatblogs.blogspot.hk/2013/05/running-mahouts-logistic-regression.html