配置所需环境:
install.packages("h2o")
library(h2o)
Sys.setenv(JAVA_HOME="E:/java/JAVA(1)") -----配置环境变量
h2o.init() #链接h2o平台
下载数据:
训练集: http://www.pjreddie.com/media/files/mnist_train.csv
测试集: http://www.pjreddie.com/media/files/mnist_test.csv
train_h2o <- h2o.importFile( path = "D:/mnist_train.csv")
test_h2o <- h2o.importFile(path = "D:/mnist_test.csv")
y_train <- as.factor(as.matrix(train_h2o[, 1]))
y_test <- as.factor(as.matrix(test_h2o[, 1]))
训练模型:
model <- h2o.deeplearning(x = 2:785, # column numbers for predictors
y = 1, # column number for label
training_frame = train_h2o, 训练集
activation = "Tanh", #激活函数
#balance_classes = TRUE, #训练集类别均衡
hidden = c(100, 100, 100), ## three hidden layers
epochs = 100) #迭代100次
由于数据规模比较大,是60000行*785列的,所以这个过程中电脑会变的巨卡,cpu使用量会持续95%以上,我的电脑持续了40分钟才训练完模型
接下来你可以输出模型来看看你的训练集的训练效果如何
==============
H2ORegressionModel: deeplearning
Model ID: DeepLearning_model_R_1500974326986_4
Status of Neuron Layers: predicting C1, regression, gaussian distribution, Quadratic loss, 92,101 weights/biases, 1.1 MB, 862,830 training samples, mini-batch size 1
layer units type dropout l1 l2 mean_rate rate_rms momentum
1 1 717 Input 0.00 %
2 2 100 Tanh 0.00 % 0.000000 0.000000 0.352263 0.377816 0.000000
3 3 100 Tanh 0.00 % 0.000000 0.000000 0.050956 0.026576 0.000000
4 4 100 Tanh 0.00 % 0.000000 0.000000 0.233008 0.247813 0.000000
5 5 1 Linear 0.000000 0.000000 0.001606 0.001025 0.000000
mean_weight weight_rms mean_bias bias_rms
1
2 -0.002465 0.110346 0.016357 0.192539
3 0.001666 0.177409 0.002860 0.447464
4 -0.002143 0.154353 -0.017609 0.236047
5 -0.012989 0.069333 -0.056454 0.000000
H2ORegressionMetrics: deeplearning
** Reported on training data. **
** Metrics reported on temporary training frame with 10092 samples **
MSE: 0.1165795
RMSE: 0.3414374
MAE: 0.1600576
RMSLE: 0.09332472
Mean Residual Deviance : 0.1165795
然后来看一下测试集分类效果如何,我们把训练的模型拿来预测测试集:
yhat_train <- h2o.predict(model, train_h2o)$predict
yhat_train <- as.factor(as.matrix(yhat_train))
yhat_test <- h2o.predict(model, test_h2o)$predict
install.packages("h2o")
library(h2o)
Sys.setenv(JAVA_HOME="E:/java/JAVA(1)") -----配置环境变量
h2o.init() #链接h2o平台
下载数据:
训练集: http://www.pjreddie.com/media/files/mnist_train.csv
测试集: http://www.pjreddie.com/media/files/mnist_test.csv
train_h2o <- h2o.importFile( path = "D:/mnist_train.csv")
test_h2o <- h2o.importFile(path = "D:/mnist_test.csv")
y_train <- as.factor(as.matrix(train_h2o[, 1]))
y_test <- as.factor(as.matrix(test_h2o[, 1]))
训练模型:
model <- h2o.deeplearning(x = 2:785, # column numbers for predictors
y = 1, # column number for label
training_frame = train_h2o, 训练集
activation = "Tanh", #激活函数
#balance_classes = TRUE, #训练集类别均衡
hidden = c(100, 100, 100), ## three hidden layers
epochs = 100) #迭代100次
由于数据规模比较大,是60000行*785列的,所以这个过程中电脑会变的巨卡,cpu使用量会持续95%以上,我的电脑持续了40分钟才训练完模型
接下来你可以输出模型来看看你的训练集的训练效果如何
model
==============
H2ORegressionModel: deeplearning
Model ID: DeepLearning_model_R_1500974326986_4
Status of Neuron Layers: predicting C1, regression, gaussian distribution, Quadratic loss, 92,101 weights/biases, 1.1 MB, 862,830 training samples, mini-batch size 1
layer units type dropout l1 l2 mean_rate rate_rms momentum
1 1 717 Input 0.00 %
2 2 100 Tanh 0.00 % 0.000000 0.000000 0.352263 0.377816 0.000000
3 3 100 Tanh 0.00 % 0.000000 0.000000 0.050956 0.026576 0.000000
4 4 100 Tanh 0.00 % 0.000000 0.000000 0.233008 0.247813 0.000000
5 5 1 Linear 0.000000 0.000000 0.001606 0.001025 0.000000
mean_weight weight_rms mean_bias bias_rms
1
2 -0.002465 0.110346 0.016357 0.192539
3 0.001666 0.177409 0.002860 0.447464
4 -0.002143 0.154353 -0.017609 0.236047
5 -0.012989 0.069333 -0.056454 0.000000
H2ORegressionMetrics: deeplearning
** Reported on training data. **
** Metrics reported on temporary training frame with 10092 samples **
MSE: 0.1165795
RMSE: 0.3414374
MAE: 0.1600576
RMSLE: 0.09332472
Mean Residual Deviance : 0.1165795
然后来看一下测试集分类效果如何,我们把训练的模型拿来预测测试集:
yhat_train <- h2o.predict(model, train_h2o)$predict
yhat_train <- as.factor(as.matrix(yhat_train))
yhat_test <- h2o.predict(model, test_h2o)$predict
yhat_test <- as.factor(as.matrix(yhat_test))
yt<-as.numeric(as.character(y_test)) #将因子现转字符再转数值
yhat<-as.numeric(as.character(yhat_test))
执行以下代码可以输出分类的正确个数
s<-0
for(i in 1:10000)
{
if(yt[i]==round(yhat[i]))
s<-s+1
}
s
[1] 8964
预测成功8964个,正确率为89.64%,效果还算不错