关注微信公共号:小程在线
关注CSDN博客:程志伟的博客
关于算法的讲解,查看前面python或者R写的内容。
1.加载包
julia> using RDatasets
2.加载数据集,并分为训练集和测试集
julia> iris = dataset("datasets","iris")
150×5 DataFrame
│ Row │ SepalLength │ SepalWidth │ PetalLength │ PetalWidth │ Species │
│ │ Float64 │ Float64 │ Float64 │ Float64 │ Cat… │
├─────┼─────────────┼────────────┼─────────────┼────────────┼───────────┤
│ 1 │ 5.1 │ 3.5 │ 1.4 │ 0.2 │ setosa │
│ 2 │ 4.9 │ 3.0 │ 1.4 │ 0.2 │ setosa │
│ 3 │ 4.7 │ 3.2 │ 1.3 │ 0.2 │ setosa │
│ 4 │ 4.6 │ 3.1 │ 1.5 │ 0.2 │ setosa │
│ 5 │ 5.0 │ 3.6 │ 1.4 │ 0.2 │ setosa │
│ 6 │ 5.4 │ 3.9 │ 1.7 │ 0.4 │ setosa │
│ 7 │ 4.6 │ 3.4 │ 1.4 │ 0.3 │ setosa │
│ 8 │ 5.0 │ 3.4 │ 1.5 │ 0.2 │ setosa │
│ 9 │ 4.4 │ 2.9 │ 1.4 │ 0.2 │ setosa │
│ 10 │ 4.9 │ 3.1 │ 1.5 │ 0.1 │ setosa │
⋮
│ 140 │ 6.9 │ 3.1 │ 5.4 │ 2.1 │ virginica │
│ 141 │ 6.7 │ 3.1 │ 5.6 │ 2.4 │ virginica │
│ 142 │ 6.9 │ 3.1 │ 5.1 │ 2.3 │ virginica │
│ 143 │ 5.8 │ 2.7 │ 5.1 │ 1.9 │ virginica │
│ 144 │ 6.8 │ 3.2 │ 5.9 │ 2.3 │ virginica │
│ 145 │ 6.7 │ 3.3 │ 5.7 │ 2.5 │ virginica │
│ 146 │ 6.7 │ 3.0 │ 5.2 │ 2.3 │ virginica │
│ 147 │ 6.3 │ 2.5 │ 5.0 │ 1.9 │ virginica │
│ 148 │ 6.5 │ 3.0 │ 5.2 │ 2.0 │ virginica │
│ 149 │ 6.2 │ 3.4 │ 5.4 │ 2.3 │ virginica │
│ 150 │ 5.9 │ 3.0 │ 5.1 │ 1.8 │ virginica │
julia> features = convert(Array,iris[:,1:4])
150×4 Array{Float64,2}:
5.1 3.5 1.4 0.2
4.9 3.0 1.4 0.2
4.7 3.2 1.3 0.2
4.6 3.1 1.5 0.2
5.0 3.6 1.4 0.2
5.4 3.9 1.7 0.4
4.6 3.4 1.4 0.3
5.0 3.4 1.5 0.2
4.4 2.9 1.4 0.2
4.9 3.1 1.5 0.1
5.4 3.7 1.5 0.2
4.8 3.4 1.6 0.2
4.8 3.0 1.4 0.1
⋮
6.0 3.0 4.8 1.8
6.9 3.1 5.4 2.1
6.7 3.1 5.6 2.4
6.9 3.1 5.1 2.3
5.8 2.7 5.1 1.9
6.8 3.2 5.9 2.3
6.7 3.3 5.7 2.5
6.7 3.0 5.2 2.3
6.3 2.5 5.0 1.9
6.5 3.0 5.2 2.0
6.2 3.4 5.4 2.3
5.9 3.0 5.1 1.8
julia> labels = convert(Array,iris[:,5])
150-element Array{String,1}:
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
⋮
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
3.加载决策树包
julia> using DecisionTree
[ Info: Precompiling DecisionTree [7806a523-6efd-50cb-b5f6-3fa6f1930dbb]
4.建立模型,并合并纯度大于等于90%的叶子
julia> model = build_tree(labels,features)
Decision Tree
Leaves: 9
Depth: 5
julia> prune_tree(model,0.9)
Decision Tree
Leaves: 8
Depth: 5
5.根据结果,将5层的决策树进行打印
julia> print_tree(model,5)
Feature 3, Threshold 2.45
L-> setosa : 50/50
R-> Feature 4, Threshold 1.75
L-> Feature 3, Threshold 4.95
L-> Feature 4, Threshold 1.65
L-> versicolor : 47/47
R-> virginica : 1/1
R-> Feature 4, Threshold 1.55
L-> virginica : 3/3
R-> Feature 1, Threshold 6.95
L-> versicolor : 2/2
R-> virginica : 1/1
R-> Feature 3, Threshold 4.85
L-> Feature 2, Threshold 3.1
L-> virginica : 2/2
R-> versicolor : 1/1
R-> virginica : 43/43
6.相当于剪枝,对前3层决策树进行剪枝
julia> print_tree(model,3)
Feature 3, Threshold 2.45
L-> setosa : 50/50
R-> Feature 4, Threshold 1.75
L-> Feature 3, Threshold 4.95
L->
R->
R-> Feature 3, Threshold 4.85
L->
R-> virginica : 43/43
7.使用训练好的模型进行预测
julia> apply_tree(model,[5.9,3.0,5.1,1.9])
"virginica"
julia> apply_tree(model,[1.9,2.0,2.1,1.9])
"setosa"
8.使用交叉验证,对模型的精度进行分析
julia> n_folds=5
5
julia> accuracy = nfoldCV_tree(labels,features,n_folds,0.9)
Fold 1
Classes: ["setosa", "versicolor", "virginica"]
Matrix: 3×3 Array{Int64,2}:
9 0 0
0 15 0
0 0 6
Accuracy: 1.0
Kappa: 1.0
Fold 2
Classes: ["setosa", "versicolor", "virginica"]
Matrix: 3×3 Array{Int64,2}:
7 0 0
0 14 0
0 1 8
Accuracy: 0.9666666666666667
Kappa: 0.9472759226713533
Fold 3
Classes: ["setosa", "versicolor", "virginica"]
Matrix: 3×3 Array{Int64,2}:
12 0 0
0 7 0
0 0 11
Accuracy: 1.0
Kappa: 1.0
Fold 4
Classes: ["setosa", "versicolor", "virginica"]
Matrix: 3×3 Array{Int64,2}:
11 0 0
0 7 0
0 0 12
Accuracy: 1.0
Kappa: 1.0
Fold 5
Classes: ["setosa", "versicolor", "virginica"]
Matrix: 3×3 Array{Int64,2}:
11 0 0
0 7 0
0 0 12
Accuracy: 1.0
Kappa: 1.0
Mean Accuracy: 0.9933333333333334
5-element Array{Float64,1}:
1.0
0.9666666666666667
1.0
1.0
1.0
二、adaboost算法
1.建立模型
julia> model,coeffs = build_adaboost_stumps(labels,features,10)
(Ensemble of Decision Trees
Trees: 10
Avg Leaves: 2.0
Avg Depth: 1.0, [0.3465735902799726, 0.25541281188299536, 0.2645768713851454, 0.3357494610262844, 0.28884625587396767, 0.2582865965088567, 0.35688249756171214, 0.29876564303107633, 0.24248457726633457, 0.3499761902694112])
2.利用模型进行测试
julia> apply_adaboost_stumps(model,coeffs,[5.9,3.0,5.1,1.9])
"virginica"
3.进行交叉验证
julia> accuracy = nfoldCV_stumps(labels,features,5,8)
Fold 1
Classes: ["setosa", "versicolor", "virginica"]
Matrix: 3×3 Array{Int64,2}:
9 0 0
0 9 0
0 3 9
Accuracy: 0.9
Kappa: 0.8507462686567167
Fold 2
Classes: ["setosa", "versicolor", "virginica"]
Matrix: 3×3 Array{Int64,2}:
10 0 0
0 6 2
0 1 11
Accuracy: 0.9
Kappa: 0.8469387755102041
Fold 3
Classes: ["setosa", "versicolor", "virginica"]
Matrix: 3×3 Array{Int64,2}:
8 0 0
0 10 1
0 0 11
Accuracy: 0.9666666666666667
Kappa: 0.9494949494949496
Fold 4
Classes: ["setosa", "versicolor", "virginica"]
Matrix: 3×3 Array{Int64,2}:
13 0 0
0 12 0
0 1 4
Accuracy: 0.9666666666666667
Kappa: 0.9459459459459458
Fold 5
Classes: ["setosa", "versicolor", "virginica"]
Matrix: 3×3 Array{Int64,2}:
10 0 0
0 10 0
0 1 9
Accuracy: 0.9666666666666667
Kappa: 0.9499999999999998
Mean Accuracy: 0.9400000000000001
5-element Array{Float64,1}:
0.9
0.9
0.9666666666666667
0.9666666666666667
0.9666666666666667
三、随机森林
1.建立模型
julia> model = build_forest(labels,features,2,10,0.5)
Ensemble of Decision Trees
Trees: 10
Avg Leaves: 6.5
Avg Depth: 4.7
2.做预测
julia> apply_forest(model,[5.9,3.0,5.1,1.9])
"virginica"
3.交叉验证
julia> accuracy = nfoldCV_forest(labels,features,3,2)
Fold 1
Classes: ["setosa", "versicolor", "virginica"]
Matrix: 3×3 Array{Int64,2}:
16 0 0
0 16 0
0 0 18
Accuracy: 1.0
Kappa: 1.0
Fold 2
Classes: ["setosa", "versicolor", "virginica"]
Matrix: 3×3 Array{Int64,2}:
15 0 0
0 15 2
0 0 18
Accuracy: 0.96
Kappa: 0.9397590361445782
Fold 3
Classes: ["setosa", "versicolor", "virginica"]
Matrix: 3×3 Array{Int64,2}:
19 0 0
0 16 1
0 0 14
Accuracy: 0.98
Kappa: 0.9698249849124925
Mean Accuracy: 0.98
3-element Array{Float64,1}:
1.0
0.96
0.98