关注微信公共号:小程在线
关注CSDN博客:程志伟的博客
1.加载包
julia> using MultivariateStats, RDatasets
2.导入数据集
julia> iris = dataset("datasets", "iris")
150×5 DataFrame
│ Row │ SepalLength │ SepalWidth │ PetalLength │ PetalWidth │ Species │
│ │ Float64 │ Float64 │ Float64 │ Float64 │ Cat… │
├─────┼─────────────┼────────────┼─────────────┼────────────┼───────────┤
│ 1 │ 5.1 │ 3.5 │ 1.4 │ 0.2 │ setosa │
│ 2 │ 4.9 │ 3.0 │ 1.4 │ 0.2 │ setosa │
│ 3 │ 4.7 │ 3.2 │ 1.3 │ 0.2 │ setosa │
│ 4 │ 4.6 │ 3.1 │ 1.5 │ 0.2 │ setosa │
│ 5 │ 5.0 │ 3.6 │ 1.4 │ 0.2 │ setosa │
│ 6 │ 5.4 │ 3.9 │ 1.7 │ 0.4 │ setosa │
│ 7 │ 4.6 │ 3.4 │ 1.4 │ 0.3 │ setosa │
│ 8 │ 5.0 │ 3.4 │ 1.5 │ 0.2 │ setosa │
│ 9 │ 4.4 │ 2.9 │ 1.4 │ 0.2 │ setosa │
│ 10 │ 4.9 │ 3.1 │ 1.5 │ 0.1 │ setosa │
⋮
│ 140 │ 6.9 │ 3.1 │ 5.4 │ 2.1 │ virginica │
│ 141 │ 6.7 │ 3.1 │ 5.6 │ 2.4 │ virginica │
│ 142 │ 6.9 │ 3.1 │ 5.1 │ 2.3 │ virginica │
│ 143 │ 5.8 │ 2.7 │ 5.1 │ 1.9 │ virginica │
│ 144 │ 6.8 │ 3.2 │ 5.9 │ 2.3 │ virginica │
│ 145 │ 6.7 │ 3.3 │ 5.7 │ 2.5 │ virginica │
│ 146 │ 6.7 │ 3.0 │ 5.2 │ 2.3 │ virginica │
│ 147 │ 6.3 │ 2.5 │ 5.0 │ 1.9 │ virginica │
│ 148 │ 6.5 │ 3.0 │ 5.2 │ 2.0 │ virginica │
│ 149 │ 6.2 │ 3.4 │ 5.4 │ 2.3 │ virginica │
│ 150 │ 5.9 │ 3.0 │ 5.1 │ 1.8 │ virginica │
3.创建训练集
julia> Xtr = convert(Array,(iris[1:2:end,1:4]))'
4×75 LinearAlgebra.Adjoint{Float64,Array{Float64,2}}:
5.1 4.7 5.0 4.6 4.4 5.4 4.8 5.8 5.4 5.7 5.4 4.6 … 6.4 7.4 6.4 6.1 6.3 6.0 6.7 5.8 6.7 6.3 6.2
3.5 3.2 3.6 3.4 2.9 3.7 3.0 4.0 3.9 3.8 3.4 3.6 2.8 2.8 2.8 2.6 3.4 3.0 3.1 2.7 3.3 2.5 3.4
1.4 1.3 1.4 1.4 1.4 1.5 1.4 1.2 1.3 1.7 1.7 1.0 5.6 6.1 5.6 5.6 5.6 4.8 5.6 5.1 5.7 5.0 5.4
0.2 0.2 0.2 0.3 0.2 0.2 0.1 0.2 0.4 0.3 0.2 0.2 2.1 1.9 2.2 1.4 2.4 1.8 2.4 1.9 2.5 1.9 2.3
julia> Xtr_labels = convert(Array,(iris[1:2:end,5]))
75-element Array{String,1}:
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
⋮
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
4.创建测试集
julia> Xte = convert(Array,(iris[2:2:end,1:4]))'
4×75 LinearAlgebra.Adjoint{Float64,Array{Float64,2}}:
4.9 4.6 5.4 5.0 4.9 4.8 4.3 5.7 5.1 5.1 5.1 5.1 … 7.2 7.9 6.3 7.7 6.4 6.9 6.9 6.8 6.7 6.5 5.9
3.0 3.1 3.9 3.4 3.1 3.4 3.0 4.4 3.5 3.8 3.7 3.3 3.0 3.8 2.8 3.0 3.1 3.1 3.1 3.2 3.0 3.0 3.0
1.4 1.5 1.7 1.5 1.5 1.6 1.1 1.5 1.4 1.5 1.5 1.7 5.8 6.4 5.1 6.1 5.5 5.4 5.1 5.9 5.2 5.2 5.1
0.2 0.2 0.4 0.2 0.1 0.2 0.1 0.4 0.3 0.3 0.4 0.5 1.6 2.0 1.5 2.3 1.8 2.1 2.3 2.3 2.3 2.0 1.8
julia> Xte_labels = convert(Array,(iris[2:2:end,5]))
75-element Array{String,1}:
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
"setosa"
⋮
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
"virginica"
5.创建PCA,保留3个变量
julia> M = fit(PCA, Xtr; maxoutdim=3)
PCA(indim = 4, outdim = 3, principalratio = 0.9957325846529407)
6.对测试集进行测试
julia> Yte = transform(M, Xte)
WARNING: both RDatasets and MultivariateStats export "transform"; uses of it in module Main must be qualified
ERROR: UndefVarError: transform not defined
Stacktrace:
[1] top-level scope at REPL[8]:1
julia> Yte = MultivariateStats.transform(M, Xte)
3×75 Array{Float64,2}:
2.72714 2.75491 2.32396 2.65105 2.68917 … -1.89212 -2.53304 -1.92047 -1.74161 -1.37706
-0.230916 -0.406149 0.646374 0.0828144 -0.17411 0.476081 0.329041 0.246554 0.127625 -0.280295
0.253119 0.0271266 -0.230469 0.0252853 0.231507 -0.1407 -0.309265 -0.180044 -0.123165 -0.314992
julia> Xr = reconstruct(M, Yte)
4×75 Array{Float64,2}:
4.86449 4.61087 5.40782 5.00775 4.90864 4.83689 … 6.85145 6.76852 6.79346 6.58825 6.46774 5.94384
3.04262 3.08695 3.89061 3.39069 3.08963 3.35572 3.15828 3.25784 3.20785 3.13416 3.03873 2.94737
1.46099 1.48132 1.68656 1.48668 1.48516 1.53664 5.4834 5.32585 5.91124 5.39197 5.25542 5.02469
0.10362 0.229519 0.421233 0.221041 0.123445 0.300129 1.96821 1.9431 2.28224 1.99665 1.91243 1.91901
7.对标签进行颜色分组
julia> setosa = Yte[:,Xte_labels.=="setosa"]
3×25 Array{Float64,2}:
2.72714 2.75491 2.32396 2.65105 2.68917 … 2.83607 2.42896 2.7245 2.85164 2.72585
-0.230916 -0.406149 0.646374 0.0828144 -0.17411 -0.969268 0.0908506 -0.314765 -0.324812 0.0357625
0.253119 0.0271266 -0.230469 0.0252853 0.231507 0.48069 -0.259688 0.149877 -0.0346684 0.0995694
julia> versicolor = Yte[:,Xte_labels.=="versicolor"]
3×25 Array{Float64,2}:
-0.901049 -0.189593 -0.631938 0.737306 0.00578031 … 0.692149 -0.315901 -0.620402 -0.288899
0.350685 -0.7887 -0.40525 -1.01982 -0.74682 -1.00992 -0.215681 0.0637046 -0.336681
-0.0027406 0.291058 0.0207768 0.121331 -0.179235 0.23994 -0.0438964 0.218073 0.0457549
julia> virginica = Yte[:,Xte_labels.=="virginica"]
3×25 Array{Float64,2}:
-1.4126 -1.95359 -3.35517 -2.89581 -2.8711 … -1.89212 -2.53304 -1.92047 -1.74161 -1.37706
-0.556727 -0.133821 0.692925 0.487134 0.828406 0.476081 0.329041 0.246554 0.127625 -0.280295
-0.214115 -0.075898 0.293002 0.386084 -0.496979 -0.1407 -0.309265 -0.180044 -0.123165 -0.314992
julia> size(Xte)
(4, 75)
julia> size(Yte)
(3, 75)
8.对结果进行可视化
julia> using Plots
julia> p = scatter(setosa[1,:],setosa[2,:],setosa[3,:],marker=:circle,linewidth=0)
scatter!(versicolor[1,:],versicolor[2,:],versicolor[3,:],marker=:circle,linewidth=0)
scatter!(virginica[1,:],virginica[2,:],virginica[3,:],marker=:circle,linewidth=0)