MLlib--多层感知机（MLP）算法原理及Spark MLlib调用实例（Scala/Java/Python）

最新推荐文章于 2024-01-12 17:04:51 发布

小丁丁_ddxdd

最新推荐文章于 2024-01-12 17:04:51 发布

阅读量2.2k

点赞数

分类专栏：技术层-spark

技术层-spark 专栏收录该内容

35 篇文章 0 订阅

订阅专栏

来源：http://blog.csdn.net/liulingyuan6/article/details/53432429

多层感知机

算法简介：

多层感知机是基于反向人工神经网络（feedforwardartificial neural network）。多层感知机含有多层节点，每层节点与网络的下一层节点完全连接。输入层的节点代表输入数据，其他层的节点通过将输入数据与层上节点的权重w以及偏差b线性组合且应用一个激活函数，得到该层输出。多层感知机通过方向传播来学习模型，其中我们使用逻辑损失函数以及L-BFGS。K＋1层多层感知机分类器可以写成矩阵形式如下：

$y(x) = {f_k}(...{f_2}(w_2^T{f_1}(w_1^Tx + {b_1}) + {b_2})... + {b_k})$

中间层节点使用sigmoid方程：

$f({z_i}) = \frac{1}{{1 + {e^{ - {z_i}}}}}$

输出层使用softmax方程：

$f({z_i}) = \frac{{{e^{{z_i}}}}}{{\sum\limits_{k = 1}^N {{e^{{z_k}}}} }}$

输出层中N代表类别数目。

参数：

featuresCol:

类型：字符串型。

含义：特征列名。

labelCol:

类型：字符串型。

含义：标签列名。

layers:

类型：整数数组型。

含义：层规模，包括输入规模以及输出规模。

maxIter:

类型：整数型。

含义：迭代次数（>=0）。

predictionCol:

类型：字符串型。

含义：预测结果列名。

seed:

类型：长整型。

含义：随机种子。

stepSize:

类型：双精度型。

含义：每次迭代优化步长。

tol:

类型：双精度型。

含义：迭代算法的收敛性。

示例：

Scala:

[plain]view plaincopy 
   
 import org.apache.spark.ml.classification.MultilayerPerceptronClassifier  
 import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator  
   
 // Load the data stored in LIBSVM format as a DataFrame.  
 val data = spark.read.format("libsvm")  
   .load("data/mllib/sample_multiclass_classification_data.txt")  
 // Split the data into train and test  
 val splits = data.randomSplit(Array(0.6, 0.4), seed = 1234L)  
 val train = splits(0)  
 val test = splits(1)  
 // specify layers for the neural network:  
 // input layer of size 4 (features), two intermediate of size 5 and 4  
 // and output of size 3 (classes)  
 val layers = Array[Int](4, 5, 4, 3)  
 // create the trainer and set its parameters  
 val trainer = new MultilayerPerceptronClassifier()  
   .setLayers(layers)  
   .setBlockSize(128)  
   .setSeed(1234L)  
   .setMaxIter(100)  
 // train the model  
 val model = trainer.fit(train)  
 // compute accuracy on the test set  
 val result = model.transform(test)  
 val predictionAndLabels = result.select("prediction", "label")  
 val evaluator = new MulticlassClassificationEvaluator()  
   .setMetricName("accuracy")  
 println("Accuracy: " + evaluator.evaluate(predictionAndLabels))  

Java:

[java]view plaincopy 
   
 import org.apache.spark.sql.Dataset;  
 import org.apache.spark.sql.Row;  
 import org.apache.spark.sql.SparkSession;  
 import org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel;  
 import org.apache.spark.ml.classification.MultilayerPerceptronClassifier;  
 import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator;  
   
 // Load training data  
 String path = "data/mllib/sample_multiclass_classification_data.txt";  
 Dataset<Row> dataFrame = spark.read().format("libsvm").load(path);  
 // Split the data into train and test  
 Dataset<Row>[] splits = dataFrame.randomSplit(new double[]{0.6, 0.4}, 1234L);  
 Dataset<Row> train = splits[0];  
 Dataset<Row> test = splits[1];  
 // specify layers for the neural network:  
 // input layer of size 4 (features), two intermediate of size 5 and 4  
 // and output of size 3 (classes)  
 int[] layers = new int[] {4, 5, 4, 3};  
 // create the trainer and set its parameters  
 MultilayerPerceptronClassifier trainer = new MultilayerPerceptronClassifier()  
   .setLayers(layers)  
   .setBlockSize(128)  
   .setSeed(1234L)  
   .setMaxIter(100);  
 // train the model  
 MultilayerPerceptronClassificationModel model = trainer.fit(train);  
 // compute accuracy on the test set  
 Dataset<Row> result = model.transform(test);  
 Dataset<Row> predictionAndLabels = result.select("prediction", "label");  
 MulticlassClassificationEvaluator evaluator = new MulticlassClassificationEvaluator()  
   .setMetricName("accuracy");  
 System.out.println("Accuracy = " + evaluator.evaluate(predictionAndLabels));  

Python：

[python]view plaincopy 
   
 from pyspark.ml.classification import MultilayerPerceptronClassifier  
 from pyspark.ml.evaluation import MulticlassClassificationEvaluator  
   
 # Load training data  
 data = spark.read.format("libsvm")\  
     .load("data/mllib/sample_multiclass_classification_data.txt")  
 # Split the data into train and test  
 splits = data.randomSplit([0.6, 0.4], 1234)  
 train = splits[0]  
 test = splits[1]  
 # specify layers for the neural network:  
 # input layer of size 4 (features), two intermediate of size 5 and 4  
 # and output of size 3 (classes)  
 layers = [4, 5, 4, 3]  
 # create the trainer and set its parameters  
 trainer = MultilayerPerceptronClassifier(maxIter=100, layers=layers, blockSize=128, seed=1234)  
 # train the model  
 model = trainer.fit(train)  
 # compute accuracy on the test set  
 result = model.transform(test)  
 predictionAndLabels = result.select("prediction", "label")  
 evaluator = MulticlassClassificationEvaluator(metricName="accuracy")  
 print("Accuracy: " + str(evaluator.evaluate(predictionAndLabels)))