来源:http://blog.csdn.net/liulingyuan6/article/details/53432429
多层感知机
算法简介:
多层感知机是基于反向人工神经网络(feedforwardartificial neural network)。多层感知机含有多层节点,每层节点与网络的下一层节点完全连接。输入层的节点代表输入数据,其他层的节点通过将输入数据与层上节点的权重w以及偏差b线性组合且应用一个激活函数,得到该层输出。多层感知机通过方向传播来学习模型,其中我们使用逻辑损失函数以及L-BFGS。K+1层多层感知机分类器可以写成矩阵形式如下:
中间层节点使用sigmoid方程:
输出层使用softmax方程:
输出层中N代表类别数目。
参数:
featuresCol:
类型:字符串型。
含义:特征列名。
labelCol:
类型:字符串型。
含义:标签列名。
layers:
类型:整数数组型。
含义:层规模,包括输入规模以及输出规模。
maxIter:
类型:整数型。
含义:迭代次数(>=0)。
predictionCol:
类型:字符串型。
含义:预测结果列名。
seed:
类型:长整型。
含义:随机种子。
stepSize:
类型:双精度型。
含义:每次迭代优化步长。
tol:
类型:双精度型。
含义:迭代算法的收敛性。
示例:
Scala:
- import org.apache.spark.ml.classification.MultilayerPerceptronClassifier
- import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
-
- // Load the data stored in LIBSVM format as a DataFrame.
- val data = spark.read.format("libsvm")
- .load("data/mllib/sample_multiclass_classification_data.txt")
- // Split the data into train and test
- val splits = data.randomSplit(Array(0.6, 0.4), seed = 1234L)
- val train = splits(0)
- val test = splits(1)
- // specify layers for the neural network:
- // input layer of size 4 (features), two intermediate of size 5 and 4
- // and output of size 3 (classes)
- val layers = Array[Int](4, 5, 4, 3)
- // create the trainer and set its parameters
- val trainer = new MultilayerPerceptronClassifier()
- .setLayers(layers)
- .setBlockSize(128)
- .setSeed(1234L)
- .setMaxIter(100)
- // train the model
- val model = trainer.fit(train)
- // compute accuracy on the test set
- val result = model.transform(test)
- val predictionAndLabels = result.select("prediction", "label")
- val evaluator = new MulticlassClassificationEvaluator()
- .setMetricName("accuracy")
- println("Accuracy: " + evaluator.evaluate(predictionAndLabels))
Java:
- import org.apache.spark.sql.Dataset;
- import org.apache.spark.sql.Row;
- import org.apache.spark.sql.SparkSession;
- import org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel;
- import org.apache.spark.ml.classification.MultilayerPerceptronClassifier;
- import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator;
-
-
- String path = "data/mllib/sample_multiclass_classification_data.txt";
- Dataset<Row> dataFrame = spark.read().format("libsvm").load(path);
-
- Dataset<Row>[] splits = dataFrame.randomSplit(new double[]{0.6, 0.4}, 1234L);
- Dataset<Row> train = splits[0];
- Dataset<Row> test = splits[1];
-
-
-
- int[] layers = new int[] {4, 5, 4, 3};
-
- MultilayerPerceptronClassifier trainer = new MultilayerPerceptronClassifier()
- .setLayers(layers)
- .setBlockSize(128)
- .setSeed(1234L)
- .setMaxIter(100);
-
- MultilayerPerceptronClassificationModel model = trainer.fit(train);
-
- Dataset<Row> result = model.transform(test);
- Dataset<Row> predictionAndLabels = result.select("prediction", "label");
- MulticlassClassificationEvaluator evaluator = new MulticlassClassificationEvaluator()
- .setMetricName("accuracy");
- System.out.println("Accuracy = " + evaluator.evaluate(predictionAndLabels));
Python:
- from pyspark.ml.classification import MultilayerPerceptronClassifier
- from pyspark.ml.evaluation import MulticlassClassificationEvaluator
-
-
- data = spark.read.format("libsvm")\
- .load("data/mllib/sample_multiclass_classification_data.txt")
-
- splits = data.randomSplit([0.6, 0.4], 1234)
- train = splits[0]
- test = splits[1]
-
-
-
- layers = [4, 5, 4, 3]
-
- trainer = MultilayerPerceptronClassifier(maxIter=100, layers=layers, blockSize=128, seed=1234)
-
- model = trainer.fit(train)
-
- result = model.transform(test)
- predictionAndLabels = result.select("prediction", "label")
- evaluator = MulticlassClassificationEvaluator(metricName="accuracy")
- print("Accuracy: " + str(evaluator.evaluate(predictionAndLabels)))