命令行直接执行:
[root@master spark-2.2.0-bin-hadoop2.7]# bin/spark-submit examples/src/main/python/ml/kmeans_example.py
此外,也可以将代码拷贝到pyspark中执行,如下所示:
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.2.0
/_/
Using Python version 2.7.5 (default, Nov 6 2016 00:28:07)
SparkSession available as 'spark'.
>>> from pyspark.ml.clustering import KMeans
set)
wssse = model.computeCost(dataset)
print("Within Set Sum of Squared Errors = " + str(wssse))
centers = model.clusterCenters()>>> from pyspark.sql import SparkSession
>>> dataset = spark.read.format("libsvm").load("/home/spark/spark-2.2.0-bin-hadoop2.7/data/mllib/sample_kmeans_data.txt")
>>> kmeans = KMeans().setK(2).setSeed(1)
>>> model = kmeans.fit(dataset)
18/10/24 04:23:19 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
18/10/24 04:23:19 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
>>> wssse = model.computeCost(dataset)
>>> print("Within Set Sum of Squared Errors = " + str(wssse))
Within Set Sum of Squared Errors = 0.12
>>>
>>> centers = model.clusterCenters()
>>> for center in centers:print(center)
...
[0.1 0.1 0.1]
[9.1 9.1 9.1]