![](https://img-blog.csdnimg.cn/20201014180756738.png?x-oss-process=image/resize,m_fixed,h_64,w_64)
Scala
大数据-酷峰中行
大数据分析挖掘
展开
-
Spark2 feature Bucketizer之将连续数据离散化到指定的范围区间
import org.apache.spark.ml.feature.Bucketizer// Double.NegativeInfinity:负无穷;Double.PositiveInfinity:正无穷 // 分为10个组:[负无穷,-5),[-5,-4),[-4,-3.5),[-3.5,-0.5),[-0.5,0),[0,0.5),[0.5,2),[2,3.5),[3.5,4),[4,正无穷原创 2016-11-30 21:29:11 · 5384 阅读 · 0 评论 -
Spark2 机器学习之决策树分类Decision tree classifier
分类决策树代码import org.apache.spark.sql.SparkSession import org.apache.spark.sql.Dataset import org.apache.spark.sql.Row import org.apache.spark.sql.DataFrame import org.apache.spark.sql.Column import org.a原创 2016-11-30 19:50:58 · 4525 阅读 · 0 评论 -
Spark2 加载保存文件,数据文件转换成数据框dataframe
hadoop fs -put /home/wangxiao/data/ml/Affairs.csv /datafile/wangxiao/ hadoop fs -ls -R /datafile drwxr-xr-x - wangxiao supergroup 0 2016-10-15 10:46 /datafile/wangxiao -rw-r--r-- 3 wangx原创 2016-12-01 09:57:26 · 1229 阅读 · 0 评论 -
Spark2 DataFrameStatFunctions探索性数据统计分析
相关系数val df = Range(0,10,step=1).toDF("id").withColumn("rand1", rand(seed=10)).withColumn("rand2", rand(seed=27)) df: org.apache.spark.sql.DataFrame = [id: int, rand1: double ... 1 more field]df.stat.co原创 2016-12-01 16:47:01 · 1238 阅读 · 0 评论