SparkDataFrame常用操作
文章平均质量分 50
勾勾黄
时间有限,不要浪费!
展开
-
使用 Spark DataFrame 构建统计类特征 (scala版)
行缺失数/率 统计 val DataDF = Seq(("Ram",null,"MCA","Bangalore"),(null,"25",null,null),(null,"26","BE",null),("Raju","21","Btech","Chennai")).toDF("name","age","degree","Place") //列名 val columns=DataDF.columns val cnt=DataDF.count() // 统计每列的缺失记录数 val missing_cn原创 2021-03-11 20:28:59 · 548 阅读 · 0 评论 -
Spark DataFrame 统计行/列缺失率 (scala版)
行缺失数/率 统计 进行特征工程得到特征后,如何统计dataframe格式特征行/列的缺失率? val DataDF = Seq(("Ram",null,"MCA","Bangalore"),(null,"25",null,null),(null,"26","BE",null),("Raju","21","Btech","Chennai")).toDF("name","age","degree","Place") //列名 val columns=DataDF.columns val cnt=DataDF原创 2021-03-11 20:27:30 · 1976 阅读 · 1 评论 -
Spark DataFrame 常用操作 Filter/groupBy/agg/pivot 方法 (scala版)
SparkDataFrame 常用操作 Filter/groupBy/agg/pivot方法 先构造一组数据 val dataDF = List( ("id1", "click","0108",1,1.0), ("id1", "view","0101",2,1.0), ("id2", "buy","0105",3,7.0), ("id2", "click","0104",4,9.0), ("id2", "click","0105",5,1.0), ("id3", "buy","0106",原创 2021-03-11 20:09:50 · 2889 阅读 · 0 评论