SparkDataFrame常用操作
文章平均质量分 50
勾勾黄
时间有限,不要浪费!
展开
-
使用 Spark DataFrame 构建统计类特征 (scala版)
行缺失数/率 统计val DataDF = Seq(("Ram",null,"MCA","Bangalore"),(null,"25",null,null),(null,"26","BE",null),("Raju","21","Btech","Chennai")).toDF("name","age","degree","Place")//列名val columns=DataDF.columnsval cnt=DataDF.count()// 统计每列的缺失记录数val missing_cn原创 2021-03-11 20:28:59 · 537 阅读 · 0 评论 -
Spark DataFrame 统计行/列缺失率 (scala版)
行缺失数/率 统计进行特征工程得到特征后,如何统计dataframe格式特征行/列的缺失率?val DataDF = Seq(("Ram",null,"MCA","Bangalore"),(null,"25",null,null),(null,"26","BE",null),("Raju","21","Btech","Chennai")).toDF("name","age","degree","Place")//列名val columns=DataDF.columnsval cnt=DataDF原创 2021-03-11 20:27:30 · 1915 阅读 · 1 评论 -
Spark DataFrame 常用操作 Filter/groupBy/agg/pivot 方法 (scala版)
SparkDataFrame 常用操作 Filter/groupBy/agg/pivot方法先构造一组数据val dataDF = List( ("id1", "click","0108",1,1.0), ("id1", "view","0101",2,1.0), ("id2", "buy","0105",3,7.0), ("id2", "click","0104",4,9.0), ("id2", "click","0105",5,1.0), ("id3", "buy","0106",原创 2021-03-11 20:09:50 · 2772 阅读 · 0 评论