row_number()
row_number() 开窗函数是按照某个字段分组,然后取另一字段的前几个的值,相当于 分组取topN
开窗函数格式:
row_number() over (partitin by xxx order by xxx )
java代码示例:
SparkSession sparkSession = SparkSession
.builder()
.appName("window")
.master("local")
//开启hive的支持,接下来就可以操作hive表了
// 前提需要是需要开启hive metastore 服务
.enableHiveSupport()
.getOrCreate();
sparkSession.sql("use spark");
sparkSession.sql("drop table if exists sales");
sparkSession.sql("create table if not exists sales (riqi string,leibie string,jine Int) "
+ "row format delimited fields terminated by '\t'");
sparkSession.sql("load data local inpath '/root/test/sales' into table sales");
/**
* 开窗函数格式:
* 【 rou_number() over (partitin by XXX order by XXX) 】
*/
Dataset<Row> result = sparkSession.sql("select riqi,leibie,jine "
+ "from ("
+ "select riqi,leibie,jine,"
+ "row_number() over (partition by leibie order by jine desc) rank "
+ "from sales) t "
+ "where t.rank<=3");
result.show();
sparkSession.stop();