目录
0 需求分析
需求:表如下
user_id | good_name | goods_type | rk |
1 | hadoop | 10 | 1 |
1 | hive | 12 | 2 |
1 | sqoop | 26 | 3 |
1 | hbase | 10 | 4 |
1 | spark | 13 | 5 |
1 | flink | 26 | 6 |
1 | kafka | 14 | 7 |
1 | oozie | 10 | 8 |
以上数据中,goods_type列,假设26代表是广告,现在有个需求,想获取每个用户每次搜索下非广告类型的商品位置自然排序,如果下效果:
user_id | good_name | goods_type | rk | naturl_rk |
1 | hadoop | 10 | 1 | 1 |
1 | hive | 12 | 2 | 2 |
1 | sqoop | 26 | 3 | null |
1 | hbase | 10 | 4 | 3 |
1 | spark | 13 | 5 | 4 |
1 | flink | 26 | 6 | null |
1 | kafka | 14 | 7 | 5 |
1 | oozie | 10 | 8 | 6 |
1 数据准备
(1)建表
create table window_goods_test (
user_id int, --用户id
goods_name string, --商品名称
goods_type int, --标识每个商品的类型,比如广告,非广告
rk int --这次搜索下商品的位置,比如第一个广告商品就是1,后面的依次2,3,4...
)ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
(2)数据
vim window_goods_test
1 hadoop 10 1
1 hive 12 2
1 sqoop 26 3
1 hbase 10 4
1 spark 13 5
1 flink 26 6
1 kafka 14 7
1 oozie 10 8
(3)加载数据
load data local inpath "/home/centos/dan_test/window_goods_test.txt" into table window_goods_test;
(4)查询数据
21/06/25 11:35:33 INFO DAGScheduler: Job 2 finished: processCmd at CliDriver.java:376, took 0.209632 s
1 hadoop 10 1
1 hive 12 2
1 sqoop 26 3
1 hbase 10 4
1 spark 13 5
1 flink 26 6
1 kafka 14 7
1 oozie 10 8
Time taken: 0.818 seconds, Fetched 8 row(s)
21/06/25 11:35:33 INFO CliDriver: Time taken: 0.818 seconds, F