需求分析
已知,数据如下:
圈子可以理解为微信公众号,用户可以理解为公众号的粉丝 tb_habit圈子表:近千万行数据
master_id(圈住的用户id) habit_id(圈住所建的圈子) open_id1 habit_id1 open_id1 habit_id2 open_id1 habit_id3 open_id2 habit_id4 open_id2 habit_id5 open_id3 habit_id6 open_id3 habit_id7 .......... ...............
user_habit_relation用户与圈子关系表:亿行数据
habit_id(所建的圈子) user_id(加入圈子的用户) habit_id1 user_id1 habit_id1 user_id3 habit_id1 user_id4 habit_id3 user_id2 habit_id3 user_id1 habit_id2 user_id5 habit_id2 user_id1 habit_id2 user_id7 habit_id4 user_id11 habit_id4 user_id12 habit_id4 user_id1 habit_id6 user_id17
需求:请用hive sql计算出如下结果(同一个圈主名下,同一个用户加多个圈子只计算一次):