Hive计算玩家连胜&连败

CREATE TABLE t1(
id bigint,
player_id int,
game_result int,
season int,
game_type int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';

id:表示比赛id

player_id:用户id

game_result:比赛结果，0代表输，1代表赢

 id player_id game_result 1 123 1 2 123 1 3 321 0

python文件名：

win_streak_reduce.py

#! /usr/bin/python

import sys

for line in sys.stdin:
line = line.split('\t')
player_id = line[0]
games = [int(i) for i in line[1].split(',')]
win_streak_max = 0
lost_streak_max = 0
win_tmp = 0
lost_tmp = 0
for i in games:
if i == 1:
lost_tmp = 0
win_tmp += 1
else:
lost_tmp += 1
win_tmp = 0
if win_tmp > win_streak_max:
win_streak_max = win_tmp
if lost_tmp > lost_streak_max:
lost_streak_max = lost_tmp
print "{player_id}\t{win_streak_max}\t{lost_streak_max}".format(player_id = player_id, win_streak_max = win_streak_max, lost_streak_max = lost_streak_max)

add file /win_streak_reduce.py;
select transform(player_id,games) using 'python win_streak_reduce.py' as player_id,win_streak,lost_streak from (select player_id,concat_ws(',',collect_list(cast(game_result as string))) games from (select player_id,id, game_result from t1 cluster by player_id,id) t  group by player_id) t;

1. 将比赛记录按照player_id,id排序，HQL中的cluster by player_id,id，会使同一个player_id的数据被分发到一起处理。
2. concat_ws(',',collect_list(cast(game_result as string)))会把玩家的比赛结果按照id从小打到排成数组，最后转成字符串，用,分隔
3. 使用udf对比赛结果进行计算，找出连胜和连败最大值。

NBA历史上最长的连胜和连败纪录!

2008-03-02 15:26:00

程序员专属谜语 你能猜中几个？

2015-05-28 09:40:54

游戏玩家的留存率统计实现

2013-11-14 16:50:03

hive--基本数据计算

2016-09-27 13:05:12

hive -- stddev , stddev_pop , stddev_samp , var_pop , var_samp（计算方差标准差等）

2017-05-16 15:26:31

Hive计算日期差

2017-10-10 22:57:02

根据身份证号得出年龄

2013-08-21 21:24:59

Hive留存率统计

2017-08-30 11:19:02

Hive 累积和的计算

2016-03-01 19:59:55

根据出生日期计算年龄

2014-01-25 08:09:34