关闭

Hive计算玩家连胜&连败

标签: hive连胜udf
396人阅读 评论(0) 收藏 举报

前几天接到以需求,要求计算玩家的连胜&连败纪录,数据sample如下:


源数据的表结构如下:

CREATE TABLE `t1`(
  `id` bigint, 
  `player_id` int, 
  `game_result` int, 
  `season` int, 
  `game_type` int)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY '\t';

数据文件如下:

id:表示比赛id

player_id:用户id

game_result:比赛结果,0代表输,1代表赢

id player_id game_result
1 123 1
2 123 1
3 321 0

使用python编写UDF如下:

python文件名:

win_streak_reduce.py


#! /usr/bin/python

import sys

for line in sys.stdin:
        line = line.split('\t')
        player_id = line[0]
        games = [int(i) for i in line[1].split(',')]
        win_streak_max = 0
        lost_streak_max = 0
        win_tmp = 0
        lost_tmp = 0
        for i in games:
                if i == 1:
                        lost_tmp = 0
                        win_tmp += 1
                else:
                        lost_tmp += 1
                        win_tmp = 0
                if win_tmp > win_streak_max:
                        win_streak_max = win_tmp
                if lost_tmp > lost_streak_max:
                        lost_streak_max = lost_tmp
        print "{player_id}\t{win_streak_max}\t{lost_streak_max}".format(player_id = player_id, win_streak_max = win_streak_max, lost_streak_max = lost_streak_max)


最后查询:

add file /win_streak_reduce.py;
select transform(player_id,games) using 'python win_streak_reduce.py' as player_id,win_streak,lost_streak from (select player_id,concat_ws(',',collect_list(cast(game_result as string))) games from (select player_id,id, game_result from t1 cluster by player_id,id) t  group by player_id) t;

整体思路是:
1. 将比赛记录按照player_id,id排序,HQL中的cluster by player_id,id,会使同一个player_id的数据被分发到一起处理。
2. concat_ws(',',collect_list(cast(game_result as string)))会把玩家的比赛结果按照id从小打到排成数组,最后转成字符串,用,分隔
3. 使用udf对比赛结果进行计算,找出连胜和连败最大值。


使用hive不久,这样的处理方式虽然能解决问题,但相信这样的处理方式显得很不专业。请各位大牛批评指正。

1
0

查看评论
* 以上用户言论只代表其个人观点,不代表CSDN网站的观点或立场
    个人资料
    • 访问:983次
    • 积分:31
    • 等级:
    • 排名:千里之外
    • 原创:2篇
    • 转载:0篇
    • 译文:0篇
    • 评论:0条
    文章分类
    文章存档