Hive学习六:HIVE日志分析(用户画像)
标签(空格分隔): Hive
案例分析思路
根据原始数据表里面的信息提取用户画像信息,一方面实现难度较大,另一方面由于数据量较大,从而导致实现的性能较差。由于以上2点,所以考虑从原始表中提取用户的会话信息放到临时中间表中,最终通过关联多个临时中间表获取需要的用户信息,可以大大提高查询的性能和降低提取的难度。
原始表结构:
drop table if exists db_track.track_log ;
create table db_track.track_log(
id string,
url string,
referer string,
keyword string,
type string,
guid string,
pageId string,
moduleId string,
linkId string,
attachedInfo string,
sessionId string,
trackerU string,
trackerType string,
ip string,
trackerSrc string,
cookie string,
orderCode string,
trackTime string,
endUserId string,
firstLink string,
sessionViewNo string,
productId string,
curMerchantId string,
provinceId string,
cityId string,
fee string,
edmActivity string,
edmEmail string,
edmJobId string,
ieVersion string,
platform string,
internalKeyword string,
resultSum string,
currentPage string,
linkPosition string,
buttonPosition