2.项目记录将HDFS中的日志数据导入Hive的ODS层

将HDFS中的数据导入到Hive中

工作场景

由于公司里的日志数据有20-30个字段,并且根据事件类型不同,生成的日志类别也有所差别

方案设计

将日志通过不同的来源划分成几种,比如分成了WEB_EVENT,APP_EVENT,WXAPP_EVENT等几种数据来源,每种数据来源的结构保证相同,
例如:

{"account":"","appId":"cn.xxx","appVersion":"2.0","carrier":"小米移动","deviceId":"ZvRWCBGAuSaK","deviceType":"REDMI-6","eventId":"share","ip":"218.23.97.57","latitude":36.17641631906538,"longitude":120.39343589187808,"netType":"WIFI","osName":"android","osVersion":"7.5","properties":{"pageId":"165","productId":"419","shareMethod":"qq空间","title":"lWF eRR jFJ","url":"uvM/JyH"},"releaseChannel":"木蚂蚁安卓应用市场","resolution":"1024*768","sessionId":"bCjgwViU9vd","timeStamp":1598861486317}
{"account":"","appId":"cn.xxx","appVersion":"4.0","carrier":"中国联通","deviceId":"nNge28DXXwNC","deviceType":"LEPHONE-6","eventId":"share","ip":"107.249.206.150","latitude":34.253410875603905,"longitude":119.15852793581637,"netType":"3G","osName":"android","osVersion":"8.5","properties":{"pageId":"513","productId":"238","shareMethod":"微信朋友圈","title":"roY rVW Pur","url":"yMi/gpb"},"releaseChannel":"手机乐园","resolution":"2048*1024","sessionId":"X1gk6w5NHdB","timeStamp":1598861486697}
{"account":"","appId":"cn.xxx","appVersion":"2.0","carrier":"小米移动","deviceId":"szqwdecx78RA","deviceType":"MI-7","eventId":"adClick","ip":"27.214.63.168","latitude":30.343773977338717,"longitude":114.29445473137073,"netType":"3G","osName":"android","osVersion":"7.2","properties":{"adCampain":"18","adId":"16","adLocation":"9","pageId":"479"},"releaseChannel":"奇珀市场","resolution":"1024*768","sessionId":"sqO1197eqlu","timeStamp":1598861486815}

这种JSON类型的数据,主要是在入仓的时候使用了GitHub开源插件进行解析,将JSON解析成字段类型,最后,能够减少很多数据处理工作

效果(脱敏)

+--------------------------+--------------------------+---------------------------+-----------------------------+-----------------------------------------+
| event_wxapp_log.account  | event_wxapp_log.carrier  | event_wxapp_log.deviceid  | event_wxapp_log.devicetype  | event_wxapp_log.evenxapp_log.longitude  |
+--------------------------+--------------------------+---------------------------+-----------------------------+-----------------------------------------+
| xxxxxxxxxxxxxxxx         | 中国xx                  | ooooooooooo               | IPHONE-6                    | share           .53805466900734         |
| xxxxxxxxxxxxxxxx         | 中国xx                  | ooooooooooo               | MI-6                        | adShow          .86291585848379         |
| xxxxxxxxxxxxxxxx         | 中国xx                  | ooooooooooo               | IPHONE-6                    | thumbup         .53805466900734         |
| xxxxxxxxxxxxxxxx         | 中国xx                  | ooooooooooo               | MATE-X                      | submitOrder     .43712824596625         |
| xxxxxxxxxxxxxxxx         | 中国xx                  | ooooooooooo               | MEIZU-ML7                   | submitOrder     .03285465597456         |
| xxxxxxxxxxxxxxxx         | 中国xx                  | ooooooooooo               | IPHONE-6                    | adClick         .53805466900734         |
| xxxxxxxxxxxxxxxx         | 中国xx                  | ooooooooooo               | MI-6                        | share           .86291585848379         |
| xxxxxxxxxxxxxxxx         | 中国xx                  | ooooooooooo               | MEIZU-ML7                   | submitOrder     .03285465597456         |
| xxxxxxxxxxxxxxxx         | 腾讯xxx                  | ooooooooooo               | IPHONE-9                    | login           .55698566076829         |
|                          | 中国xx                  | ooooooooooo               | MATE-X                      | adClick         .43712824596625         |
+--------------------------+--------------------------+---------------------------+-----------------------------+-----------------------------------------+

字段中包含数组类型,方便后续的DWD层的处理

待处理

针对用户的唯一标识做行为数据关联,目前面临的问题是对于用户标识不同的来源标识不同,
APP使用的是mac+IMEI+系统号+APP码
WEB端使用的是CookieID
微信小程序使用的是OPENID
保证日志和用户的强关联是需要解决的

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值