【Hive】hive 微博案例

数据准备及描述

数据描述

用户的历史数据,戴止到20131215,压缩后221MB,解压后878MB,整个数据1206个小文件,所有数据格式均是json格式
数据下载链接

数据样例

[{
  "beCommentWeiboId":"","beForwardWeiboId":"","catchTime":"1387165034","commentCount":"6","content":"Raresmileyportrait(1977)","createTime":"1387130972","info1":"","info2":"","info3":"","mlevel":"","musicurl":[],"pic_list":["http://ww2.sinaimg.cn/thumbnail/69d3e27djw1ebkxp7rtczj20mo0mogmy.jpg"],"praiseCount":"5","reportCount":"70","source":"","userId":"1775493757","videourl":[],"weiboId":"3655954636173507","weiboUrl":"http://weibo.com/1775493757/AntDppU0H"}]
[{
  "beCommentWeiboId":"","beForwardWeiboId":"3655954636173507","catchTime":"1387165034","commentCount":"29","content":"玲笑容!","createTime":"1387139090","info1":"","info2":"","info3":"","mlevel":"","musicurl":[],"pic_list":[],"praiseCount":"72","reportCount":"61","source":"新浪微博","userId":"1719481457","videourl":[],"weiboId":"3655988685551869","weiboUrl":"http://weibo.com/1719481457/Anuwkniih"}]
[{
  "beCommentWeiboId":"","beForwardWeiboId":"","catchTime":"1387165034","commentCount":"4","content":"lifeisbeautifulandallisaboutconfident&trust&friends&LOVE,thanksto@黄伟文,youmakemefeellikehongkongismagic&happiness.","createTime":"1387053188","info1":"","info2":"","info3":"","mlevel":"","musicurl":[],"pic_list":[],"praiseCount":"8","reportCount":"8","source":"","userId":"1733190683","videourl":[],"weiboId":"3655628385727081","weiboUrl":"http://weibo.com/1733190683/Anl9co1Sh"}]

字段描述

共19个字段:

beCommentWeiboId  是否评论
beForwardWeiboId 是否是转发微博
catchTime 抓取时间
commentCount 评论次数
content	内容
createTime 创建时间
info1 信息字段1
info2信息字段2
info3信息字段3
mlevel   no sure
musicurl	音乐链接
pic_list	照片列表(可以有多个)
praiseCount	点赞人数
reportCount	转发人数
source	数据来源
userId	用户id
videourl	视频链接	
weiboId	微博id
weiboUrl	微博网址

数据存储

hdfs://hdp01:9000/data/weibo
建表的时候,建外表

[hdp01@hdp01 weibo]$ hdfs dfs -ls /data/weibo
Found 2 items
-rw-r--r--   2 hdp01 supergroup    1004992 2020-01-11 16:17 /data/weibo/1387159770_1087770692_20100101000000_VCSvoMgPvrSTKhCkkIA7uMV9Hn10877706927159770ouss.json
-rw-r--r--   2 hdp01 supergroup     680641 2020-01-11 16:17 /data/weibo/1387159770_1180721740_20100101000000_tBx94gQvEoOWTiB4n3gORSmS11807217407159771ouss.json

准备开始

hive> set hive.exec.model.local.auto=true;
--hive> set hive.cli.print.header=true;
hive> create database weibo;
hive> use weibo;

功能需求

1. 数据处理:针对数据问题,请给出对应的解决方案(15分)

数据文件过多:要合并,请给出解决方案
mapreduce

2. 组织数据(10分)

(创建Hive表weibo_json(json string),表只有一个字段,导入所有数据,并验证查询前5条数据)
(解析完weibo_json当中的json格式数据到拥有19个字段的weibo表中,写出必要的SQL语句)

创建weibo_json表

hive> create external table if not exists weibo_json(
    > json string)
    > location "/data/weibo";
   
-- 因为我创建的外部表,location指向了/data/weibo,所以表创建完成直接就可以读数据了
hive> select * from weibo_json limit 2;
OK
[{
  "beCommentWeiboId":"","beForwardWeiboId":"","catchTime":"1387159495","commentCount":"1419","content":"分享图片","createTime":"1386981067","info1":"","info2":"","info3":"","mlevel":"","musicurl":[],"pic_list":["http://ww3.sinaimg.cn/thumbnail/40d61044jw1ebixhnsiknj20qo0qognx.jpg"],"praiseCount":"5265","reportCount":"1285","source":"iPad客户端","userId":
  • 2
    点赞
  • 20
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值