Hive综合案例实战

本文详述了Hive在大数据处理中的应用,包括数据源准备、内部表与外部表的创建与管理,以及分区表和分桶表的概念与操作。通过实例演示了如何导入数据、查看HDFS空间占用,并对比了内部表与外部表在数据管理及删除操作上的差异。
摘要由CSDN通过智能技术生成

一 、数据源的准备工作

首先我们去一个网站下载相关的数据,之后通过hive导入进行实验.http://grouplens.org/

二 、内部表

创建内部表并载入数据

  1. hadoop@hadoopmaster:~$ beeline-u jdbc:hive2://hadoopmaster:10000/

  2. Beelineversion2.1.0byApacheHive

  3. 0:jdbc:hive2://hadoopmaster:10000/> show databases;

  4. OK

  5. +----------------+--+

  6. |database_name|

  7. +----------------+--+

  8. |default|

  9. |fincials|

  10. +----------------+--+

  11. 2rows selected(1.038seconds)

  12. 0:jdbc:hive2://hadoopmaster:10000/> use default;

  13. OK

  14. Norows affected(0.034seconds)

  15. 0:jdbc:hive2://hadoopmaster:10000/> create table u_data (userid INT, movieid INT, rating INT, unixtime STRING) row format delimited fields terminated by '\t' lines terminated by '\n';

  16. OK

  17. Norows affected(0.242seconds)

  18. 0:jdbc:hive2://hadoopmaster:10000/> LOAD DATA LOCAL INPATH '/home/hadoop/u.data' OVERWRITE INTO TABLE u_data;

  19. Loadingdata to tabledefault.u_data

  20. OK

  21. Norows affected(0.351seconds)

  22. 0:jdbc:hive2://hadoopmaster:10000/> select * from u_data;

  23. OK

  24. +----------------+-----------------+----------------+------------------+--+

  25. |u_data.userid|u_data.movieid|u_data.rating|u_data.unixtime|

  26. +----------------+-----------------+----------------+------------------+--+

  27. |196|242|3|881250949|

  28. |186|302|3|891717742|

  29. |22|377|1|878887116|

  30. |244|51|2|880606923|

  31. |166|346|1|886397596|

  32. |298|474|4|884182806|

  33. |115|265|2|881171488|

  34. |253|465|5|891628467|

  35. |305|451|3|886324817|

  36. |6|86|3|883603013|

  37. |62|257|2|879372434|

  38. |286|1014|5|879781125|

查看占用的HDFS空间

  1. hadoop@hadoopmaster:~$ hdfs dfs-ls/user/hive/warehouse/u_data

  2. Found1items

  3. -rwxrwxr-x2hadoop supergroup19791732016-07-2210:19/user/hive/warehouse/u_data/u.data

写脚本反复导入100次

先查看以前有多少行

  1. 0:jdbc:hive2://hadoopmaster:10000/> select count(*) from u_data;

  2. WARNING:Hive-on-MRisdeprecatedinHive2andmaynotbe availableinthe future versions.Considerusinga different execution engine(i.e.tez,spark)orusingHive1.Xreleases.

  3. QueryID=hadoop_20160722102853_77aa1bc6-79c2-4916-9b07-a763d112ef41

  4. Totaljobs=1

  5. LaunchingJob1outof1

  6. Numberof reduce tasks determined at compile time:1

  7. Inorder to change the average loadfora reducer(inbytes):

  8. sethive.exec.reducers.bytes.per.reducer=<number>

  9. Inorder to limit the maximum number of reducers:

  10. sethive.exec.reducers.max=<number>

  11. Inorder toseta constant number of reducers:

  12. setmapreduce.job.reduces=<number>

  13. StartingJob=job_1468978056881_0003,TrackingURL=http://hadoopmaster:8088/proxy/application_1468978056881_0003/

  14. KillCommand=/usr/local/hadoop/bin/hadoop job-kill job_1468978056881_0003

  15. Hadoopjob informationforStage-1:number of mappers:1;number of reducers:1

  16. 2016-07-2210:28:58,786Stage-1map=0%,reduce=0%

  17. 2016-07-2210:29:03,890Stage-1map=100%,reduce=0%,CumulativeCPU0.89sec

  18. 2016-07-2210:29:10,005Stage-1map=100%,reduce=100%,CumulativeCPU1.71sec

  19. MapReduceTotalcumulative CPU time:1seconds710msec

  20. EndedJob=job_1468978056881_0003

  21. MapReduceJobsLaunched:

  22. Stage-Stage-1:Map:1Reduce:1CumulativeCPU:1.71sec   HDFSRead:198

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值