1. 创建一个数据库
1. hive> create database wordcount;
2. OK
3. Time taken: 0.389 seconds
4. hive> show databases;
5. OK
6. default
7. wordcount
8. Time taken: 0.043 seconds, Fetched: 3 row(s)
2. 创建表
1. hive> create table file_data(context string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\n';
2. OK
3. Time taken: 1.227 seconds
4. hive> show tables;
5. OK
6. file_data
7. Time taken: 0.195 seconds, Fetched: 1 row(s)
3. 准备数据
1. [hadoop@wjxhadoop001 ~]$ vi wordcount.txt
2. hello world
3. hello hadoop
4. hello java
5. hello mysql
6. c
7. c++
8. lisi zhangsan wangwu
4. 将数据加载到 file_data 表中
将准备的数据(/home/hadoop/wordcount.tx)添加到file_data 表中
1. hive> load data local inpath '/home/hadoop/wordcount.txt' into table file_data; 2. Loading data to table wordcount.file_data 3. Table wordcount.file_data stats: [numFiles=1, totalSize=75] 4. OK 5. Time taken: 3.736 seconds
查看
1. hive> select * from file_data; 2. OK 3. hello world 4. hello hadoop 5. hello java 6. hello mysql 7. c 8. c++ 9. lisi zhangsan wangwu 10. Time taken: 0.736 seconds, Fetched: 7 row(s)
5. 切分数据
- 根据空格切分数据,切分出来的每个单词作为一行 记录到结果表。
创建结果表,存放单词统计记录。
1. hive> create table words(word string);; 2. OK 3. Time taken: 0.606 seconds
split是拆分函数,跟java的split功能一样,这里是按照空格拆分,所以执行完hql语句,words表里面就全部保存的单个单词
1. hive> insert into table words select explode(split(word , " ")) from file_data; 2. hive> select * from words; 3. OK 4. hello 5. world 6. hello 7. hadoop 8. hello 9. java 10. hello 11. mysql 12. c 13. c++ 14. lisi 15. zhangsan 16. wangwu 17. Time taken: 0.304 seconds, Fetched: 13 row(s)
6. 使用count
1. hive> select word, count(word) from words group by word;
2. Query ID = hadoop_20171222143131_4636629a-1983-4b0b-8d96-39351f3cd53b
3. Total jobs = 1
4. Launching Job 1 out of 1
5. Number of reduce tasks not specified. Estimated from input data size: 1
6. In order to change the average load for a reducer (in bytes):
7. set hive.exec.reducers.bytes.per.reducer=<number>
8. In order to limit the maximum number of reducers:
9. set hive.exec.reducers.max=<number>
10. In order to set a constant number of reducers:
11. set mapreduce.job.reduces=<number>
12. Starting Job = job_1513893519773_0004, Tracking URL = http://zydatahadoop001:8088/proxy/application_1513893519773_0004/
13. Kill Command = /opt/software/hadoop-cdh/bin/hadoop job -kill job_1513893519773_0004
14. Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
15. 2017-12-22 15:47:06,871 Stage-1 map = 0%, reduce = 0%
16. 2017-12-22 15:47:50,314 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 22.96 sec
17. 2017-12-22 15:48:09,496 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 25.59 sec
18. MapReduce Total cumulative CPU time: 25 seconds 590 msec
19. Ended Job = job_1513893519773_0004
20. MapReduce Jobs Launched:
21. Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 25.59 sec HDFS Read: 6944 HDFS Write: 77 SUCCESS
22. Total MapReduce CPU Time Spent: 25 seconds 590 msec
23. OK
24. c 1
25. c++ 1
26. hadoop 1
27. hello 4
28. java 1
29. lisi 1
30. mysql 1
31. wangwu 1
32. world 1
33. zhangsan 1
来自@若泽大数据