一 、数据源的准备工作
首先我们去一个网站下载相关的数据,之后通过hive导入进行实验.http://grouplens.org/
二 、内部表
创建内部表并载入数据
-
hadoop@hadoopmaster:~$ beeline-u jdbc:hive2://hadoopmaster:10000/
-
Beelineversion2.1.0byApacheHive
-
0:jdbc:hive2://hadoopmaster:10000/> show databases;
-
OK
-
+----------------+--+
-
|database_name|
-
+----------------+--+
-
|default|
-
|fincials|
-
+----------------+--+
-
2rows selected(1.038seconds)
-
0:jdbc:hive2://hadoopmaster:10000/> use default;
-
OK
-
Norows affected(0.034seconds)
-
0:jdbc:hive2://hadoopmaster:10000/> create table u_data (userid INT, movieid INT, rating INT, unixtime STRING) row format delimited fields terminated by '\t' lines terminated by '\n';
-
OK
-
Norows affected(0.242seconds)
-
0:jdbc:hive2://hadoopmaster:10000/> LOAD DATA LOCAL INPATH '/home/hadoop/u.data' OVERWRITE INTO TABLE u_data;
-
Loadingdata to tabledefault.u_data
-
OK
-
Norows affected(0.351seconds)
-
0:jdbc:hive2://hadoopmaster:10000/> select * from u_data;
-
OK
-
+----------------+-----------------+----------------+------------------+--+
-
|u_data.userid|u_data.movieid|u_data.rating|u_data.unixtime|
-
+----------------+-----------------+----------------+------------------+--+
-
|196|242|3|881250949|
-
|186|302|3|891717742|
-
|22|377|1|878887116|
-
|244|51|2|880606923|
-
|166|346|1|886397596|
-
|298|474|4|884182806|
-
|115|265|2|881171488|
-
|253|465|5|891628467|
-
|305|451|3|886324817|
-
|6|86|3|883603013|
-
|62|257|2|879372434|
-
|286|1014|5|879781125|
查看占用的HDFS空间
-
hadoop@hadoopmaster:~$ hdfs dfs-ls/user/hive/warehouse/u_data
-
Found1items
-
-rwxrwxr-x2hadoop supergroup19791732016-07-2210:19/user/hive/warehouse/u_data/u.data
写脚本反复导入100次
先查看以前有多少行
-
0:jdbc:hive2://hadoopmaster:10000/> select count(*) from u_data;
-
WARNING:Hive-on-MRisdeprecatedinHive2andmaynotbe availableinthe future versions.Considerusinga different execution engine(i.e.tez,spark)orusingHive1.Xreleases.
-
QueryID=hadoop_20160722102853_77aa1bc6-79c2-4916-9b07-a763d112ef41
-
Totaljobs=1
-
LaunchingJob1outof1
-
Numberof reduce tasks determined at compile time:1
-
Inorder to change the average loadfora reducer(inbytes):
-
sethive.exec.reducers.bytes.per.reducer=<number>
-
Inorder to limit the maximum number of reducers:
-
sethive.exec.reducers.max=<number>
-
Inorder toseta constant number of reducers:
-
setmapreduce.job.reduces=<number>
-
StartingJob=job_1468978056881_0003,TrackingURL=http://hadoopmaster:8088/proxy/application_1468978056881_0003/
-
KillCommand=/usr/local/hadoop/bin/hadoop job-kill job_1468978056881_0003
-
Hadoopjob informationforStage-1:number of mappers:1;number of reducers:1
-
2016-07-2210:28:58,786Stage-1map=0%,reduce=0%
-
2016-07-2210:29:03,890Stage-1map=100%,reduce=0%,CumulativeCPU0.89sec
-
2016-07-2210:29:10,005Stage-1map=100%,reduce=100%,CumulativeCPU1.71sec
-
MapReduceTotalcumulative CPU time:1seconds710msec
-
EndedJob=job_1468978056881_0003
-
MapReduceJobsLaunched:
-
Stage-Stage-1:Map:1Reduce:1CumulativeCPU:1.71sec HDFSRead:198