1、创建数据库与表
1、创建库
hive> create database sogou;
2、创建表
hive> create table if not exists sogou.sogou_500w(
> ts string,
> uid string,
> keyword string,
> rank int,
> orders int,
> url string) row format delimited fields terminated by '\t' stored as textfile;
OK
Time taken: 0.385 seconds
2、数据装载
hive> load data local inpath '/root/sogou.500w.utf8' overwrite into table sogou.sogou_500w;
Loading data to table sogou.sogou_500w
OK
Time taken: 5.18 seconds
- 查看;500w数据真的是眼都花了
3、提取数据
1、创建表sogou_xj
hive> create table if not exists sogou.sogou_xj(
> ts string,
> uid string,
> keyword string,
> rank int,
> orders int,
> url string) row format delimited fields terminated by '\t' stored as textfile;
OK
Time taken: 0.455 seconds
2、筛选带有‘仙剑奇侠装’的数据插入上表中
hive> insert overwrite table sogou.sogou_xj select * from sogou.sogou_500w where keyword like '%仙剑奇侠传%';
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20231126183129_57894c50-cf47-4706-b6a8-dca2760e413e
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2023-11-26 18:31:30,701 Stage-1 map = 0%, reduce = 0%
2023-11-26 18:31:31,761 Stage-1 map = 100%, reduce = 0%
Ended Job = job_local1259273538_0003
Stage-4 is filtered out by condition resolver.
Stage-3 is selected by condition resolver.
Stage-5 is filtered out by condition resolver.
Launching Job 3 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2023-11-26 18:31:35,004 Stage-3 map = 100%, reduce = 0%
Ended Job = job_local1385583283_0004
Loading data to table sogou.sogou_xj
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 3100078204 HDFS Write: 411085 SUCCESS
Stage-Stage-3: HDFS Read: 1147541830 HDFS Write: 338082 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 6.224 seconds
- 查看
hive> select * from sogou.sogou_xj limit 100;
OK
20111230003647 2e6a9aa494ae241b95664ff56971b72a 仙剑奇侠传三 2 1 http://www.tvmao.com/drama/LWlTVw==/episode
20111230003727 2e6a9aa494ae241b95664ff56971b72a 仙剑奇侠传三 3 2 http://www.youku.com/show_page/id_zcc07361a962411de83b1.html
20111230010508 89072e859c41da47298d967ab8d787c5 仙剑奇侠传1刘亦菲版全集 2 1 http://www.tudou.com/playlist/id11482399.html
20111230012415 3a195290e645f623d13c2b12cce4c273 仙剑奇侠传5破解 游民 2 1 http://bbs.gamersky.com/thread-927101-1-1.html
20111230012515 3a195290e645f623d13c2b12cce4c273 仙剑奇侠传5破解 游民 4 2 http://www.gamersky.com/Soft/200712/10802.shtml
20111230012532 3a195290e645f623d13c2b12cce4c273 仙剑奇侠传5破解教程 2 1 http://zhidao.baidu.com/question/293761707
20111230012544 3a195290e645f623d13c2b12cce4c273 仙剑奇侠传5破解教程 1 2 http://www.duote.com/tech/7/18820.html
20111230012544 3a195290e645f623d13c2b12cce4c273 仙剑奇侠传5破解教程 9 3 http://game.shangdu.com/news/pcgl/2011-08-17/359792.html
20111230013132 3a195290e645f623d13c2b12cce4c273 仙剑奇侠传5破解教程 8 4 http://my.nuaa.edu.cn/thread-152913-1-1.html
20111230015346 3a195290e645f623d13c2b12cce4c273 仙剑奇侠传5破解教程 10 5 http://tieba.baidu.com/p/1240255091
20111230020738 ebaf6800291f038901bfa25bf193262d 仙剑奇侠传5九宫格攻略 1 1 http://cheats.3234.com/danjigonglue/10216_2.htm
20111230024104 0d14ebeaf5c4429a519f0aa8b73e8b93 仙剑奇侠传2 4 1 http://zhidao.baidu.com/question/109364646
20111230024213 0d14ebeaf5c4429a519f0aa8b73e8b93 仙剑奇侠传4电视剧 2 1 http://tv.sogou.com/series/wxt4vu5644qm7sn5updont6awsv4xrgtz3h3pnvpxowq.html?p=40230600
20111230024246 0d14ebeaf5c4429a519f0aa8b73e8b93 仙剑奇侠传4电视剧 1 2 http://hi.baidu.com/sousun/blog/item/1a3ae9180b0253bc4aedbcc1.html
4、数据备份
hive> create table sogou.sogou_xj_backup as select * from sogou.sogou_xj;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20231126184539_8337b51c-413c-4927-a11d-6b80d3bb4bf9
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2023-11-26 18:45:40,986 Stage-1 map = 100%, reduce = 0%
Ended Job = job_local728323564_0005
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://hadoop:8020/user/hive/warehouse/sogou.db/.hive-staging_hive_2023-11-26_18-45-39_474_8148677371035998524-1/-ext-10002
Moving data to directory hdfs://hadoop:8020/user/hive/warehouse/sogou.db/sogou_xj_backup
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 1147726908 HDFS Write: 506639 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 1.785 seconds
- 查询
hive> select * from sogou.sogou_xj_backup limit 10;
OK
20111230003647 2e6a9aa494ae241b95664ff56971b72a 仙剑奇侠传三 2 1 http://www.tvmao.com/drama/LWlTVw==/episode
20111230003727 2e6a9aa494ae241b95664ff56971b72a 仙剑奇侠传三 3 2 http://www.youku.com/show_page/id_zcc07361a962411de83b1.html
20111230010508 89072e859c41da47298d967ab8d787c5 仙剑奇侠传1刘亦菲版全集 2 1 http://www.tudou.com/playlist/id11482399.html
20111230012415 3a195290e645f623d13c2b12cce4c273 仙剑奇侠传5破解 游民 2 1 http://bbs.gamersky.com/thread-927101-1-1.html
20111230012515 3a195290e645f623d13c2b12cce4c273 仙剑奇侠传5破解 游民 4 2 http://www.gamersky.com/Soft/200712/10802.shtml
20111230012532 3a195290e645f623d13c2b12cce4c273 仙剑奇侠传5破解教程 2 1 http://zhidao.baidu.com/question/293761707
20111230012544 3a195290e645f623d13c2b12cce4c273 仙剑奇侠传5破解教程 1 2 http://www.duote.com/tech/7/18820.html
20111230012544 3a195290e645f623d13c2b12cce4c273 仙剑奇侠传5破解教程 9 3 http://game.shangdu.com/news/pcgl/2011-08-17/359792.html
20111230013132 3a195290e645f623d13c2b12cce4c273 仙剑奇侠传5破解教程 8 4 http://my.nuaa.edu.cn/thread-152913-1-1.html
20111230015346 3a195290e645f623d13c2b12cce4c273 仙剑奇侠传5破解教程 10 5 http://tieba.baidu.com/p/1240255091
Time taken: 0.11 seconds, Fetched: 10 row(s)
5、导入数据
1、创建表sogou_liangjian
hive> create external table if not exists sogou.sogou_liangjian(
> ts string,
> uid string,
> keyword string,
> rank int,
> orders int,
> url string) row format delimited fields terminated by '\t' stored as textfile location '/sogou/liangjian';
OK
Time taken: 0.076 seconds
hive> show tables;
OK
sogou_500w
sogou_liangjian
sogou_xj
sogou_xj_backup
Time taken: 0.041 seconds, Fetched: 4 row(s)
hive> select * from sogou_liangjian;
OK
Time taken: 0.143 seconds
hive>
2、hive创建一个临时目录
hive> dfs -mkdir /tmp/export;
hive> dfs -ls /tmp/;
Found 2 items
drwxr-xr-x - root supergroup 0 2023-11-26 20:13 /tmp/export
drwx-wx-wx - root supergroup 0 2023-11-20 21:16 /tmp/hive
3、创建文件liangjian.txt;查询 ‘亮剑’ 数据写入临时目录
hive> insert overwrite directory '/tmp/export'
> row format delimited
> fields terminated by '\t'
> select * from sogou.sogou_500w where keyword like '亮剑';
4、将临时目录中数据导入本地文件
[root@hadoop ~]# hadoop fs -cat /tmp/export/* > liangjian.txt
[root@hadoop ~]# ls
a.txt e1.txt e2.txt e3.txt e.txt liangjian.txt sogou.500w.utf8 start.sh student.txt
[root@hadoop ~]# cat liangjian.txt
20111230003230 fa1f5f9f7983637d04179187c9fad3b0 亮剑 1 1 http://tv.sohu.com/s2011/liangjian/
20111230010001 574a4e94e666fce031689064ce6b524e 亮剑 2 1 http://tv.sogou.com/series/wxt4vu5644qmdqn5um.html?p=40230600
20111230012058 2a0f064f2afdf8fc5084f7e57c6ab007 亮剑 1 1 http://tv.sohu.com/s2011/liangjian/
5、将上边到处文件进行移动
hadoop fs -put liangjian.txt /sogou/liangjian
- 查看表数据
hive> select * from sogou.sogou_liangjian limit 20;
OK
20111230003230 fa1f5f9f7983637d04179187c9fad3b0 亮剑 1 1 http://tv.sohu.com/s2011/liangjian/
20111230010001 574a4e94e666fce031689064ce6b524e 亮剑 2 1 http://tv.sogou.com/series/wxt4vu5644qmdqn5um.html?p=40230600
20111230012058 2a0f064f2afdf8fc5084f7e57c6ab007 亮剑 1 1 http://tv.sohu.com/s2011/liangjian/
20111230012451 2dd3dff74740d3b3916f1e677622d27b 亮剑 2 1 http://tv.sogou.com/series/wxt4vu5644qmdqn5um.html?p=40230600
20111230012529 2dd3dff74740d3b3916f1e677622d27b 亮剑 4 2 http://www.tvmao.com/drama/LW5SVw==/episode
20111230012609 2dd3dff74740d3b3916f1e677622d27b 亮剑 1 3 http://tv.sohu.com/s2011/liangjian/
20111230013423 70ccb6376487856cbe3bb504bffa9c1c 亮剑 1 1 http://tv.sohu.com/s2011/liangjian/
20111230032827 118c57f32d2c053059d6c0690977080a 亮剑 2 1 http://tv.sogou.com/series/wxt4vu5644qmdqn5um.html?p=40230600
20111230040244 de4112c7c970329fb99730cfbf408c62 亮剑 2 1 http://tv.sogou.com/series/wxt4vu5644qmdqn5um.html?p=40230600
20111230040323 295ecf2ec896525e065d755c462d1410 亮剑 1 1 http://tv.sohu.com/s2011/liangjian/
20111230043913 8196855fbb30033cde52aee315b11bbd 亮剑 5 1 http://www.youku.com/show_page/id_zcbfb715e962411de83b1.html
20111230052332 325807155a90a376b4fea826f13282e5 亮剑 2 1 http://tv.sogou.com/series/wxt4vu5644qmdqn5um.html?p=40230600
20111230052335 325807155a90a376b4fea826f13282e5 亮剑 3 2 http://www.iqiyi.com/dianshiju/lj.html?src=alddsj
20111230084909 1870be054489ec858a7694a78c1311e6 亮剑 2 1 http://tv.sogou.com/series/wxt4vu5644qmdqn5um.html?p=40230600
20111230091503 c0fbc36d1c97babcd336b26926709197 亮剑 6 1 http://baike.baidu.com/view/3340.htm
20111230091701 680fac79aab4682d3bb91a0241ee79a6 亮剑 1 1 http://tv.sohu.com/s2011/liangjian/
20111230092558 18a523a8c39c242d44236835a9b3a7b6 亮剑 1 1 http://tv.sohu.com/s2011/liangjian/
20111230093823 a9bfdff7b47717cc3d0d946ebb714f46 亮剑 5 1 http://www.youku.com/show_page/id_zcbfb715e962411de83b1.html
20111230094129 75ff86584a04922da136bc5001a793de 亮剑 3 1 http://www.iqiyi.com/dianshiju/lj.html?src=alddsj
20111230095018 2cbc43d3721f8a76c027b9dd8350979b 亮剑 1 1 http://tv.sohu.com/s2011/liangjian/
Time taken: 0.119 seconds, Fetched: 20 row(s)
6、导出数据
- 查看文件
hadoop fs -ls /user/hive/warehouse/sogou.db/sogou_xj_backup/000000_0
- 导出文件
hadoop fs -get /user/hive/warehouse/sogou.db/sogou_xj_backup/000000_0
- 查看数据
[root@hadoop ~]# head 000000_0
201112300036472e6a9aa494ae241b95664ff56971b72a仙剑奇侠传三21http://www.tvmao.com/drama/LWlTVw==/episode
201112300037272e6a9aa494ae241b95664ff56971b72a仙剑奇侠传三32http://www.youku.com/show_page/id_zcc07361a962411de83b1.html
2011123001050889072e859c41da47298d967ab8d787c5仙剑奇侠传1刘亦菲版全集21http://www.tudou.com/playlist/id11482399.html
201112300124153a195290e645f623d13c2b12cce4c273仙剑奇侠传5破解 游民21http://bbs.gamersky.com/thread-927101-1-1.html
201112300125153a195290e645f623d13c2b12cce4c273仙剑奇侠传5破解 游民42http://www.gamersky.com/Soft/200712/10802.shtml
201112300125323a195290e645f623d13c2b12cce4c273仙剑奇侠传5破解教程21http://zhidao.baidu.com/question/293761707
201112300125443a195290e645f623d13c2b12cce4c273仙剑奇侠传5破解教程12http://www.duote.com/tech/7/18820.html
201112300125443a195290e645f623d13c2b12cce4c273仙剑奇侠传5破解教程93http://game.shangdu.com/news/pcgl/2011-08-17/359792.html
201112300131323a195290e645f623d13c2b12cce4c273仙剑奇侠传5破解教程84http://my.nuaa.edu.cn/thread-152913-1-1.html
201112300153463a195290e645f623d13c2b12cce4c273仙剑奇侠传5破解教程105http://tieba.baidu.com/p/1240255091
7、总结
- 需要注意的是hive中主要是操作数据库创建与数据导入,在bash命令中使用hadoop操作HDFS文件
- 需要手动创建目录,掌握数据导出到本地文件的方法
- 注意数据库备份命令,备份文件位置及写法