Hive简单SQL语句描述

最新推荐文章于 2024-06-21 15:56:07 发布

大饼辉

最新推荐文章于 2024-06-21 15:56:07 发布

阅读量3k

点赞数 1

分类专栏：大数据云计算Linux相关文章标签： Hive SQL 常用命令

本文链接：https://blog.csdn.net/ghuiL/article/details/45170245

版权

大数据云计算Linux相关专栏收录该内容

13 篇文章 0 订阅

订阅专栏

hive是基于Hadoop的一个数据仓库工具，可以将结构化的数据文件映射为一张数据库表，并提供简单的sql查询功能，可以将sql语句转换为MapReduce任务进行运行。其优点是学习成本低，可以通过类SQL语句快速实现简单的MapReduce统计，不必开发专门的MapReduce应用，十分适合数据仓库的统计分析。（该段话来自于百度百科）

下面进行一些简单的Hive命令，供Hive小白看。

1.准备测试数据

新建文件ht_p.txt，然后输入一组数据：

vi /ht_p.txt

数据之间以Tab符号隔开。

01      lgh     24
02      zmy     25
03      zyx     23
04      lgq     19
05      zly     28

2.建表

首先 输入hive命令进入hive模式，然后通过以下命令建表：

hive> create table ht_person(id int, name string, age int) row format delimited fields terminated by '\t';
OK
Time taken: 3.77 seconds

建表完成后，输入show tables查询表：

hive> show tables;
OK
ht_person
Time taken: 0.04 seconds, Fetched: 1 row(s)

3.数据导入

将/ht_p.txt文本数据导入到hive库中，SQL代码如下：

hive> load data local inpath '/ht_p.txt' overwrite into table ht_person;
Copying data from file:/ht_p.txt
Copying file: file:/ht_p.txt
Loading data to table default.ht_person
Table default.ht_person stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 50, raw_data_size: 0]
OK
Time taken: 1.679 seconds

4.查询数据

查询表格中的所有数据：

hive> select * from ht_person;
OK
1       lgh     24
2       zmy     25
3       zyx     23
4       lgq     19
5       zly     28
Time taken: 0.431 seconds, Fetched: 5 row(s)

查询表结构：

hive> desc ht_person;
OK
id                      int                     None
name                    string                  None
age                     int                     None
Time taken: 0.11 seconds, Fetched: 3 row(s)

5.修改表

5.1 增加一个列

hive> alter table ht_person add columns (place string);
OK
Time taken: 0.834 seconds

插入完成后，查询表结构：

hive> desc ht_person;
OK
id                      int                     None
name                    string                  None
age                     int                     None
place                   string                  None
Time taken: 0.08 seconds, Fetched: 4 row(s)

5.2 修改表名

hive> alter table ht_person rename to ht_persons;
OK
Time taken: 0.763 seconds

修改完成后，查看表如下：

hive> show tables;
OK
ht_persons
Time taken: 0.035 seconds, Fetched: 1 row(s)

5.3 删除表（最好别删除，后面例子用）

hive> drop table ht_persons;
OK
Time taken: 0.763 seconds

删除完成后，查看表如下：

hive> show tables;
OK
Time taken: 0.763 seconds

6.用已有表创建新表，并复制原有表

如果在创建过程中出现各种错误可参见： http://blog.csdn.net/ghuil/article/details/45170993

hive> create table ht_p2 as select * from ht_persons;
Total MapReduce jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1429240731064_0011, Tracking URL = http://compute-24-13.local:8088/proxy/application_1429240731064_0011/
Kill Command = /home/hbase/install/hadoop-2.3.0-cdh5.1.0/bin/hadoop job  -kill job_1429240731064_0011
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-04-21 16:00:12,105 Stage-1 map = 0%,  reduce = 0%
2015-04-21 16:00:17,307 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.46 sec
2015-04-21 16:00:18,344 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.46 sec
2015-04-21 16:00:19,377 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.46 sec
MapReduce Total cumulative CPU time: 1 seconds 460 msec
Ended Job = job_1429240731064_0011
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://compute-24-13.local:9000/tmp/hive-root/hive_2015-04-21_16-00-01_866_8125068571975125560-1/-ext-10001
Moving data to: hdfs://compute-24-13.local:9000/user/hive/warehouse/ht_p2
Table default.ht_p2 stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 60, raw_data_size: 0]
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 1.46 sec   HDFS Read: 275 HDFS Write: 60 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 460 msec
OK
Time taken: 20.57 seconds
hive> show tables;
OK
ht_p2
ht_persons
Time taken: 0.088 seconds, Fetched: 2 row(s)

创建完成后，通过select查询如下：

hive> select * from ht_p2;
OK
1       lgh     24      NULL
2       zmy     25      NULL
3       zyx     23      NULL
4       lgq     19      NULL
5       zly     28      NULL
Time taken: 0.314 seconds, Fetched: 5 row(s)

可见创建成功。

7. 创建新表时仅复制表结构，不要数据

hive> create table ht_p3 like ht_persons;
OK
Time taken: 1.839 seconds

创建完成后，通过select查询如下：

hive> select * from ht_p3;
OK
Time taken: 0.279 seconds

可见表中并无数据。

8.复合查询

8.1 查询年龄23岁以上的人：

hive> from (select name, age as big from ht_persons) t
    > select t.name, t.big
    > where t.big > 23
    > limit 3;

结果如下：

Total MapReduce CPU Time Spent: 1 seconds 770 msec
OK
lgh     24
zmy     25
zly     28
Time taken: 22.579 seconds, Fetched: 3 row(s)

8.2 JOIN链接查询

hive> select t1.name, t1.age
    > from ht_persons t1 join ht_p2 t2 on t1.age=t2.age
    > where t1.age>23;

结果如下：

Total MapReduce CPU Time Spent: 4 seconds 930 msec
OK
lgh     24
zmy     25
zly     28
Time taken: 24.229 seconds, Fetched: 3 row(s)

8.3 聚合查询

hive> select count(*), avg(t.age)
    > from ht_persons t;

结果如下：

Total MapReduce CPU Time Spent: 3 seconds 220 msec
OK
5       23.8
Time taken: 24.699 seconds, Fetched: 1 row(s)

全文完，转载请注明出处： http://blog.csdn.net/ghuil/article/details/45170245

大饼辉

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Hive简单SQL语句描述

hive是基于Hadoop的一个数据仓库工具，可以将结构化的数据文件映射为一张数据库表，并提供简单的sql查询功能，可以将sql语句转换为MapReduce任务进行运行。其优点是学习成本低，可以通过类SQL语句快速实现简单的MapReduce统计，不必开发专门的MapReduce应用，十分适合数据仓库的统计分析。（该段话来自于百度百科）下面进行一些简单的Hive命令，供Hive小白看。
复制链接

扫一扫