Hive安装配置+练习

最新推荐文章于 2024-05-25 16:17:34 发布

6饼

最新推荐文章于 2024-05-25 16:17:34 发布

阅读量1.1k

点赞数

分类专栏：大数据文章标签： hive 数据

本文链接：https://blog.csdn.net/u013492736/article/details/72582127

版权

大数据专栏收录该内容

4 篇文章 0 订阅

订阅专栏

一. Hive的安装

hive配置步骤
- step1
- setp2
- step3
- step4
- step5
启动hive
启动metastore服务
启动hiveserver服务
启动hive客户端
查询MySQL数据库中的元数据

二. Hive的基本使用

 1. 进入hive控制台   

 2.  新建表    
       创建新表 
       导入数据t_hive.txt到t_hive表       

 3. 查看表和数据  
        查看表 
        正则匹配表名  
        查看表数据   
        查看表结构   

 4. 修改表 
    -增加一个字段 
    -重命令表名  

 5.  删除表

三. Hive交互式模式

四. 数据导入

1. 从操作本地文件系统加载数据(LOCAL) 
    -在HDFS中查找刚刚导入的数据    
    -从HDFS加载数据

五. 数据导出

 1. 从HDFS复制到HDFS其他位置    
 2. 通过Hive导出到本地文件系统 
 3. 查看本地操作系统

六. Hive查询HiveQL

1. 普通查询：排序，列别名，嵌套子查询    
2. 连接查询：JOIN    
3. 聚合查询1：count, avg 
4. 聚合查询2：count, distinct    
5. 聚合查询3：GROUP BY, HAVING

七. Hive视图

-删除视图

八. Hive分区表

1. 创建数据 
2. 创建数据表    
3. 创建分区数据表  
4. 导入数据 
5. 查看分区表    
6. 查询数据

前言

Hive是Hadoop一个程序接口，Hive让数据分析人员快速上手，Hive使用了类SQL的语法，Hive让JAVA的世界变得简单而轻巧，Hive让Hadoop普及到了程序员以外的人。
从Hive开始，让分析师们也能玩转大数据。
一. Hive的安装

系统环境

装好hadoop的环境后，我们可以把Hive装在namenode机器上(c1)。
hadoop的环境，请参考：让Hadoop跑在云端系列文章，RHadoop实践系列之一:Hadoop环境搭建
下载: apache-hive-1.2.2-bin.tar.gz
解压到：/home/cos/apache-hive-1.2.2-bin.tar.gz（解压到哪一个目录自己定义）
hive配置

1、step1
（1）、解压安装包
tar -zxvf apache-hive-1.2.2-bin.tar.gz
（2）、添加环境变量
[user@user ~]$ vim .bashrc
导入下面的环境变量

 export HIVE_HOME=自己hive的解压路径
 export PATH=$PATH:$HIVE_HOME/bin

在安装之前，需要先安装MySql，Hive默认使用内嵌的DerBy作为元数据库，但是鉴于其单会话限制，这里使用Mysql作为元数据库
检查是否装有Mysql：
netstat -tap | grep mysql
如果有输出，说明已经有mysql，如果没有需要安装

2、setp2
安装Mysql,装完Mysql的跳过这一步
[user@user home]# wget http://dev.mysql.com/get/mysql-community-release-el7-5.noarch.rpm
[user@user home]#rpm -ivh mysql-community-release-el7-5.noarch.rpm
[user@user home]# yum install mysql-community-server

已加载插件：fastestmirror, langpacks
base | 3.6 kB 00:00
extras | 3.4 kB 00:00
mysql-connectors-community | 2.5 kB 00:00
mysql-tools-community | 2.5 kB 00:00
mysql56-community | 2.5 kB 00:00
updates | 3.4 kB 00:00
(1/4): mysql-connectors-community/x86_64/primary_db | 14 kB 00:03
(2/4): mysql-tools-community/x86_64/primary_db | 33 kB 00:03
(3/4): mysql56-community/x86_64/primary_db | 168 kB 00:04
(4/4): updates/7/x86_64/primary_db | 4.8 MB 00:14
Loading mirror speeds from cached hostfile
* base: mirrors.aliyun.com
* extras: mirrors.aliyun.com
* updates: mirrors.aliyun.com
正在解决依赖关系
–> 正在检查事务
……
……
验证中 : mysql-community-libs-5.6.36-2.el7.x86_64 11/11

已安装:
mysql-community-server.x86_64 0:5.6.36-2.el7

作为依赖被安装:
mysql-community-client.x86_64 0:5.6.36-2.el7
mysql-community-common.x86_64 0:5.6.36-2.el7
mysql-community-libs.x86_64 0:5.6.36-2.el7
perl-Compress-Raw-Bzip2.x86_64 0:2.061-3.el7
perl-Compress-Raw-Zlib.x86_64 1:2.061-4.el7
perl-DBI.x86_64 0:1.627-4.el7
perl-Data-Dumper.x86_64 0:2.145-3.el7
perl-IO-Compress.noarch 0:2.061-2.el7
perl-Net-Daemon.noarch 0:0.48-5.el7
perl-PlRPC.noarch 0:0.2020-14.el7

完毕！

3、step3
启动mysql服务
[user@localhost ~]$ service mysqld start

[user@localhost ~]$ mysql -u root

Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 5
Server version: 5.6.36 MySQL Community Server (GPL)

Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type ‘help;’ or ‘\h’ for help. Type ‘\c’ to clear the current input statement.

mysql> use mysql;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed

//添加root用户的密码
mysql> update user set password = password(‘123456’) where >user=’root’；

Query OK, 4 rows affected (0.00 sec)
Rows matched: 4 Changed: 4 Warnings: 0

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

mysql> exit
Bye

[user@localhost ~]$ mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 10
Server version: 5.6.36 MySQL Community Server (GPL)

Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type ‘help;’ or ‘\h’ for help. Type ‘\c’ to clear the current input statement.

创建hive用户,数据库等
mysql> use mysql;
mysql> GRANT USAGE ON . TO ‘hive’@’localhost’ IDENTIFIED BY ‘hive’ WITH GRANT OPTION;
mysql> grant all on hive.* to hive@’%’ identified by ‘hive’;
mysql> grant all on hive.* to hive@’localhost’ identified by ‘hive’;
mysql> create database hive;
mysql> flush privileges;
mysql> ALTER DATABASE hive CHARACTER SET latin1;

4、step4
配置hive中的文件

[user@localhost conf]$ sudo cp hive-default.xml.template hive-site.xml

[sudo] password for user:
[user@localhost conf]$ ls
beeline-log4j.properties.template    hive-log4j.properties.template
hive-default.xml.template            hive-site.xml
hive-env.sh.template                 ivysettings.xml
hive-exec-log4j.properties.template

[user@localhost conf]$ sudo vim hive-site.xml
[sudo] password for user:

主要修改以下参数

<property> 
   <name>javax.jdo.option.ConnectionURL </name> 
   <value>jdbc:mysql://localhost:3306/hive </value> //hive 为自己新建数据库的名字
</property> 

<property> 
   <name>javax.jdo.option.ConnectionDriverName </name> 
   <value>com.mysql.jdbc.Driver </value> //驱动
</property>


<property> 
   <name>javax.jdo.option.ConnectionPassword </name> 
   <value>hive </value>  //数据库密码
</property> 

<property> 
   <name>datanucleus.autoCreateSchema </name> 
   <value>true</value> 
</property> 

 <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value> //数据库用户名
    <description>Username to use against metastore database</description>
  </property>


  <property>
    <name>hive.exec.local.scratchdir</name>
    <value>/home/hdpsrc/hive/iotmp</value>// 自定义目录
    <description>Local scratch space for Hive jobs</description>
  </property>
  <property>
    <name>hive.downloaded.resources.dir</name>
    <value>/home/hdpsrc/hive/iotmp</value> //自定义目录
    <description>Temporary local directory for added resources in the remote file system.</description>
  </property>
  <property>
    <name>hive.querylog.location</name>
    <value>/home/hdpsrc/hive/iotmp</value> 自定义目录
    <description>Location of Hive run time structured log file</description>
  </property>
<property> <name>hive.server2.logging.operation.log.location</name>
     <value>自定义目录/operation_logs</value>//注意operation_logs文件夹也需要自己新建
     <description>Top level directory where operation logs are stored if logging functionality is enabled</description>
</property>


[user@localhost conf]$ sudo cp hive-log4j.properties.template hive-log4j.properties
[sudo] password for user:

[user@localhost conf]$ sudo vim hive-log4j.properties

#log4j.appender.EventCounter=org.apache.hadoop.hive.shims.HiveEventCounter
log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter

配置完后以上文件后，把mysql-connector的jar包放到hive的lib目录下，这一步不能少

5、step5
启动hadoop

[user@localhost hadoop-2.7.3]$  ./sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: namenode running as process 3931. Stop it first.
localhost: datanode running as process 4055. Stop it first.
Starting secondary namenodes [0.0.0.0]
0.0.0.0: secondarynamenode running as process 4255. Stop it first.
starting yarn daemons
resourcemanager running as process 4451. Stop it first.
localhost: nodemanager running as process 4566. Stop it first.
[user@localhost hadoop-2.7.3]$ jps
4451 ResourceManager
4566 NodeManager
4055 DataNode
3931 NameNode
10061 Jps
4255 SecondaryNameNode

[user@localhost apache-hive-1.2.2-bin]$ hive

hive> show databases;
OK
default
Time taken: 1.926 seconds, Fetched: 1 row(s)

可能出现的问题：
1. 在安装mysql数据库是出现：/var/run/yum.pid已被锁定，PID为1610的另一个程序正在运行。
解决办法 rm -f /var/run/yum.pid
删除文件后再次运行yum可用。
2. hadoop起来后，输入hive，报RuntimeException的异常。
尝试换个其他版本的mysql-contentor jar包，jar包从windows系统直接拖进虚拟机可以会出现问题，建议从虚拟机里下jar包

接下来，在hdfs上面，创建目录

$HADOOP_HOME    /bin/hadoop fs -mkdir /tmp
$HADOOP_HOME    /bin/hadoop fs -mkdir /user/warehouse
//如果报错，输入hadoop fs -mkdir -p /user/warehouse
$HADOOP_HOME    /bin/hadoop fs -chmod g+w /tmp //赋予权限
$HADOOP_HOME    /bin/hadoop fs -chmod g+w /user/warehouse//赋予权限

启动hive
启动metastore服务
~ bin/hive –service metastore &

输出信息：Starting Hive Metastore Server…

启动hiveserver服务
~ bin/hive –service hiveserver2 &
输出信息:Starting Hive Thrift Server…

启动hive客户端
~ bin/hive shell

hive> show tables
OK
查询MySQL数据库中的元数据
~ mysql –u root -p
mysql> use hive;
Database changed

mysql> show tables;
+-------------------------+
| Tables_in_hive |
+-------------------------+
| BUCKETING_COLS          |
| CDS                     |
| COLUMNS_V2              |
| DATABASE_PARAMS         |
| DBS                     |
| IDXS                    |
| INDEX_PARAMS            |
| PARTITIONS              |
| PARTITION_KEYS          |
| PARTITION_KEY_VALS      |
| PARTITION_PARAMS        |
| PART_COL_PRIVS          |
| PART_PRIVS              |
| SDS                     |
| SD_PARAMS               |
| SEQUENCE_TABLE          |
| SERDES                  |
| SERDE_PARAMS            |
| SORT_COLS               |
| TABLE_PARAMS            |
| TBLS                    |
| TBL_COL_PRIVS           |
| TBL_PRIVS               |
+-------------------------+
23 rows in set (0.00 sec)

Hive已经成功安装，下面是hive的使用攻略。

二. Hive的基本使用

进入hive控制台

~ cd cd /home/apache-hive-1.2.2-bin/

~ bin/hive shell

输出：Logging initialized using configuration in。。。

2．新建表

创建数据(文本以tab分隔)

sudo vim /home/hive/t_hive.txt //新建t_hive.txt文件，位置自己定

输入数据，数字之间距离为tap间隔(直接粘贴，上传到hive上可能会出现问题)

16      2       3
61      12      13
41      2       31
17      21      3
71      2       31
1       12      34
11      2       34

创建新表

hive> CREATE TABLE t_hive (a int, b int, c int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
OK
Time taken: 0.489 seconds


>导入数据t_hive.txt到t_hive表
hive> LOAD DATA LOCAL INPATH '/home/hive/t_hive.txt' OVERWRITE INTO TABLE t_hive ;

>Loading data to table default.t_hive

>OK
Time taken: 0.397 seconds

>**3.查看表和数据**
    **查看表** 
hive> show tables;
OK
t_hive
Time taken: 0.099 seconds

>**正则匹配表名**
hive>show tables '*t*';
OK
t_hive
Time taken: 0.065 seconds

>**查看表数据**

hive> select * from t_hive;

OK
16      2       3
61      12      13
41      2       31
17      21      3
71      2       31
1       12      34
11      2       34

Time taken: 0.264 seconds

查看表结构

hive> desc t_hive;
OK
a       int
b       int
c       int

Time taken: 0.1 seconds

4.修改表

增加一个字段

hive> ALTER TABLE t_hive ADD COLUMNS (new_col String);
OK
Time taken: 0.186 seconds
hive> desc t_hive;
OK
a       int
b       int
c       int
new_col string
Time taken: 0.086 seconds

重命令表名

~ ALTER TABLE t_hive RENAME TO t_hadoop;
OK
Time taken: 0.45 seconds
hive> show tables;
OK
t_hadoop
Time taken: 0.07 seconds

5.删除表

hive> DROP TABLE t_hadoop;
OK
Time taken: 0.767 seconds

hive> show tables;
OK
Time taken: 0.064 seconds

三. Hive交互式模式

•   quit,exit:  退出交互式shell
•   reset: 重置配置为默认值
•   set <key>=<value> : 修改特定变量的值(如果变量名拼写错误，不会报错)
•   set :  输出用户覆盖的hive配置变量
•   set -v : 输出所有Hadoop和Hive的配置变量
•   add FILE[S] *, add JAR[S] *, add ARCHIVE[S] * : 添加 一个或多个 file, jar, archives到分布式缓存
•   list FILE[S], list JAR[S], list ARCHIVE[S] : 输出已经添加到分布式缓存的资源。
•   list FILE[S] *, list JAR[S] *,list ARCHIVE[S] * : 检查给定的资源是否添加到分布式缓存
•   delete FILE[S] *,delete JAR[S] *,delete ARCHIVE[S] * : 从分布式缓存删除指定的资源
•   ! <command> :  从Hive shell执行一个shell命令
•   dfs <dfs command> :  从Hive shell执行一个dfs命令
•   <query string> : 执行一个Hive 查询，然后输出结果到标准输出
•   source FILE <filepath>:  在CLI里执行一个hive脚本文件

四. 数据导入
还以刚才的t_hive为例。

创建表结构
hive> CREATE TABLE t_hive (a int, b int, c int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t’;
1.从操作本地文件系统加载数据(LOCAL)

hive> LOAD DATA LOCAL INPATH '/home/hive/t_hive.txt' OVERWRITE INTO TABLE t_hive ;

Loading data to table default.t_hive
OK
Time taken: 0.612 seconds

在HDFS中查找刚刚导入的数据
[user@localhost hadoop-2.7.3] $cd bin/ [user@localhost bin]$ hadoop fs -cat /user/warehouse/t_hive/t_hive.txt
16 2 3
61 12 13
41 2 31
17 21 3
71 2 31
1 12 34
11 2 34
从HDFS加载数据
创建表t_hive2
hive> CREATE TABLE t_hive2 (a int, b int, c int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t’;

从HDFS加载数据
hive> LOAD DATA INPATH ‘/user/warehouse/t_hive/t_hive.txt’ OVERWRITE INTO TABLE t_hive2;
Loading data to table default.t_hive2
Table default.t_hive2 stats: [numFiles=1, numRows=0, totalSize=68, rawDataSize=0]
OK
Time taken: 2.649 seconds
hive>

查看数据

hive> select * from t_hive2;
OK
16      2       3
61      12      13
41      2       31
17      21      3
71      2       31
1       12      34
11      2       34
Time taken: 0.287 seconds

从其他表导入数据

hive> INSERT OVERWRITE TABLE t_hive2 SELECT * FROM t_hive ;

Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201307131407_0002, Tracking URL = http://c1.wtmart.com:50030/jobdetails.jsp?jobid=job_201307131407_0002
Kill Command = /home/cos/toolkit/hadoop-1.0.3/libexec/../bin/hadoop job  -Dmapred.job.tracker=hdfs://c1.wtmart.com:9001 -kill job_201307131407_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2013-07-16 10:32:41,979 Stage-1 map = 0%,  reduce = 0%
2013-07-16 10:32:48,034 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.03 sec
2013-07-16 10:32:49,050 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.03 sec
2013-07-16 10:32:50,068 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.03 sec
2013-07-16 10:32:51,082 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.03 sec
2013-07-16 10:32:52,093 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.03 sec
2013-07-16 10:32:53,102 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.03 sec
2013-07-16 10:32:54,112 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 1.03 sec
MapReduce Total cumulative CPU time: 1 seconds 30 msec
Ended Job = job_201307131407_0002
Ended Job = -314818888, job is filtered out (removed at runtime).
Moving data to: hdfs://c1.wtmart.com:9000/tmp/hive-cos/hive_2013-07-16_10-32-31_323_5732404975764014154/-ext-10000
Loading data to table default.t_hive2
Deleted hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hive2
Table default.t_hive2 stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 56, raw_data_size: 0]
7 Rows loaded to t_hive2
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 1.03 sec   HDFS Read: 273 HDFS Write: 56 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 30 msec
OK
Time taken: 23.227 seconds

hive> select * from t_hive2;
OK
16      2       3
61      12      13
41      2       31
17      21      3
71      2       31
1       12      34
11      2       34
Time taken: 0.134 seconds

创建表并从其他表导入数据

删除表

hive> DROP TABLE t_hive;

创建表并从其他表导入数据

hive> CREATE TABLE t_hive AS SELECT * FROM t_hive2 ;

Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201307131407_0003, Tracking URL = http://c1.wtmart.com:50030/jobdetails.jsp?jobid=job_201307131407_0003
Kill Command = /home/cos/toolkit/hadoop-1.0.3/libexec/../bin/hadoop job  -Dmapred.job.tracker=hdfs://c1.wtmart.com:9001 -kill job_201307131407_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2013-07-16 10:36:48,612 Stage-1 map = 0%,  reduce = 0%
2013-07-16 10:36:54,648 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13 sec
2013-07-16 10:36:55,657 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13 sec
2013-07-16 10:36:56,666 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13 sec
2013-07-16 10:36:57,673 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13 sec
2013-07-16 10:36:58,683 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13 sec
2013-07-16 10:36:59,691 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 1.13 sec
MapReduce Total cumulative CPU time: 1 seconds 130 msec
Ended Job = job_201307131407_0003
Ended Job = -670956236, job is filtered out (removed at runtime).
Moving data to: hdfs://c1.wtmart.com:9000/tmp/hive-cos/hive_2013-07-16_10-36-39_986_1343249562812540343/-ext-10001
Moving data to: hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hive
Table default.t_hive stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 56, raw_data_size: 0]
7 Rows loaded to hdfs://c1.wtmart.com:9000/tmp/hive-cos/hive_2013-07-16_10-36-39_986_1343249562812540343/-ext-10000
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 1.13 sec   HDFS Read: 272 HDFS Write: 56 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 130 msec
OK
Time taken: 20.13 seconds

hive> select * from t_hive;
OK
16      2       3
61      12      13
41      2       31
17      21      3
71      2       31
1       12      34
11      2       34
Time taken: 0.109 seconds

仅复制表结构不导数据

hive> CREATE TABLE t_hive3 LIKE t_hive;
hive> select * from t_hive3;
OK
Time taken: 0.077 seconds

从MySQL数据库导入数据
我们将在介绍Sqoop时讲。

五. 数据导出
从HDFS复制到HDFS其他位置

~ hadoop fs -cp /user/warehouse/t_hive/

~ hadoop fs -ls /t_hive
Found 1 items
-rw-r--r--   1 cos supergroup         56 2013-07-16 10:41 /t_hive/000000_0

~ hadoop fs -cat /t_hive/000000_0
1623
611213
41231
17213
71231
11234
11234

通过Hive导出到本地文件系统

hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/t_hive' SELECT * FROM t_hive;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201307131407_0005, Tracking URL = http://c1.wtmart.com:50030/jobdetails.jsp?jobid=job_201307131407_0005
Kill Command = /home/cos/toolkit/hadoop-1.0.3/libexec/../bin/hadoop job  -Dmapred.job.tracker=hdfs://c1.wtmart.com:9001 -kill job_201307131407_0005
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2013-07-16 10:46:24,774 Stage-1 map = 0%,  reduce = 0%
2013-07-16 10:46:30,823 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.87 sec
2013-07-16 10:46:31,833 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.87 sec
2013-07-16 10:46:32,844 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.87 sec
2013-07-16 10:46:33,856 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.87 sec
2013-07-16 10:46:34,865 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.87 sec
2013-07-16 10:46:35,873 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.87 sec
2013-07-16 10:46:36,884 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 0.87 sec
MapReduce Total cumulative CPU time: 870 msec
Ended Job = job_201307131407_0005
Copying data to local directory /tmp/t_hive
Copying data to local directory /tmp/t_hive
7 Rows loaded to /tmp/t_hive
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 0.87 sec   HDFS Read: 271 HDFS Write: 56 SUCCESS
Total MapReduce CPU Time Spent: 870 msec
OK
Time taken: 23.369 seconds

查看本地操作系统
hive> ! cat /tmp/t_hive/000000_0;
hive> 1623
611213
41231
17213
71231
11234
11234

六. Hive查询HiveQL
注：以下代码将去掉map,reduce的日志输出部分。
普通查询：排序，列别名，嵌套子查询

hive> FROM (

SELECT b,c as c2 FROM t_hive
) t
SELECT t.b, t.c2
WHERE b>2
LIMIT 2;
12 13
21 3
连接查询：JOIN

hive> SELECT t1.a,t1.b,t2.a,t2.b

FROM t_hive t1 JOIN t_hive2 t2 on t1.a=t2.a
WHERE t1.c>10;

1 12 1 12
11 2 11 2
41 2 41 2
61 12 61 12
71 2 71 2
聚合查询1：count, avg

hive> SELECT count(*), avg(a) FROM t_hive;
7 31.142857142857142
聚合查询2：count, distinct

hive> SELECT count(DISTINCT b) FROM t_hive;
3
聚合查询3：GROUP BY, HAVING

GROUP BY

hive> SELECT avg(a),b,sum(c) FROM t_hive GROUP BY b,c
16.0 2 3
56.0 2 62
11.0 2 34
61.0 12 13
1.0 12 34
17.0 21 3

HAVING

hive> SELECT avg(a),b,sum(c) FROM t_hive GROUP BY b,c HAVING sum(c)>30
56.0 2 62
11.0 2 34
1.0 12 34
七. Hive视图
Hive视图和数据库视图的概念是一样的，我们还以t_hive为例。

hive> CREATE VIEW v_hive AS SELECT a,b FROM t_hive where c>30;
hive> select * from v_hive;
41 2
71 2
1 12
11 2
删除视图

hive> DROP VIEW IF EXISTS v_hive;
OK
Time taken: 0.495 seconds
八. Hive分区表
分区表是数据库的基本概念，但很多时候数据量不大，我们完全用不到分区表。Hive是一种OLAP数据仓库软件，涉及的数据量是非常大的，所以分区表在这个场景就显得非常重要！！
下面我们重新定义一个数据表结构：t_hft
创建数据

~ sudo vim /home/cos/demo/t_hft_20130627.csv
000001,092023,9.76
000002,091947,8.99
000004,092002,9.79
000005,091514,2.2
000001,092008,9.70
000001,092059,9.45

~ vi /home/cos/demo/t_hft_20130628.csv
000001,092023,9.76
000002,091947,8.99
000004,092002,9.79
000005,091514,2.2
000001,092008,9.70
000001,092059,9.45
创建数据表

DROP TABLE IF EXISTS t_hft;
CREATE TABLE t_hft(
SecurityID STRING,
tradeTime STRING,
PreClosePx DOUBLE
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’;
创建分区数据表
根据业务：按天和股票ID进行分区设计

DROP TABLE IF EXISTS t_hft;
CREATE TABLE t_hft(
SecurityID STRING,
tradeTime STRING,
PreClosePx DOUBLE
) PARTITIONED BY (tradeDate INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’;
导入数据

20130627

hive> LOAD DATA LOCAL INPATH ‘/home/cos/demo/t_hft_20130627.csv’ OVERWRITE INTO TABLE t_hft PARTITION (tradeDate=20130627);
Copying data from file:/home/cos/demo/t_hft_20130627.csv
Copying file: file:/home/cos/demo/t_hft_20130627.csv
Loading data to table default.t_hft partition (tradedate=20130627)

20130628

hive> LOAD DATA LOCAL INPATH ‘/home/cos/demo/t_hft_20130628.csv’ OVERWRITE INTO TABLE t_hft PARTITION (tradeDate=20130628);
Copying data from file:/home/cos/demo/t_hft_20130628.csv
Copying file: file:/home/cos/demo/t_hft_20130628.csv
Loading data to table default.t_hft partition (tradedate=20130628)

查看分区表

hive> SHOW PARTITIONS t_hft;
tradedate=20130627
tradedate=20130628
Time taken: 0.082 seconds
查询数据

hive> select * from t_hft where securityid=’000001’;
000001 092023 9.76 20130627
000001 092008 9.7 20130627
000001 092059 9.45 20130627
000001 092023 9.76 20130628
000001 092008 9.7 20130628
000001 092059 9.45 20130628

hive> select * from t_hft where tradedate=20130627 and PreClosePx<9;
000002 091947 8.99 20130627
000005 091514 2.2 20130627
Hive基于使用完成，这些都是日常的操作。后面我会继续讲一下，HiveQL优化及Hive的运维。

可能遇到的问题
1. hive –service hiveserver &
Exception in thread “main” java.lang.ClassNotFoundException:org.apache.hadoop.hive.service.HiveServer

这是原来表结构
hive> desc t_hive;
OK
a int
b int
c int
Time taken: 3.201 seconds, Fetched: 3 row(s)

想新增加一个字段，报错
hive> ALTER TABLE t_hive ADD COLUMNS (new_col String);
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table.
For direct MetaStore DB connections, we don’t support retries at the client level.

hive.log 报错内容
2017-05-19 14:53:41,036 ERROR [main]: ql.Driver (SessionState.java:printError(960)) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. For direct MetaStore DB connections, we don’t support retries at the client level.
2017-05-19 14:53:41,036 INFO [main]: metadata.Hive (Hive.java:logDumpPhase(3294)) - Dumping metastore api call timing information for : execution phase
2017-05-19 14:53:41,037 INFO [main]: metadata.Hive (Hive.java:dumpAndClearMetaCallTiming(3285)) - Total time spent in this metastore function was greater than 1000ms : getTable_(String, String, )=2108
2017-05-19 14:53:41,037 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) -
2017-05-19 14:53:41,037 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) -
2017-05-19 14:53:41,037 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) -
2017-05-19 14:53:41,037 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) -
2017-05-19 14:53:41,037 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) -

解决方法：重新换一个版本的mysql驱动，替换掉hive的lib目录中之前的版本，我换成了5.1.41版本的，可以用。

6饼

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
Hive安装配置+练习

目录一. Hive的安装 hive配置步骤step1 setp2step3step4step5启动hive启动metastore服务启动hiveserver服务启动hive客户端查询MySQL数据库中的元数据二. Hive的基本使用 1. 进入hive控制台 2. 新建表创建新表导入数据t_hive.txt到
复制链接

扫一扫