为什么使用Hive
1.直接使用hadoop所面临的问题
人员学习成本太高
项目周期要求太短
MapReduce实现复杂查询逻辑开发难度太大
2.操作接口采用类SQL语法,提供快速开发的能力。
避免了去写MapReduce,减少开发人员的学习成本。
扩展功能很方便。
Hive安装
1.上传tar包
2.解压
tar -zxvf hive-1.2.1.tar.gz -C /usr/local/
3.安装mysql数据库(切换到root用户)(装在哪里没有限制,只有能联通hadoop集群的节点)
4.配置hive
(1)配置HIVE_HOME环境变量 vi conf/hive-env.sh 配置其中的$hadoop_home
(2)配置元数据库信息 vi hive-site.xml
<configuration>
<!-- mysql 连接 -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<!-- 用户名 -->
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
<!-- 密码 -->
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
<description>password to use against metastore database</description>
</property>
</configuration>
5.安装hive和mysq完成后,将mysql的连接jar包拷贝到$HIVE_HOME/lib目录下
如果出现没有权限的问题,在mysql授权(在安装mysql的机器上执行)
mysql -uroot -p
执行下面的语句 *.*
:所有库下的所有表 %:任何IP地址或主机都可以连接
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'root' WITH GRANT OPTION;
FLUSH PRIVILEGES;
6.启动hive
bin/hive
使用thrift server启动方式
hiveserver2
#启动另一个终端
beeline
!connect jdbc:hive2://hadoop1:10000 #hadoop1为安装mysql节点ip地址
7.建表(默认是内部表)
create table trade_detail(id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by '\t';
#建分区表
create table td_part(id bigint, account string, income double, expenses double, time string) partitioned by (logdate string) row format delimited fields terminated by '\t';
#建外部表
create external table td_ext(id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by '\t' location '/td_ext';
7.创建分区表
普通表和分区表区别:有大量数据增加的需要建分区表
create table book (id bigint, name string) partitioned by (pubdate string) row format delimited fields terminated by '\t';
#分区表加载数据
load data local inpath './book.txt' overwrite into table book partition (pubdate='2010-08-22');
load data local inpath '/root/data.am' into table beauty partition (nation="USA");
select nation, avg(size) from beauties group by nation order by avg(size);