应用环境::
linux系统:hadoop
数据库:mysql
依需求ar包: mysql-connector-java-5.1.34-bin.jar
安装Hive
参考官网安装介绍:https://cwiki.apache.org/confluence/display/Hive/GettingStarted
1解压hive的tar包
$ tar -zxf hive-0.13.1-bin.tar.gz -C ../modules/
2.创建/tmp和hive数据仓库在HDFS之上的目录
$ bin/hdfs dfs -mkdir -p /user/hive/warehouse
$ bin/hdfs dfs -mkdir /tmp //默认已经创建了
修改目录的权限(增加组用户的写权限)
bin/hdfs dfs -chmod g+w /user/hive/warehouse
bin/hdfs dfs -chmod g+w /tmp
3.${HIVE_HOME}/conf/ 重命名生成配置文件
$ cp hive-env.sh.template hive-env.sh
$ cp hive-default.xml.template hive-site.xml
$ cp hive-log4j.properties.template hive-log4j.properties
4.${HIVE_HOME}/conf/ 修改hive-env.sh
JAVA_HOME=/opt/modules/jdk1.7.0_67
HADOOP_HOME=/opt/modules/hadoop-2.5.0
export HIVE_CONF_DIR=/opt/modules/apache-hive/conf
5)${HIVE_HOME}/conf/ 修改hive-site.xml
参考官网的介绍:
https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin
--JDBC连接的四要素:ConnectionURL DriverName UserName Password
<--!132-->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://[hostname]:3306/metastore?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<--!138-->
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<--!162-->
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
<--!168-->
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root123</value>
<description>password to use against metastore database</description>
</property>
=====================
6)拷贝jdbc驱动包到${HIVE_HOME}/lib
$ cp mysql-connector-java-5.1.34-bin.jar ../modules/apache-hive/lib/
Hive基本操作
一)启动Hive的条件1.检查hadoop的相关进程
$ jps
26514 SecondaryNameNode
27934 ResourceManager
28033 NodeManager
26232 NameNode
28590 Jps
26329 DataNode
2.启动进入Hive CLI
${HIVE_HOME}/bin存放的hive的启动命令
$ bin/hive
启动之后检查mysql数据库中metastore数据库是否自动创建成功
3.Hive表创建
//创建数据库
create database db01;
//创建表
create table student(
id int,
name string,
age int
)
row format delimited fields terminated by '\t'; //指定表格字段的分隔符
//加载数据
load data local inpath '/home/user01/student.txt' into table student;
//查询数据
select * from student;
3.修改${HIVE_HOME}/conf/ 修改hive-site.xml
显示当前所在的数据库名和查询结果的字段名
<--!55-->
<property>
<name>hive.cli.print.header</name>
<value>true</value>
<description>Whether to print the names of the columns in query output.</description>
</property>
<--!61-->
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
<description>Whether to include the current database in the Hive prompt.</description>
</property>
Hive DDL
创建/删除/清空表基本语法
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)
[(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]
ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
[STORED AS DIRECTORIES]
[
[ROW FORMAT row_format]
[STORED AS file_format]
| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later)
]
Hive 表类型简介
表类型一、管理表或内部表Table Type: MANAGED_TABLE
create table if not exists dept(
deptno int,
deptname string,
address string
)
row format delimited fields terminated by '\t';
//加载HDFS文件到Hive表中
load data inpath '/input/dept.txt' into table dept;
//用来指定原文件的列分隔符
row format delimited fields terminated by '\t';
load 如果操作的HDFS上的文件,代表着会移动或者剪切文件
desc formatted dept; //描述表结构信息
Location: hdfs://bigdata.ibeifeng.com:8020/user/hive/warehouse/db01.db/dept
Table Type: MANAGED_TABLE
表类型二、外部表
create external table emp(
empno int,
empname string,
empjob string,
mgno int,
birthday string,
salary float,
bonus float,
depno int
)
row format delimited fields terminated by '\t'
location '/input/demo';
//描述表结构
desc formatted emp;
Location: hdfs://bigdata.ibeifeng.com:8020/input/demo
Table Type: EXTERNAL_TABLE
删除内部表
drop table dept;
删除外部表
drop table emp;
清空表数据
truncate table student;
内部表和外部表的区别:
创建表
外部表创建表的时候,不会移动数到数据仓库目录中(/user/hive/warehouse),只会记录表数据存放的路径
内部表会把数据复制或剪切到表的目录下
删除表
外部表在删除表的时候只会删除表的元数据信息不会删除表数据
内部表删除时会将元数据信息和表数据同时删除
表类型三、分区表
create table emp_part(
empno int,
empname string,
empjob string,
mgrno int,
birthday string,
salary float,
bonus float,
deptno int
)
partitioned by (province string)
row format delimited fields terminated by '\t';
//向分区表加载数据
load data local inpath '/home/user01/emp.txt' into table emp_part partition (province='CHICAGO');
//描述表信息
desc formatted emp_part;
//查询全表数据
select * from emp_part;
//查询分区字段表数据
select * from emp_part where province='CHICAGO';
//查看分区信息
show partitions emp_part;
//增加分区
aler table emp_part add [if not exist] partition(provine='zhejiang',city='hangzhou')
//删除分区
aler table emp_part drop [if exist] partition(provine='zhejiang',city='hangzhou')
分区表创建表的时候需要指定分区字段,分区字段与普通字段的区别:分区字段会在HDFS表目录下生成一个分区字段名称的目录,而普通字段则不会,查询的时候可以当成普通字段来使用,一般不直接和业务直接相关。
(二)与其他文件系统的交互
1.与Linux系统交互
!ls /opt/sofware/;
2.与hdfs文件系统的交互
dfs -ls /input
dfs -mkdir /hive
3. show databases --显示数据库; //可以直接使用--对语句进行注释