工具及软件
1. ubuntu 12.04
2. hive 2.1.1
前提: 已安装hadoop 并配置环境变量 $HADOOP_HOME
下载的hive压缩文件加压, 不需要修改任何文件
进行hive 命令行
root@finder03:/finder/hive/bin# ./hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/finder/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/finder/hadoop/hadoop-2.6.5/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/finder/hive/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
hive>
测试:
hive> show databases;
OK
default
Time taken: 0.607 seconds, Fetched: 1 row(s)
默认情况下 ,使用 derby 数据库保存hive的元数据信息
现在我们修改成 mysql 来保存元数据信息。目前 只支持 derby 和mysql
1. 修改配置文件 hive-site.xml
进入到 conf目录
mv hive-default.xml.template hive-site.xml
把里面的配置修改成如下参数:
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.1.101:3306/hive?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>password$1</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>datanucleus.metadata.validate</name>
<value>false</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>true</value>
</property>
</configuration>
2. 拷贝 mysql-connector-5.1.8.jar 驱动到 lib目录下
mysq
3. 切换到mysql schema初始化
root@finder03:/finder/hive/bin# ./schematool -initSchema -dbType mysql
4. #(执行下面的语句 *.*:所有库下的所有表 %:任何IP地址或主机都可以连接)
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY '密码' WITH GRANT OPTION;
FLUSH PRIVILEGES;
5. 启动hive
注意: (如果当前的目录存在旧的 metastore_db文件,则需要删除)
#. bin/hive
可以查看 自动创建的元数据相关的表
hive> show databases;
OK
default
Time taken: 0.607 seconds, Fetched: 1 row(s)
Demo:
创建老师表 和学生表
hive> create table student(id bigint,nam varchar(255) row format delimited fields terminated by '\t';
FAILED: ParseException line 1:48 missing ) at 'row' near '<EOF>'
hive> create table student(id bigint,nam varchar(255)) row format delimited fields terminated by '\t';
OK
Time taken: 0.923 seconds
hive> show tables;
OK
student
Time taken: 0.1 seconds, Fetched: 1 row(s)
hive> create table person(id int,name string);
OK
Time taken: 0.447 seconds
hive> show tables;
OK
person
student
Time taken: 0.07 seconds, Fetched: 2 row(s)
使用WEB管理界面查看 hdfs管理页面
会在 /user/hive/warehouse下创建响应目录
mysql中 hive数据 表的数据也会发生记录
tbls表中存储 表的名字、类型等信息
columns_v2: 记录字段的信息
sdf: 记录 表对应的hdfs的数据
hive> load data local inpath '/finder/student.txt' into table student;
Loading data to table default.student
OK
Time taken: 1.595 seconds
hive> select * from student;
OK
1 的金佛
2 hellen
3 marry
NULL NULL
Time taken: 1.742 seconds, Fetched: 4 row(s)
hive> select * from student limit 2;
OK
1 的金佛
2 hellen
Time taken: 0.288 seconds, Fetched: 2 row(s)
创建外部表
MANAGED_TABLE: 内部表
EXTERNAL_TABLE:外部表
可以创建表 可以指向 HDFS文件夹
hive> dfs -ls /
> ;
Found 9 items
-rw-r--r-- 3 root supergroup 7590 2017-03-01 12:18 /cert.zip
drwxr-xr-x - root supergroup 0 2017-03-02 12:00 /finder
drwxr-xr-x - root supergroup 0 2017-03-08 14:56 /hbase
-rw-r--r-- 3 root supergroup 142249690 2017-03-01 11:41 /jdk7
drwxr-xr-x - root supergroup 0 2017-03-02 10:49 /sqoop
drwx------ - root supergroup 0 2017-03-03 17:25 /tmp
drwxr-xr-x - root supergroup 0 2017-03-03 17:26 /user
drwxr-xr-x - root supergroup 0 2017-03-01 12:35 /wcin
drwxr-xr-x - root supergroup 0 2017-03-01 12:44 /wcout
hive> dfs -mkdir /mydata;
hive> dfs -ls /
> ;
Found 10 items
-rw-r--r-- 3 root supergroup 7590 2017-03-01 12:18 /cert.zip
drwxr-xr-x - root supergroup 0 2017-03-02 12:00 /finder
drwxr-xr-x - root supergroup 0 2017-03-08 14:56 /hbase
-rw-r--r-- 3 root supergroup 142249690 2017-03-01 11:41 /jdk7
drwxr-xr-x - root supergroup 0 2017-03-08 15:53 /mydata
drwxr-xr-x - root supergroup 0 2017-03-02 10:49 /sqoop
drwx------ - root supergroup 0 2017-03-03 17:25 /tmp
drwxr-xr-x - root supergroup 0 2017-03-03 17:26 /user
drwxr-xr-x - root supergroup 0 2017-03-01 12:35 /wcin
drwxr-xr-x - root supergroup 0 2017-03-01 12:44 /wcout
hive> dfs -put /finder/student.txt /mydata/s1.txt
> ;
hive> dfs -ls /mydata
> ;
Found 1 items
-rw-r--r-- 3 root supergroup 44 2017-03-08 15:54 /mydata/s1.txt
hive> dfs -put /finder/student.txt /mydata/s2.txt
> ;
hive> dfs -ls /mydata;
Found 2 items
-rw-r--r-- 3 root supergroup 44 2017-03-08 15:54 /mydata/s1.txt
-rw-r--r-- 3 root supergroup 44 2017-03-08 15:54 /mydata/s2.txt
hive> create external student_ext (id int,name varchar(255)) row format delimited fields terminated by '\t' location '/mydata';
FAILED: ParseException line 1:16 missing TABLE at 'student_ext' near '<EOF>'
hive> create external table student_ext (id int,name varchar(255)) row format delimited fields terminated by '\t' location '/mydata';
OK
Time taken: 1.138 seconds
hive> select * from student_ext; 表对应的HDFS目录 ,目录下面的数据文件都是表的数据
OK
1 的金佛
2 hellen
3 marry
NULL NULL
1 的金佛
2 hellen
3 marry
NULL NULL
Time taken: 1.324 seconds, Fetched: 8 row(s)