其实在公司我们用的是0.13.0这个版本,看见官网有2.1.1版本,所以想先体验下,具体这两个版本有啥区别,还没有去研究过
先说下hive安装的几种方式
1. 内嵌方式,使用的是derby数据库存储元数据,默认也是采用的这种数据库,但是有个缺点,一次只能有一个hiveclient链接
2. 本地模式,使用本地的mysql数据库存储元数据
3. 远程模式,mysql和hive独立开来
在公司里一般是使用第三种方式,首先安装远程模式,启动metasotre服务,服务的接口地址:thrift://localhost:9083,
然后给公司的各个部门分配hive客户端,客户端就配置一个元数据的地址
还有那就是需要配置HADOOP_HOME,我一般是直接配置到~/.barcrc文件,下面是我机器的所有配置,大家根据自己的安装目录配置下
vi ~/.bashrc
ELASTICSEARCH_HOME=/home/qun/soft/elasticsearch-2.3.4
OPENRESTY=/home/qun/nginx/openresty
FLUME_HOME=/home/qun/apache-flume-1.6.0-bin
export FLUME_CONF_DIR=/home/qun/apache-flume-1.6.0-bin/conf
JAVA_HOME=/home/qun/soft/jdk1.8.0_91
SOLR_HOME=/home/qun/solr/solr-6.0.0
SCALA_HOME=/home/qun/scala-2.11.8
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
SPARK_HOME=/home/qun/spark
#HADOOP_HOME=/home/qun/hadoop-2.6.0
HADOOP_HOME=/home/qun/soft/hadoop-2.8.0
MAVEN_HOME=/home/qun/apache-maven-3.3.9
STORM_HOME=/home/qun/apache-storm-0.9.3
HIVE_HOME=/home/qun/soft/apache-hive-2.1.1-bin
KAFKA_HOME=/home/qun/kafka_2.11-0.10.0.0
ZOOKEEPER_HOME=/home/qun/zookeeper-3.4.6
export JAVA_HOME PATH CLASSPATH
export PATH=$HIVE_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH:$SQOOP_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin:$PRESTO_HOME/bin:$HBASE_HOME/bin:$SCALA_HOME/bin:$STORM_HOME/bin:$MAVEN_HOME/bin:$KAFKA_HOME/bin:$ZOOKEEPER_HOME/bin:$SOLR_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin:$SOLR_HOME/bin:$FLUME_HOME/bin:$OPENRESTY/nginx/sbin:$ELASTICSEARCH_HOME/bin
export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib
配置完后,记得刷新配置 source ~/.bashrc
先找一台机器master安装hive,启动metastore(metastore服务就一个,连接metastore的客户端可以用很多个)
下载hive安装包
wget https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-2.1.1/apache-hive-2.1.1-bin.tar.gz
tar -zxvf apache-hive-2.1.1-bin.tar.gz
cd apache-hive-2.1.1-bin/conf
mv hive-default.xml.template hive-site.xml
vi hive-site.xml
<property>
<name>system:java.io.tmpdir</name>
<value>/home/qun/soft/apache-hive-2.1.1-bin/tmpdir</value>
<description/>
</property>
<property>
<name>system:user.name</name>
<value>hive</value>
<description/>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>presto</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master:3306/hive3?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
将mysql的驱动包复制到HIVE_HOME/lib
cp mysql-connector-java-5.1.32.jar /home/qun/soft/apache-hive-2.1.1-bin/lib/
初始化数据库
bin/schematool -dbType mysql -initSchema
hive测试(直接连mysql数据库查询元数据)
bin/hive
hive> show databases;
OK
default
Time taken: 1.399 seconds, Fetched: 1 row(s)
hive> create database test;
OK
Time taken: 0.429 seconds
hive> use test;
OK
Time taken: 0.037 seconds
hive> create table t(line string);
OK
Time taken: 0.48 seconds
[qun@master tmpdir]$ hadoop dfs -ls /user/hive/warehouse/test.db
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Found 1 items
drwxr-xr-x - qun supergroup 0 2017-06-17 21:36 /user/hive/warehouse/test.db/t
hive> load data local inpath '/home/qun/soft/apache-hive-2.1.1-bin/conf/hive-site.xml' into table t;
Loading data to table test.t
OK
Time taken: 0.628 seconds
启动metastore,默认占用9083端口
./hive --service metastore &
[qun@master bin]$ netstat -anpl|grep 9083
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 0.0.0.0:9083 0.0.0.0:* LISTEN 8540/java
配置客户端
找一台机器slave2安装hive-client,将之前的apache-hive-2.1.1-bin.tar.gz复制到slave2机器上,解压,配置hive.metasoter.uris
通过thrift://master:9083链接元数据
vi hive-site.xml
<property>
<name>hive.metastore.uris</name>
<value>thrift://master:9083</value>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
<property>
<name>system:java.io.tmpdir</name>
<value>/home/qun/soft/apache-hive-2.1.1-bin/tmpdir</value>
<description/>
</property>
<property>
<name>system:user.name</name>
<value>hive</value>
<description/>
</property>
测试客户端(通过连接master启动的metastore服务获取元数据):
hive> show tables;
OK
Time taken: 1.351 seconds
hive> use test;
OK
Time taken: 0.055 seconds
hive> show tables;
OK
t
Time taken: 0.048 seconds, Fetched: 1 row(s)
hive> select coutn(*) from t;
FAILED: SemanticException [Error 10011]: Invalid function coutn
hive> select count(*) from t;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = qun_20170617222212_d724486b-70a1-4937-b7eb-e72183721c3f
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1497708218417_0002, Tracking URL = http://master:8888/proxy/application_1497708218417_0002/
Kill Command = /home/qun/soft/hadoop-2.8.0/bin/hadoop job -kill job_1497708218417_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-06-17 22:22:33,632 Stage-1 map = 0%, reduce = 0%
2017-06-17 22:22:45,680 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.17 sec
2017-06-17 22:22:56,668 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.55 sec
MapReduce Total cumulative CPU time: 4 seconds 550 msec
Ended Job = job_1497708218417_0002
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.55 sec HDFS Read: 466382 HDFS Write: 105 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 550 msec
OK
10718
Time taken: 45.346 seconds, Fetched: 1 row(s)
一般情况下会metastore,mysql会单独占用一个节点,其他客户端只需要配置mestore.uris就能执行hivesql
如果需要使用jdbc链接hive的话,还需要启动hiveserver或者hiveserver2,后续会对hiveserver和hiveserver的配置和使用写一篇文章