我是在hadoop用户里安装hive,我安装hive是为了配合Spark的使用。安装hive之前我们要先配置好mysql
安装Mysql的教程请参考:http://blog.csdn.net/yl20175514/article/details/79009468
安装环境:系统是 centos7;
1.下载hive的linux的安装包,下载地址:https://hive.apache.org/downloads.html
我下载的版本是:apache-hive-2.3.2-bin.tar.gz 。
解压hive的安装包 命令如下:
tar -zxf apache-hive-2.3.2-bin.tar.gz
解压成功之后我们要配置,配置如下:
[hadoop@hadoop ~]$ vi .bashrc
#在。bashrc文件插入下面代码:
export HIVE_HOME=/home/hadoop/opt/apache-hive-2.3.2-bin
export PATH=$PATH:$HIVE_HOME/bin
[hadoop@hadoop ~]$ source .bashrc
配置完成之后我们要进入hive的解压包中进入lib目录:
[hadoop@hadoop ~]$ cd opt/apache-hive-2.3.2-bin/
[hadoop@hadoop apache-hive-2.3.2-bin]$ ll
总用量 60
drwxrwxr-x. 3 hadoop hadoop 4096 1月 5 23:59 bin
drwxrwxr-x. 2 hadoop hadoop 4096 1月 5 23:59 binary-package-licenses
drwxrwxr-x. 2 hadoop hadoop 4096 1月 6 00:32 conf
drwxrwxr-x. 4 hadoop hadoop 32 1月 5 23:59 examples
drwxrwxr-x. 7 hadoop hadoop 63 1月 5 23:59 hcatalog
drwxrwxr-x. 2 hadoop hadoop 43 1月 5 23:59 jdbc
drwxrwxr-x. 4 hadoop hadoop 12288 1月 6 00:06 lib
-rw-r--r--. 1 hadoop hadoop 20798 11月 10 00:26 LICENSE
-rw-r--r--. 1 hadoop hadoop 230 11月 10 00:26 NOTICE
-rw-r--r--. 1 hadoop hadoop 1979 11月 10 00:58 RELEASE_NOTES.txt
drwxrwxr-x. 4 hadoop hadoop 33 1月 5 23:59 scripts
[hadoop@hadoop apache-hive-2.3.2-bin]$ cd lib/
[hadoop@hadoop lib]$
进入lib中我们要下载mysql的jar包(这里我的jar包是:
mysql-connector-java-5.1.38.jar
):
[hadoop@hadoop lib]$ wget http://cdn.mysql.com/Downloads/Connector-J/mysql-connector-java-5.1.38.tar.gz
修改配置文件
进入到hive的配置文件目录下(conf目录下),找到hive-default.xml.template,可以参考其中的标签和用法:
vi hive-site.xml
创建hive-site.xml并添加参数,参数如下:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive_metastore?autoReconnect=true&useUnicode=true&createDatabaseIfNotExist=true&characterEncoding=utf8</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
</property>
</configuration>
esc 之后输入 :wq。之后进入Spark的解压包中:
[hadoop@hadoop ~]$ cd opt/spark-2.2.1-bin-hadoop2.7/
[hadoop@hadoop spark-2.2.1-bin-hadoop2.7]$ ll
总用量 88
drwxrwxr-x. 2 hadoop hadoop 4096 11月 25 07:31 bin
drwxrwxr-x. 2 hadoop hadoop 4096 1月 6 00:11 conf
drwxrwxr-x. 5 hadoop hadoop 47 11月 25 07:31 data
drwxrwxr-x. 4 hadoop hadoop 27 11月 25 07:31 examples
drwxrwxr-x. 2 hadoop hadoop 8192 1月 6 00:10 jars
-rw-rw-r--. 1 hadoop hadoop 17881 11月 25 07:31 LICENSE
drwxrwxr-x. 2 hadoop hadoop 4096 11月 25 07:31 licenses
-rw-rw-r--. 1 hadoop hadoop 24645 11月 25 07:31 NOTICE
drwxrwxr-x. 8 hadoop hadoop 4096 11月 25 07:31 python
drwxrwxr-x. 3 hadoop hadoop 16 11月 25 07:31 R
-rw-rw-r--. 1 hadoop hadoop 3809 11月 25 07:31 README.md
-rw-rw-r--. 1 hadoop hadoop 128 11月 25 07:31 RELEASE
drwxrwxr-x. 2 hadoop hadoop 4096 11月 25 07:31 sbin
drwxrwxr-x. 2 hadoop hadoop 41 11月 25 07:31 yarn
[hadoop@hadoop spark-2.2.1-bin-hadoop2.7]$ cd jars
[hadoop@hadoop jars]$
在jars中写一个mysql-connector-java-5.1.38.jar快捷方式的连接:
[hadoop@hadoop jars]$ ln -s /home/hadoop/opt/apache-hive-2.3.2-bin/lib/mysql-connector-java-5.1.38.jar mysql-connector-java-5.1.38.jar
进入conf中写一个hive-site.xml
快捷方式的连接:
[hadoop@hadoop jars]$ cd ../
[hadoop@hadoop spark-2.2.1-bin-hadoop2.7]$ cd conf/
[hadoop@hadoop conf]$ ln -s /home/hadoop/opt/apache-hive-2.3.2-bin/conf/hive-site.xml hive-site.xml
之后我们要先要用start-dfs.sh和start-yarn.sh启动hadoop集群 和yarn,进入hive解压包中的bin目录中先创建mysql数据库:
[hadoop@hadoop ~]$ cd opt/apache-hive-2.3.2-bin/
[hadoop@hadoop apache-hive-2.3.2-bin]$ cd bin
[hadoop@hadoop bin]$ ls
beeline ext hive hive-config.sh hiveserver2 hplsql metatool schematool
[hadoop@hadoop bin]$ ./schematool -dbType mysql -initSchema
运行成功之后打开mysql数据看看有没有创建成功:
[hadoop@hadoop bin]$
[hadoop@hadoop bin]$ mysql -uroot -proot
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.7.20 MySQL Community Server (GPL)
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show databases
-> ;
+--------------------+
| Database |
+--------------------+
| information_schema |
| hive_metastore |
| mysql |
| performance_schema |
| sys |
+--------------------+
5 rows in set (0.07 sec)
mysql>
这里hive_metastore数据库就是我们创建的。
我们在这我们可以创建一个hadoop的登录的权限,我们要在hadoop解压包下进入etc/hadoop/编辑core-site.xml文件:
[hadoop@hadoop ~]$ cd opt/hadoop-2.9.0/etc/hadoop/
[hadoop@hadoop hadoop]$ ls
capacity-scheduler.xml hadoop-env.cmd hadoop-policy.xml httpfs-signature.secret kms-log4j.properties mapred-env.sh slaves yarn-env.sh
configuration.xsl hadoop-env.sh hdfs-site.xml httpfs-site.xml kms-site.xml mapred-queues.xml.template ssl-client.xml.example yarn-site.xml
container-executor.cfg hadoop-metrics2.properties httpfs-env.sh kms-acls.xml log4j.properties mapred-site.xml ssl-server.xml.example
core-site.xml hadoop-metrics.properties httpfs-log4j.properties kms-env.sh mapred-env.cmd mapred-site.xml.template yarn-env.cmd
[hadoop@hadoop hadoop]$ vi core-site.xml
用vi命令编辑core-site.xml在末尾添加如下代码:
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
之后我们重新启动一下hadoop集群,我们接下来要启动hiveserver2:
[hadoop@hadoop bin]$ ./hiveserver2
之后我们在hive解压包中bin目录下启动beeline
[hadoop@hadoop ~]$
[hadoop@hadoop ~]$ cd opt/apache-hive-2.3.2-bin/bin
[hadoop@hadoop bin]$ ls
beeline ext hive hive-config.sh hiveserver2 hplsql metatool schematool
[hadoop@hadoop bin]$ ./beeline -u jdbc:hive2://localhost:10000 -nhadoop
启动成功我们可以在beeline中新建一个数据库:
Last login: Wed Jan 10 23:46:11 2018 from 192.168.233.1
[hadoop@hadoop ~]$ cd opt/apache-hive-2.3.2-bin/bin
[hadoop@hadoop bin]$ ls
beeline ext hive hive-config.sh hiveserver2 hplsql metatool schematool
[hadoop@hadoop bin]$ ./beeline -u jdbc:hive2://localhost:10000 -nhadoop
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/opt/apache-hive-2.3.2-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/opt/hadoop-2.9.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://localhost:10000
Connected to: Apache Hive (version 2.3.2)
Driver: Hive JDBC (version 2.3.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.3.2 by Apache Hive
0: jdbc:hive2://localhost:10000>show databases;
+----------------+
| database_name |
+----------------+
| default |
| |
+----------------+
2 rows selected (1.323 seconds)
0: jdbc:hive2://localhost:10000>create database mydb
No rows affected (0.5 seconds)
0: jdbc:hive2://localhost:10000> show databases;
+----------------+
| database_name |
+----------------+
| default | |
| mydb |
+----------------+
3 rows selected (0.154 seconds)
0: jdbc:hive2://localhost:10000>
之后会自动在hadoop集群中创建/user/hive/warehouse/mydb.db
我们在浏览器中打开http://hadoop:50070,(hadoop在这是你的IP)查看Utilities/Browse the file system里的数据
这样我们是安装好了hive...,hive也成功启动了,但是如果我们在beeline使用sql的查询语句是会出现报错,报错原因是因为很多jar包找不着和版本不匹配。
这是因为hive on spark 需要我们手动下载hive的源码,在使用maven手动编译它才能用
所以我们要有反向思维,我们可以使用spark using hive,
我们要进入spark的解压包找到目录下sbin下找到start-thriftserver.sh文件,用在控制台输入./start-thriftserver.sh之后回车
hadoop@hadoop ~]$
[hadoop@hadoop ~]$ cd opt/spark-2.2.1-bin-hadoop2.7/
[hadoop@hadoop spark-2.2.1-bin-hadoop2.7]$ ls
bin conf data examples jars LICENSE licenses NOTICE python R README.md RELEASE sbin yarn
[hadoop@hadoop spark-2.2.1-bin-hadoop2.7]$ cd sbin/
[hadoop@hadoop sbin]$ ls
slaves.sh spark-daemons.sh start-master.sh start-shuffle-service.sh start-thriftserver.sh stop-master.sh stop-shuffle-service.sh stop-thriftserver.sh
spark-config.sh start-all.sh start-mesos-dispatcher.sh start-slave.sh stop-all.sh stop-mesos-dispatcher.sh stop-slave.sh
spark-daemon.sh start-history-server.sh start-mesos-shuffle-service.sh start-slaves.sh stop-history-server.sh stop-mesos-shuffle-service.sh stop-slaves.sh
[hadoop@hadoop sbin]$ ./start-thriftserver.sh
starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging to /home/hadoop/opt/spark-2.2.1-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-hadoop.out
[hadoop@hadoop sbin]$
启动成功后我们cd ../ 回车回到上级目录我们进入bin目录 我们可以看到beeline这个文件,
用./beeline -u jdbc:hive2://localhost:10000 -nhadoop可以启动beeline :
[hadoop@hadoop sbin]$ cd ../
[hadoop@hadoop spark-2.2.1-bin-hadoop2.7]$ ls
bin conf data examples jars LICENSE licenses logs NOTICE python R README.md RELEASE sbin yarn
[hadoop@hadoop spark-2.2.1-bin-hadoop2.7]$ cd bin/
[hadoop@hadoop bin]$ ls
beeline find-spark-home load-spark-env.cmd pyspark pyspark.cmd run-example.cmd spark-class2.cmd sparkR sparkR.cmd spark-shell2.cmd spark-sql spark-submit2.cmd
beeline.cmd find-spark-home.cmd load-spark-env.sh pyspark2.cmd run-example spark-class spark-class.cmd sparkR2.cmd spark-shell spark-shell.cmd spark-submit spark-submit.cmd
[hadoop@hadoop bin]$ ./beeline -u jdbc:hive2://localhost:10000 -nhadoop
Connecting to jdbc:hive2://localhost:10000
log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Connected to: Spark SQL (version 2.2.1)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1.spark2 by Apache Hive
0: jdbc:hive2://localhost:10000>
这样我们的Spark using hive就成功了