一、安装mysql
yum install wget
wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm
rpm -ivh mysql-community-release-el7-5.noarch.rpm
yum install mysql-server
启动mysql
service mysqld start
开机启动
systemctl enable mysqld.service
设置密码
#/usr/bin/mysql_secure_installation
[...]
Enter current password for root (enter for none):
OK, successfully used password, moving on...
[...]
Set root password? [Y/n] y
New password:
Re-enter new password:
Remove anonymous users? [Y/n] Y
[...]
Disallow root login remotely? [Y/n] N
[...]
Remove test database and access to it [Y/n] Y
[...]
Reload privilege tables now? [Y/n] Y
All done!
二、安装配置hive
官网下载hive 和 mysql-connector-java-5.*.*-bin.jar
上传后解压
tar -zxvf apache-hive-2.3.6-bin.tar.gz -C ../app/
Hive环境变量设置
vi ~/.bashrc
# Hive environment (#代表注释)
export HIVE_HOME=/home/hadoop/app/apache-hive-2.3.6-bin
export PATH=$HIVE_HOME/bin:$HIVE_HOME/conf:$PATH
激活环境变量
source ~/.bashrc 修改配置文件
cd ../app/
创建hive-site.xml文件 在hive/conf/目录下创建hive-site.xml文件
vi hive-site.xml
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description>
Enforce metastore schema version consistency.
</description>
</property>
</configuration>
将mysql-connector-Java-5.1.15-bin.jar拷贝到/opt/software/hive/apache-hive-2.1.1-bin下的lib下即可
三、源数据初始化
[hadoop@sparkServer apache-hive-2.3.6-bin]$ bin/schematool -initSchema -dbType mysql
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/app/apache-hive-2.3.6-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/app/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: root
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.mysql.sql
Initialization script completed
schemaTool completed
四、测试
(1)创建数据库create database db_hive_test;
(2)创建测试表
use db_hive_test;
create table student(id int,name string) row format delimited fields terminated by '\t';
(3)返回linux新建student.txt 文件写入数据(id,name 按tab键分隔)
1001 zhangsan
1002 lisi
(4)在hive中导入数据
load data local inpath '/home/hadoop/student.txt' into table db_hive_test.student;
(5)查看结果
select * from db_hive_test.student;
五、Spark 连接hive 元数据库(mysql)
1)拷贝hive的hive-site.xml文件到spark的conf目录下
2)修改spark中hive-site.xml文件
添加以下:
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://localhost:9083</value>
</property>
</configuration>
3)另建窗口启动:
[root@head42 conf]$ hive --service metastore
4)启动pyspark:
[root@head42 conf]$ pyspark
5)测试:
>>> from pyspark.sql import HiveContext
>>> sqlContext = HiveContext(sc)
>>> read_hive_score = sqlContext.sql("select * from db_hive_test.student")
>>> read_hive_score.show()
+----+--------+
| id| name|
+----+--------+
|1001|zhangsan|
|1002| lisi|
+----+--------+
这样就OK了!
参考: