Hive–下载及配置完整教程
一、重写构建Hadoop镜像:
此步骤仅提供给那些需要修改Hadoop配置文件,但又不想删除hdfs文件系统中文件的人。
请在工程下的Hadoop文件夹中执行本条指令:
$ cd hadoop #假设你已经在工程文件夹下,cd进入hadoop目录
$ docker build -t netName/hadoop . #netName为你之前定义的网络名,一定不要忘记后面的 . ,如果缺少则执行失败。
二、修改Hadoop配置文件:
删除docker容器中的Hadoop主节点和工作节点(没有Hadoop集群的请忽略)。
搭建集群详见:
在Deepin(深度)操作系统上使用docker在idea中搭建一个简单的Hadoop集群(一)
在Deepin(深度)操作系统上使用docker在idea中搭建一个简单的Hadoop集群(二)
如果已经拥有集群则需要修改core-site.xml配置文件,修改后的文件如下(如果初始配置已经正确请忽略此步骤):
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-master:9000/</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
</configuration>
然后执行$ docker-compose -f docker-compose.yml up -d
重新生成节点。
生成节点后执行$ docker exec -it hadoop-master bash
(hadoop-master请替换为你自己的主节点名称)进入主节点,并分别运行$ start-dfs.sh
、$ start-yarn.sh
、$ mr-jobhistory-daemon.sh start historyserver
三条指令来开启一系列Hadoop相关进程。
三、在工程目录下创建Hive目录,并加入如下配置文件:
下面上各个配置文件的详细代码:
(1)beeline-log4j2.properties:
status = INFO
name = BeelineLog4j2
packages = org.apache.hadoop.hive.ql.log
# list of properties
property.hive.log.level = WARN
property.hive.root.logger = console
# list of all appenders
appenders = console
# console appender
appender.console.type = Console
appender.console.name = console
appender.console.target = SYSTEM_ERR
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = %d{yy/MM/dd HH:mm:ss} [%t]: %p %c{2}: %m%n
# list of all loggers
loggers = HiveConnection
# HiveConnection logs useful info for dynamic service discovery
logger.HiveConnection.name = org.apache.hive.jdbc.HiveConnection
logger.HiveConnection.level = INFO
# root logger
rootLogger.level = ${sys:hive.log.level}
rootLogger.appenderRefs = root
rootLogger.appenderRef.root.ref = ${sys:hive.root.logger}
(2)hive-log4j2.properties:
status = INFO
name = HiveLog4j2
packages = org.apache.hadoop.hive.ql.log
# list of properties
property.hive.log.level = INFO
property.hive.root.logger = DRFA
property.hive.log.dir = /data/logs
property.hive.log.file = hive.log
property.hive.perflogger.log.level = INFO
# list of all appenders
appenders = console, DRFA
# console appender
appender.console.type = Console
appender.console.name = console
appender.console.target = SYSTEM_ERR
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = %d{ISO8601} %5p [%t] %c{2}: %m%n
# daily rolling file appender
appender.DRFA.type = RollingRandomAccessFile
appender.DRFA.name = DRFA
appender.DRFA.fileName = ${sys:hive.log.dir}/${sys:hive.log.file}
# Use %pid in the filePattern to append <process-id>@<host-name> to the filename if you want separate log files for different CLI session
appender.DRFA.filePattern = ${sys:hive.log.dir}/${sys:hive.log.file}.%d{yyyy-MM-dd}
appender.DRFA.layout.type = PatternLayout
appender.DRFA.layout.pattern = %d{ISO8601} %5p [%t] %c{2}: %m%n
appender.DRFA.policies.type = Policies
appender.DRFA.policies.time.type = TimeBasedTriggeringPolicy
appender.DRFA.policies.time.interval = 1
appender.DRFA.policies.time.modulate = true
appender.DRFA.strategy.type = DefaultRolloverStrategy
appender.DRFA.strategy.max = 30
# list of all loggers
loggers = NIOServerCnxn, ClientCnxnSocketNIO, DataNucleus, Datastore, JPOX, PerfLogger
logger.NIOServerCnxn.name = org.apache.zookeeper.server.NIOServerCnxn
logger.NIOServerCnxn.level = WARN
logger.ClientCnxnSocketNIO.name = org.apache.zookeeper.ClientCnxnSocketNIO
logger.ClientCnxnSocketNIO.level = WARN
logger.DataNucleus.name = DataNucleus
logger.DataNucleus.level = ERROR
logger.Datastore.name = Datastore
logger.Datastore.level = ERROR
logger.JPOX.name = JPOX
logger.JPOX.level = ERROR
logger.PerfLogger.name = org.apache.hadoop.hive.ql.log.PerfLogger
logger.PerfLogger.level = ${sys:hive.perflogger.log.level}
# root logger
rootLogger.level = ${sys:hive.log.level}
rootLogger.appenderRefs = root
rootLogger.appenderRef.root.ref = ${sys:hive.root.logger}
(3)hive-site.xml:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- WARNING!!! This file is auto generated for documentation purposes ONLY! -->
<!-- WARNING!!! Any changes you make to this file will be ignored by Hive. -->
<!-- WARNING!!! You must make your changes in hive-site.xml instead. -->
<!-- Hive Execution Parameters -->
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>password</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://beimei6-mysql/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useUnicode=true</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>Username to use against metastore database</description>
</property>
</configuration>
(4)Dockerfile:
FROM beimei6/hadoop
MAINTAINER seal.jing<744327309@qq.com>
ADD apache-hive-2.3.6-bin.tar.gz /usr/local/
ENV HIVE_HOME=/usr/local/apache-hive-2.3.6-bin
ENV PATH=$PATH:$HIVE_HOME/bin
ENV HIVE_CONF_DIR=$HIVE_HOME/conf
RUN chmod 1777 /data/logs
COPY config/* $HIVE_CONF_DIR/
COPY mysql-connector-java-8.0.19.jar $HIVE_HOME/lib/
(5)hive tar包下载地址:
https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-2.3.6/
(6)mysql-connector-java-8.0.19.jar下载地址:
https://mvnrepository.com/artifact/mysql/mysql-connector-java
点击此处下载对应jar包。
四、在工程下加入mysql和hive的yml文件及安装脚本:
(1)docker-compose-hive.yml:
version: "3"
services:
hadoop-hive:
image: test/hive
container_name: hadoop-hive
networks:
test-net:
ipv4_address: 172.20.0.6
volumes:
- ./data/hive:/data
stdin_open: true # -i interactive
tty: true # -t tty
entrypoint: ["sh" ,"-c","service ssh start; bash"]
(2)install-hive.sh:
#!/bin/bash
cd hive
docker build -t test/hive .
cd -
docker-compose -f docker-compose.yml -f docker-compose-hive.yml up -d hadoop-hive
(3)docker-compose-mysql.yml:
version: "3"
services:
test-mysql:
image: mysql
container_name: test-mysql
networks:
test-net:
ipv4_address: 172.20.0.5
volumes:
- ./data/mysql:/var/lib/mysql
environment:
MYSQL_ROOT_PASSWORD: password
(4)install-mysql.sh:
#!/bin/bash
docker-compose -f docker-compose.yml -f docker-compose-mysql.yml up -d beimei6-mysql
五、在docker中安装mysql:
$ bash install-mysql.sh #在工程目录下执行
六、在docker中安装Hive:
$ bash install-hive.sh #在工程目录下执行
七、进入 hadoop-hive容器 初始化hive metastore 并启动 hiveserver2 :
$ docker exec -it hadoop-hive bash
$ schematool -dbType mysql -initSchema
$ hiveserver2
八、新开一个终端,同样进入hadoop-hive容器,使用beeline 链接hiveserver2:
$ docker exec -it hadoop-hive bash
$ beeline -u jdbc:hive2://hadoop-hive:10000 -n root
看到这样就算创建成功了。