为了探索Hive 的神秘与伟大,我们踏上了Hive的学习之路,这个工具的好与不好且先不谈,先来安装Hive吧。。。
我们用MySQL存储Hive 的元数据Metastore,所以先安装MySQL。具体安装及配置步骤如下:
我的整个操作过程分7个部分:
1.安装 MySQL
2.安装 Hive
3.将 Hive 元数据 Metastore配置到 MySQL
4.Hadoop集群配置
5.Hive 数据仓库位置配置
6. 查询后信息显示配置(根据自己喜好配或不配)
7.Hive日志文件配置
----------------------------------OK,开始我的 长篇大论-------------------------------------------------
》》》》》》>1. 安装MySQL
step 1: 查看mysql 是否安装,如果已经安装,卸载原有MySQL
# yum list installed | grep mysql ----查看
#yum -y remove mysql-libs.x86-64 -----卸载
step 1: 下载压缩包并安装
# rpm -Uvh http://repo.mysql.com/mysql-community-release-el6-5.noarch.rpm(下载网址)
#yum install mysql-community-server -y
或者
# yum -y install mysql mysql-server mysql-devel
# wget http://dev.mysql.com/get/mysql-community-release-el7-5.noarch.rpm
# rpm -ivh mysql-community-release-el7-5.noarch.rpm
# yum -y install mysql-community-server
step3: 开启mysql
#service mysqld start
step4: 设置root用户登录密码
# mysqladmin -uroot password 'rootroot'
step5: 登录mysql
# mysql -uroot -prootroot
step6: 登录后设置访问权限,以便使集群内部机器访问
mysql> grant all privileges on *.* to root@'%' identified by 'rootrooot';
mysql>
可以 > exit; 退出,
此时,MySQL已经完成安装。
》》》》》》> 2. 安装 Hive
step1: 将本机下载好多hive安装包 apache-hive-1.2.1-bin.tar.gz 上传之虚拟机(hadoop011)
Alt +p 进入sftp界面
sftp> put G:/Hive/apache-hive-1.2.1-bin.tar.gz
然后进入home目录 #cd ~ 找到文件,将文件包移到指定路径下
# mv apache-hive-1.2.1-bin.tar.gz /opt/soft
step2: 解压安装包到/opt/app/
# tar -zxvf apache-hive-1.2.1-bin.tar.gz -C /opt/app/
进入/opt/app/修改文件名称
# cd /opt/app
#mv apache-hive-1.2.1-bin apache-hive-1.2.1
step 3: 配置文件 hive-env.sh
进入/opt/app/apache-hive-1.2.1/conf 找到文件 hive-env.sh.template,修改文件名称
[root@hadoop011 conf]# mv hive-env.sh.template hive-env.sh
[root@hadoop011 conf]# vim hive-env.sh
进入文件hive-env.sh,添加HADOOP_HOME, HIVE_CONF_DIR路径
export HADOOP_HOME=/opt/app/hadoop-2.7.2
export HIVE_CONF_DIR=/opt/app/apache-hive-1.2.1/conf
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Set Hive and Hadoop environment variables here. These variables can be used
# to control the execution of Hive. It should be used by admins to configure
# the Hive installation (so that users do not have to set environment variables
# or set command line parameters to get correct behavior).
#
# The hive service being invoked (CLI/HWI etc.) is available via the environment
# variable SERVICE
# Hive Client memory usage can be an issue if a large number of clients
# are running at the same time. The flags below have been useful in
# reducing memory usage:
#
# if [ "$SERVICE" = "cli" ]; then
# if [ -z "$DEBUG" ]; then
# export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-UseGCOverheadLimit"
# else
# export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit"
# fi
# fi
# The heap size of the jvm stared by hive shell script can be controlled via:
# export HADOOP_HEAPSIZE=1024
# Larger heap size may be required when running queries over large number of files or partitions.
# By default hive shell scripts use a heap size of 256 (MB). Larger heap size would also be
# appropriate for hive server (hwi etc).
# Set HADOOP_HOME to point to a specific hadoop install directory
export HADOOP_HOME=/opt/app/hadoop-2.7.2
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/opt/app/apache-hive-1.2.1/conf
# Folder containing extra ibraries required for hive compilation/execution can be controlled by:
# export HIVE_AUX_JARS_PATH=
"hive-env.sh" 54L, 2407C 已写入
此时,可以在目录/opt/app/apache-hive-1.2.1/conf下启动Hive了,但是我们的整体工作还未配置完成,下面继续,,,
》》》》》》> 3. Hive 元数据(Metastore)配置到MySQL
step 1: 将下载好的驱动文件上传至本机
Alt+p:
sftp> put G:/Hive/mysql-connector-java-5.1.37-bin.jar
Uploading mysql-connector-java-5.1.37-bin.jar to /root/mysql-connector-java-5.1.37-bin.jar
100% 962KB 962KB/s 00:00:00
G:/Hive/mysql-connector-java-5.1.37-bin.jar: 985603 bytes transferred in 0 seconds (962 KB/s)
sftp>
将mysql-connector-java-5.1.37-bin.jar移动(或拷贝)到 /opt/app/apache-hive-1.2.1/lib/
[root@hadoop011 ~]# mv mysql-connector-java-5.1.37-bin.jar /opt/app/apache-hive-1.2.1/lib/
step 2: 配置Metastore 到mySQL
在/opt/app/apache-hive-1.2.1/conf/下创建hive-site.xml
[root@hadoop011 ~]# cd /opt/app/apache-hive-1.2.1/conf
[root@hadoop011 conf]# touch hive-site.xml
将配置内容(https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin)放入 hive-site.xml文件中
[root@hadoop011 conf]# vim hive-site.xml
配置信息:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop102:3306/metastore?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>000000</value>
<description>password to use against metastore database</description>
</property>
</configuration>
其中对内容中两处作如下修改:
a. 我的MySQL在hadoop015上面
<value>jdbc:mysql://hadoop102:3306/metastore?createDatabaseIfNotExist=true</value>
——> <value>jdbc:mysql://hadoop015:3306/metastore?createDatabaseIfNotExist=true</value>
b. MySQL密码已设置为‘rootroot’,所以此处需要修改
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>rootrooot</value>
<description>password to use against metastore database</description>
</property>
step 3:配置完成,可以重启查看
#reboot
#service mysqld start
#mysql -uroot -prootroot
mysql> show databases:
此时,可以发现数据库 metastore ,说明配置成功
当然,也可以启动集群和Hive
你认为结束了吗,NONONO,可以继续其它配置,,,,,,,
》》》》》》> 4. Hadoop集群配置
step 1:启动集群
[root@hadoop011 ~]# start-dfs.sh
[root@hadoop012 ~]# start-yarn.sh
step 2:在HDFS上创建目录/user/hive/warehouse(可以自己设置),作为HIVE 的数据仓库
[root@hadoop011 conf]# hadoop fs -mkdir -p /user/hive/warehouse
step 2:修改权限
[root@hadoop011 conf]# hadoop fs -chmod g+w /user/hive/warehouse
》》》》》》>5. 数据仓库位置配置
修改default数据仓库原始位置(将目录/opt/app/apache-hive-1.2.1/conf下的hive-default.xml.template如下配置信息拷贝到hive-site.xml文件中)。
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
》》》》》》>6. 查询后信息显示配置
为了实现查询后显示当前数据库,以及查询表的头信息,可在hive-site.xml文件中添加下面配置信息:
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
》》》》》》> 7. Hive日志文件配置
查看日志文件:
#cd /opt/app/apache-hive-1.2.1/conf
发现日志文件hive-log4j.properties.template,修改名称为 hive-log4j.properties
[root@hadoop011 conf]# mv hive-log4j.properties.template hive-log4j.properties
[root@hadoop011 conf]# pwd
/opt/app/apache-hive-1.2.1/conf
进入日志文件,修改LOG存放路径
[root@hadoop011 conf]# vim hive-log4j.properties
日志文件信息:
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Define some default values that can be overridden by system properties
hive.log.threshold=ALL
hive.root.logger=INFO,DRFA
hive.log.dir=/opt/app/apache-hive-1.2.1/logs
hive.log.file=hive.log
原始数据: hive.log.dir=${java.io.tmpdir}/${user.name}
修改后路径: hive.log.dir=/opt/app/apache-hive-1.2.1/logs
---------------------------------
OK,到此为止,基本所需的东西已配置完毕。当然,根据需要也可以配置其它信息,如参数配置等等
我想,这些对于我目前的学习,已足矣
那么,最后的最后,启动HIVE
# reboot
[root@hadoop011 ~]# start-dfs.sh
[root@hadoop012 ~]# start-yarn.sh
进入目录 /opt/app/apache-hive-1.2.1/conf
[root@hadoop011 bin]# ./hive
------------ 一同开启学习之路 --------------------------------