ubuntu 16.04 搭建完全分布式之：HIVE搭建

最新推荐文章于 2024-05-30 13:02:37 发布

Rosendal

最新推荐文章于 2024-05-30 13:02:37 发布

阅读量603

点赞数

分类专栏： hadoop 文章标签： hadoop hive

本文链接：https://blog.csdn.net/fdsjzs1/article/details/100891048

版权

hadoop 专栏收录该内容

17 篇文章 0 订阅

订阅专栏

对于hadoop集群来说，任何一个服务器按我理解都是可以弄hive的，反正hive就是个关系数据库，应该都是可以的
反正我在namenode机器上面弄的

哎……昨天写了好多，然后我以为相同的提交页面也是可以用的，结果我就把HIVE的那个页面提交了一下FLUME。。结果就TMD覆盖啊，我悔恨啊！

我就大概记录一下HIVE的搭建过程，然后记录一下坑有哪些吧

hadoop 2.7.7

介绍

HIVE和HBASE不同，HIVE是一个基于hadoop的数据仓库工具，他可以将结构化的数据映射为一张数据库表，并且提供完整的SQL查询功能，可以将SQL语句转换为MAPREDUCE任务进行运行

优势

可以直接通过类SQL语句快速实现mapreduce统计，不用开发专门的mapreduce应用

安装

安装mysql

sudo apt install mysql-server
sudo mysql_secure_installation
sudo mysql -uroot -p 进去看看

安装mysql连接器

查询对应hadoop版本使用的mysql连接器，最好不要使用apt安装自带的，我装了一下，然后添加了链接，并没有卵用
我用的是5.1.47，注意不要去安装connector8了，会出错
然后把connector jar文件放入到/usr/app/HIVE/lib里面

安装HIVE

设置好HIVE环境变量到profile里面，不放也行
设置conf/hive-env.sh

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Set Hive and Hadoop environment variables here. These variables can be used
# to control the execution of Hive. It should be used by admins to configure
# the Hive installation (so that users do not have to set environment variables
# or set command line parameters to get correct behavior).
#
# The hive service being invoked (CLI etc.) is available via the environment
# variable SERVICE


# Hive Client memory usage can be an issue if a large number of clients
# are running at the same time. The flags below have been useful in 
# reducing memory usage:
#
# if [ "$SERVICE" = "cli" ]; then
#   if [ -z "$DEBUG" ]; then
#     export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-UseGCOverheadLimit"
#   else
#     export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit"
#   fi
# fi

# The heap size of the jvm stared by hive shell script can be controlled via:
#
# export HADOOP_HEAPSIZE=1024
#
# Larger heap size may be required when running queries over large number of files or partitions. 
# By default hive shell scripts use a heap size of 256 (MB).  Larger heap size would also be 
# appropriate for hive server.


# Set HADOOP_HOME to point to a specific hadoop install directory
# HADOOP_HOME=${bin}/../../hadoop

# Hive Configuration Directory can be controlled by:
# export HIVE_CONF_DIR=

# Folder containing extra libraries required for hive compilation/execution can be controlled by:
# export HIVE_AUX_JARS_PATH=

export JAVA_HOME=/usr/java/jdk1.8.0_221
export HADOOP_HOME=/usr/local/hadoop
export HIVE_HOME=/usr/app/hive

设置conf/hive-site.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
    <description>JDBC connect string for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
    <description>username to use against metastore database</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>sl159753</value>
    <description>password to use against metastore database</description>
  </property>

  <property>
    <name>datanucleus.autoCreateSchema</name>
    <value>true</value>
    <description>password to use against metastore database</description>
  </property>
  <property>
    <name>hive.server2.thrift.sasl.qop</name>
    <value>auth</value>
    <description>password to use against metastore database</description>
  </property>
  <property>
    <name>hive.metastore.schema.verification</name>
    <value>false</value>
    <description>password to use against metastore database</description>
  </property>
</configuration>

注意点

我遇到的问题是链接不上mysql，排除两个问题：

java connector版本选不对，选好了之后还是不行
mysql默认只对localhost开启，所以如果是本机连接的话，hive-site.xml里面地址修改成localhost，或者把mysql对外开放端口权限打开就ok了
没有第三了

测试

/bin/./hive

create table test(id int, name string);
show tables;

这两个命令执行OK了，才说明mysql->connector->hive的链接已经打通了

数据倾斜问题

这个问题讲了就大了，有空拿个实例来扩展一下，挖个坑先

Rosendal

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
ubuntu 16.04 搭建完全分布式之：HIVE搭建

对于hadoop集群来说，任何一个服务器按我理解都是可以弄hive的，反正hive就是个关系数据库，应该都是可以的反正我在namenode机器上面弄的哎……昨天写了好多，然后我以为相同的提交页面也是可以用的，结果我就把HIVE的那个页面提交了一下FLUME。。结果就TMD覆盖啊，我悔恨啊！我就大概记录一下HIVE的搭建过程，然后记录一下坑有哪些吧hadoop 2.7.7介绍HIVE和H...
复制链接

扫一扫