ubuntu 16.04 搭建完全分布式之:HIVE搭建

对于hadoop集群来说,任何一个服务器按我理解都是可以弄hive的,反正hive就是个关系数据库,应该都是可以的
反正我在namenode机器上面弄的

哎……昨天写了好多,然后我以为相同的提交页面也是可以用的,结果我就把HIVE的那个页面提交了一下FLUME。。结果就TMD覆盖啊,我悔恨啊!

我就大概记录一下HIVE的搭建过程,然后记录一下坑有哪些吧

hadoop 2.7.7

介绍

HIVE和HBASE不同,HIVE是一个基于hadoop的数据仓库工具,他可以将结构化的数据映射为一张数据库表,并且提供完整的SQL查询功能,可以将SQL语句转换为MAPREDUCE任务进行运行

优势

可以直接通过类SQL语句快速实现mapreduce统计,不用开发专门的mapreduce应用

安装

安装mysql

sudo apt install mysql-server
sudo mysql_secure_installation
sudo mysql -uroot -p 进去看看

安装mysql连接器

查询对应hadoop版本使用的mysql连接器,最好不要使用apt安装自带的,我装了一下,然后添加了链接,并没有卵用
我用的是5.1.47,注意不要去安装connector8了,会出错
然后把connector jar文件放入到/usr/app/HIVE/lib里面

安装HIVE

设置好HIVE环境变量到profile里面,不放也行
设置conf/hive-env.sh

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Set Hive and Hadoop environment variables here. These variables can be used
# to control the execution of Hive. It should be used by admins to configure
# the Hive installation (so that users do not have to set environment variables
# or set command line parameters to get correct behavior).
#
# The hive service being invoked (CLI etc.) is available via the environment
# variable SERVICE


# Hive Client memory usage can be an issue if a large number of clients
# are running at the same time. The flags below have been useful in 
# reducing memory usage:
#
# if [ "$SERVICE" = "cli" ]; then
#   if [ -z "$DEBUG" ]; then
#     export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-UseGCOverheadLimit"
#   else
#     export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit"
#   fi
# fi

# The heap size of the jvm stared by hive shell script can be controlled via:
#
# export HADOOP_HEAPSIZE=1024
#
# Larger heap size may be required when running queries over large number of files or partitions. 
# By default hive shell scripts use a heap size of 256 (MB).  Larger heap size would also be 
# appropriate for hive server.


# Set HADOOP_HOME to point to a specific hadoop install directory
# HADOOP_HOME=${bin}/../../hadoop

# Hive Configuration Directory can be controlled by:
# export HIVE_CONF_DIR=

# Folder containing extra libraries required for hive compilation/execution can be controlled by:
# export HIVE_AUX_JARS_PATH=

export JAVA_HOME=/usr/java/jdk1.8.0_221
export HADOOP_HOME=/usr/local/hadoop
export HIVE_HOME=/usr/app/hive

设置conf/hive-site.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
    <description>JDBC connect string for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
    <description>username to use against metastore database</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>sl159753</value>
    <description>password to use against metastore database</description>
  </property>

  <property>
    <name>datanucleus.autoCreateSchema</name>
    <value>true</value>
    <description>password to use against metastore database</description>
  </property>
  <property>
    <name>hive.server2.thrift.sasl.qop</name>
    <value>auth</value>
    <description>password to use against metastore database</description>
  </property>
  <property>
    <name>hive.metastore.schema.verification</name>
    <value>false</value>
    <description>password to use against metastore database</description>
  </property>
</configuration>

注意点

我遇到的问题是链接不上mysql,排除两个问题:

  1. java connector版本选不对,选好了之后还是不行
  2. mysql默认只对localhost开启,所以如果是本机连接的话,hive-site.xml里面地址修改成localhost,或者把mysql对外开放端口权限打开就ok了
  3. 没有第三了
测试

/bin/./hive

create table test(id int, name string);
show tables;

这两个命令执行OK了,才说明mysql->connector->hive的链接已经打通了

数据倾斜问题

这个问题讲了就大了,有空拿个实例来扩展一下,挖个坑先

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值