hive安装配置

最新推荐文章于 2024-06-17 23:27:58 发布

原创最新推荐文章于 2024-06-17 23:27:58 发布 · 6.2k 阅读

2 ·

CC 4.0 BY-SA版权

文章标签：

#hive #安装 #大数据

大数据同时被 2 个专栏收录

29 篇文章

订阅专栏

hive

12 篇文章

订阅专栏

1. hive是基于hdfs的一个数据仓库，所以需要hadoop的环境，hadoop怎么搭建可以参考另一篇博文：http://blog.csdn.net/jthink_/article/details/38622297

2. 下载hive（hive-0.11.0.tar.gz）放到合适的位置（注意这里放在bg01主机上）：

如我的放在：/usr/local/bg文件夹下

3. 修改配置：

拷贝hive-default.xml.template，名字改为: hive-default.xml，这是hive的默认配置

建立新文件：hive-site.xml, 这个里面的配置会覆盖hive-default.xml中的配置，调参在这个文件中改相应配置

拷贝hive-env.sh.template，名字改为：hive-env.sh,内容为：

# Licensed to the Apache Software Foundation (ASF) under one

# or more contributor license agreements.  See the NOTICE file

# distributed with this work for additional information

# regarding copyright ownership.  The ASF licenses this file

# to you under the Apache License, Version 2.0 (the

# "License"); you may not use this file except in compliance

# with the License.  You may obtain a copy of the License at

#

#     http://www.apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.



# Set Hive and Hadoop environment variables here. These variables can be used

# to control the execution of Hive. It should be used by admins to configure

# the Hive installation (so that users do not have to set environment variables

# or set command line parameters to get correct behavior).

#

# The hive service being invoked (CLI/HWI etc.) is available via the environment

# variable SERVICE





# Hive Client memory usage can be an issue if a large number of clients

# are running at the same time. The flags below have been useful in 

# reducing memory usage:

#

# if [ "$SERVICE" = "cli" ]; then

#   if [ -z "$DEBUG" ]; then

#     export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-UseGCOverheadLimit"

#   else

#     export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit"

#   fi

# fi



# The heap size of the jvm stared by hive shell script can be controlled via:

#

export HADOOP_HEAPSIZE=1024

#

# Larger heap size may be required when running queries over large number of files or partitions. 

# By default hive shell scripts use a heap size of 256 (MB).  Larger heap size would also be 

# appropriate for hive server (hwi etc).





# Set HADOOP_HOME to point to a specific hadoop install directory

HADOOP_HOME=/usr/local/bg/hadoop-1.2.1



# Hive Configuration Directory can be controlled by:

export HIVE_CONF_DIR=/usr/local/bg/hive-0.11.0/conf



# Folder containing extra ibraries required for hive compilation/execution can be controlled by:

export HIVE_AUX_JARS_PATH=/usr/local/bg/hive-0.11.0/lib

4. hive的元数据需要放在传统的RDBMS中，这里选择的mysql，先安装mysql

sudo apt-get install mysql-server

修改my.cnf文件：注释掉bind-address=127.0.0.1这行，记住是用#号注释

所以我们的hive-site.xml的配置为：

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>



<configuration>

  <property>

    <name>hive.metastore.warehouse.dir</name>

    <value>/user/hive/warehouse</value>

    <description>location of default database for the warehouse</description> 

  </property>



  <property> 

    <name>hive.querylog.location</name>

    <value>/usr/local/bg/hive-0.11.0/log</value>

    <description>

      Location of Hive run time structured log file 

    </description>

  </property>



  <property>

    <name>hive.exec.scratchdir</name>

    <value>/tmp/hive-${user.name}</value>

    <description>Scratch space for Hive jobs</description>

  </property>



  <property> 

    <name>hive.metastore.local</name> 

    <value>true</value> 

  </property> 

  <property> 

    <name>javax.jdo.option.ConnectionURL</name> 

    <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value> 

  </property> 

  <property> 

    <name>javax.jdo.option.ConnectionDriverName</name> 

    <value>com.mysql.jdbc.Driver</value> 

  </property> 

  <property> 

    <name>javax.jdo.option.ConnectionUserName</name> 

    <value>root</value> 

  </property> 

  <property> 

    <name>javax.jdo.option.ConnectionPassword</name> 

    <value>root</value> 

  </property> 

</configuration>

/usr/local/bg/hive-0.11.0/log这个文件夹需要自己建

对了，还要下载mysql的驱动到hive的lib文件夹中

5. 启动hive

配置下环境变量

sudo vim /etc/profile

# set hive environment
export HIVE_HOME=/usr/local/bg/hive-0.11.0
export PATH=$PATH:$HIVE_HOME/bin

命令就是: hive

show tables;

如果正常就说明配置成功

顺便提及一下，得先启动hadoop