Hadoop2.2.0+Hive0.13+MySQL5.1集成安装

[b][color=olive][size=large]本次散仙安装的Hive是Hive最新版本中的稳定版本,是基于Hadoop2.2.0,以前有写过,如何在hadoop1.x下面安装Hive0.8,本次Hive的版本是Hive0.13,可以直接在Hive官网上下载二进制包,无须进行源码编译。Hive需要依赖底层的Hadoop环境,所以在安装Hive前,请确保你的hadoop集群环境已经可以正常工作。
Hive0.13稳定版本的下载地址
[url]http://apache.fayea.com/apache-mirror/hive/stable/[/url]
关于Hadoop2.2.0分布式集群的搭建
[url]http://qindongliang1922.iteye.com/blog/2078423[/url]
MySQL的安装
[url]http://qindongliang1922.iteye.com/blog/1987199[/url]
[/size][/color][/b]

[b][color=green][size=large] 下载具体看下安装的步骤和过程:
[table]
|1|序号|描述
|2|Hadoop2.2.0集群的搭建|底层依赖环境
|3|下载Hive0.13的bin包,并解压|Hive包
|4|配置HIVE_HOME环境变量|环境变量所需
|5|配置hive-env.sh|涉及hadoop的目录,和hive的conf目录
|6|配置hive-site.xml|配置hive属性和集成MySQL存储元数据
|7|启动bin/hive服务|测试启动hive
|8|建库,建表,测试hive|测试hive是否正常工作
|9|退出Hive客户端|执行命令exit
|10|工程师一枚|开工
|11|拷贝mysql的jdbc包到hive的lib目录下|元数据存储为MySQL
|12|hadoop技术交流群|376932160
[/table][/size][/color][/b]
[b][color=olive][size=large]
首先,先执行如下4个命令,把Hive自带的模板文件,变为Hive实际所需的文件:
cp hive-default.xml.template hive-site.xml
cp hive-env.sh.template hive-env.sh
cp hive-exec-log4j.properties.template hive-exec-log4j.properties
cp hive-log4j.properties.template hive-log4j.properties


Hive环境变量的设置:
[/size][/color][/b]
export PATH=.:$PATH
<!-- JDK环境 -->
export JAVA_HOME="/usr/local/jdk"
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export PATH=$PATH:$JAVA_HOME/bin
<!-- Hadoop环境 -->
export HADOOP_HOME=/home/search/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop/
export CLASSPATH=.:$CLASSPATH:$HADOOP_HOME/lib
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
<!-- Ant环境 -->
export ANT_HOME=/usr/local/ant
export CLASSPATH=$CLASSPATH:$ANT_HOME/lib
export PATH=$PATH:$ANT_HOME/bin
<!-- Maven环境 -->
export MAVEN_HOME="/usr/local/maven"
export CLASSPATH=$CLASSPATH:$MAVEN_HOME/lib
export PATH=$PATH:$MAVEN_HOME/bin

<!-- Hive环境 -->
export HIVE_HOME=/home/search/hive
export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib
export PATH=$PATH:$HIVE_HOME/bin:$HIVE_HOME/conf


[b][color=green][size=large]下面是Hive-env.sh里面的内容:[/size][/color][/b]
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Set Hive and Hadoop environment variables here. These variables can be used
# to control the execution of Hive. It should be used by admins to configure
# the Hive installation (so that users do not have to set environment variables
# or set command line parameters to get correct behavior).
#
# The hive service being invoked (CLI/HWI etc.) is available via the environment
# variable SERVICE


# Hive Client memory usage can be an issue if a large number of clients
# are running at the same time. The flags below have been useful in
# reducing memory usage:
#
# if [ "$SERVICE" = "cli" ]; then
# if [ -z "$DEBUG" ]; then
# export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-UseGCOverheadLimit"
# else
# export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit"
# fi
# fi

# The heap size of the jvm stared by hive shell script can be controlled via:
#
# export HADOOP_HEAPSIZE=1024
#
# Larger heap size may be required when running queries over large number of files or partitions.
# By default hive shell scripts use a heap size of 256 (MB). Larger heap size would also be
# appropriate for hive server (hwi etc).


# Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOME=/home/search/hadoop

# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/home/search/hive/conf

# Folder containing extra ibraries required for hive compilation/execution can be controlled by:
# export HIVE_AUX_JARS_PATH=


[b][color=olive][size=large]hive-site.xml里面的配置如下:[/size][/color][/b]
<configuration>
<property>
<!-- MySQ的URL配置 -->
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<!-- 数据库的用户名配置-->
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<!-- 此处JDBC的驱动务必加上,对应的数据配置对应的驱动-->
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<!-- 数据库密码配置-->
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>qin</value>
</property>
<!-- HDFS路径hive表的存放位置-->
<property>
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://h1:9000//user/hive/warehouse</value>
</property>
<!--HDFS路径,用于存储不同 map/reduce 阶段的执行计划和这些阶段的中间输出结果。 -->
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx4096m</value>
</property>
<!-- 日志的记录位置-->
<property>
<name>hive.querylog.location</name>
<value>/home/search/hive/logs</value>
</property>

<property>
<name>hive.metastore.local</name>
<value>true</value>
</property>
</configuration>


[b][color=olive][size=large]在HDFS上,新建hive的数据存储目录,以及MapReduce执行过程,生成的临时文件目录,执行命令如下,并赋值权限:[/size][/color][/b]
hadoop fs -mkidr /tmp
hadoop fs -mkidr /user/hive/warehouse
hadoop fs -chmod g+w /tmp
hadoop fs -chmod g+w /user/hive/warehouse


[b][color=green][size=large]启动hive:
执行命令:bin/hive,启动信息如下:
[/size][/color][/b]
[search@h1 hive]$ bin/hive
14/07/30 04:18:08 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/07/30 04:18:08 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/07/30 04:18:08 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
14/07/30 04:18:08 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/07/30 04:18:08 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/07/30 04:18:08 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/07/30 04:18:08 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/07/30 04:18:08 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
14/07/30 04:18:09 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.

Logging initialized using configuration in file:/home/search/hive/conf/hive-log4j.properties
hive>

[b][color=olive][size=large]执行,建表命令,并导入数据:

建表:
create table mytt (name string ,count int) row format delimited fields terminated by '#' stored as textfile ;
导入数据:
LOAD DATA LOCAL INPATH '/home/search/abc1.txt' OVERWRITE INTO TABLE info;
执行查询命令,并降序输出:[/size][/color][/b]
Time taken: 0.837 seconds, Fetched: 5 row(s)
hive> select * from info limit 5 order by count desc;
FAILED: ParseException line 1:27 missing EOF at 'order' near '5'
hive> select * from info order by count desc limit 5 ;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1406660797211_0003, Tracking URL = http://h1:8088/proxy/application_1406660797211_0003/
Kill Command = /home/search/hadoop/bin/hadoop job -kill job_1406660797211_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2014-07-30 04:26:13,538 Stage-1 map = 0%, reduce = 0%
2014-07-30 04:26:26,398 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 5.41 sec
2014-07-30 04:26:27,461 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 6.64 sec
2014-07-30 04:26:39,177 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 10.02 sec
MapReduce Total cumulative CPU time: 10 seconds 20 msec
Ended Job = job_1406660797211_0003
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 10.02 sec HDFS Read: 143906707 HDFS Write: 85 SUCCESS
Total MapReduce CPU Time Spent: 10 seconds 20 msec
OK
英的国 999999
中的国 999997
美的国 999996
中的国 999993
英的国 999992
Time taken: 37.892 seconds, Fetched: 5 row(s)
hive>



[b][color=green][size=large]hive shell一些交互式命令的使用方法:[/size][/color][/b]
quit,exit:  退出交互式shell
reset: 重置配置为默认值
set <key>=<value> : 修改特定变量的值(如果变量名拼写错误,不会报错)
set : 输出用户覆盖的hive配置变量
set -v : 输出所有Hadoop和Hive的配置变量
add FILE[S] *, add JAR[S] *, add ARCHIVE[S] * : 添加 一个或多个 file, jar, archives到分布式缓存
list FILE[S], list JAR[S], list ARCHIVE[S] : 输出已经添加到分布式缓存的资源。
list FILE[S] *, list JAR[S] *,list ARCHIVE[S] * : 检查给定的资源是否添加到分布式缓存
delete FILE[S] *,delete JAR[S] *,delete ARCHIVE[S] * : 从分布式缓存删除指定的资源
! <command> : 从Hive shell执行一个shell命令
dfs <dfs command> : 从Hive shell执行一个dfs命令
<query string> : 执行一个Hive 查询,然后输出结果到标准输出
source FILE <filepath>: 在CLI里执行一个hive脚本文件



[b][color=olive][size=large]
以debug模式启动: hive -hiveconf hive.root.logger=DEBUG,console

至此,我们的Hive,已经安装成功,并可以正常运行。
[/size][/color][/b]
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值