深入理解hive

hive是基于Hadoop构建的一套数据仓库分析系统,它提供了丰富的SQL查询方式来分析存储在Hadoop分布式文件系统中的数据:可以将结构化的数据文件映射为一张数据库表,并提供完整的SQL查询功能;可以将SQL语句转换为MapReduce任务运行,通过自己的SQL查询分析需要的内容,这套SQL简称Hive SQL,

将Mysql的mysql-connector-java-5.1.27-bin.jar拷贝到/opt/module/hive/lib/
$ cp /opt/software/mysql-libs/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27-bin.jar /opt/module/hive/lib/
在/opt/module/hive/conf路径上,创建hive-site.xml文件
vim hive-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL/name>
<value>jdbc:mysql://hadoop102:3306/metastore?createDatabaseIfNotExist=true
</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>

<property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore </property> property> name>javax.jdo.option.ConnectionDriverName/name> value>com.mysql.jdbc.Driver/value> description>Driver class name for a JDBC metastore/description> /property>

property>
name>javax.jdo.option.ConnectionUserName/name>
value>root/value>
description>username to use against metastore database/description>
/property>

property>
name>javax.jdo.option.ConnectionPassword/name>
value>000000/value>
description>password to use against metastore database/description>
/property>

property>
name>hive.metastore.warehouse.dir/name>
value>/user/hive/warehouse/value>
description>location of default database for the warehouse/description>
/property>

property>
name>hive.cli.print.header/name>
value>true/value>
/property>

property>
name>hive.cli.print.current.db/name>
value>true/value>
/property>

property>
name>hive.metastore.schema.verification/name>
value>false/value>
/property>

property>
name>datanucleus.schema.autoCreateAll/name>
value>true/value>

/property>
property>
name>hive.metastore.uris/name>
value>thrift://hadoop102:9083/value>
/property>
/configuration>
服务启动完毕后在启动Hive
$ bin/hive
nohup bin/hive --service metastore &
Hive集成引擎Tez
Tez是一个Hive的运行引擎,性能优于MR。
Tez可以将多个有依赖的作业转换为一个作业,这样只需写一次HDFS,且中间节点较少,从而大大提升作业的计算性能。
将apache-tez-0.9.1-bin.tar.gz上传到HDFS的/tez目录下。
$ hadoop fs -put /opt/software/apache-tez-0.9.1-bin.tar.gz/ /tez
在Hive的/opt/module/hive/conf下面创建一个tez-site.xml文件
vim tez-site.xml
?xml version=“1.0” encoding=“UTF-8”?>
?xml-stylesheet type=“text/xsl” href=“configuration.xsl”?>
configuration>
property>
name>tez.lib.uris/name>
value>${fs.defaultFS}/tez/apache-tez-0.9.1-bin.tar.gz/value>
/property>
property>
name>tez.use.cluster.hadoop-libs
value>true
/property>
property>
name>tez.history.logging.service.class
value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService
/property>
/configuration>
在hive-env.sh文件中添加tez环境变量配置和依赖包环境变量配置
mv hive-env.sh.template hive-env.sh
vim hive-env.sh
#Set HADOOP_HOME to point to a specific hadoop install directory

export HADOOP_HOME=/opt/module/hadoop-2.7.2
#Hive Configuration Directory can be controlled by:

export HIVE_CONF_DIR=/opt/module/hive/conf
#Folder containing extra libraries required for hive compilation/execution can be controlled by:

export TEZ_HOME=/opt/module/tez-0.9.1 #是你的tez的解压目录
export TEZ_JARS=""
for jar in ls $TEZ_HOME |grep jar; do
export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/$jar
done
for jar in ls $TEZ_HOME/lib; do
export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/lib/$jar
done
export HIVE_AUX_JARS_PATH=/opt/module/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20.jar$TEZ_JARS
在hive-site.xml文件中添加如下配置,更改hive计算引擎
<property>
<name>hive.execution.engine</name>
<value>tez</value>
</property>
启动Hive
$ bin/hive
运行Tez时检查到用过多内存而被NodeManager杀死进程问题:
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
xsync yarn-site.xml

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值