02.03第二篇之数据环境准备

第3 章数据环境准备

3.1 安装Hive2.3

1)上传apache-hive-2.3.0-bin.tar.gz 到/opt/software 目录下,并解压到/opt/module

[atguigu@hadoop102 software]$ tar -zxvf apache-hive-2.3.6-bin.tar.gz -C /opt/module/

2)修改apache-hive-2.3.6-bin 名称为hive

[atguigu@hadoop102 module]$ mv apache-hive-2.3.6-bin hive

3)将Mysql 的mysql-connector-java-5.1.27-bin.jar 拷贝到/opt/module/hive/lib/

[atguigu@hadoop102 module]$ cp /opt/software/mysql-libs/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27-bin.jar /opt/module/hive/lib/

4)在/opt/module/hive/conf 路径上,创建hive-site.xml 文件

[atguigu@hadoop102 conf]$ vim hive-site.xml

添加如下内容

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://hadoop102:3306/metastore?createDatabaseIfNotExist=true</value>
        <description>JDBC connect string for a JDBC metastore</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
        <description>Driver class name for a JDBC metastore</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
        <description>username to use against metastore database</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>000000</value>
        <description>password to use against metastore database</description>
    </property>
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
        <description>location of default database for the warehouse</description>
    </property>
    <property>
        <name>hive.cli.print.header</name>
        <value>true</value>
    </property>
    <property>
        <name>hive.cli.print.current.db</name>
        <value>true</value>
    </property>
    <property>
        <name>hive.metastore.schema.verification</name>
        <value>false</value>
    </property>
    <property>
        <name>datanucleus.schema.autoCreateAll</name>
        <value>true</value>
    </property>
    <property>
        <name>hive.metastore.uris</name>
        <value>thrift://hadoop102:9083</value>
    </property>
</configuration>

注意,Hive安装在哪个服务器节点,thrift://hadoop102:9083中的主机名就更换为相应的主机名

3)启动服务

[atguigu@hadoop102 hive]$ nohup bin/hive --service metastore &

Hive 2.3较之前的hive版本相比,要先启动上面的命令元数据服务!!

 

本次hiveserver2不启动了

[atguigu@hadoop102 hive]$ nohup bin/hive --service hiveserver2 &

注意:hive2.x版本需要启动两个服务metastore 和hiveserver2,否则会报错Exception in thread “main” java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException:java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStroreClient

4)查看进程jps, RunJar进程就是Hive

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-LkInp0RT-1599489475568)(file:///C:/Users/ADMINI~1/AppData/Local/Temp/msohtmlclip1/01/clip_image001.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Oz1PLBEc-1599489475571)(file:///C:/Users/ADMINI~1/AppData/Local/Temp/msohtmlclip1/01/clip_image003.jpg)]

5)服务启动完毕后在启动Hive

[atguigu@hadoop102 hive]$ bin/hive

Hive(default)>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-2C3gHuR6-1599489475574)(file:///C:/Users/ADMINI~1/AppData/Local/Temp/msohtmlclip1/01/clip_image005.jpg)]

3.2 Hive 集成引擎Tez

Tez是一个Hive的运行引擎,性能优于MR。为什么优于MR呢?看下图。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-PxqMb2b0-1599489475580)(file:///C:/Users/ADMINI~1/AppData/Local/Temp/msohtmlclip1/01/clip_image007.png)]

用Hive直接编写MR程序,假设有四个有依赖关系的MR作业,上图中,绿色是Reduce Task,云状表示写屏蔽,需要将中间结果持久化写到HDFS。

Tez可以将多个有依赖的作业转换为一个作业,这样只需写一次HDFS,且中间节点较少,从而大大提升作业的计算性能。

3.2.1 安装包准备

1)下载tez的依赖包:http://tez.apache.org

2)拷贝apache-tez-0.9.1-bin.tar.gz到hadoop102的/opt/software目录

[atguigu@hadoop102 software]$ ls

apache-tez-0.9.1-bin.tar.gz

3)解压缩apache-tez-0.9.1-bin.tar.gz

[atguigu@hadoop102 software]$ tar -zxvf apache-tez-0.9.1-bin.tar.gz /opt/module

4)修改名称

[atguigu@hadoop102 module]$ mv apache-tez-0.9.1-bin/ tez-0.9.1
3.2.2 集成Tez

1)在Hive的/opt/module/hive/conf下面创建一个tez-site.xml文件

[atguigu@hadoop102 conf]$ pwd

/opt/module/hive/conf

[atguigu@hadoop102 conf]$ vim tez-site.xml

添加如下内容

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
	<name>tez.lib.uris</name>    <value>${fs.defaultFS}/tez/tez-0.9.1,${fs.defaultFS}/tez/tez-0.9.1/lib</value>
</property>
<property>
     <name>tez.use.cluster.hadoop-libs</name>
     <value>true</value>
</property>
<property>
     <name>tez.history.logging.service.class</name>        <value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
</property>
</configuration> 

2)进入到Hive的配置目录:/opt/module/hive/conf

[atguigu@hadoop102 conf]$ pwd

/opt/module/hive/conf

3)在hive-env.sh文件中添加tez环境变量配置和依赖包环境变量配置

[atguigu@hadoop102 conf]$ mv hive-env.sh.template hive-env.sh

[atguigu@hadoop102 conf]$ vim hive-env.sh

添加如下配置

# Set HADOOP_HOME to point to a specific hadoop install directory
export HADOOP_HOME=/opt/module/hadoop-2.7.2

# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/opt/module/hive/conf

# Folder containing extra libraries required for hive compilation/execution can be controlled by:
export TEZ_HOME=/opt/module/tez-0.9.1    #是你的tez的解压目录
export TEZ_JARS=""
for jar in `ls $TEZ_HOME |grep jar`; do
    export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/$jar
done
for jar in `ls $TEZ_HOME/lib`; do
    export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/lib/$jar
done

export HIVE_AUX_JARS_PATH=/opt/module/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20.jar$TEZ_JARS

3)在hive-site.xml文件中添加如下配置,更改hive计算引擎

<property>

​    <name>hive.execution.engine</name>

​    <value>tez</value>

</property>
3.2.3 上传Tez到集群

1)将/opt/module/tez-0.9.1上传到HDFS的/tez路径

[atguigu@hadoop102 conf]$ hadoop fs -mkdir /tez

[atguigu@hadoop102 conf]$ hadoop fs -put /opt/module/tez-0.9.1/ /tez

[atguigu@hadoop102 conf]$ hadoop fs -ls /tez

/tez/tez-0.9.1
3.2.4 测试

1)启动Hive

[atguigu@hadoop102 hive]$ bin/hive

2)创建LZO表

hive (default)> create table student(

id int,

name string);

3)向表中插入数据

hive (default)> insert into student values(1,"zhangsan");

4)如果没有报错就表示成功了

hive (default)> select * from student;

1          zhangsan

若有报错了。如下方的解决措施

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
3.2.5 注意事项

1)运行Tez 时检查到用过多内存而被NodeManager 杀死进程问题:

Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession
has already shutdown. Application application_1546781144082_0005
failed 2 times due to AM Container for
appattempt_1546781144082_0005_000002 exited with exitCode: -103
For more detailed output, check application tracking
page:http://hadoop103:8088/cluster/app/application_15467811440
82_0005Then, click on links to logs of each attempt.
Diagnostics: Container
[pid=11116,containerID=container_1546781144082_0005_02_000001]
is running beyond virtual memory limits. Current usage: 216.3 MB
of 1 GB physical memory used; 2.6 GB of 2.1 GB virtual memory used.
Killing container.

这种问题是从机上运行的Container 试图使用过多的内存,而被NodeManager kill 掉了。

[摘录] The NodeManager is killing your container. It sounds like
you are trying to use hadoop streaming which is running as a child
process of the map-reduce task. The NodeManager monitors the entire
process tree of the task and if it eats up more memory than the
maximum set in mapreduce.map.memory.mb or
mapreduce.reduce.memory.mb respectively, we would expect the
Nodemanager to kill the task, otherwise your task is stealing memory
belonging to other containers, which you don't want.

2)解决方法:

(1)关掉虚拟内存检查,修改yarn-site.xml,增加如下,但还是报错,搞不定。。。应该还是内存不够。

<!--# Whether virtual memory limits will be enforced for containers.-->
<!--# 虚拟内存限制 -->
<property>
	<name>yarn.nodemanager.vmem-check-enabled</name>
	<value>false</value>
</property>

##如果还不行,可以考虑增加如下的配置
<property>
<name>hadoop.zk.address</name>
<value>hadoop102:2181,hadoop103:2181,hadoop104:2181</value>
</property>


(2)修改后一定要分发,并重新启动hadoop 集群。

[atguigu@hadoop102 hadoop]$ xsync yarn-site.xml

回顾:要熟悉业务数据的常用表单

在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值