CDH安装Tez 0.8.5

7 篇文章 0 订阅
1 篇文章 0 订阅

CDH安装Tez 0.8.5

  • 1.1前置环境

1)安装JDK

2)安装Maven

下载安装包:apache-maven-3.5.4-bin.tar.gz

  • 解压:
tar -zxvf apache-maven-3.5.4-bin.tar.gz -C /usr/local/software/maven
  • 配置:
[joy@hadoop002 dev_env]$ vim /etc/profile
export MAVEN_HOME=/usr/local/software/maven/apache-maven-3.5.4
export PATH=${MAVEN_HOME}/bin:$PATH


[joy@hadoop002 maven]$ source /etc/profile
[joy@hadoop002 maven]$ mvn -v
Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-18T02:33:14+08:00)
Maven home: /opt/dev_env/maven/maven
Java version: 1.8.0_91, vendor: Oracle Corporation, runtime: /usr/local/jdk1.8.0_91/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.10.0-862.el7.x86_64", arch: "amd64", family: "unix"


cd /opt/dev_env/maven
mkdir mavenRepository
[joy@hadoop002 maven]$ vim $MAVEN_HOME/conf/settings.xml

# 新增mavenRepository路径
<localRepository>/opt/dev_env/maven/mavenRepository/</localRepository>

 

 

# 搜索 mirrors 在<mirrors> 新增阿里maven下载路径  </mirrors>
     <!-- 阿里Maven下载路径 -->
     <mirror>
        <id>nexus-aliyun</id>
        <mirrorOf>*,!cloudera</mirrorOf>
        <name>Nexus aliyun</name>                     
        <url> http://maven.aliyun.com/nexus/content/groups/public</url>
     </mirror

 

3)安装protobuf-2.5.0

protobuf 2.5.0 (必须是这个版本, 官网中有说明)https://github.com/protocolbuffers/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz

解压:

tar -zxvf protobuf-2.5.0.tar.gz -C ./
  • OS依赖包安装:
yum -y install gcc gcc-c++ libstdc++-devel make build
  • 编译
[joy@hadoop002 protobuf]$ cd protobuf-2.5.0
[joy@hadoop002 protobuf]$ ln -s protobuf-2.5.0/ protobuf
[joy@hadoop002 protobuf]$ ./configure
# ./configure --prefix=/usr/local/protobuf # 建议这样安装
# (完了之后会在 /usr/local/bin 目录下生成一个可执行文件 protoc)
[joy@hadoop002 protobuf]$ make
[joy@hadoop002 protobuf]$ make install
# 注意: 验证是否安装成功,configure后默认位置/usr/local/bin/会出现protoc
[joy@hadoop002 protobuf]$ /usr/local/bin/protoc --version
# /usr/local/protobuf/bin/protoc --version
libprotoc 2.5.0
  • 配置环境变量
# protobuf
# 在文件的末尾添加如下的两行:
export PATH=$PATH:/usr/local/bin/
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig/
# export PATH=$PATH:/usr/local/protobuf/bin/
# export PKG_CONFIG_PATH=/usr/local/protobuf/lib/pkgconfig/
  • 1.2下载并解压tez

(1)下载地址:http://tez.apache.org/releases/

  • 1.3修改

建议在win10下通过编辑工具修改,因为文件比较大,比较难找到需要修改的地方

(1)修改pom.xml

第一处:修改为我们cdh所用版本

<hadoop.version>2.6.0-cdh5.16.2</hadoop.version>

第二三处:添加Cloudera的Maven仓库地址【因为Hadoop环境版本为CDH版本】

<repositories>
  <repository>
    <id>${distMgmtSnapshotsId}</id>
    <name>${distMgmtSnapshotsName}</name>
    <url>${distMgmtSnapshotsUrl}</url>
  </repository>
  <repository>
    <id>cloudera</id>
    <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    <name>Cloudera Repositories</name>
    <snapshots>
      <enabled>false</enabled>
    </snapshots>
  </repository>
</repositories>

 

<pluginRepositories>

  <pluginRepository>
    <id>maven2-repository.atlassian</id>
    <name>Atlassian Maven Repository</name>
    <url>https://maven.atlassian.com/repository/public</url>
    <layout>default</layout>
  </pluginRepository>
  <pluginRepository>
    <id>${distMgmtSnapshotsId}</id>
    <name>${distMgmtSnapshotsName}</name>
    <url>${distMgmtSnapshotsUrl}</url>
    <layout>default</layout>
  </pluginRepository>
  <pluginRepository>
    <id>cloudera</id>
    <name>Cloudera Repositories</name>
    <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
  </pluginRepository>
</pluginRepositories>

第四处:注释掉tez-ext-service-tests、tez-ui、tez-ui2这三个模块

# 注意: 这里稍微不同
tez 0.9.2 注释掉 tez-ext-service-tests、tez-ui
tez 0.8.5 注释掉 tez-ext-service-tests、tez-ui、tez-ui2
<modules>
    <module>hadoop-shim</module>
    <module>tez-api</module>
    <module>tez-common</module>
    <module>tez-runtime-library</module>
    <module>tez-runtime-internals</module>
    <module>tez-mapreduce</module>
    <module>tez-examples</module>
    <module>tez-tests</module>
    <module>tez-dag</module>
    <!-- <module>tez-ext-service-tests</module> -->
    <!-- <module>tez-ui</module> -->
    <!-- <module>tez-ui2</module> -->
    <module>tez-plugins</module>
    <module>tez-tools</module>
    <module>hadoop-shim-impls</module>
    <module>tez-dist</module>
    <module>docs</module>
  </modules>

(2)修改java类

# 进入到:

cd apache-tez-0.8.5-src
vim tez-mapreduce/src/main/java/org/apache/tez/mapreduce/hadoop/mapreduce/JobContextImpl.java

修改:JobContextImpl.java,在文件最后新增:

/**
 * Get the boolean value for the property that specifies which classpath
 * takes precedence when tasks are launched. True - user's classes takes
 * precedence. False - system's classes takes precedence.
 * @return true if user's classes should take precedence
 */
 @Override
  public boolean userClassesTakesPrecedence() {
    return getJobConf().getBoolean(MRJobConfig.MAPREDUCE_JOB_USER_CLASSPATH_FIRST, false);
  }
  • 1.4编译
cd /opt/dev_env/apache-tez-0.8.5-src
mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true

1.5 获取编译好的Tez包: tez-0.8.5.tar.gz

[joy@hadoop002 apache-tez-0.8.5-src]$ cd tez-dist/target/
[joy@hadoop002 target]$ ll
total 57616
drwxrwxr-x. 2 joy joy        6 Jun 12 19:34 archive-tmp
drwxrwxr-x. 2 joy joy       28 Jun 12 19:34 maven-archiver
drwxrwxr-x. 3 joy joy     4096 Jun 12 19:34 tez-0.8.5
drwxrwxr-x. 3 joy joy     4096 Jun 12 19:34 tez-0.8.5-minimal
-rw-rw-r--. 1 joy joy 12623396 Jun 12 19:34 tez-0.8.5-minimal.tar.gz
-rw-rw-r--. 1 joy joy 46359820 Jun 12 19:34 tez-0.8.5.tar.gz
-rw-rw-r--. 1 joy joy     2869 Jun 12 19:34 tez-dist-0.8.5-tests.jar
  • (1)将tez-0.8.5.tar.gz压缩包上传到HDFS上:/user/tez
[joy@hadoop002 target]$ hdfs dfs -mkdir /user/tez
[joy@hadoop002 target]$ hdfs dfs -chmod -R 775 /user/tez/
[joy@hadoop002 target]$ hdfs dfs -put tez-0.8.5.tar.gz /user/tez/
[joy@hadoop002 target]$ hdfs dfs -ls /user/tez
Found 1 items
-rw-r--r--   3 joy supergroup   46359820 2020-06-15 15:50 /user/tez/tez-0.8.5.tar.gz

(2)创建tez目录

在/opt/cloudera/parcels/CDH/lib 下创建tez目录

sudo mkdir /opt/cloudera/parcels/CDH/lib/tez

进入到tez目录下,创建conf目录

sudo mkdir /opt/cloudera/parcels/CDH/lib/tez/conf
  • 将(1)中的tez-0.9.2-minimal文件夹下的jar及lib下的jar拷贝到tez中

在/opt/cloudera/parcels/CDH/lib/tez 中如下目录:

### 注意: 这里是把tez-0.8.5-minimal下的内容拷贝,而不是tez-0.8.5下的内容
# 进入apache-tez-0.8.5-src/tez-dist/target/tez-0.8.5-minimal
[joy@hadoop002 tez-0.8.5-minimal$ cd /opt/dev_env/apache-tez-0.8.5-src/tez-dist/target/tez-0.8.5-minimal
# 将*.jar拷贝至/opt/cloudera/parcels/CDH/lib/tez
[joy@hadoop002 tez-0.8.5-minimal$ sudo cp ./*.jar /opt/cloudera/parcels/CDH/lib/tez
# 将lib拷贝至/opt/cloudera/parcels/CDH/lib/tez
[joy@hadoop002 tez-0.8.5-minimal$ sudo cp -r ./lib /opt/cloudera/parcels/CDH/lib/tez

[joy@hadoop002 tez-0.8.5-minimal$ cd /opt/cloudera/parcels/CDH/lib/tez
[joy@hadoop002 tez-0.8.5-minimal]$ ll

其中conf文件夹下,放置tez-site.xml

 sudo vim tez-site.xml
### 内容如下:

<configuration>

    <property>
      <name>tez.lib.uris</name>
      <!-- 这里指向hdfs上的tez.tar.gz包 -->
      <value>${fs.defaultFS}/user/tez/tez-0.8.5.tar.gz</value>
    </property>
    <property>
      <name>tez.use.cluster.hadoop-libs</name>
      <value>false</value>
      <description>使用hadoop自身的lib包,设置为true的话可以使用minimal的tez包,false的话需要使用tez-0.9.2.tar.gz的包</description>
    </property>
    <property>
      <name>hive.tez.container.size</name>
      <value>1024</value>
      <description>Set hive.tez.container.size to be the same as or a small multiple(1 or 2 times that) of YARN container size yarn.scheduler.minimum-allocation-mb but NEVER more than yarn.scheduler.maximum-allocation-mb</description>
    </property>

    <!-- 
    <property>
      <name>tez.container.max.java.heap.fraction</name>
      <description>这里是因为我机器内存不足, 而添加的参数</description>
      <value>0.2</value>
    </property>
    <property> 
       <name>tez.am.launch.cluster-default.env</name>
       <value>LD_LIBRARY_PATH={{HADOOP_COMMON_HOME}}/lib/native</value>
       <description>tez任务执行会使用native命令,没有的话会报错</description>
    <property>
    </property>
       <name>mapreduce.admin.user.env</name>
       <value>LD_LIBRARY_PATH={{HADOOP_COMMON_HOME}}/lib/native</value>
       <description>tez任务执行会使用native命令,没有的话会报错</description>
    </property>
    -->

</configuration>

从 /opt/cloudera/parcels/CDH/jars中拷贝kryo-2.22.jar到tez文件夹下的lib文件夹下 

[joy@hadoop002 tez]$ sudo cp /opt/cloudera/parcels/CDH/jars/kryo-2.22.jar /opt/cloudera/parcels/CDH/lib/tez/lib/

防止出现以下异常:

java.lang.ClassNotFoundException: com.esotericsoftware.kryo.Serializer

/opt/cloudera/parcels/CDH/lib/tez/lib/tez/lib中包含slf4j的jar包,会打印较多日志,可以在客户端中去掉slf4j-api-1.7.10.jar、slf4j-log4j12-1.7.10.jar这两个jar包,减少日志打印

# 直接修改名字就可以
[joy@hadoop002 lib]$ cd /opt/cloudera/parcels/CDH/lib/tez/lib
sudo mv slf4j-api-1.7.10.jar slf4j-api-1.7.10.jar.bak
sudo mv slf4j-log4j12-1.7.10.jar slf4j-log4j12-1.7.10.jar.bak

到这里就客户端的基本配置结束。

将 tez这个文件 copy到集群的其他主机的cloudera目录下

scp -r /opt/cloudera/parcels/CDH/lib/tez joy@hadoop003:/opt/cloudera/parcels/CDH/lib/

配置hive环境变量

  • 在cdh找到hive客户端配置

  • 搜索 客户端 可以直接找到下面两个位置
  • HiveServer2 环境高级配置代码段(安全阀)
HADOOP_CLASSPATH=/opt/cloudera/parcels/CDH/lib/tez/conf:/opt/cloudera/parcels/CDH/lib/tez/*:/opt/cloudera/parcels/CDH/lib/tez/lib/*
  • hive-env.sh 的 Gateway 客户端环境高级配置代码段(安全阀)

然后重启hive,这样配置的环境变量才会生效。

重新部署客户端配置,tez安装完成。

测试:

  • 1. hive Cli

    2. beeline

  • 安装过程中出现的错误

  • java.lang.ArithmeticException: / by zero

解决:

# 直接在tez-site.xml中进行配置(上面配置已经补充,根据集群设置而定)
set hive.tez.container.size = 1024;
Status: Running (Executing on YARN cluster with App id application_1592204457838_0003)

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1                 FAILED     -1          0        0       -1       0       0
Reducer 2             KILLED      1          0        0        1       0       0
--------------------------------------------------------------------------------
VERTICES: 00/02  [>>--------------------------] 0%    ELAPSED TIME: 0.42 s
--------------------------------------------------------------------------------
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1592204457838_0003_1_00, diagnostics=[Vertex vertex_1592204457838_0003_1_00 [Map 1] 
killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: ce_user initializer failed, vertex=vertex_1592204457838_0003_1_00 [Map 1], 
java.lang.ArithmeticException: / by zero
        at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:123)
        at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
        at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
        at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
        at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
]
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1592204457838_0003_1_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1592204457838_0003_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask

 

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值