CDH6.3.2 集成 tez0.9.1

目录

 

1 参考:

2 我的环境:

3  安装maven

3.1 下载maven

3.2上传解压

3.3 配置 MVN_HOMR

3.4 验证maven

4 安装protobuf-2.5.0.tar.gz

4.1 下载

4.2 上传解压

4.3 configure校验

4.4 make

4.5  make install

4.6  验证protobuf

5 安装 Tez

5.1 下载

5.2 上传解压

5.3 修改pom.xml

5.4  maven编译

 5.5  遇到问题

5.6  最后编译成功

5.7  整个tez到hdfs

5.8 整个tez到hive

6.1  配置环境

6.2  测试效果

6.3 异常处理

6.4  日志太多的处理

 6.5 hiveserver2也配置能启用tez

 6.6  替换hive默认计算引擎


1 参考:

https://www.freesion.com/article/9435149734/
https://blog.csdn.net/Shea1992/article/details/101041244
https://www.jianshu.com/p/9fb9f32e1f0f
https://www.jianshu.com/p/45c95a51a8c2
https://blog.csdn.net/weixin_43941899/article/details/105787688

2 我的环境:

  • hadoop版本:3.0.0-cdh6.3.2
  • linux环境:centos7.6
  • jdk1.8.0_131

3  安装maven

3.1 下载maven

https://maven.apache.org/download.cgi

3.2上传解压

tar -zxvf apache-maven-3.6.3-bin.tar.gz -C /data/module

3.3 配置 MVN_HOMR

[root@cluster2-slave2 ~]# vim /etc/profile

export MVN_HOME=/data/module/apache-maven-3.6.3
export PATH=$PATH:$MVN_HOME/bin
[root@cluster2-slave2 ~]# source /etc/profile

3.4 验证maven

[root@cluster2-slave2 ~]# mvn -v

4 安装protobuf-2.5.0.tar.gz

4.1 下载

只能是2.5.0这个版本

因为后面安装tez0.91的时候加压后在pom.xml里可以看到,要求就是2.5.0的

hadoop使用protocol buffer进行通信,需要下载和安装 protobuf-2.5.0.tar.gz。

但是现在 protobuf-2.5.0.tar.gz已经无法在官网 https://code.google.com/p/protobuf/downloads/list中 下载了

我在百度网盘找到了下载链接这里附上下载链接

链接:https://pan.baidu.com/s/1hm7D2_wxIxMKbN9xnlYWuA 
提取码:haz4 
复制这段内容后打开百度网盘手机App,操作更方便哦

4.2 上传解压

tar -zxvf protobuf-2.5.0.tar.gz

4.3 configure校验

cd protobuf-2.5.0/

[root@cluster2-slave2 protobuf-2.5.0]# ./configure

第一次我校验失败:

checking whether to enable maintainer-specific portions of Makefiles... yes
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking target system type... x86_64-unknown-linux-gnu
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking for style of include used by make... GNU
checking dependency style of gcc... gcc3
checking for g++... no
checking for c++... no
checking for gpp... no
checking for aCC... no
checking for CC... no
checking for cxx... no
checking for cc++... no
checking for cl.exe... no
checking for FCC... no
checking for KCC... no
checking for RCC... no
checking for xlC_r... no
checking for xlC... no
checking whether we are using the GNU C++ compiler... no
checking whether g++ accepts -g... no
checking dependency style of g++... none
checking how to run the C++ preprocessor... /lib/cpp
configure: error: in `/data/software/protobuf-2.5.0':
configure: error: C++ preprocessor "/lib/cpp" fails sanity check
See `config.log' for more details

错误信息:

configure: error: in `/data/software/protobuf-2.5.0':
configure: error: C++ preprocessor "/lib/cpp" fails sanity check

问题的根源是缺少必要的C++库。如果是CentOS系统,运行,如下命令解决 

yum install glibc-headers

yum install gcc-c++ 

结束后日志

.....省略......


Installed:
  gcc-c++.x86_64 0:4.8.5-39.el7                                                                                                                  

Dependency Installed:
  libstdc++-devel.x86_64 0:4.8.5-39.el7                                                                                                          

Dependency Updated:
  libstdc++.x86_64 0:4.8.5-39.el7                                                                                                                

Complete!

再次检查通过

[root@cluster2-slave2 protobuf-2.5.0]# ./configure
checking whether to enable maintainer-specific portions of Makefiles... yes
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking target system type... x86_64-unknown-linux-gnu
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /usr/bin/mkdir -p

。。。。。。。。省略。。。。。。。。

checking whether the g++ linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking dynamic linker characteristics... (cached) GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking for python... /usr/bin/python
checking for the pthreads library -lpthreads... no
checking whether pthreads work without any flags... no
checking whether pthreads work with -Kthread... no
checking whether pthreads work with -kthread... no
checking for the pthreads library -llthread... no
checking whether pthreads work with -pthread... yes
checking for joinable pthread attribute... PTHREAD_CREATE_JOINABLE
checking if more special flags are required for pthreads... no
checking whether to check for GCC pthread/shared inconsistencies... yes
checking whether -pthread is sufficient with -shared... yes
configure: creating ./config.status
config.status: creating Makefile
config.status: creating scripts/gtest-config
config.status: creating build-aux/config.h
config.status: executing depfiles commands
config.status: executing libtool commands
[root@cluster2-slave2 protobuf-2.5.0]# 

4.4 make

[root@cluster2-slave2 protobuf-2.5.0]# make
.....省略....

libtool: link: ranlib .libs/libprotobuf-lite.a
libtool: link: ( cd ".libs" && rm -f "libprotobuf-lite.la" && ln -s "../libprotobuf-lite.la" "libprotobuf-lite.la" )
make[3]: Leaving directory `/data/software/protobuf-2.5.0/src'
make[2]: Leaving directory `/data/software/protobuf-2.5.0/src'
make[1]: Leaving directory `/data/software/protobuf-2.5.0'

4.5  make install

[root@cluster2-slave2 protobuf-2.5.0]# make install
.........省略...........

/usr/bin/mkdir -p '/usr/local/include/google/protobuf/io'
 /usr/bin/install -c -m 644  google/protobuf/io/coded_stream.h google/protobuf/io/gzip_stream.h google/protobuf/io/printer.h google/protobuf/io/tokenizer.h google/protobuf/io/zero_copy_stream.h google/protobuf/io/zero_copy_stream_impl.h google/protobuf/io/zero_copy_stream_impl_lite.h '/usr/local/include/google/protobuf/io'
make[3]: Leaving directory `/data/software/protobuf-2.5.0/src'
make[2]: Leaving directory `/data/software/protobuf-2.5.0/src'
make[1]: Leaving directory `/data/software/protobuf-2.5.0/src'

4.6  验证protobuf

[root@cluster2-slave2 protobuf-2.5.0]# protoc --version
libprotoc 2.5.0

可见安装成功

5 安装 Tez

5.1 下载

http://www.apache.org/dyn/closer.lua/tez/0.9.1/

下载源码自己根据自己的环境编译

5.2 上传解压

[root@cluster2-slave2 software]# tar -zxvf apache-tez-0.9.1-src.tar.gz -C ../module/

5.3 修改pom.xml

有四处修改:

  • (1)hadoop.version版本对应修改

确认自己的hadoop版本

修改如下 

  • (2)repository.cloudera

  • (3) 新增pluginRepository.cloudera

  • (4) 注释不必要的东西减少下载和编译出错概率

完整pom.xml 如下

<?xml version="1.0" encoding="UTF-8"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>org.apache.tez</groupId>
  <artifactId>tez</artifactId>
  <packaging>pom</packaging>
  <version>0.9.1</version>
  <name>tez</name>

  <licenses>
    <license>
      <name>The Apache Software License, Version 2.0</name>
      <url>http://www.apache.org/licenses/LICENSE-2.0.txt</url>
    </license>
  </licenses>

  <organization>
    <name>Apache Software Foundation</name>
    <url>http://www.apache.org</url>
  </organization>

  <properties>
    <maven.test.redirectTestOutputToFile>true</maven.test.redirectTestOutputToFile>
    <clover.license>${user.home}/clover.license</clover.license>
    <hadoop.version>3.0.0-cdh6.3.2</hadoop.version>
    <jetty.version>6.1.26</jetty.version>
    <netty.version>3.6.2.Final</netty.version>
    <pig.version>0.13.0</pig.version>
    <javac.version>1.8</javac.version>
    <slf4j.version>1.7.10</slf4j.version>
    <enforced.java.version>[${javac.version},)</enforced.java.version>
    <distMgmtSnapshotsId>apache.snapshots.https</distMgmtSnapshotsId>
    <distMgmtSnapshotsName>Apache Development Snapshot Repository</distMgmtSnapshotsName>
    <distMgmtSnapshotsUrl>https://repository.apache.org/content/repositories/snapshots</distMgmtSnapshotsUrl>
    <distMgmtStagingId>apache.staging.https</distMgmtStagingId>
    <distMgmtStagingName>Apache Release Distribution Repository</distMgmtStagingName>
    <distMgmtStagingUrl>https://repository.apache.org/service/local/staging/deploy/maven2</distMgmtStagingUrl>
    <failIfNoTests>false</failIfNoTests>
    <protobuf.version>2.5.0</protobuf.version>
    <protoc.path>${env.PROTOC_PATH}</protoc.path>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <scm.url>scm:git:https://git-wip-us.apache.org/repos/asf/tez.git</scm.url>
    <build.time>${maven.build.timestamp}</build.time>
    <frontend-maven-plugin.version>1.4</frontend-maven-plugin.version>
    <findbugs-maven-plugin.version>3.0.1</findbugs-maven-plugin.version>
    <javadoc-maven-plugin.version>2.10.4</javadoc-maven-plugin.version>
    <shade-maven-plugin.version>2.4.3</shade-maven-plugin.version>
  </properties>
  <scm>
    <connection>${scm.url}</connection>
  </scm>

  <distributionManagement>
    <repository>
      <id>${distMgmtStagingId}</id>
      <name>${distMgmtStagingName}</name>
      <url>${distMgmtStagingUrl}</url>
    </repository>
    <snapshotRepository>
      <id>${distMgmtSnapshotsId}</id>
      <name>${distMgmtSnapshotsName}</name>
      <url>${distMgmtSnapshotsUrl}</url>
    </snapshotRepository>
  </distributionManagement>

  <repositories>
    <repository>
      <id>${distMgmtSnapshotsId}</id>
      <name>${distMgmtSnapshotsName}</name>
      <url>${distMgmtSnapshotsUrl}</url>
    </repository>
	<repository>
	  <id>cloudera</id>
	  <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
	  <name>Cloudera Repositories</name>
	  <snapshots>
       <enabled>false</enabled>
      </snapshots>
    </repository>
  </repositories>

  <pluginRepositories>
    <pluginRepository>
      <id>maven2-repository.atlassian</id>
      <name>Atlassian Maven Repository</name>
      <url>https://maven.atlassian.com/repository/public</url>
      <layout>default</layout>
    </pluginRepository>
    <pluginRepository>
      <id>${distMgmtSnapshotsId}</id>
      <name>${distMgmtSnapshotsName}</name>
      <url>${distMgmtSnapshotsUrl}</url>
      <layout>default</layout>
    </pluginRepository>
	<pluginRepository>
      <id>cloudera</id>
      <name>Cloudera Repositories</name>
      <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </pluginRepository>
  </pluginRepositories>

  <dependencyManagement>
    <dependencies>
     <!-- 原文省略 -->
    </dependencies>
  </dependencyManagement>

  <modules>
    <module>hadoop-shim</module>
    <module>tez-api</module>
    <module>tez-common</module>
    <module>tez-runtime-library</module>
    <module>tez-runtime-internals</module>
    <module>tez-mapreduce</module>
    <module>tez-examples</module>
    <module>tez-tests</module>
    <module>tez-dag</module>
	<!--
    <module>tez-ext-service-tests</module>
    <module>tez-ui</module>
	-->
    <module>tez-plugins</module>
    <module>tez-tools</module>
    <module>hadoop-shim-impls</module>
    <module>tez-dist</module>
    <module>docs</module>
  </modules>

  <build>
    <!-- 原文省略 -->
  </build>

  <profiles>
    <!-- 原文省略 -->
  </profiles>

  <reporting>
    <!-- 原文省略 -->
  </reporting>

</project>

5.4  maven编译

注意:我们不需要javadoc和test所以可以编译时跳过

所以用下面领命编译

在pom.xml同级目录下执行

mvn clean package -Dmaven.javadoc.skip=true -Dmaven.test.skip=true

开始编译

 5.5  遇到问题

第一次编译要下载很多包有可能中途下载失败

比如下面就是一次包的依赖没有下载成功报错了 

重试后所有依赖的jar下载成功

但是编译还遇到了一个问题

报ApplicationReport.newInstance() 89行异常

/data/module/apache-tez-0.9.1-src/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/clienttRunningJob.java:[89,29] no suitable method found for newInstance(org.apache.hadoop.yarn.api.records.ApplicationId,org.apache.hadoop.yarn.api.records.ApplicationAttemptId,java.lang.String,java.lang.String,java.lang.String,java.lang.String,int,<nulltype>,org.apache.hadoop.yarn.api.records.YarnApplicationState,java.lang.String,java.lang.String,int,int,org.apache.hadoop.yarn.api.records.FinalApplicationStatus,<nulltype>,java.lang.String,float,java.lang.String,<nulltype>)

解决办法:更换ApplicationReport.newInstance()的另一个方法

更换源代码:

return ApplicationReport.newInstance(unknownAppId, unknownAttemptId, "N/A",
        "N/A", "N/A", "N/A", 0, null, YarnApplicationState.NEW, "N/A", "N/A",
        0, 0, FinalApplicationStatus.UNDEFINED, null, "N/A", 0.0f, "TEZ_MRR", null)

 更换为新代码:

return ApplicationReport.newInstance(unknownAppId, unknownAttemptId, "N/A",
        "N/A", "N/A", "N/A", 0, null, YarnApplicationState.NEW, "N/A", "N/A",
        0, 0, 0, FinalApplicationStatus.UNDEFINED, null, "N/A", 0.0f, "TEZ_MRR", null);
vim tez-mapreduce/src/main/java/org/apache/tez/mapreduce/clienttRunningJob.java

 

然后继续编译

mvn clean package -Dmaven.javadoc.skip=true -Dmaven.test.skip=true

5.6  最后编译成功

编译后的文件在tez-dist/target下面

我的在:/data/module/apache-tez-0.9.1-src/tez-dist/target

5.7  整个tez到hdfs

hdfs上创建相应的tez目录

[root@cluster2-slave2 target]# hadoop fs -mkdir /user/tez

上传tez-0.9.1.tar.gz到 /user/tez目录

[root@cluster2-slave2 target]# hadoop fs -put tez-0.9.1.tar.gz /user/tez

5.8 整个tez到hive

(1)进入CDH lib目录

cd /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib

(2) 创建tez相关目录目录

[root@cluster2-slave2 lib]# mkdir -p tez/conf

 (3)创建 tez-site.xml文件

[root@cluster2-slave2 lib]# cd tez/conf/
[root@cluster2-slave2 conf]# vim tez-site.xml
<configuration>
  <property>
    <name>tez.lib.uris</name>
    <value>${fs.defaultFS}/user/tez/tez-0.9.1.tar.gz</value>
  </property>
  <property>
    <name>tez.use.cluster.hadoop-libs</name>
    <value>false</value>
  </property>
</configuration>

(4)tez-0.9.1-minimal拷贝到tez中

下面的目录

/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/tez

[root@cluster2-slave2 tez-0.9.1-minimal]# cp ./*.jar /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/tez
[root@cluster2-slave2 tez-0.9.1-minimal]# cp -r lib /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/tez

 

(5)分发tez这个文件到各个节点

[root@cluster2-slave2 lib]# scp -r tez root@cluster2-slave1:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib
[root@cluster2-slave2 lib]# scp -r tez root@cluster2-master:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib

 

6 配置hvie环境

6.1  配置环境

在cdh找到hive客户端配置

HADOOP_CLASSPATH=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/tez/conf:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/tez/*:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/tez/lib/*

然后保存并部署客户端配置,使生效

6.2  测试效果

报错

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
----------------------------------------------------------------------------------------------
Map 1            container        FAILED     -1          0        0       -1       0       0  
Reducer 2        container        KILLED      1          0        0        1       0       0  
----------------------------------------------------------------------------------------------
VERTICES: 00/02  [>>--------------------------] 0%    ELAPSED TIME: 0.15 s     
----------------------------------------------------------------------------------------------
20/08/07 13:01:23 INFO SessionState: Map 1: -/-	Reducer 2: 0/1	
Status: Failed
20/08/07 13:01:23 ERROR SessionState: Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1596710469388_0001_1_00, diagnostics=[Vertex vertex_1596710469388_0001_1_00 [Map 1] killed/failed due to:INIT_FAILURE, Fail to create InputInitializerManager, org.apache.tez.dag.api.TezReflectionException: Unable to instantiate class with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator
	at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:71)
	at org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:89)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:152)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:148)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:148)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:121)
	at org.apache.tez.dag.app.dag.impl.VertexImpl.setupInputInitializerManager(VertexImpl.java:4101)
	at org.apache.tez.dag.app.dag.impl.VertexImpl.access$3100(VertexImpl.java:205)
	at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:2912)
	at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2859)
	at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2841)
	at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
	at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
	at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:59)
	at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1939)
	at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:204)
	at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2317)
	at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2303)
	at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:180)
	at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68)
	... 25 more
Caused by: java.lang.NoClassDefFoundError: com/esotericsoftware/kryo/Serializer
	at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:404)
	at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:317)
	at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:131)
	... 30 more
Caused by: java.lang.ClassNotFoundException: com.esotericsoftware.kryo.Serializer
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 33 more
]

6.3 异常处理

ClassNotFoundException: com.esotericsoftware.kryo.Serializer

解决办法 修改:

删除或者重命名hive/auxlib下的hive-exec-2.1.1-cdh6.3.2-core.jar和hive-exec-core.jar

所有节点我都做了

cd /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hive/auxlib

原本这样的 

hive-exec-2.1.1-cdh6.3.2-core.jar -> ../../../jars/hive-exec-2.1.1-cdh6.3.2-core.jar
hive-exec-core.jar -> hive-exec-2.1.1-cdh6.3.2-core.jar

修改:重命名 

mv hive-exec-2.1.1-cdh6.3.2-core.jar hive-exec-2.1.1-cdh6.3.2-core.jar.bck
mv hive-exec-core.jar hive-exec-core.jar.bck

然后重启hive使生效

继续测试

注意必须有set hive.tez.container.size=3020;否则报错

hive> set hive.tez.container.size=3020;
hive> set hive.execution.engine=tez;

执行查询

hive> select sum(salemoney) from crm_sal_shop_sale;

打印出

6.4  日志太多的处理

实际上可以看出这个过程日志太多了。看着不清爽,下面的操作减少日志输出量

这是因为/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/tez/lib目录下有

slf4j-api-1.7.10.jar
slf4j-log4j12-1.7.10.jar
两个包

并且现在hive日志级别是 INFO

那么现在通过CDH控制台把hive日志级别设置成ERROR

重启hive使生效 

结果查看配置

查看hive查询日志

 可见清爽多了

也可以直接删除下面这两个包

slf4j-api-1.7.10.jar
slf4j-log4j12-1.7.10.jar

 

这个时候yarn上也显示 ApplicationType 是 Tez

 6.5 hiveserver2也配置能启用tez

set hive.tez.container.size=3020;
set hive.execution.engine=tez;

select brandid,vipid,sku_id,`time` from pro30049.add_to_cart_dt_partition where vipid=1197032 and `time`>=1588953500000 and `time`<=1589039999000;

 6.6  替换hive默认计算引擎

从上面可以看出,使用tez的时候需要临时设置

set hive.tez.container.size=3020;
set hive.execution.engine=tez;

这里我遇到的一个问题本应该

上面配置

如上图修改的时候就可以把配置分发到hive-site.xml 但是我的没有分发

测试下来还是需要添加临时设置 

set hive.tez.container.size=3020;
set hive.execution.engine=tez;

后面尝试直接在一个节点cluster2-slave2去修改hive-site.xml文件

 重启hive服务

然后测试

不用设置临时启用tez引擎了

但是其他节点的默认hive引擎依然是MR,所以如果需让其他节点也默认使用tez也需要这么自己去改,我暂时没有修改。 

### 回答1: 为了在CDH 6.3.2中集成Apache Atlas 2.1.0,需要按照以下步骤进行操作: 1. 准备工作: - 确保CDH集群已经安装和配置成功,并且可正常运行。 - 下载并解压Apache Atlas 2.1.0安装包,并将其上传到CDH集群的某一台主机上。 2. 配置Atlas: - 进入Atlas安装包的目录,编辑conf/atlas-env.sh文件,设置ATLAS_HOME和ATLAS_LOG_DIR变量。 - 编辑conf/atlas-application.properties文件,设置配置选项,如atlas.graph.index.search.backend=lucene和atlas.audit.hbase.tablename=ATLAS_HOOK。 - 如果需要使用LDAP进行用户身份验证,编辑conf/atlas-application.properties,设置atlas.authentication.method=LDAP,并配置相关的LDAP连接参数。 3. 配置Hadoop集成: - 进入CDH的HDFS配置目录,例如/etc/hadoop/conf.cloudera.hdfs/。 - 编辑hdfs-site.xml文件,在其中添加以下配置: ``` <property> <name>dfs.namenode.acls.enabled</name> <value>true</value> </property> <property> <name>dfs.namenode.acls.enabled</name> <value>true</value> </property> ``` - 重新启动HDFS服务,使配置生效。 4. 初始化Atlas: - 切换到Atlas安装包目录,运行bin/atlas_start.py脚本以启动Atlas服务。 - 运行bin/atlas_client.py脚本,执行create-hbase-schema命令初始化HBase表结构。 - 运行bin/atlas_client.py脚本,执行import-hive.sh命令初始化Hive元数据。 - 最后,运行bin/atlas_client.py脚本,执行import-hdfs.sh命令初始化HDFS元数据。 完成以上步骤后,CDH 6.3.2与Apache Atlas 2.1.0就成功集成起来了。Atlas将能够提供数据治理和元数据管理的功能,同时与CDH集群的各个组件相互交互,提供更加全面和可靠的数据管理支持。 ### 回答2: CDH 6.3.2是一种大数据平台,集成了各种开源的大数据软件,包括HadoopHive、Spark等。而Atlas 2.1.0则是一种开源的元数据管理和数据治理平台。 将CDH 6.3.2与Atlas 2.1.0集成,可以为大数据平台提供更全面和高效的元数据管理功能。具体的集成步骤如下: 1. 下载和安装CDH 6.3.2:首先,需要从Cloudera官网下载CDH 6.3.2的安装包,并按照官方说明进行安装配置。 2. 下载和安装Atlas 2.1.0:接下来,需要从Apache Atlas官网下载Atlas 2.1.0的安装包,并按照官方说明进行安装配置。 3. 配置Atlas与CDH集成:在安装完成之后,需要修改CDH的配置文件,以便与Atlas集成。通过编辑Cloudera Manager的配置文件,将Atlas的相关配置信息添加进去,配置包括Atlas的运行路径、端口号等。 4. 启动Atlas服务:Atlas服务是一个后台服务,负责元数据管理功能。设置完成后,需要启动Atlas服务,以便使之在CDH平台上生效。通过Cloudera Manager界面,找到Atlas服务,并启动它。 5. 验证集成效果:在Atlas服务启动后,可以登录Atlas的Web界面,验证集成效果。在Atlas中,可以添加和管理各种元数据,比如数据表、数据列等。通过Atlas,可以方便地搜索和浏览CDH中的元数据信息,实现数据治理的目标。 总的来说,将CDH 6.3.2与Atlas 2.1.0集成可以提升大数据平台的元数据管理和数据治理能力。通过将两者集成,可以更方便地管理和查询各种元数据信息,为数据分析和挖掘提供更好的支持。 ### 回答3: CDH 6.3.2 是Cloudera提供的开源大数据平台,而Atlas 2.1.0 是Apache Atlas 提供的元数据管理和数据治理工具。要将Atlas 2.1.0 集成CDH 6.3.2 中,需要按照以下步骤进行操作: 1. 安装CDH 6.3.2:首先,需要按照Cloudera官方文档提供的指南,从Cloudera官方网站下载并安装CDH 6.3.2。这个过程需要确保与系统的要求相符,包括硬件要求和操作系统版本等。 2. 安装Apache Atlas 2.1.0:接下来,需要从Apache Atlas官方网站下载并安装Atlas 2.1.0 的二进制包。同样,这个过程也需要根据官方文档中的指南进行操作,确保安装过程正确无误。 3. 配置CDH 6.3.2 和Atlas 2.1.0:一旦安装完毕,需要进行CDH和Atlas的配置。首先,需要编辑CDH 6.3.2 的配置文件,将Atlas相关的配置选项添加进去,并指定Atlas的元数据存储位置。然后,需要启动CDH的服务,包括HadoopHive、HBase等。接着,在Atlas的配置文件中,需要指定Hadoop集群的地址和端口等信息。 4. 启动Atlas 2.1.0:配置完成后,可以启动Atlas 2.1.0 服务。这样,Atlas将能够连接到CDH 6.3.2,并开始收集、管理和治理集群中的元数据。 需要注意的是,由于CDH和Atlas都是复杂而庞大的系统,集成过程中可能会遇到各种问题和挑战。因此,在进行集成之前,确保事先熟悉了官方文档,并参考经验丰富的用户或社区中的指南和建议。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值