tez依赖于hadoop,需要针对具体版本hadoop进行编译,将根pom中hadoop.version变量更改为对应的版本,编译机器上需安装protobuf2.5,编译过程中tez.ui可能编译不过,可以在根pom中将tez-ui/tz-ui2注释掉
mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true
编译完成后在tez/dist/target下会生成tez-x.y.z.tar.gz和tez-x.y.z.minimal.tar.gz, 将tez-x.y.z.tar.gz拷贝至hdfs
hadoop fs -copyFromLocal tez-x.y.z.tar.gz /hdfs/tez/ tar xzvf x.y.z.minimal.tar.gz -C /usr/local/tez
将tez-site.mxl放在$HADOOP_CONF_DIR下
<property> <name>tez.lib.uris</name> <value>/hdfs/tez/tez.tgz</value> </property>
在$HADOOP__CONF__DIR/hadoop-env.sh中将tez lib添加到classpath
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$TEZ_JARS/*:$TEZ_JARS/lib/*
测试tez
yarn jar tez-examples.jar orderedwordcount <input> <output>
测试过程中可能会有ClassNotFoundException, 我遇到了lzo相关class没找到,只需将在step2之前相关的jar归档到tez-x.y.z.tar.gz中然后再上传至hdfs.
hive开启tez
set hive.execution.engine=tez;
我用hql查询时碰到了一个错误:
java.lang.RuntimeException: java.io.IOException: Previous writer likely failed to write hdfs://biphdfs/tmp/hive/pplive/_tez_session_dir/7ecca665-ddbd-4f31-87a7-11d972e096bc/bip.common-0.2.7.518.jar. Failing because I am unlikely to write too.
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:523)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:79)
Caused by: java.io.IOException: Previous writer likely failed to write hdfs://biphdfs/tmp/hive/pplive/_tez_session_dir/7ecca665-ddbd-4f31-87a7-11d972e096bc/bip.common-0.2.7.518.jar. Failing because I am unlikely to write too.
at org.apache.hadoop.hive.ql.exec.tez.DagUtils.localizeResource(DagUtils.java:978)
at org.apache.hadoop.hive.ql.exec.tez.DagUtils.addTempResources(DagUtils.java:859)
at org.apache.hadoop.hive.ql.exec.tez.DagUtils.localizeTempFilesFromConf(DagUtils.java:802)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.refreshLocalResourcesFromConf(TezSessionState.java:228)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:154)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:123)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:520)最后发现应该是$HIVE_HOME/auxlib下的jar与hiverc中yar有重复导致,将auxlib中jar去掉即可
7.安装tez-ui,最后验证下来会报错:
[FATAL] [HistoryEventHandlingThread] |yarn.YarnUncaughtExceptionHandler|: Thread Thread[HistoryEventHandlingThread,5,main] threw an Error. Shutting down now...
java.lang.AbstractMethodError: org.codehaus.jackson.map.AnnotationIntrospector.findSerializer(Lorg/codehaus/jackson/map/introspect/Annotated;)Ljava/lang/Object;
at org.codehaus.jackson.map.ser.BasicSerializerFactory.findSerializerFromAnnotation(BasicSerializerFactory.java:362)
at org.codehaus.jackson.map.ser.BeanSerializerFactory.createSerializer(BeanSerializerFactory.java:252)
at org.codehaus.jackson.map.ser.StdSerializerProvider._createUntypedSerializer(StdSerializerProvider.java:782)
at org.codehaus.jackson.map.ser.StdSerializerProvider._createAndCacheUntypedSerializer(StdSerializerProvider.java:735)
at org.codehaus.jackson.map.ser.StdSerializerProvider.findValueSerializer(StdSerializerProvider.java:344)
at org.codehaus.jackson.map.ser.StdSerializerProvider.findTypedValueSerializer(StdSerializerProvider.java:420)
at org.codehaus.jackson.map.ser.StdSerializerProvider._serializeValue(StdSerializerProvider.java:601)
at org.codehaus.jackson.map.ser.StdSerializerProvider.serializeValue(StdSerializerProvider.java:256)
at org.codehaus.jackson.map.ObjectMapper.writeValue(ObjectMapper.java:1604)
at org.codehaus.jackson.jaxrs.JacksonJsonProvider.writeTo(JacksonJsonProvider.java:527)
at com.sun.jersey.api.client.RequestWriter.writeRequestEntity(RequestWriter.java:300)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:204)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147)
at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter$1.run(TimelineClientImpl.java:226)
at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:162)
at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:237)
at com.sun.jersey.api.client.Client.handle(Client.java:648)
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:472)
at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:321)
at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:301)
at org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.handleEvents(ATSHistoryLoggingService.java:357)
at org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.access$700(ATSHistoryLoggingService.java:53)
at org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService$1.run(ATSHistoryLoggingService.java:190)
at java.lang.Thread.run(Thread.java:745)
应该是tez基于cdh5.5编译后包含的jackson-mapper与jersey-json版本冲突导致,基于cdh编译的jersey jar是1.9,jackson jar是1.8,但基于hadoop2.6编译后都是1.9版本,cdh貌似不会支持tez,暂且只能不用web ui了。