为了更高效地运行存在依赖关系的作业(比如Pig和Hive产生的MapReduce作业),减少磁盘和网络IO,Hortonworks开发了DAG计算框架Tez。Tez是从MapReduce计算框架演化而来的通用DAG计算框架,可作为MapReduceR/Pig/Hive等系统的底层数据处理引擎,它天生融入Hadoop 2.0中的资源管理平台YARN,且由Hadoop 2.0核心人员精心打造,势必将会成为计算框架中的后起之秀。本文将重点介绍Tez的最新进展。
Tez有以下几个特色:
- (1) 丰富的数据流(dataflow,NOT Streaming!)编程接口;
- (2) 扩展性良好的“Input-Processor-Output”运行模型;
- (3) 简化数据部署(充分利用了YARN框架,Tez本身仅是一个客户端编程库,无需事先部署相关服务)
- (4) 性能优于MapReduce
- (5) 优化的资源管理(直接运行在资源管理系统YARN之上)
- (6) 动态生成物理数据流(dataflow)
DAG task
git获取源码
1 | $ git clone git@github.com:apache/tez.git |
一、安装必要软件
1、安装java
1 | # wget http://download.oracle.com/otn-pub/java/jdk/7u79-b15/jdk-7u79-linux-x64.tar.gz # tar -zxvf jdk-7u79-linux-x64.tar.gz # ln -s jdk1.7.0_79 jdk 加入环境变量 # source ~/.bash_profile # java -version java version "1.7.0_79" Java(TM) SE Runtime Environment (build 1.7.0_79-b15) Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode) |
2、安装apache maven
1 | # wget http://archive.apache.org/dist/maven/maven-3/3.0.5/binaries/apache-maven-3.0.5-bin.tar.gz # tar -zxf apache-maven-3.0.5-bin.tar.gz # ln -s apache-maven-3.0.5 maven # echo "export MAVEN_HOME=/usr/local/maven" >> ~/.bash_profile # echo "export PATH=$MAVEN_HOME/bin:$PATH:$HOME/bin" >> ~/.bash_profile # source ~/.bash_profile # mvn -version Apache Maven 3.0.5 (r01de14724cdef164cd33c7c8c2fe155faf9602da; 2013-02-19 21:51:28+0800) Maven home: /usr/local/maven Java version: 1.7.0_79, vendor: Oracle Corporation Java home: /usr/local/jdk1.7.0_79/jre Default locale: en_US, platform encoding: UTF-8 OS name: "linux", version: "2.6.32-504.el6.x86_64", arch: "amd64", family: "unix" |
3、Protocol Buffers 2.5.0
1 | # tar -zxvf protobuf-2.5.0.tar.gz # cd protobuf-2.5.0 # ./configure --prefix=/usr/local/protobuf-2.5/ # make && make install # ln -s protobuf-2.5 protobuf #在~/.bash_porfile添加 export PATH=/usr/local/protobuf/bin # source ~/.bash_profile # protoc --version libprotoc 2.5.0 |
注意: 如果是ubuntu系统,编译报错,执行sudo apt-get install build-essential,来安装许多基本库,然后编译即可顺利完成!
4、编译tez
github获取某个release版本
1 | # git clone https://github.com/apache/tez.git # cd tez/ 修改pom中hadoop版本为<hadoop.version>2.4.0</hadoop.version> # git tag release-0.2.0-rc0 release-0.3.0-incubating release-0.3.0-incubating-rc0 release-0.3.0-incubating-rc1 release-0.4.0-incubating release-0.4.1-incubating release-0.5.0 release-0.5.1 release-0.5.2 release-0.5.3 release-0.5.4 release-0.6.0 release-0.6.1 release-0.7.0 # git show-branch [master] TEZ-2610: yarn-swimlanes for DAGs that use containers of a previous DAG (gopalv) |
5、 change hadoop-version为你的版本
Build tez
1 | # mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true 报错1: [ERROR] Failed to execute goal com.github.eirslett:frontend-maven-plugin:0.0.22:install-node-and-npm (install node and npm) on project tez-ui: Could not download npm: Could not download http://registry.npmjs.org/npm/-/npm-1.3.8.tgz: Unknown host registry.npmjs.org: Name or service not known -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn <goals> -rf :tez-ui 解决:https://issues.apache.org/jira/browse/TEZ-2229 # cd tez-ui/ then I changed tez-ui/pom.xml, add '--allow-root' argument: <configuration> <workingDirectory>$ {webappDir} </workingDirectory> <executable>$ {node.executable} </executable> <arguments> <argument>node_modules/bower/bin/bower</argument> <argument>install</argument> <argument>--remove-unnecessary-resolutions=false</argument> <argument>--allow-root</argument> </arguments> </configuration> --通过手动安装tez后解决问题如下 # wget http://mirror.bit.edu.cn/apache/tez/0.7.0/apache-tez-0.7.0-src.tar.gz # cd apache-tez-0.7.0-src # mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true # yum -y install gcc make gcc-c++ openssl # wget http://nodejs.org/dist/v0.10.26/node-v0.10.26.tar.gz # tar -zxvf node-v0.10.26.tar.gz # make && make install # node -v v0.10.26 --继续编译tez # mvn -X clean package -DskipTests=true -Dmaven.javadoc.skip=true 报错2: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project tez-yarn-timeline-history-with-acls: Compilation failure: Compilation failure: [ERROR] symbol: class TimelineDomain 原因:由于修改了pom中hadoop.version=2.4.0,而tez-0.7默认是2.6.0版本 --继续编译 # mvn -X clean package -DskipTests=true -Dmaven.javadoc.skip=true [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] tez ............................................... SUCCESS [3.563s] [INFO] tez-api ........................................... SUCCESS[24.276s] [INFO] tez-common ........................................ SUCCESS [1.977s] [INFO] tez-runtime-internals ............................. SUCCESS [3.696s] [INFO] tez-runtime-library ............................... SUCCESS[11.011s] [INFO] tez-mapreduce ..................................... SUCCESS [4.218s] [INFO] tez-examples ...................................... SUCCESS [1.187s] [INFO] tez-dag ........................................... SUCCESS[10.370s] [INFO] tez-tests ......................................... SUCCESS [3.100s] [INFO] tez-ui ............................................ SUCCESS [14.876s] [INFO] tez-plugins ....................................... SUCCESS [0.097s] [INFO] tez-yarn-timeline-history ......................... SUCCESS [2.782s] [INFO] tez-yarn-timeline-history-with-acls ............... SUCCESS [2.044s] [INFO] tez-mbeans-resource-calculator .................... SUCCESS [2.047s] [INFO] tez-tools ......................................... SUCCESS [0.263s] [INFO] tez-dist .......................................... SUCCESS [1:17:56.763s] [INFO] Tez ............................................... SUCCESS [0.032s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 1:19:24.309s [INFO] Finished at: Sat Jul 18 07:33:17 HKT 2015 [INFO] Final Memory: 43M/202M [INFO] ------------------------------------------------------------------------ # tree -L 1 . |-- BUILDING.txt |-- CHANGES.txt |-- INSTALL.md -> docs/src/site/markdown/install.md |-- KEYS |-- LICENSE.txt |-- NOTICE.txt |-- README.md |-- Tez_DOAP.rdf |-- build-tools |-- docs |-- pom.xml |-- target |-- tez-api |-- tez-common |-- tez-dag |-- tez-dist |-- tez-examples |-- tez-mapreduce |-- tez-plugins |-- tez-runtime-internals |-- tez-runtime-library |-- tez-tests |-- tez-tools `-- tez-ui # tar zcf apache-tez-0.7.0.tar.gz apache-tez-0.7.0-src # sz apache-tez-0.7.0.tar.gz |
总结:编译阶段不断重试,国内网络大家都懂,各种包下载不惜来,多重试几次就ok了
Tez版本兼容情况
https://cwiki.apache.org/confluence/display/TEZ/Version+Compatibility
二、tez install
1、上传编译后的Tez的tarball到hdfs
1 | # rz rz waiting to receive. ¿ª' zmodem ´«?¡£ °´ Ctrl+C ??¡£ Transferring apache-tez-0.7.0.tar.gz... 100% 128772 KB 11706 KB/s 00:00:11 0 Errors # tar -zxvf apache-tez-0.7.0.tar.gz # cd tez-dist/target/ # tree -L 2 . |-- archive-tmp |-- maven-archiver | `-- pom.properties |-- tez-0.7.0 | |-- lib | |-- tez-api-0.7.0.jar | |-- tez-common-0.7.0.jar | |-- tez-dag-0.7.0.jar | |-- tez-examples-0.7.0.jar | |-- tez-mapreduce-0.7.0.jar | |-- tez-mbeans-resource-calculator-0.7.0.jar | |-- tez-runtime-internals-0.7.0.jar | |-- tez-runtime-library-0.7.0.jar | |-- tez-tests-0.7.0.jar | |-- tez-ui-0.7.0.war | |-- tez-yarn-timeline-history-0.7.0.jar | `-- tez-yarn-timeline-history-with-acls-0.7.0.jar |-- tez-0.7.0-minimal.tar.gz |-- tez-0.7.0.tar.gz `-- tez-dist-0.7.0-tests.jar # ls apache-tez-0.7.0-src/tez-dist/target/ archive-tmp maven-archiver tez-0.7.0 tez-0.7.0-minimal.tar.gz tez-0.7.0.tar.gz tez-dist-0.7.0-tests.jar # hadoop version Hadoop 2.4.0 Subversion Unknown -r Unknown Compiled by root on 2014-07-27T04:58Z Compiled with protoc 2.5.0 From source with checksum 375b2832a6641759c6eaf6e3e998147 This command was run using /usr/local/hadoop-2.4.0/share/hadoop/common/hadoop-common-2.4.0.jar # hadoop fs -mkdir -p /apps/tez-0.7.0 # hadoop fs -copyFromLocal apache-tez-0.7.0-src/tez-dist/target/tez-0.7.0.tar.gz /apps/tez-0.7.0 |
2、解压tez部署包
1 | # tar -zxvf tez-0.7.0.tar.gz # cp apache-tez-0.7.0-src/tez-dist/target/tez-0.7.0.tar.gz . # mkdir tez-0.7.0 # tar zxf tez-0.7.0.tar.gz -C tez-0.7.0 # ln -s tez-0.7.0 tez |
3、配置环境变量
1 | export TEZ_HOME=/usr/local/tez export TEZ_CONF_DIR=$TEZ_HOME/conf export TEZ_JARS=$TEZ_HOME export HADOOP_CLASSPATH=${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/* |
4、配置文件
1 | # mkdir tez/conf # cat tez/conf/tez-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>tez.lib.uris</name> <value>${fs.defaultFS}/apps/tez-0.7.0/tez-0.7.0.tar.gz</value> </property> </configuration> |
5、修改 mapred-site.xml
1 | <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn-tez</value> </property> </configuration> |
6、example of using an MRR job in the tez-examples.jar
1 | # hadoop fs -cat /testdata/external_table/external_teble 1 zhangsan 2 lisi 3 wanwu 4 zhaoliu 5 da 6 abc # hadoop fs -put tez-0.7.0 /apps # hadoop jar tez-examples-0.7.0.jar orderedwordcount /apps/tez-0.7.0/conf/tez-site.xml /output INFO client.TezClientUtils: Using tez.lib.uris value from configuration: hdfs://mycluster/apps/tez-0.7.0,hdfs://mycluster/apps/tez-0.7.0/lib/ INFO client.DAGClientImpl: DAG initialized: CurrentState=Running 报错: 15/07/18 02:17:38 INFO examples.OrderedWordCount: DAG diagnostics: [Application application_1437156593890_0002 failed 2 times due to AM Container for appattempt_1437156593890_0002_000002 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: org.apache.hadoop.util.Shell$ExitCodeException: at org.apache.hadoop.util.Shell.runCommand(Shell.java:505) at org.apache.hadoop.util.Shell.run(Shell.java:418) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Container exited with a non-zero exit code 1 .Failing this attempt.. Failing the application.] |
7、hive on tez
hive版本说明:hive-0.14+
版本为hive-0.13.1和hadoop2.4测试失败
1 | 修改hive的执行引擎为 Tez <property> <name>hive.execution.engine</name> <value>tez</value> </property> # cp tez-0.7.0/tez-*.jar hive/lib/ # cp tez-0.7.0/lib/* hive/lib/ hive (default)> set hive.execution.engine; hive.execution.engine=mr hive (default)> set hive.execution.engine=tez; hive (default)> select count(*) from order_test; Total jobs = 1 Launching Job 1 out of 1 Exception in thread "Thread-6" java.lang.NoSuchMethodError: org.apache.tez.mapreduce.hadoop.MRHelpers.updateEnvironmentForMRAM(Lorg/apache/hadoop/conf/Configuration;Ljava/util/Map;)V |
升级hadoop版本为2.6,hive版本为1.2.1测试
1 | $ mv hadoop-2.6.0/share/hadoop/yarn/lib/jline-0.9.94.jar hadoop-2.6.0/share/hadoop/yarn/lib/jline-0.9.94.jar.bak [hsu@server2 ~]$ hive Logging initialized using configuration in file:/home/hsu/apache-hive-1.2.1-bin/conf/hive-log4j.properties hive (default)> show tables; OK tab_name test Time taken: 2.33 seconds, Fetched: 1 row(s) hive (default)> select * from test; OK test.id 1 2 3 Time taken: 0.648 seconds, Fetched: 3 row(s) hive (default)> set hive.execution.engine; hive.execution.engine=tez hive (default)> select count(*) from test; Query ID = hsu_20150813161742_a64d46a7-7349-41b3-be27-9a91bb490a8b Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1439448136822_0002) -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- Map 1 .......... SUCCEEDED 1 1 0 0 0 0 Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0 -------------------------------------------------------------------------------- VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 51.17 s -------------------------------------------------------------------------------- OK _c0 3 Time taken: 54.354 seconds, Fetched: 1 row(s) #队列使用 set tez.queue.name=eda; hive (default)> set tez.queue.name=test; hive (default)> select count(*) from test; Query ID = hsu_20150813162458_b0ce5d5f-09d2-40ae-8b6a-90f1f4cebe91 Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1439448136822_0003) -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- Map 1 .......... SUCCEEDED 1 1 0 0 0 0 Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0 -------------------------------------------------------------------------------- VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 32.71 s -------------------------------------------------------------------------------- OK _c0 3 Time taken: 84.191 seconds, Fetched: 1 row(s) hive (default)> set hive.execution.engine; hive.execution.engine=tez hive (default)> set hive.execution.engine=mr; hive (default)> select count(*) from test; Query ID = hsu_20150813162723_31559eda-c8ab-4ca6-a445-b75afc908d28 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1439448136822_0004, Tracking URL = http://server1:23188/proxy/application_1439448136822_0004/ Kill Command = /home/hsu/hadoop/bin/hadoop job -kill job_1439448136822_0004 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2015-08-13 16:28:33,107 Stage-1 map = 0%, reduce = 0% 2015-08-13 16:29:20,688 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.99 sec 2015-08-13 16:29:49,215 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.27 sec MapReduce Total cumulative CPU time: 2 seconds 270 msec Ended Job = job_1439448136822_0004 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.27 sec HDFS Read: 210 HDFS Write: 2 SUCCESS Total MapReduce CPU Time Spent: 2 seconds 270 msec OK _c0 3 Time taken: 148.332 seconds, Fetched: 1 row(s) |
-
- hive on tez,直接配置hive支持tez,其实是在yarn上面启动一个app,这个app永久运行,接受不同sql,除非退出hive cli此app才会结束
-
- 可以通过tez提高的ui查看任务执行情况,需要单独配置tezUI!
-
- tez runing app在yarn上面和spark on yarn很类似,而且tez,spark底层原理也很相似!
-
- 在切换队列时,会重启一个app运行!
-
- 在切换底层为mr引擎的时候也会重启app运行,但是tez那个app也没结束,等待mr app运行结束后,两个一起结束!
8、排查问题
-
1、取消所有tez依赖单独运行
- 结论:可以成功运行yarn任务,断定是不兼容问题,由于我编译tez使用了2.6.0
- 编译通过,而是用2.4.0的包编译失败!目前集群hadoop版本2.4.0!
-
2、升级hadoop版本为2.6.0再次测试
升级为2.6.0以后测试成功,说明tez兼容性还是比较差。
1 | $ hadoop jar /usr/local/tez/tez-examples-0.7.0.jar orderedwordcount /apps/tez- 0.7.0/conf/tez-site.xml /output 15/07/20 13:53:52 INFO client.TezClient: Tez Client Version: [ component=tez- api, version=0.7.0, revision=${buildNumber}, SCM-URL=scm:git:https://git-wip-us .apache.org/repos/asf/tez.git, buildTime=20150718-0613 ] 15/07/20 13:53:52 INFO client.RMProxy: Connecting to ResourceManager at itr- mastertest01/192.168.2.200:8032 15/07/20 13:54:13 INFO examples.OrderedWordCount: Running OrderedWordCount 15/07/20 13:54:13 INFO client.TezClient: Submitting DAG application with id: application_1437371604990_0001 15/07/20 13:54:13 INFO client.TezClientUtils: Using tez.lib.uris value from configuration: hdfs://mycluster/apps/tez-0.7.0,hdfs://mycluster/apps/tez-0.7.0/ lib/ 15/07/20 13:54:13 INFO client.TezClient: Stage directory /tmp/hsu/tez/staging doesn't exist and is created 15/07/20 13:54:14 INFO client.TezClient: Tez system stage directory hdfs:// mycluster/tmp/hsu/tez/staging/.tez/application_1437371604990_0001 doesn't exist and is created 15/07/20 13:54:14 INFO client.TezClient: Submitting DAG to YARN, applicationId=application_1437371604990_0001, dagName=OrderedWordCount 15/07/20 13:54:14 INFO impl.YarnClientImpl: Submitted application application_1437371604990_0001 15/07/20 13:54:14 INFO client.TezClient: The url to track the Tez AM: http:// server1:8088/proxy/application_1437371604990_0001/ 15/07/20 13:54:14 INFO client.RMProxy: Connecting to ResourceManager at itr- mastertest01/192.168.2.200:8032 15/07/20 13:54:15 INFO client.DAGClientImpl: Waiting for DAG to start running 15/07/20 13:55:06 INFO client.DAGClientImpl: DAG initialized: CurrentState=Running 15/07/20 13:55:08 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% TotalTasks: 3 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:08 INFO client.DAGClientImpl: VertexStatus: VertexName: Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:08 INFO client.DAGClientImpl: VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:08 INFO client.DAGClientImpl: VertexStatus: VertexName: Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:13 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% TotalTasks: 3 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:13 INFO client.DAGClientImpl: VertexStatus: VertexName: Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:13 INFO client.DAGClientImpl: VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:13 INFO client.DAGClientImpl: VertexStatus: VertexName: Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:19 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% TotalTasks: 3 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:19 INFO client.DAGClientImpl: VertexStatus: VertexName: Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:19 INFO client.DAGClientImpl: VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:19 INFO client.DAGClientImpl: VertexStatus: VertexName: Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:24 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% TotalTasks: 3 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:24 INFO client.DAGClientImpl: VertexStatus: VertexName: Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:24 INFO client.DAGClientImpl: VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:24 INFO client.DAGClientImpl: VertexStatus: VertexName: Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:29 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% TotalTasks: 3 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:29 INFO client.DAGClientImpl: VertexStatus: VertexName: Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:29 INFO client.DAGClientImpl: VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:29 INFO client.DAGClientImpl: VertexStatus: VertexName: Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:34 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% TotalTasks: 3 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:34 INFO client.DAGClientImpl: VertexStatus: VertexName: Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:34 INFO client.DAGClientImpl: VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:34 INFO client.DAGClientImpl: VertexStatus: VertexName: Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:39 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% TotalTasks: 3 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 15/07/20 13:55:39 INFO client.DAGClientImpl: VertexStatus: VertexName: Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 15/07/20 13:55:39 INFO client.DAGClientImpl: VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:39 INFO client.DAGClientImpl: VertexStatus: VertexName: Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:44 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% TotalTasks: 3 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 15/07/20 13:55:44 INFO client.DAGClientImpl: VertexStatus: VertexName: Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 15/07/20 13:55:44 INFO client.DAGClientImpl: VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:44 INFO client.DAGClientImpl: VertexStatus: VertexName: Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:49 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% TotalTasks: 3 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 15/07/20 13:55:49 INFO client.DAGClientImpl: VertexStatus: VertexName: Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 15/07/20 13:55:49 INFO client.DAGClientImpl: VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:49 INFO client.DAGClientImpl: VertexStatus: VertexName: Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:54 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% TotalTasks: 3 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 15/07/20 13:55:54 INFO client.DAGClientImpl: VertexStatus: VertexName: Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 15/07/20 13:55:54 INFO client.DAGClientImpl: VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:54 INFO client.DAGClientImpl: VertexStatus: VertexName: Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:59 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% TotalTasks: 3 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 15/07/20 13:55:59 INFO client.DAGClientImpl: VertexStatus: VertexName: Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 15/07/20 13:55:59 INFO client.DAGClientImpl: VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:55:59 INFO client.DAGClientImpl: VertexStatus: VertexName: Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:56:01 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 33.33% TotalTasks: 3 Succeeded: 1 Running: 1 Failed: 0 Killed: 0 15/07/20 13:56:01 INFO client.DAGClientImpl: VertexStatus: VertexName: Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0 15/07/20 13:56:01 INFO client.DAGClientImpl: VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 15/07/20 13:56:01 INFO client.DAGClientImpl: VertexStatus: VertexName: Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 15/07/20 13:56:03 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 66.67% TotalTasks: 3 Succeeded: 2 Running: 1 Failed: 0 Killed: 0 15/07/20 13:56:03 INFO client.DAGClientImpl: VertexStatus: VertexName: Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0 15/07/20 13:56:03 INFO client.DAGClientImpl: VertexStatus: VertexName: Summation Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0 15/07/20 13:56:03 INFO client.DAGClientImpl: VertexStatus: VertexName: Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 15/07/20 13:56:04 INFO client.DAGClientImpl: DAG: State: SUCCEEDED Progress: 100% TotalTasks: 3 Succeeded: 3 Running: 0 Failed: 0 Killed: 0 15/07/20 13:56:04 INFO client.DAGClientImpl: VertexStatus: VertexName: Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0 15/07/20 13:56:04 INFO client.DAGClientImpl: VertexStatus: VertexName: Summation Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0 15/07/20 13:56:04 INFO client.DAGClientImpl: VertexStatus: VertexName: Sorter Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0 15/07/20 13:56:04 INFO client.DAGClientImpl: DAG completed. FinalState=SUCCEEDED |
$ hadoop jar tez-examples-0.7.0.jar orderedwordcount -DUSE_TEZ_SESSION=true /test /output
- 3、 yarn底层替换为tez,hive直接执行,效果如下!
1
[hsu@server1 ~]$ hive Logging initialized using configuration in file:/home/hsu/apache-hive-1.2.1-bin /conf/hive-log4j.properties hive (default)> show tables; OK tab_name test Time taken: 1.106 seconds, Fetched: 1 row(s) hive (default)> select count(1) from test; Query ID = hsu_20150813170521_77b18935-4f38-4f84-8da7-6774b9aff194 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1439456630574_0001, Tracking URL = http://itr- mastertest01:23188/proxy/application_1439456630574_0001/ Kill Command = /home/hsu/hadoop/bin/hadoop job -kill job_1439456630574_0001 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0 2015-08-13 17:06:36,372 Stage-1 map = 0%, reduce = 0% 2015-08-13 17:07:26,923 Stage-1 map = 100%, reduce = 100% Ended Job = job_1439456630574_0001 MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK _c0 3 Time taken: 126.726 seconds, Fetched: 1 row(s)
注意:yarn页面:app tyep 直接变成TEZ,但是如上执行还是mr执行程序日志~!这里比较难理解具体是mr执行的还是tez执行的!还得去底层跟踪确认!
(1)、yarn mr替换为tez,hive不做任务配置,提交任务到yarn,效果如下图:
(2)、不同tez版本支持的hadoop版本需要对应
(3)、编译时,我手动制定hadoop.version它有回出现yarn模块兼容性编译通不过。而默认的它指定的hadoop版本可以成功编译!
(4)、tez从另一个方面讲还不算成熟!~
(5)、支持ResourceManager HA
- 3、目前xxx on yarn模式情况
(1) tez on yarn模式主要是hdp在推,相信未来和yarn集成兼容性各种问题都能迎刃而解!
(2) spark engine on yarn模式,主要是cdh在推,目前已经完成合并到hive主干!
ps:官网安装文档都说,自己可以修改hadoop.version进行编译,实际修改编译失败,比较费解,也说明tez的兼容性不好!
9、tez ui
(1) 配置tez-site.xml
1 | $ cat tez-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>tez.lib.uris</name> <value>${fs.defaultFS}/apps/tez-0.7.0,${fs.defaultFS}/apps/tez-0.7.0/lib/</value> </property> <property> <description>Enable Tez to use the Timeline Server for History Logging</description> <name>tez.history.logging.service.class</name> <value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value> </property> <property> <description>URL for where the Tez UI is hosted</description> <!--tomcat 9999 端口--> <name>tez.tez-ui.history-url.base</name> <value>http://server2:9999/tez-ui/</value> </property> <property> <description>Publish configuration information to Timeline server.</description> <name>tez.runtime.convert.user-payload.to.history-text</name> <value>true</value> </property> <property> <name>tez.task.generate.counters.per.io</name> <value>true</value> </property> </configuration> |
(2) 配置yarn-site.xml
1 | <property> <description>Indicate to clients whether Timeline service is enabled or not. If enabled, the TimelineClient library used by end-users will post entities and events to the Timeline server.</description> <name>yarn.timeline-service.enabled</name> <value>true</value> </property> <property> <description>The hostname of the Timeline service web application.</description> <name>yarn.timeline-service.hostname</name> <value>server2</value> </property> <property> <description>Enables cross-origin support (CORS) for web services where cross-origin web response headers are needed. For example, javascript making a web services request to the timeline server.</description> <name>yarn.timeline-service.http-cross-origin.enabled</name> <value>true</value> </property> <property> <description>Publish YARN information to Timeline Server</description> <name>yarn.resourcemanager.system-metrics-publisher.enabled</name> <value>true</value> </property> <property> <description>The http address of the Timeline service web application.</description> <name>yarn.timeline-service.webapp.address</name> <value>${yarn.timeline-service.hostname}:8188</value> </property> <property> <description>The https address of the Timeline service web application.</description> <name>yarn.timeline-service.webapp.https.address</name> <value>${yarn.timeline-service.hostname}:2191</value> </property> |
(3) 启动hadoop timeline
1 | [hsu@server2 hadoop]$ yarn-daemon.sh start timelineserver [hsu@server2 ~]$ lsof -i :8188 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 7189 hsu 281u IPv6 71981 0t0 TCP server2:8188 (LISTEN) |
(4) tez ui war包解压放到tomcat的webapps下
1 | ## 在server2上部署tomcat $ tar -zxvf apache-tomcat-8.0.24.tar.gz [hsu@server2 ~]$ ln -s apache-tomcat-8.0.24 tomcat [hsu@server2 ~]$ cp /usr/local/tez/tez-ui-0.7.0.war ./tomcat/webapps/ [hsu@server2 webapps]$ mkdir tez-ui [hsu@server2 webapps]$ mv tez-ui-0.7.0.war tez-ui/ [hsu@server2 webapps]$ cd tez-ui/ [hsu@server2 tez-ui]$ jar xvf tez-ui-0.7.0.war ## 配置scripts/config.js文件 --Configuring the Timeline Server URL and Resource Manager UI URL timelineBaseUrl: 'http://server1:8188', RMWebUrl: 'http://server1:23188', |
(5) 启动tomcat
1 | [hsu@server2 ~]$ tomcat/bin/startup.sh Using CATALINA_BASE: /home/hsu/tomcat Using CATALINA_HOME: /home/hsu/tomcat Using CATALINA_TMPDIR: /home/hsu/tomcat/temp Using JRE_HOME: /usr/local/jdk1.7.0_45 Using CLASSPATH: /home/hsu/tomcat/bin/bootstrap.jar:/home/hsu/tomcat/bin/tomcat-juli.jar Tomcat started. #修改端口为9999 [hsu@server2 ~]$ tomcat/bin/shutdown.sh Using CATALINA_BASE: /home/hsu/tomcat Using CATALINA_HOME: /home/hsu/tomcat Using CATALINA_TMPDIR: /home/hsu/tomcat/temp Using JRE_HOME: /usr/local/jdk1.7.0_45 Using CLASSPATH: /home/hsu/tomcat/bin/bootstrap.jar:/home/hsu/tomcat/bin/tomcat-juli.jar [hsu@server2 ~]$ tomcat/bin/startup.sh [hsu@server2 ~]$ lsof -i :9999 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 7824 hsu 43u IPv6 75557 0t0 TCP *:distinct (LISTEN) |
(6) 验证
1、webui验证:http://server2:9999/tez-ui/
2、后台执行sql验证前台
注意:resourcemanager节点必须为timelineserver配置的节点,如果RS配置HA的需要特别注意!
参考:http://zh.hortonworks.com/hadoop-tutorial/supercharging-interactive-queries-hive-tez/
http://tez.apache.org/tez-ui.html
http://zh.hortonworks.com/blog/evaluating-hive-with-tez-as-a-fast-query-engine