hive tez-use

http://www.itweet.cn/2015/07/20/tez-use/
文章目录
  1. 1. DAG task

为了更高效地运行存在依赖关系的作业(比如Pig和Hive产生的MapReduce作业),减少磁盘和网络IO,Hortonworks开发了DAG计算框架Tez。Tez是从MapReduce计算框架演化而来的通用DAG计算框架,可作为MapReduceR/Pig/Hive等系统的底层数据处理引擎,它天生融入Hadoop 2.0中的资源管理平台YARN,且由Hadoop 2.0核心人员精心打造,势必将会成为计算框架中的后起之秀。本文将重点介绍Tez的最新进展。

Tez有以下几个特色:

  • (1) 丰富的数据流(dataflow,NOT Streaming!)编程接口;
  • (2) 扩展性良好的“Input-Processor-Output”运行模型;
  • (3) 简化数据部署(充分利用了YARN框架,Tez本身仅是一个客户端编程库,无需事先部署相关服务)
  • (4) 性能优于MapReduce
  • (5) 优化的资源管理(直接运行在资源管理系统YARN之上)
  • (6) 动态生成物理数据流(dataflow)

DAG task


git获取源码

1
$ git clone git@github.com:apache/tez.git

一、安装必要软件

1、安装java

1
# wget http://download.oracle.com/otn-pub/java/jdk/7u79-b15/jdk-7u79-linux-x64.tar.gz
# tar -zxvf jdk-7u79-linux-x64.tar.gz 
# ln -s jdk1.7.0_79 jdk

加入环境变量
# source ~/.bash_profile 
# java -version 
java version "1.7.0_79"
Java(TM) SE Runtime Environment (build 1.7.0_79-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)

2、安装apache maven

1
# wget http://archive.apache.org/dist/maven/maven-3/3.0.5/binaries/apache-maven-3.0.5-bin.tar.gz
# tar -zxf apache-maven-3.0.5-bin.tar.gz
# ln -s apache-maven-3.0.5 maven
# echo "export MAVEN_HOME=/usr/local/maven" >> ~/.bash_profile
# echo "export PATH=$MAVEN_HOME/bin:$PATH:$HOME/bin" >> ~/.bash_profile
# source ~/.bash_profile 
# mvn -version
Apache Maven 3.0.5 (r01de14724cdef164cd33c7c8c2fe155faf9602da; 2013-02-19 21:51:28+0800)
Maven home: /usr/local/maven
Java version: 1.7.0_79, vendor: Oracle Corporation
Java home: /usr/local/jdk1.7.0_79/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "2.6.32-504.el6.x86_64", arch: "amd64", family: "unix"

3、Protocol Buffers 2.5.0

1
# tar -zxvf protobuf-2.5.0.tar.gz 
# cd protobuf-2.5.0
# ./configure --prefix=/usr/local/protobuf-2.5/
# make && make install
# ln -s protobuf-2.5 protobuf
#在~/.bash_porfile添加 export PATH=/usr/local/protobuf/bin
# source ~/.bash_profile 
# protoc --version
libprotoc 2.5.0

注意: 如果是ubuntu系统,编译报错,执行sudo apt-get install build-essential,来安装许多基本库,然后编译即可顺利完成!

4、编译tez

github获取某个release版本

1
# git clone https://github.com/apache/tez.git
# cd tez/

修改pom中hadoop版本为<hadoop.version>2.4.0</hadoop.version>

# git tag
  release-0.2.0-rc0
  release-0.3.0-incubating
  release-0.3.0-incubating-rc0
  release-0.3.0-incubating-rc1
  release-0.4.0-incubating
  release-0.4.1-incubating
  release-0.5.0
  release-0.5.1
  release-0.5.2
  release-0.5.3
  release-0.5.4
  release-0.6.0
  release-0.6.1
  release-0.7.0

  # git show-branch
  [master] TEZ-2610: yarn-swimlanes for DAGs that use containers of a previous DAG (gopalv)

5、 change hadoop-version为你的版本

Build tez

1
# mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true

报错1:
[ERROR] Failed to execute goal com.github.eirslett:frontend-maven-plugin:0.0.22:install-node-and-npm (install node and npm) on project tez-ui: Could not download npm: Could not download http://registry.npmjs.org/npm/-/npm-1.3.8.tgz: Unknown host registry.npmjs.org: Name or service not known -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :tez-ui

解决:https://issues.apache.org/jira/browse/TEZ-2229
# cd tez-ui/
then I changed tez-ui/pom.xml, add '--allow-root' argument:
<configuration>
<workingDirectory>$
{webappDir}
</workingDirectory>
<executable>$
{node.executable}
</executable>
<arguments>
<argument>node_modules/bower/bin/bower</argument>
<argument>install</argument>
<argument>--remove-unnecessary-resolutions=false</argument>
<argument>--allow-root</argument>
</arguments>
</configuration>

--通过手动安装tez后解决问题如下
# wget http://mirror.bit.edu.cn/apache/tez/0.7.0/apache-tez-0.7.0-src.tar.gz
# cd apache-tez-0.7.0-src
# mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true

# yum -y install gcc make gcc-c++ openssl 
# wget http://nodejs.org/dist/v0.10.26/node-v0.10.26.tar.gz
# tar -zxvf node-v0.10.26.tar.gz 
# make && make install

# node -v
  v0.10.26

--继续编译tez
# mvn -X clean package -DskipTests=true -Dmaven.javadoc.skip=true

报错2:
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project tez-yarn-timeline-history-with-acls: Compilation failure: Compilation failure:
[ERROR] symbol:   class TimelineDomain

原因:由于修改了pom中hadoop.version=2.4.0,而tez-0.7默认是2.6.0版本

--继续编译
# mvn -X clean package -DskipTests=true -Dmaven.javadoc.skip=true    
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] tez ............................................... SUCCESS [3.563s]
[INFO] tez-api ........................................... SUCCESS[24.276s]
[INFO] tez-common ........................................ SUCCESS [1.977s]
[INFO] tez-runtime-internals ............................. SUCCESS [3.696s]
[INFO] tez-runtime-library ............................... SUCCESS[11.011s]
[INFO] tez-mapreduce ..................................... SUCCESS [4.218s]
[INFO] tez-examples ...................................... SUCCESS [1.187s]
[INFO] tez-dag ........................................... SUCCESS[10.370s]
[INFO] tez-tests ......................................... SUCCESS [3.100s]
[INFO] tez-ui ............................................ SUCCESS [14.876s]
[INFO] tez-plugins ....................................... SUCCESS [0.097s]
[INFO] tez-yarn-timeline-history ......................... SUCCESS [2.782s]
[INFO] tez-yarn-timeline-history-with-acls ............... SUCCESS [2.044s]
[INFO] tez-mbeans-resource-calculator .................... SUCCESS [2.047s]
[INFO] tez-tools ......................................... SUCCESS [0.263s]
[INFO] tez-dist .......................................... SUCCESS [1:17:56.763s]
[INFO] Tez ............................................... SUCCESS [0.032s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1:19:24.309s
[INFO] Finished at: Sat Jul 18 07:33:17 HKT 2015
[INFO] Final Memory: 43M/202M
[INFO] ------------------------------------------------------------------------

# tree -L 1
.
|-- BUILDING.txt
|-- CHANGES.txt
|-- INSTALL.md -> docs/src/site/markdown/install.md
|-- KEYS
|-- LICENSE.txt
|-- NOTICE.txt
|-- README.md
|-- Tez_DOAP.rdf
|-- build-tools
|-- docs
|-- pom.xml
|-- target
|-- tez-api
|-- tez-common
|-- tez-dag
|-- tez-dist
|-- tez-examples
|-- tez-mapreduce
|-- tez-plugins
|-- tez-runtime-internals
|-- tez-runtime-library
|-- tez-tests
|-- tez-tools
`-- tez-ui

# tar zcf apache-tez-0.7.0.tar.gz apache-tez-0.7.0-src
# sz apache-tez-0.7.0.tar.gz

总结:编译阶段不断重试,国内网络大家都懂,各种包下载不惜来,多重试几次就ok了

Tez版本兼容情况
https://cwiki.apache.org/confluence/display/TEZ/Version+Compatibility

二、tez install

1、上传编译后的Tez的tarball到hdfs

1
# rz
rz waiting to receive.
¿ª' zmodem ´«?¡£  °´ Ctrl+C ??¡£
Transferring apache-tez-0.7.0.tar.gz...
  100%  128772 KB 11706 KB/s 00:00:11       0 Errors

# tar -zxvf apache-tez-0.7.0.tar.gz 
# cd tez-dist/target/
# tree -L 2
.
|-- archive-tmp
|-- maven-archiver
|   `-- pom.properties
|-- tez-0.7.0
|   |-- lib
|   |-- tez-api-0.7.0.jar
|   |-- tez-common-0.7.0.jar
|   |-- tez-dag-0.7.0.jar
|   |-- tez-examples-0.7.0.jar
|   |-- tez-mapreduce-0.7.0.jar
|   |-- tez-mbeans-resource-calculator-0.7.0.jar
|   |-- tez-runtime-internals-0.7.0.jar
|   |-- tez-runtime-library-0.7.0.jar
|   |-- tez-tests-0.7.0.jar
|   |-- tez-ui-0.7.0.war
|   |-- tez-yarn-timeline-history-0.7.0.jar
|   `-- tez-yarn-timeline-history-with-acls-0.7.0.jar
|-- tez-0.7.0-minimal.tar.gz
|-- tez-0.7.0.tar.gz
`-- tez-dist-0.7.0-tests.jar

# ls apache-tez-0.7.0-src/tez-dist/target/
 archive-tmp  maven-archiver  tez-0.7.0  tez-0.7.0-minimal.tar.gz  tez-0.7.0.tar.gz  tez-dist-0.7.0-tests.jar

# hadoop version
Hadoop 2.4.0
Subversion Unknown -r Unknown
Compiled by root on 2014-07-27T04:58Z
Compiled with protoc 2.5.0
From source with checksum 375b2832a6641759c6eaf6e3e998147
This command was run using /usr/local/hadoop-2.4.0/share/hadoop/common/hadoop-common-2.4.0.jar

# hadoop fs -mkdir -p /apps/tez-0.7.0
# hadoop fs -copyFromLocal apache-tez-0.7.0-src/tez-dist/target/tez-0.7.0.tar.gz /apps/tez-0.7.0

2、解压tez部署包

1
# tar -zxvf tez-0.7.0.tar.gz 
# cp apache-tez-0.7.0-src/tez-dist/target/tez-0.7.0.tar.gz . 
# mkdir tez-0.7.0  
# tar zxf tez-0.7.0.tar.gz -C tez-0.7.0
# ln -s tez-0.7.0 tez

3、配置环境变量

1
export TEZ_HOME=/usr/local/tez
export TEZ_CONF_DIR=$TEZ_HOME/conf
export TEZ_JARS=$TEZ_HOME

export HADOOP_CLASSPATH=${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*

4、配置文件

1
# mkdir tez/conf
# cat tez/conf/tez-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
  <name>tez.lib.uris</name>
  <value>${fs.defaultFS}/apps/tez-0.7.0/tez-0.7.0.tar.gz</value>
  </property>
</configuration>

5、修改 mapred-site.xml

1
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn-tez</value>
</property>
</configuration>

6、example of using an MRR job in the tez-examples.jar

1
# hadoop fs -cat /testdata/external_table/external_teble  
1       zhangsan
2       lisi
3       wanwu
4       zhaoliu
5       da
6       abc

# hadoop fs -put tez-0.7.0 /apps

# hadoop jar tez-examples-0.7.0.jar orderedwordcount  /apps/tez-0.7.0/conf/tez-site.xml /output
INFO client.TezClientUtils: Using tez.lib.uris value from configuration: hdfs://mycluster/apps/tez-0.7.0,hdfs://mycluster/apps/tez-0.7.0/lib/
INFO client.DAGClientImpl: DAG initialized: CurrentState=Running

报错:
15/07/18 02:17:38 INFO examples.OrderedWordCount: DAG diagnostics: [Application application_1437156593890_0002 failed 2 times due to AM Container for appattempt_1437156593890_0002_000002 exited with  exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: 
org.apache.hadoop.util.Shell$ExitCodeException: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
    at org.apache.hadoop.util.Shell.run(Shell.java:418)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)


Container exited with a non-zero exit code 1
.Failing this attempt.. Failing the application.]

7、hive on tez

hive版本说明:hive-0.14+

版本为hive-0.13.1和hadoop2.4测试失败

1
修改hive的执行引擎为 Tez
 <property>
   <name>hive.execution.engine</name>
   <value>tez</value>
 </property>

# cp tez-0.7.0/tez-*.jar hive/lib/
# cp tez-0.7.0/lib/* hive/lib/

hive (default)> set hive.execution.engine;      
hive.execution.engine=mr
hive (default)> set hive.execution.engine=tez;  
hive (default)> select count(*) from order_test;
Total jobs = 1
Launching Job 1 out of 1
Exception in thread "Thread-6" java.lang.NoSuchMethodError: org.apache.tez.mapreduce.hadoop.MRHelpers.updateEnvironmentForMRAM(Lorg/apache/hadoop/conf/Configuration;Ljava/util/Map;)V

升级hadoop版本为2.6,hive版本为1.2.1测试

1
$ mv hadoop-2.6.0/share/hadoop/yarn/lib/jline-0.9.94.jar hadoop-2.6.0/share/hadoop/yarn/lib/jline-0.9.94.jar.bak  

[hsu@server2 ~]$ hive

Logging initialized using configuration in file:/home/hsu/apache-hive-1.2.1-bin/conf/hive-log4j.properties
hive (default)> show tables;
OK
tab_name
test
Time taken: 2.33 seconds, Fetched: 1 row(s)
hive (default)> select * from test;
OK
test.id
1
2
3
Time taken: 0.648 seconds, Fetched: 3 row(s)

hive (default)> set hive.execution.engine;
hive.execution.engine=tez
hive (default)> select count(*) from test;
Query ID = hsu_20150813161742_a64d46a7-7349-41b3-be27-9a91bb490a8b
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1439448136822_0002)
--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 51.17 s    
--------------------------------------------------------------------------------
OK
_c0
3
Time taken: 54.354 seconds, Fetched: 1 row(s)

#队列使用
set tez.queue.name=eda;

hive (default)>  set tez.queue.name=test;
hive (default)> select count(*) from test;
Query ID = hsu_20150813162458_b0ce5d5f-09d2-40ae-8b6a-90f1f4cebe91
Total jobs = 1
Launching Job 1 out of 1


Status: Running (Executing on YARN cluster with App id application_1439448136822_0003)

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 32.71 s    
--------------------------------------------------------------------------------
OK
_c0
3
Time taken: 84.191 seconds, Fetched: 1 row(s)

hive (default)> set hive.execution.engine;
hive.execution.engine=tez
hive (default)> set hive.execution.engine=mr;
hive (default)> select count(*) from test;
Query ID = hsu_20150813162723_31559eda-c8ab-4ca6-a445-b75afc908d28
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1439448136822_0004, Tracking URL = http://server1:23188/proxy/application_1439448136822_0004/
Kill Command = /home/hsu/hadoop/bin/hadoop job  -kill job_1439448136822_0004
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2015-08-13 16:28:33,107 Stage-1 map = 0%,  reduce = 0%
2015-08-13 16:29:20,688 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.99 sec
2015-08-13 16:29:49,215 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.27 sec
MapReduce Total cumulative CPU time: 2 seconds 270 msec
Ended Job = job_1439448136822_0004
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 2.27 sec   HDFS Read: 210 HDFS Write: 2 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 270 msec
OK
_c0
3
Time taken: 148.332 seconds, Fetched: 1 row(s)
    1. hive on tez,直接配置hive支持tez,其实是在yarn上面启动一个app,这个app永久运行,接受不同sql,除非退出hive cli此app才会结束
    1. 可以通过tez提高的ui查看任务执行情况,需要单独配置tezUI!
    1. tez runing app在yarn上面和spark on yarn很类似,而且tez,spark底层原理也很相似!
    1. 在切换队列时,会重启一个app运行!
    1. 在切换底层为mr引擎的时候也会重启app运行,但是tez那个app也没结束,等待mr app运行结束后,两个一起结束!

8、排查问题

  • 1、取消所有tez依赖单独运行

    • 结论:可以成功运行yarn任务,断定是不兼容问题,由于我编译tez使用了2.6.0
    • 编译通过,而是用2.4.0的包编译失败!目前集群hadoop版本2.4.0!
  • 2、升级hadoop版本为2.6.0再次测试

升级为2.6.0以后测试成功,说明tez兼容性还是比较差。

1
$ hadoop jar /usr/local/tez/tez-examples-0.7.0.jar orderedwordcount  /apps/tez- 0.7.0/conf/tez-site.xml /output
15/07/20 13:53:52 INFO client.TezClient: Tez Client Version: [ component=tez-   api, version=0.7.0, revision=${buildNumber}, SCM-URL=scm:git:https://git-wip-us    .apache.org/repos/asf/tez.git, buildTime=20150718-0613 ]
15/07/20 13:53:52 INFO client.RMProxy: Connecting to ResourceManager at itr-    mastertest01/192.168.2.200:8032
15/07/20 13:54:13 INFO examples.OrderedWordCount: Running OrderedWordCount
15/07/20 13:54:13 INFO client.TezClient: Submitting DAG application with id:    application_1437371604990_0001
15/07/20 13:54:13 INFO client.TezClientUtils: Using tez.lib.uris value from     configuration: hdfs://mycluster/apps/tez-0.7.0,hdfs://mycluster/apps/tez-0.7.0/ lib/
15/07/20 13:54:13 INFO client.TezClient: Stage directory /tmp/hsu/tez/staging   doesn't exist and is created
15/07/20 13:54:14 INFO client.TezClient: Tez system stage directory hdfs:// mycluster/tmp/hsu/tez/staging/.tez/application_1437371604990_0001 doesn't    exist and is created
15/07/20 13:54:14 INFO client.TezClient: Submitting DAG to YARN,    applicationId=application_1437371604990_0001, dagName=OrderedWordCount
15/07/20 13:54:14 INFO impl.YarnClientImpl: Submitted application   application_1437371604990_0001
15/07/20 13:54:14 INFO client.TezClient: The url to track the Tez AM: http://   server1:8088/proxy/application_1437371604990_0001/
15/07/20 13:54:14 INFO client.RMProxy: Connecting to ResourceManager at itr-    mastertest01/192.168.2.200:8032
15/07/20 13:54:15 INFO client.DAGClientImpl: Waiting for DAG to start running
15/07/20 13:55:06 INFO client.DAGClientImpl: DAG initialized:   CurrentState=Running
15/07/20 13:55:08 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0%   TotalTasks: 3 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
15/07/20 13:55:08 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
15/07/20 13:55:08 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
15/07/20 13:55:08 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
15/07/20 13:55:13 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0%   TotalTasks: 3 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
15/07/20 13:55:13 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
15/07/20 13:55:13 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
15/07/20 13:55:13 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
15/07/20 13:55:19 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0%   TotalTasks: 3 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
15/07/20 13:55:19 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
15/07/20 13:55:19 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
15/07/20 13:55:19 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
15/07/20 13:55:24 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0%   TotalTasks: 3 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
15/07/20 13:55:24 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
15/07/20 13:55:24 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
15/07/20 13:55:24 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
15/07/20 13:55:29 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0%   TotalTasks: 3 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
15/07/20 13:55:29 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
15/07/20 13:55:29 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
15/07/20 13:55:29 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
15/07/20 13:55:34 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0%   TotalTasks: 3 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
15/07/20 13:55:34 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
15/07/20 13:55:34 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
15/07/20 13:55:34 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
15/07/20 13:55:39 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0%   TotalTasks: 3 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
15/07/20 13:55:39 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed:    0
15/07/20 13:55:39 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
15/07/20 13:55:39 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
15/07/20 13:55:44 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0%   TotalTasks: 3 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
15/07/20 13:55:44 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed:    0
15/07/20 13:55:44 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
15/07/20 13:55:44 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
15/07/20 13:55:49 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0%   TotalTasks: 3 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
15/07/20 13:55:49 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed:    0
15/07/20 13:55:49 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
15/07/20 13:55:49 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
15/07/20 13:55:54 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0%   TotalTasks: 3 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
15/07/20 13:55:54 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed:    0
15/07/20 13:55:54 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
15/07/20 13:55:54 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
15/07/20 13:55:59 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0%   TotalTasks: 3 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
15/07/20 13:55:59 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed:    0
15/07/20 13:55:59 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:    0
15/07/20 13:55:59 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
15/07/20 13:56:01 INFO client.DAGClientImpl: DAG: State: RUNNING Progress:  33.33% TotalTasks: 3 Succeeded: 1 Running: 1 Failed: 0 Killed: 0
15/07/20 13:56:01 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0  Killed: 0
15/07/20 13:56:01 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed:    0
15/07/20 13:56:01 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
15/07/20 13:56:03 INFO client.DAGClientImpl: DAG: State: RUNNING Progress:  66.67% TotalTasks: 3 Succeeded: 2 Running: 1 Failed: 0 Killed: 0
15/07/20 13:56:03 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0  Killed: 0
15/07/20 13:56:03 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Summation Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0  Killed: 0
15/07/20 13:56:03 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
15/07/20 13:56:04 INFO client.DAGClientImpl: DAG: State: SUCCEEDED Progress:    100% TotalTasks: 3 Succeeded: 3 Running: 0 Failed: 0 Killed: 0
15/07/20 13:56:04 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0  Killed: 0
15/07/20 13:56:04 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Summation Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0  Killed: 0
15/07/20 13:56:04 INFO client.DAGClientImpl:    VertexStatus: VertexName:   Sorter Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
15/07/20 13:56:04 INFO client.DAGClientImpl: DAG completed. FinalState=SUCCEEDED

$ hadoop jar tez-examples-0.7.0.jar orderedwordcount -DUSE_TEZ_SESSION=true /test /output

  • 3、 yarn底层替换为tez,hive直接执行,效果如下!
    1
    [hsu@server1 ~]$ hive
    Logging initialized using configuration in file:/home/hsu/apache-hive-1.2.1-bin /conf/hive-log4j.properties
    hive (default)> show tables;
    OK
    tab_name
    test
    Time taken: 1.106 seconds, Fetched: 1 row(s)
    hive (default)> select count(1) from test;
    Query ID = hsu_20150813170521_77b18935-4f38-4f84-8da7-6774b9aff194
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1439456630574_0001, Tracking URL = http://itr-   mastertest01:23188/proxy/application_1439456630574_0001/
    Kill Command = /home/hsu/hadoop/bin/hadoop job  -kill job_1439456630574_0001
    Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
    2015-08-13 17:06:36,372 Stage-1 map = 0%,  reduce = 0%
    2015-08-13 17:07:26,923 Stage-1 map = 100%,  reduce = 100%
    Ended Job = job_1439456630574_0001
    MapReduce Jobs Launched: 
    Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
    Total MapReduce CPU Time Spent: 0 msec
    OK
    _c0
    3
    Time taken: 126.726 seconds, Fetched: 1 row(s)

注意:yarn页面:app tyep 直接变成TEZ,但是如上执行还是mr执行程序日志~!这里比较难理解具体是mr执行的还是tez执行的!还得去底层跟踪确认!

(1)、yarn mr替换为tez,hive不做任务配置,提交任务到yarn,效果如下图:

(2)、不同tez版本支持的hadoop版本需要对应
(3)、编译时,我手动制定hadoop.version它有回出现yarn模块兼容性编译通不过。而默认的它指定的hadoop版本可以成功编译!
(4)、tez从另一个方面讲还不算成熟!~
(5)、支持ResourceManager HA

  • 3、目前xxx on yarn模式情况

(1) tez on yarn模式主要是hdp在推,相信未来和yarn集成兼容性各种问题都能迎刃而解!

(2) spark engine on yarn模式,主要是cdh在推,目前已经完成合并到hive主干!

ps:官网安装文档都说,自己可以修改hadoop.version进行编译,实际修改编译失败,比较费解,也说明tez的兼容性不好!

9、tez ui

(1) 配置tez-site.xml

1
$ cat tez-site.xml         
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>   

  <property>        
  <name>tez.lib.uris</name>        
  <value>${fs.defaultFS}/apps/tez-0.7.0,${fs.defaultFS}/apps/tez-0.7.0/lib/</value>   
  </property>

 <property>
  <description>Enable Tez to use the Timeline Server for History Logging</description>
   <name>tez.history.logging.service.class</name>
   <value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
 </property>

 <property>
  <description>URL for where the Tez UI is hosted</description>
  <!--tomcat 9999 端口-->
  <name>tez.tez-ui.history-url.base</name>
  <value>http://server2:9999/tez-ui/</value>
 </property>

<property>
    <description>Publish configuration information to Timeline server.</description>
    <name>tez.runtime.convert.user-payload.to.history-text</name>
    <value>true</value>
</property>

<property>
    <name>tez.task.generate.counters.per.io</name>
    <value>true</value>
</property>
</configuration>

(2) 配置yarn-site.xml

1
<property>
  <description>Indicate to clients whether Timeline service is enabled or not. If enabled, the TimelineClient library used by end-users will post entities and events to the Timeline server.</description>
  <name>yarn.timeline-service.enabled</name>
  <value>true</value>
</property>
<property>
  <description>The hostname of the Timeline service web application.</description>
  <name>yarn.timeline-service.hostname</name>
  <value>server2</value>
</property>
<property>
  <description>Enables cross-origin support (CORS) for web services where cross-origin web response headers are needed. For example, javascript making a web services request to the timeline server.</description>
  <name>yarn.timeline-service.http-cross-origin.enabled</name>
  <value>true</value>
</property>
<property>
  <description>Publish YARN information to Timeline Server</description>
  <name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
  <value>true</value>
</property>
      <property>
       <description>The http address of the Timeline service web application.</description>
       <name>yarn.timeline-service.webapp.address</name>
       <value>${yarn.timeline-service.hostname}:8188</value>
     </property>
<property>
   <description>The https address of the Timeline service web application.</description>
  <name>yarn.timeline-service.webapp.https.address</name>
  <value>${yarn.timeline-service.hostname}:2191</value>
</property>

(3) 启动hadoop timeline

1
[hsu@server2 hadoop]$ yarn-daemon.sh start timelineserver

[hsu@server2 ~]$ lsof -i :8188
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
java    7189  hsu  281u  IPv6  71981      0t0  TCP server2:8188 (LISTEN)

(4) tez ui war包解压放到tomcat的webapps下

1
## 在server2上部署tomcat

$ tar -zxvf apache-tomcat-8.0.24.tar.gz 
[hsu@server2 ~]$ ln -s apache-tomcat-8.0.24 tomcat

[hsu@server2 ~]$ cp /usr/local/tez/tez-ui-0.7.0.war ./tomcat/webapps/

[hsu@server2 webapps]$ mkdir tez-ui           
[hsu@server2 webapps]$ mv tez-ui-0.7.0.war tez-ui/

[hsu@server2 webapps]$ cd tez-ui/
[hsu@server2 tez-ui]$ jar xvf tez-ui-0.7.0.war 

## 配置scripts/config.js文件

--Configuring the Timeline Server URL and Resource Manager UI URL 
 
 timelineBaseUrl: 'http://server1:8188',
 RMWebUrl: 'http://server1:23188',

(5) 启动tomcat

1
[hsu@server2 ~]$ tomcat/bin/startup.sh 
Using CATALINA_BASE:   /home/hsu/tomcat
Using CATALINA_HOME:   /home/hsu/tomcat
Using CATALINA_TMPDIR: /home/hsu/tomcat/temp
Using JRE_HOME:        /usr/local/jdk1.7.0_45
Using CLASSPATH:       /home/hsu/tomcat/bin/bootstrap.jar:/home/hsu/tomcat/bin/tomcat-juli.jar
Tomcat started.

#修改端口为9999
[hsu@server2 ~]$ tomcat/bin/shutdown.sh 
Using CATALINA_BASE:   /home/hsu/tomcat
Using CATALINA_HOME:   /home/hsu/tomcat
Using CATALINA_TMPDIR: /home/hsu/tomcat/temp
Using JRE_HOME:        /usr/local/jdk1.7.0_45
Using CLASSPATH:       /home/hsu/tomcat/bin/bootstrap.jar:/home/hsu/tomcat/bin/tomcat-juli.jar

[hsu@server2 ~]$ tomcat/bin/startup.sh 
[hsu@server2 ~]$ lsof -i :9999          
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
java    7824  hsu   43u  IPv6  75557      0t0  TCP *:distinct (LISTEN)

(6) 验证

1、webui验证:http://server2:9999/tez-ui/

tez-runingtez-runing
tez-runting_dagtez-runting_dag

2、后台执行sql验证前台

注意:resourcemanager节点必须为timelineserver配置的节点,如果RS配置HA的需要特别注意!

参考:http://zh.hortonworks.com/hadoop-tutorial/supercharging-interactive-queries-hive-tez/

http://tez.apache.org/tez-ui.html

http://zh.hortonworks.com/blog/evaluating-hive-with-tez-as-a-fast-query-engine

http://tez.apache.org/install.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值