说明:
- 官网下载最新版本:https://oozie.apache.org/
- 依赖环境:CentOS7+JDK1.8+maven-3.6.3+pig-0.17.0
- 参考官网
一、准备工作
-
下载maven、安装、修改settings.xml改其仓库为阿里云
(1)安装:
tar -zvxf /tools/apache-maven-3.6.3-bin.tar.gz -C /training
(2) 环境变量配置:vi ~/.bash_profile
,添加如下信息:export MVN_HOME=/training/apache-maven-3.6.3/ export PATH=$PATH:$MVN_HOME/bin
(3)环境变量生效:
source ~/.bash_profile
(4)其中配置生成.m2目录:安装完maven后 所谓的{user_home}/.m2/repository找不见mvn help:effective-settings
等下载完jar包后 这里的 就是本地仓库了 一般为/root/.m2/repository
(5)在/root/.m2/repository下创建创建settings.xml,添加阿里云仓库:<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 https://maven.apache.org/xsd/settings-1.0.0.xsd"> <mirrors> <mirror> <id>alimaven</id> <name>aliyun maven</name> <url>http://maven.aliyun.com/nexus/content/groups/public/</url> <mirrorOf>central</mirrorOf> </mirror> </mirrors> </settings>
(6)将settings.xml复制一份到$MAVEN_HOME/conf下
-
修改虚拟机Centos7下的/etc/hosts文件,添加如下信息:
182.92.29.13 maven.aliyun.com 111.13.210.19 archiva-maven-storage-prod.oss-cn-beijing.aliyuncs.com 136.243.146.148 repository.apache.org 223.113.13.64 maven.repository.redhat.com 88.97.7.126 www.datanucleus.org 54.197.228.20 conjars.org 209.132.182.97 repository.jboss.org 137.254.56.48 maven2-repository.dev.java.net
-
下载hadoop-2.7.3.tar.gz,并自行安装部署
这里只需要修改hadoop安装目录下etc/hadoop/core-site.xml,添加如下:root为代理用户,可以自行创建普通用户
<!-- OOZIE --> <property> <name>hadoop.proxyuser.root.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.root.groups</name> <value>*</value> </property>
-
下载ExtJS 2.2(官网下载)
-
下载Pig-0.17.0,并自行完成安装配置
-
下载HBase-1.3.1,并自行安装配置
-
下载Hive-2.3.3,并自行安装配置
-
下载Spark,并自行安装配置(本博文没有此项,可自行添加)
-
下载oozie-5.2.1.tar.gz,解压、配置:
1.
修改oozie解压目录下pom.xml,此项为重点关注对象
,需要修改内容如下:<hadoop.version>2.7.3</hadoop.version> <hadoop.majorversion>2</hadoop.majorversion> <hadooplib.version>hadoop-${hadoop.majorversion}-${project.version}</hadooplib.version> <hbase.version>1.3.1</hbase.version> <!-- Sharelib component versions --> <hive.version>2.3.3</hive.version> <hive.jline.version>2.12</hive.jline.version> <pig.version>0.17.0</pig.version> <!-- 因aliyun repos上pig-0.17.0没有classifier,故需要将默认的h2删除 --> <pig.classifier></pig.classifier> <hive.classifier>core</hive.classifier> <sqoop.version>1.4.7</sqoop.version>
但是只配置上述还存在问题,需要继续修改pom.xml,如下所示
a)Oozie-core报错:
错误:[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.7.0:testCompile (default-testCompile) on project oozie-core: Compilation failure: Compilation failure:
方法:在pom.xml增加/core/src/test/java,如下所示:<build> <plugins> ... ... </plugins> <testSourceDirectory>/core/src/test/java</testSourceDirectory> </build>
b)解决无法访问clourdera仓库问题,需要修改如下:
<pluginRepositories> <pluginRepository> <!-- <id>repository.cloudera.com</id> <name>repository.cloudera.com-releases</name> <url>https://repository.cloudera.com/artifactory/ext-release-local</url> <snapshots> <enabled>false</enabled> </snapshots> --> <!-- changed by me --> <id>central</id> <url>https://repo1.maven.org/maven2/org/apache/felix/maven-bundle-plugin/3.5.0/</url> <snapshots> <enabled>false</enabled> </snapshots> </pluginRepository> </pluginRepositories>
- 修改/oozie-5.2.1/fluent-job/fluent-job-api下的pom.xml文件,将其中一个插件的版本由0.1.6改成0.1.8(版本问题),如下所示:
二、开始编译
确保上述所有步骤都正确无误,在进行下面得操作:
在Oozie解压目录下,进行打包(注意版本要和pom.xml中的一致哈)命令如下:
bin/mkdistro.sh -DskipTests -Puber -Dhadoop.version=2.7.3 -Dpig.version=0.17.0 -Dhive.version=2.3.3 -Dhbase.version=1.3.1
成功之后,可以到Oozie的解压目录下查找编译后的版本,如在/tools/oozie-5.2.1/distro/target中会看到oozie-5.2.1-distro.tar.gz
[INFO] Reactor Summary for Apache Oozie Main 5.2.1:
[INFO]
[INFO] Apache Oozie Main .................................. SUCCESS [ 3.798 s]
[INFO] Apache Oozie Fluent Job ............................ SUCCESS [ 0.138 s]
[INFO] Apache Oozie Fluent Job API ........................ SUCCESS [ 18.445 s]
[INFO] Apache Oozie Client ................................ SUCCESS [ 10.356 s]
[INFO] Apache Oozie Share Lib Oozie ....................... SUCCESS [ 4.550 s]
[INFO] Apache Oozie Share Lib HCatalog .................... SUCCESS [ 9.372 s]
[INFO] Apache Oozie Share Lib Distcp ...................... SUCCESS [ 1.317 s]
[INFO] Apache Oozie Core .................................. SUCCESS [ 33.162 s]
[INFO] Apache Oozie Share Lib Streaming ................... SUCCESS [ 7.258 s]
[INFO] Apache Oozie Share Lib Pig ......................... SUCCESS [ 29.345 s]
[INFO] Apache Oozie Share Lib Git ......................... SUCCESS [ 22.844 s]
[INFO] Apache Oozie Share Lib Hive ........................ SUCCESS [ 29.006 s]
[INFO] Apache Oozie Share Lib Hive 2 ...................... SUCCESS [ 13.767 s]
[INFO] Apache Oozie Share Lib Sqoop ....................... SUCCESS [ 6.880 s]
[INFO] Apache Oozie Examples .............................. SUCCESS [ 51.169 s]
[INFO] Apache Oozie Share Lib Spark ....................... SUCCESS [01:11 min]
[INFO] Apache Oozie Share Lib ............................. SUCCESS [ 55.261 s]
[INFO] Apache Oozie Docs .................................. SUCCESS [ 8.078 s]
[INFO] Apache Oozie WebApp ................................ SUCCESS [ 36.713 s]
[INFO] Apache Oozie Tools ................................. SUCCESS [ 10.642 s]
[INFO] Apache Oozie MiniOozie ............................. SUCCESS [ 5.966 s]
[INFO] Apache Oozie Fluent Job Client ..................... SUCCESS [ 5.838 s]
[INFO] Apache Oozie Server ................................ SUCCESS [ 24.425 s]
[INFO] Apache Oozie Distro ................................ SUCCESS [01:32 min]
[INFO] Apache Oozie ZooKeeper Security Tests .............. SUCCESS [ 46.027 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 10:01 min
[INFO] Finished at: 2021-03-06T15:45:28+08:00
[INFO] ------------------------------------------------------------------------
Oozie distro created, DATE[2021.03.06-07:35:24GMT] VC-REV[unavailable], available at [/tools/oozie-5.2.1/distro/target]
**注意:这里往往出错最多得地方,原因是各种依赖无法正常下载,这里很关键,一定要确保正常下载依赖才能编译成功**
三、安装部署
从上述部分可以在/tools/oozie-5.2.1/distro/target下面找到编译好的包:oozie-5.2.1-distro.tar.gz
-
解压安装
-
配置环境变量
-
环境变量生效
-
在oozie安装路径下创建libext:
mkdir libext
-
复制Hadoop安装目录下share目录的jar和前面已经下载好的ext-2.2.zip包至oozie下的libext,执行如下命令:
cp -rf /training/hadoop-2.7.3/share/hadoop/*/hadoop*-.jar /training/oozie-5.2.1/libext/ cp -rf /training/hadoop-2.7.3/share/hadoop/*/lib/*.jar /training/oozie-5.2.1/libext/ cp /tools/ext-2.2.zip /training/oozie-5.2.1/libext/
-
配置oozie相关文件:
1.修改conf/oozie-site.xml,只需添加如下<property> <name>oozie.service.HadoopAccessorService.hadoop.configurations</name> <value>*=/training/hadoop-2.7.3/etc/hadoop/</value> </property>
/training/hadoop-2.7.3/etc/hadoop/
为Hadoop配置文件所在目录2.修改conf/hadoop-conf/core-site.xml,只需要添加如下
<property> <name>fs.defaultFS</name> <value>hdfs://hadoop001:9000</value> </property>
-
上述没有问题后,确保Hadoop已经启动,然后在oozie安装路径下执行如下命令将oozie的sharelib上传到hdfs中:
bin/oozie-setup.sh sharelib create -fs hdfs://hadoop001:9000 -locallib oozie-sharelib-5.2.1.tar.gz
其中-locallib 及后面的都可以省略不写,建议写上。
正常执行后会在HDFS上创建出如下目录:
-
使用ooziedb.sh创建Oozie数据库脚本,执行如下:
bin/ooziedb.sh create -sqlfile oozie.sql -run
正常执行后会在oozie安装路径下会生成oozie.sql的文件
-
启动Oozie服务(守护进程方式)
bin/oozied.sh start 或者 bin/oozie-start.sh
-
查看是否启动成功
查看是否存在进程:jps 看到EmbeddedOozieServer
进程
查看Oozie服务状态:
bin/oozie admin -oozie http://localhost:11000/oozie -status
正常情况下会看到系统状态是:NORMAL
浏览器中查看,会看到如下图所示:
访问:http://hadoop001:11000/oozie/
至此,已经成功安装部署了Oozie的服务器端,注意:这里其实也一并把Oozie的client端也安装了,Oozie的客户端一般需要单独安装在另外的服务器上
四、运行测试
- Command Line Examples命令行的方式
-
解压oozie安装路径下的oozie-examples.tar.gz,并将解压后的examples整个目录上传到hdfs中的用户家目录下(如/user/root/)
-
选择需要测试的案例,并进入到该案例所在的目录下根据实际情况修改job.properties中的内容,我这里选择的是map-reduce案例,故修改map-reduce目录下的job.properties,将其中的内容改成如下所示:
nameNode=hdfs://hadoop001:9000 resourceManager=hadoop001:8032 queueName=default examplesRoot=examples oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/map-reduce/workflow.xml outputDir=map-reduce
-
在oozie安装路径下运行案例,执行如下命令:
bin/oozie job -oozie http://hadoop001:11000/oozie -config examples/apps/map-reduce/job.properties -run
注意:oozie的命令要在oozie安装路径下执行,虽已经配置环境变量,但是我在其他路径下执行该命令会出错
命令如成功执行会看到信息:job: 0000000-210307000139733-oozie-root-W
-
查看运行状态:
bin/oozie job -oozie http://localhost:11000/oozie -info 0000000-210307000139733-oozie-root-W
得到如下信息:
Job ID : 0000000-210307000139733-oozie-root-W ------------------------------------------------------------------------------------------------------------------------------------ Workflow Name : map-reduce-wf App Path : hdfs://hadoop001:9000/user/root/examples/apps/map-reduce/workflow.xml Status : RUNNING Run : 0 User : root Group : - Created : 2021-03-06 16:09 GMT Started : 2021-03-06 16:09 GMT Last Modified : 2021-03-06 16:09 GMT Ended : - CoordAction ID: - Actions ------------------------------------------------------------------------------------------------------------------------------------ ID Status Ext ID Ext Status Err Code ------------------------------------------------------------------------------------------------------------------------------------ 0000000-210307000139733-oozie-root-W@:start: OK - OK - ------------------------------------------------------------------------------------------------------------------------------------ 0000000-210307000139733-oozie-root-W@mr-node RUNNING application_1615045482449_0001RUNNING - ------------------------------------------------------------------------------------------------------------------------------------
-
浏览器查看:
双击上述红色部分,看到如图:
切换到Job DAG会看到如下图所示:
-
- Java API方式(略)
五、总结
花了差不多一个白天从零开始搭建这个环境,刚开始最大的问题在各种依赖无法下载导致浪费了很多时间!其实Apache Oozie版本官网上写得还算好了,只是有一些细节没有写得很清楚,也浪费了不少时间!总之,已经编译部署完毕,有兴趣得读者可以试试吧~
六、编译后得版本
下载地址:https://download.csdn.net/download/sujiangming/15623324