安装
$ bin/mkdistro.sh [-DskipTests]Running =mkdistro.sh= will create the binary distribution of Oozie. By default, oozie war will not contain hadoop and hcatalog libraries, however they are required for oozie to work. There are 2 options to add these libraries: 1. At install time, copy the hadoop and hcatalog libraries to libext and run oozie-setup.sh to setup oozie war. This is suitable when same oozie package needs to be used in multiple set-ups with different hadoop/hcatalog versions. 2. Build with -Puber which will bundle the required libraries in the oozie war. Further, the following options are available to customise the versions of the dependencies: -P<profile> - default hadoop-1. Valid are hadoop-1, hadoop-0.23, hadoop-2 or hadoop-3. Choose the correct hadoop profile depending on the hadoop version used. -Dhadoop.version=<version> - default 1.2.1 for hadoop-1, 0.23.5 for hadoop-0.23, 2.3.0 for hadoop-2 and 3.0.0-SNAPSHOT for hadoop-3 -Dhadoop.auth.version=<version> - defaults to hadoop version -Ddistcp.version=<version> - defaults to hadoop version -Dpig.version=<version> - default 0.12.1 -Dpig.classifier=<classifier> - default none -Dsqoop.version=<version> - default 1.4.3 -Dsqoop.classifier=<classifier> - default hadoop100 -Dtomcat.version=<version> - default 6.0.41 -Dopenjpa.version=<version> - default 2.2.2 -Dxerces.version=<version> - default 2.10.0 -Dcurator.version=<version> - default 2.5.0 -Dhive.version=<version> - default 0.13.1 -Dhbase.version=<version> - default 0.94.2
cd oozie-4.2.0/bin
./mkdistro.sh -DskipTests -Puber -P hadoop-2 -Puber会将第三方的包的打包进war包,比较方便。如果不加-Puber的话。编译好后的oozie.war包就没有依赖的jar文件。以后得自己下载依赖,放到libext目录下,然后自己打war包。
2.hbase-1.0.3.jar下载不到,在m2本地仓库中用hbase-0.9.XX.jar替换了,就是把名字改成1.0.3了----
不知道会不会有后遗症
3. Failed to execute goal org.apache.maven.plugins:maven-site-plugin:2.0-beta-6:site (default) on project oozie-docs: The site descriptor cannot be resolved from the repository: Could not transfer artifact org.apache:apache:xml:site_en:16 from/to Codehaus repository (http://repository.codehaus.org/): repository.codehaus.org: unknown error
解决方式:
I was able to resolve it by editing the parent pom.xml file by removing the repository Codehaus repository
<repository>
<id>Codehaus repository</id>
<url>http://repository.codehaus.org/</url>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
或者替换链接
<repositories>
<repository>
<id>Codehaus repository</id>
<name>codehaus-mule-repo</name>
<url>https://repository-master.mulesoft.org/nexus/content/groups/public/
</url>
<layout>default</layout>
</repository>
</repositories>
编译后目标生产路径oozie-4.2.0/distro/target/
oozie-4.2.0-distro.tar.gz
安装:
编译生成的oozie-4.2.0-distro.tar.gz 解压到相应目录这就是我们要的oozie 了。解压后得到oozie-4.2.0,cd到改目录下
1.下载ext-2.2.zip
解压oozie-4.0.1-distro.tar.gz包
mkdir libext
把hadoop的lib拷贝至libext目录下:cp /usr/local/hadoop/share/hadoop/*/*.jar libext/;cp /usr/local/hadoop/share/hadoop/*/lib/*.jar libext/
把hadoop与tomcat冲突jar包去掉--这个参考网络,不知道是不是必须
mv servlet-api-2.5.jar servlet-api-2.5.jar.bak
mv jsp-api-2.1.jar jsp-api-2.1.jar.bak
mv jasper-compiler-5.5.23.jar jasper-compiler-5.5.23.jar.bak
mv jasper-runtime-5.5.23.jar jasper-runtime-5.5.23.jar.bak
2.bin/oozie-setup.sh prepare-war
3.上传共享lib
tar -zxvf oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/oozie-sharelib-4.2.0.tar.gz,会生成share目录,待会生成hdfs的sharelib目录的时候需要用到。
bin/oozie-setup.sh sharelib create -fs
hdfs://node01:8020,红色部分改成自己hdfs url地址
或者
hdfs dfs -put /opt/oozie-4.2.0/share /user/{
username}
注意,与oozie-site.xml中的oozie.service.WorkflowAppService.system.libpath的值保持一致,所以必须放到/user/{
username
}这个目录下
4.代理设置
如果不设置,提交任务时会遇到类似的报错:
hadoop is not allowed to impersonate hadoop
翻译过来意思是hadoop不允许模仿hadoop,也就是说hadoop没有代替hadoop提交任务的权限。
出现这个问题的原因在于OOZIE本身并不执行任何任务,也不会分发任务至Tasktracker。OOZIE和Hadoop集群唯一的交互是向Jobtracker提交任务,并通过回调URL或轮询的方式获取任务执行情况。
我们假定Hadoop集群安装在A账户下,OOZIE安装在某节点的B账户下,该账户属于C用户组。那么代理设置表示如下含义:A账户在该节点拥有代替C用户组提交任务的权限。
在core-site.xml中添加
<!-- OOZIE -->
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>IP</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>hadoop</value>
</property>
在配置项中,hadoop.proxyuser.hadoop.hosts和hadoop.proxyuser.hadoop.groups中的两个hadoop是我们上文提到的账户A,hadoop.proxyuser.hadoop.hosts对应的value需要填写OOZIE安装节点的IP,hadoop.proxyuser.hadoop.groups对应的value需要填写我们上文提到的用户组C。
由于一般Hadoop和OOZIE都安装在hadoop账户下,而hadoop账户又属于hadoop用户组。所以就出现了这种搞笑的配置,hadoop代替hadoop提交任务。
不重启hadoop集群,而使配置生效
hdfs dfsadmin -refreshSuperUserGroupsConfiguration
yarn rmadmin -refreshSuperUserGroupsConfiguration
注意用户名一定不能带点:如
hadoop.proxyuser.xing.ming.groups
5.
bin/oozie-setup.sh db create -run
可以在conf/oozie-site.xml中修改oozie 元数据db相关信息
6.
bin/oozied.sh start
7.
bin/oozie admin -oozie http://localhost:11000/oozie -status
可以直接访问
http://localhost:11000/oozie