1、部署Hadoop(CDH版本的)
准备环境:
oozie-4.0.0-cdh5.3.6.tar.gz、
hadoop-2.5.0-cdh5.3.6.tar.gz、
ext-2.2.zip
自己已经拷贝进来了,在/usr/local/hadoop文件目录下
由于jar包比较多,所以自己在/usr/local/hadoop/module文件下创建一个cdh的空文件,用来存放解压的jar包
解压文件:hadoop-2.5.0-cdh5.3.6.tar.gz
[root@hadoop105 hadoop]# tar -zxvf hadoop-2.5.0-cdh5.3.6.tar.gz -C /usr/local/hadoop/module/cdh/
解压文件:oozie-4.0.0-cdh5.3.6.tar.gz
[root@hadoop105 hadoop]# tar -zxvf oozie-4.0.0-cdh5.3.6.tar.gz -C /usr/local/hadoop/module/
修改Hadoop配置
进入hadoop-2.5.0-cdh5.3.6文件下的conf配置文件
修改文件:
hadoop-env.sh:
[root@hadoop105 module]# cd cdh/hadoop-2.5.0-cdh5.3.6/etc/hadoop
[root@hadoop105 hadoop]# vim hadoop-env.sh
export JAVA_HOME=/usr/local/java/module/jdk1.8
mapred-env:
[root@hadoop105 hadoop]# vim mapred-env.sh
export JAVA_HOME=/usr/local/java/module/jdk1.8
yarn-env.sh:
[root@hadoop105 hadoop]# vim yarn-env.sh
export JAVA_HOME=/usr/local/java/module/jdk1.8
core-site.xml:
[root@hadoop105 hadoop]# vim core-site.xml
<configuration>
<!--指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop105:8020</value>
</property>
<!--指定hadoop运行时产生文件存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/module/cdh/hadoop-2.5.0-cdh5.3.6/tmp</value>
</property>
<!-- Oozie Server的Hostname -->
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<!-- 允许被Oozie代理的用户组 -->
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
</configuration>
~
hdfs-site.xml
[root@hadoop105 hadoop]# vim hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<!-- 指定Hadoop辅助名称节点主机配置 -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop105:50090</value>
</property>
</configuration>
mapred-site.xml:
</configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop105:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop105:19889</value>
</property>
</configuration>
yarn-site.xml
[root@hadoop105 hadoop]# vim yarn-site.xml
</configuration>
<!-- 指定YARN的ResourceManager的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop105</value>
</property>
<!-- Reducer获取数据的方式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 日志聚集功能使用 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 日志停留时间设置7天 -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
</configuration>
slaves
[root@hadoop105 hadoop]# vim slaves
hadoop105
hadoop106
hadoop107
分发到别的机器:hadoop106、hadoop107
[root@hadoop105 cdh]# scp -r hadoop-2.5.0-cdh5.3.6/ hadoop106:/usr/local/hadoop/module/cdh/
[root@hadoop105 cdh]# scp -r hadoop-2.5.0-cdh5.3.6/ hadoop107:/usr/local/hadoop/module/cdh/
启动
(1)启动之前需要在hadoop105机器格式化:
[root@hadoop105 hadoop-2.5.0-cdh5.3.6]# bin/hdfs namenode -format
格式化成功,如图所示:
(2)当格式化成功后,我们就可以启动,执行启动命令:
[root@hadoop105 hadoop-2.5.0-cdh5.3.6]# sbin/start-dfs.sh
(3)启动hadoop106机器,执行启动命令:
[root@hadoop106 hadoop-2.5.0-cdh5.3.6]# sbin/start-yarn.sh
(4)随后,我们还需要在hadoop105机器上启动历史服务:
[root@hadoop105 hadoop-2.5.0-cdh5.3.6]# sbin/mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/hadoop/module/cdh/hadoop-2.5.0-cdh5.3.6/logs/mapred-root-historyserver-hadoop105.out
注意:
需要开启JobHistoryServer, 最好执行一个MR任务进行测试。
2、部署Oozie
解压Oozie
(1)自己解压到/usr/local/hadoop/module目录下了
[root@hadoop105 hadoop]# tar -zxvf oozie-4.0.0-cdh5.3.6.tar.gz -C /usr/local/hadoop/module/
(2)进入oozie-4.0.0-cdh5.3.6文件目录里面,解压hadooplibs-4.0.0-cdh5.3.6.tar.gz压缩包解压到上级目录下
[root@hadoop105 module]# cd oozie-4.0.0-cdh5.3.6/
[root@hadoop105 oozie-4.0.0-cdh5.3.6]# tar -zxvf oozie-hadooplibs-4.0.0-cdh5.3.6.tar.gz -C ../
解压到这里
[root@hadoop105 module]# cd oozie-4.0.0-cdh5.3.6/
[root@hadoop105 oozie-4.0.0-cdh5.3.6]# ll
完成后Oozie目录下会出现hadooplibs目录。
(3)在Oozie目录下创建libext目录
[root@hadoop105 oozie-4.0.0-cdh5.3.6]# mkdir libext
拷贝依赖的Jar包
(1)将hadooplibs里面的jar包,拷贝到libext目录下:
[root@hadoop105 oozie-4.0.0-cdh5.3.6]# cp -ra hadooplibs/hadooplib-2.5.0-cdh5.3.6.oozie-4.0.0-cdh5.3.6/* libext/
(2)拷贝Mysql驱动包到libext目录下:
[root@hadoop105 mysql-connector-java-5.1.27]# ll
total 1264
-rw-r--r--. 1 root root 47173 Oct 23 2013 build.xml
-rw-r--r--. 1 root root 222520 Oct 23 2013 CHANGES
-rw-r--r--. 1 root root 18122 Oct 23 2013 COPYING
drwxr-xr-x. 2 root root 71 Jan 26 15:04 docs
-rw-r--r--. 1 root root 872303 Oct 23 2013 mysql-connector-java-5.1.27-bin.jar
-rw-r--r--. 1 root root 61423 Oct 23 2013 README
-rw-r--r--. 1 root root 63674 Oct 23 2013 README.txt
drwxr-xr-x. 7 root root 67 Oct 23 2013 src
[root@hadoop105 mysql-connector-java-5.1.27]# cp mysql-connector-java-5.1.27-bin.jar /usr/local/hadoop/module/oozie-4.0.0-cdh5.3.6/libext/
(3)将ext-2.2.zip拷贝到libext目录下
ext是一个js框架,用于展示oozie前端页面:
[root@hadoop105 cdh]# cp ext-2.2.zip ../oozie-4.0.0-cdh5.3.6/libext/
修改Oozie配置文件
oozie-site.xml
[root@hadoop105 oozie-4.0.0-cdh5.3.6]# cd conf/
[root@hadoop105 conf]# vim oozie-site.xml
#140行:
<property>
<name>oozie.service.JPAService.jdbc.driver</name>
<value>com.mysql.jdbc.Driver</value>
<description>
JDBC driver class.
</description>
</property>
#148行:
<property>
<name>oozie.service.JPAService.jdbc.url</name>
<value>jdbc:mysql://hadoop105:3306/oozie</value>
<description>
JDBC URL.
</description>
</property>
#156行:
<property>
<name>oozie.service.JPAService.jdbc.username</name>
<value>root</value>
<description>
DB user name.
</description>
</property>
#164行
<property>
<name>oozie.service.JPAService.jdbc.password</name>
<value>123456</value>
<description>
DB user password.
IMPORTANT: if password is emtpy leave a 1 space string, the service trims the value,
if empty Configuration assumes it is NULL.
</description>
</property>
#231行:
<property>
<name>oozie.service.HadoopAccessorService.hadoop.configurations</name>
<value>*=/usr/local/hadoop/module/cdh/hadoop-2.5.0-cdh5.3.6/etc/hadoop</value>
<description>
Comma separated AUTHORITY=HADOOP_CONF_DIR, where AUTHORITY is the HOST:PORT of
the Hadoop service (JobTracker, HDFS). The wildcard '*' configuration is
used when there is no exact match for an authority. The HADOOP_CONF_DIR contains
the relevant Hadoop *-site.xml files. If the path is relative is looked within
the Oozie configuration directory; though the path can be absolute (i.e. to point
to Hadoop client conf/ directories in the local filesystem.
</description>
</property>
进入Mysql并创建oozie数据库:
[root@hadoop105 conf]# mysql -uroot -p
Enter password:
在Mysql中创建Oozie的数据库
mysql> create database oozie;
Query OK, 1 row affected (0.02 sec)
mysql>
检测配置ok了!
修改配置文件
(1)修改 mapred-site.xml配置文件
[root@hadoop105 hadoop]# mv mapred-site.xml.template mapred-site.xml
其他机器也分发或弄一下:
[root@hadoop106 hadoop]# mv mapred-site.xml.template mapred-site.xml
[root@hadoop107 hadoop]# mv mapred-site.xml.template mapred-site.xml
停止服务:
由于修改过参数,所以建议格式化停止服务
[root@hadoop105 hadoop-2.5.0-cdh5.3.6]# sbin/start-yarn.sh
[root@hadoop105 hadoop-2.5.0-cdh5.3.6]# sbin/stop-yarn.sh
[root@hadoop105 hadoop-2.5.0-cdh5.3.6]# sbin/mr-jobhistory-daemon.sh stop historyserver
查看进程:
[root@hadoop105 ~]# jps
7514 QuorumPeerMain
81199 Jps
[root@hadoop105 ~]#
现在我们就可以进行格式化了
注意:在格式化之前,我们需要把以前的格式化文件删掉,防止在格式化时受影响!
[root@hadoop105 hadoop-2.5.0-cdh5.3.6]# ll
total 12
drwxr-xr-x. 2 1106 4001 137 Jul 28 2015 bin
drwxr-xr-x. 2 1106 4001 166 Jul 28 2015 bin-mapreduce1
drwxr-xr-x. 3 1106 4001 187 Jul 28 2015 cloudera
drwxr-xr-x. 6 1106 4001 109 Jul 28 2015 etc
drwxr-xr-x. 5 1106 4001 43 Jul 28 2015 examples
drwxr-xr-x. 3 1106 4001 28 Jul 28 2015 examples-mapreduce1
drwxr-xr-x. 2 1106 4001 106 Jul 28 2015 include
drwxr-xr-x. 3 1106 4001 20 Jul 28 2015 lib
drwxr-xr-x. 2 1106 4001 239 Jul 28 2015 libexec
drwxr-xr-x. 3 root root 4096 Jan 27 15:21 logs
drwxr-xr-x. 3 1106 4001 4096 Jul 28 2015 sbin
drwxr-xr-x. 4 1106 4001 31 Jul 28 2015 share
drwxr-xr-x. 17 1106 4001 4096 Jul 28 2015 src
drwxr-xr-x. 4 root root 37 Jan 27 15:21 tmp
[root@hadoop105 hadoop-2.5.0-cdh5.3.6]# rm -rf tmp/
[root@hadoop105 hadoop-2.5.0-cdh5.3.6]# bin/hdfs namenode -format
格式化ok
接下来,就可以启动了
[root@hadoop105 hadoop-2.5.0-cdh5.3.6]# sbin/start-dfs.sh
[root@hadoop106 hadoop-2.5.0-cdh5.3.6]# sbin/start-yarn.sh
[root@hadoop105 hadoop-2.5.0-cdh5.3.6]# sbin/mr-jobhistory-daemon.sh start historyserver
初始化Oozie
(1)上传Oozie目录下的yarn.tar.gz文件到HDFS:
提示:yarn.tar.gz文件会自行解压
[root@hadoop105 oozie-4.0.0-cdh5.3.6]# bin/oozie-setup.sh sharelib create -fs hdfs://hadoop105:8020 -locallib oozie-sharelib-4.0.0-cdh5.3.6-yarn.tar.gz
执行成功之后,去50070检查对应目录有没有文件生成。
(2)创建oozie.sql文件
[root@hadoop105 oozie-4.0.0-cdh5.3.6]# bin/ooziedb.sh create -sqlfile oozie.sql -run
setting CATALINA_OPTS="$CATALINA_OPTS -Xmx1024m"
Validate DB Connection
DONE
Check DB schema does not exist
DONE
Check OOZIE_SYS table does not exist
DONE
Create SQL schema
DONE
Create OOZIE_SYS table
DONE
Oozie DB has been created for Oozie version '4.0.0-cdh5.3.6'
The SQL commands have been written to: oozie.sql
[root@hadoop105 oozie-4.0.0-cdh5.3.6]$ bin/oozie-setup.sh prepare-war
执行过程中:出错
bin/oozie-setup.sh: line 235: unzip: command not found
因为缺少依赖包,所以安装对应的依赖包即可。
yum -y install unzip
之后,由出现另一个错误:
Failed: creating new Oozie WAR
自己在百度查看说:
[root@hadoop105 oozie-4.0.0-cdh5.3.6]# yum install zip
打包项目,生成war包
[root@hadoop105 oozie-4.0.0-cdh5.3.6]# bin/oozie-setup.sh prepare-war
............
..............
#执行过程
New Oozie WAR file with added 'ExtJS library, JARs' at /usr/local/hadoop/module/oozie-4.0.0-cdh5.3.6/oozie-server/webapps/oozie.war
INFO: Oozie is ready to be started
查看进程(Bootstrap就是Oozie的进程)
[root@hadoop105 oozie-4.0.0-cdh5.3.6]# jps
84641 JobHistoryServer
85344 NameNode
88082 Bootstrap
7514 QuorumPeerMain
85580 SecondaryNameNode
88094 Jps
[root@hadoop105 oozie-4.0.0-cdh5.3.6]#
我们也可以查看进程日志
[root@hadoop105 oozie-4.0.0-cdh5.3.6]# jps -l
84641 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer
85344 org.apache.hadoop.hdfs.server.namenode.NameNode
88082 org.apache.catalina.startup.Bootstrap
7514 org.apache.zookeeper.server.quorum.QuorumPeerMain
85580 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
88159 sun.tools.jps.Jps