1. 下载CDH
wget http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.5.0.tar.gz
2. 设置环境变量
(1) 设置JAVA_HOME
-----------------------
$ cat ~/.bash_profile
#java settings
export PATH
export JAVA_HOME=/u01/app/softbase/jdk1.6.0_21
export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
(2)设置PROTOC
------------------------
>>> hadoop-2.0.0-cdh4.5.0 要求protoc 2.4.x版本
下载: https://github.com/google/protobuf/tree/v2.4.1
$wget https://github.com/google/protobuf/archive/v2.4.1.zip
To build and install the C++ Protocol Buffer runtime and the Protocol
Buffer compiler (protoc) execute the following:
$ autoreconf -f -i -Wall,no-obsolete <== 如果没有configure文件,则执行
$ ./configure --prefix=/u01/app/softbase/protoc2.4.1
$ make
$ make install
在~/.bash_profile中增加:
export HADOOP_PROTOC_CDH4_PATH=/u01/app/softbase/protoc2.4.1/bin/protoc
(3)编译 binary distribution with native code and without documentation:
[bruce@iRobot src]$ pwd
/home/bruce/hadoop-install/hadoop/src
编译native library,dist,跳过javadoc:
$ mvn package -Dmaven.javadoc.skip=true -Pdist,native -DskipTests -Dtar
native library在/home/bruce/hadoop-install/hadoop/src/hadoop-dist/target/hadoop-2.0.0-cdh4.5.0/lib/native下。
[bruce@iRobot hadoop]$ pwd
/home/bruce/hadoop-install/hadoop
[bruce@iRobot hadoop]$ ls -lrt
4
drwxr-xr-x 4 bruce oinstall 4096 111 2013 share
drwxr-xr-x 2 bruce oinstall 4096 111 2013 libexec
drwxr-xr-x 3 bruce oinstall 4096 111 2013 lib
drwxr-xr-x 2 bruce oinstall 4096 111 2013 include
drwxr-xr-x 3 bruce oinstall 4096 111 2013 examples-mapreduce1
drwxr-xr-x 2 bruce oinstall 4096 111 2013 bin-mapreduce1
drwxr-xr-x 4 bruce oinstall 4096 111 2013 examples
drwxr-xr-x 6 bruce oinstall 4096 111 2013 etc
drwxr-xr-x 3 bruce oinstall 4096 111 2013 cloudera
drwxr-xr-x 2 bruce oinstall 4096 117 10:49 bin
drwxr-xr-x 16 bruce oinstall 4096 117 13:49 src
-rw-r--r-- 1 bruce oinstall 5882 117 17:03 hellohadoop-1.0-SNAPSHOT.jar
drwxr-xr-x 3 bruce oinstall 4096 117 23:04 sbin
drwxr-xr-x 2 bruce oinstall 4096 119 14:49 logs
-rw-r--r-- 1 bruce oinstall 59 119 15:07 1.txt
[bruce@iRobot hadoop]$ cd src
[bruce@iRobot src]$ ls -lrt
6
drwxr-xr-x 2 bruce oinstall 4096 111 2013 dev-support
-rw-r--r-- 1 bruce oinstall 3975 111 2013 BUILDING.txt
-rw-r--r-- 1 bruce oinstall 13095 111 2013 pom.xml
drwxr-xr-x 10 bruce oinstall 4096 111 2013 hadoop-mapreduce1-project
drwxr-xr-x 2 bruce oinstall 4096 111 2013 build
drwxr-xr-x 4 bruce oinstall 4096 111 12:13 hadoop-project
drwxr-xr-x 4 bruce oinstall 4096 111 12:13 hadoop-assemblies
drwxr-xr-x 3 bruce oinstall 4096 111 12:13 hadoop-project-dist
drwxr-xr-x 9 bruce oinstall 4096 111 12:14 hadoop-common-project
drwxr-xr-x 6 bruce oinstall 4096 111 12:14 hadoop-hdfs-project
drwxr-xr-x 4 bruce oinstall 4096 111 12:15 hadoop-yarn-project
drwxr-xr-x 10 bruce oinstall 4096 111 12:15 hadoop-mapreduce-project
drwxr-xr-x 12 bruce oinstall 4096 111 12:15 hadoop-tools
drwxr-xr-x 3 bruce oinstall 4096 111 12:15 hadoop-dist
drwxr-xr-x 3 bruce oinstall 4096 111 12:16 hadoop-client
drwxr-xr-x 3 bruce oinstall 4096 111 12:16 hadoop-minicluster
[bruce@iRobot src]$ cd hadoop-dist/
[bruce@iRobot hadoop-dist]$ ls -lrt
2
-rw-r--r-- 1 bruce oinstall 7379 111 2013 pom.xml
drwxr-xr-x 6 bruce oinstall 4096 111 12:15 target
[bruce@iRobot hadoop-dist]$ cd target/
[bruce@iRobot target]$ ls -lrt
5584
drwxr-xr-x 2 bruce oinstall 4096 111 12:15 antrun
drwxr-xr-x 2 bruce oinstall 4096 111 12:15 test-dir
-rw-r--r-- 1 bruce oinstall 1590 111 12:15 dist-layout-stitching.sh
drwxr-xr-x 10 bruce oinstall 4096 111 12:15 hadoop-2.0.0-cdh4.5.0
drwxr-xr-x 2 bruce oinstall 4096 111 12:15 maven-archiver
-rw-r--r-- 1 bruce oinstall 2828 111 12:15 hadoop-dist-2.0.0-cdh4.5.0.jar
-rw-r--r-- 1 bruce oinstall 643 111 12:15 dist-tar-stitching.sh
-rw-r--r-- 1 bruce oinstall 77368435 111 12:16 hadoop-2.0.0-cdh4.5.0.tar.gz
[bruce@iRobot target]$ cd hadoop-2.0.0-cdh4.5.0
[bruce@iRobot hadoop-2.0.0-cdh4.5.0]$ ls -lrt
2
drwxr-xr-x 3 bruce oinstall 4096 111 12:15 lib
drwxr-xr-x 3 bruce oinstall 4096 111 12:15 etc
drwxr-xr-x 4 bruce oinstall 4096 111 12:15 share
drwxr-xr-x 2 bruce oinstall 4096 111 12:15 bin
drwxr-xr-x 2 bruce oinstall 4096 111 12:15 sbin
drwxr-xr-x 2 bruce oinstall 4096 111 12:15 libexec
drwxr-xr-x 3 bruce oinstall 4096 111 12:15 examples
drwxr-xr-x 2 bruce oinstall 4096 111 12:15 include
[bruce@iRobot hadoop-2.0.0-cdh4.5.0]$ cd lib
[bruce@iRobot lib]$ ls -lrt
drwxr-xr-x 2 bruce oinstall 4096 111 12:15 native
[bruce@iRobot lib]$ cd native/
[bruce@iRobot native]$ ls -lrt
576
-rwxr-xr-x 1 bruce oinstall 405409 111 12:15 libhadoop.so.1.0.0
lrwxrwxrwx 1 bruce oinstall 18 111 12:15 libhadoop.so -> libhadoop.so.1.0.0
-rw-r--r-- 1 bruce oinstall 721680 111 12:15 libhadoop.a
-rwxr-xr-x 1 bruce oinstall 180522 111 12:15 libhdfs.so.0.0.0
lrwxrwxrwx 1 bruce oinstall 16 111 12:15 libhdfs.so -> libhdfs.so.0.0.0
-rw-r--r-- 1 bruce oinstall 272316 111 12:15 libhdfs.a
-rw-r--r-- 1 bruce oinstall 582232 111 12:15 libhadooputils.a
-rw-r--r-- 1 bruce oinstall 1486812 111 12:15 libhadooppipes.a
[bruce@iRobot native]$
$ cp /home/bruce/hadoop-install/hadoop/src/hadoop-dist/target/hadoop-2.0.0-cdh4.5.0/lib/native/* $HADOOP_HOME/lib/native
将编译出来的新的lib/native替换到hadoop-2.0.0-cdh4.5.0.tar.gz中原来的lib/native,记得要修改$HADOOP_HOME/etc/hadoop/hadoop-env.sh,在最后加上;
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_HOME/lib/native"
(4)设置HADOOP 环境变量
------------------------
export HADOOP_HOME=/home/bruce/hadoop-install/hadoop
export HADOOP_PREFIX=$HADOOP_HOME
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export YARN_HOME=${HADOOP_MAPRED_HOME}
export YARN_CONF_DIR=${HADOOP_CONF_DIR}
export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
export CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib:$HADOOP_CONF_DIR:$HADOOP_HOME/lib/native
export PATH=$HADOOP_HOME/sbin:$PATH
3. 配置文件
假定在hadoop home目录下
[bruce@iRobot hadoop]$ pwd
/home/bruce/hadoop-install/hadoop
(0) hadoop-env.sh
[bruce@iRobot hadoop]$ cat etc/hadoop/hadoop-env.sh
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Set Hadoop-specific environment variables here.
# The only required environment variable is JAVA_HOME. All others are
# optional. When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.
# The java implementation to use.
#export JAVA_HOME=${JAVA_HOME}
#export HADOOP_PREFIX=/home/bruce/hadoop-install/hadoop
#export HADOOP_HOME=/home/bruce/hadoop-install/hadoop
#export JAVA_HOME=/u01/app/softbase/jdk1.7.0_21
export JAVA_HOME=/u01/app/softbase/jdk1.6.0_21
# The jsvc implementation to use. Jsvc is required to run secure datanodes.
#export JSVC_HOME=${JSVC_HOME}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}
# Extra Java CLASSPATH elements. Automatically insert capacity-scheduler.
for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do
if [ "$HADOOP_CLASSPATH" ]; then
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f
else
export HADOOP_CLASSPATH=$f
fi
done
# The maximum amount of heap to use, in MB. Default is 1000.
#export HADOOP_HEAPSIZE=
#export HADOOP_NAMENODE_INIT_HEAPSIZE=""
# Extra Java runtime options. Empty by default.
#export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.library.path=$HADOOP_HOME/lib:$HADOOP_HOME/lib/native"
# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_SECONDARYNAMENODE_OPTS"
# !!!! NO SUCH CONFIGURATIONS IN CDH HADOOP?
#export HADOOP_NFS3_OPTS="$HADOOP_NFS3_OPTS"
#export HADOOP_PORTMAP_OPTS="-Xmx512m $HADOOP_PORTMAP_OPTS"
# The following applies to multiple commands (fs, dfs, fsck, distcp etc)
export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"
#export HADOOP_CLIENT_OPTS="-Xmx1024m $HADOOP_CLIENT_OPTS"
#HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData $HADOOP_JAVA_PLATFORM_OPTS"
# On secure datanodes, user to run the datanode as after dropping privileges
export HADOOP_SECURE_DN_USER=${HADOOP_SECURE_DN_USER}
# Where log files are stored. $HADOOP_HOME/logs by default.
#export HADOOP_LOG_DIR=${HADOOP_LOG_DIR}/$USER
# Where log files are stored in the secure data environment.
export HADOOP_SECURE_DN_LOG_DIR=${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER}
# The directory where pid files are stored. /tmp by default.
# NOTE: this should be set to a directory that can only be written to by
# the user that will run the hadoop daemons. Otherwise there is the
# potential for a symlink attack.
export HADOOP_PID_DIR=${HADOOP_PID_DIR}
export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}
# A string representing this instance of hadoop. $USER by default.
export HADOOP_IDENT_STRING=$USER
[bruce@iRobot hadoop]$
(1) core-site.xml
[bruce@iRobot hadoop]$
[bruce@iRobot hadoop]$ cat etc/hadoop/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.100.200:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/u01/hadoopdata/hadoop-${user.name}/tmp</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>
</configuration>
[bruce@iRobot hadoop]$
(2) hdfs-site.xml
[bruce@iRobot hadoop]$ cat etc/hadoop/hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>192.168.100.200:50090</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///u01/hadoopdata/hadoop-${user.name}/dfs/name</value>
<final>true</final>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///u01/hadoopdata/hadoop-${user.name}/dfs/data</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:///u01/hadoopdata/hadoop-${user.name}/dfs/namesecondary</value>
<final>true</final>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<property>
<name>dfs.datanode.address</name>
<value>192.168.100.200:50010</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions.superusergroup</name>
<value>oinstall</value>
</property>
</configuration>
[bruce@iRobot hadoop]$
(3)mapred-site.xml
[bruce@iRobot hadoop]$ cat etc/hadoop/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.shuffle.port</name>
<value>13562</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>192.168.100.200:10020</value>
</property>
</configuration>
(4)yarn-site.xml
[bruce@iRobot hadoop]$ cat etc/hadoop/yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>192.168.100.200:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>192.168.100.200:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>192.168.100.200:8032</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>192.168.100.200:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>192.168.100.200:8088</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/u01/hadoopdata/hadoop-${user.name}/yarn/nmlocal</value>
</property>
<property>
<name>yarn.nodemanager.address</name>
<value>192.168.100.200:11000</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/u01/hadoopdata/hadoop-${user.name}/yarn/logs</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/u01/hadoopdata/hadoop-${user.name}/yarn/userlogs</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>iRobot</value>
</property>
<property>
<name>yarn.web-proxy.address</name>
<value>192.168.100.200:54315</value>
</property>
</configuration>
[bruce@iRobot hadoop]$
(5)capacity-scheduler.xml
默认。
(6)slave配置 (比较重要!!!!)
配置salve的主机名,每行一个.
比如, iRobot
[bruce@iRobot hadoop]$ cat etc/hadoop/slaves
iRobot
[bruce@iRobot hadoop]$
(7)配置log4j的级别
为了调试方便,可以修改 etc/hadoop/log4j.properties 中的日志输出级别:
比如,调整hadoop.root.logger
hadoop.root.logger=ALL,console
或者 单独某项输出level的级别
[bruce@iRobot hadoop]$ cat etc/hadoop/log4j.properties
# Copyright 2011 The Apache Software Foundation
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Define some default values that can be overridden by system properties
hadoop.root.logger=ALL,console
#hadoop.root.logger=INFO,console
hadoop.log.dir=.
hadoop.log.file=hadoop.log
# Define the root logger to the system property "hadoop.root.logger".
log4j.rootLogger=${hadoop.root.logger}, EventCounter
# Logging Threshold
log4j.threshold=ALL
# Null Appender
log4j.appender.NullAppender=org.apache.log4j.varia.NullAppender
#
# Rolling File Appender - cap space usage at 5gb.
#
hadoop.log.maxfilesize=256MB
hadoop.log.maxbackupindex=20
log4j.appender.RFA=org.apache.log4j.RollingFileAppender
log4j.appender.RFA.File=${hadoop.log.dir}/${hadoop.log.file}
log4j.appender.RFA.MaxFileSize=${hadoop.log.maxfilesize}
log4j.appender.RFA.MaxBackupIndex=${hadoop.log.maxbackupindex}
log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
# Pattern format: Date LogLevel LoggerName LogMessage
log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
# Debugging Pattern format
#log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
#
# Daily Rolling File Appender
#
log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender
log4j.appender.DRFA.File=${hadoop.log.dir}/${hadoop.log.file}
# Rollver at midnight
log4j.appender.DRFA.DatePattern=.yyyy-MM-dd
# 30-day backup
#log4j.appender.DRFA.MaxBackupIndex=30
log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout
# Pattern format: Date LogLevel LoggerName LogMessage
log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
# Debugging Pattern format
#log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
#
# console
# Add "console" to rootlogger above if you want to use this
#
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n
#
# TaskLog Appender
#
#Default values
hadoop.tasklog.taskid=null
hadoop.tasklog.iscleanup=false
hadoop.tasklog.noKeepSplits=4
hadoop.tasklog.totalLogFileSize=100
hadoop.tasklog.purgeLogSplits=true
hadoop.tasklog.logsRetainHours=12
log4j.appender.TLA=org.apache.hadoop.mapred.TaskLogAppender
log4j.appender.TLA.taskId=${hadoop.tasklog.taskid}
log4j.appender.TLA.isCleanup=${hadoop.tasklog.iscleanup}
log4j.appender.TLA.totalLogFileSize=${hadoop.tasklog.totalLogFileSize}
log4j.appender.TLA.layout=org.apache.log4j.PatternLayout
log4j.appender.TLA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
#
# HDFS block state change log from block manager
#
# Uncomment the following to suppress normal block state change
# messages from BlockManager in NameNode.
#log4j.logger.BlockStateChange=WARN
#
#Security appender
#
hadoop.security.logger=ALL,NullAppender
#hadoop.security.logger=INFO,NullAppender
hadoop.security.log.maxfilesize=256MB
hadoop.security.log.maxbackupindex=20
log4j.category.SecurityLogger=${hadoop.security.logger}
hadoop.security.log.file=SecurityAuth-${user.name}.audit
log4j.appender.RFAS=org.apache.log4j.RollingFileAppender
log4j.appender.RFAS.File=${hadoop.log.dir}/${hadoop.security.log.file}
log4j.appender.RFAS.layout=org.apache.log4j.PatternLayout
log4j.appender.RFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
log4j.appender.RFAS.MaxFileSize=${hadoop.security.log.maxfilesize}
log4j.appender.RFAS.MaxBackupIndex=${hadoop.security.log.maxbackupindex}
#
# Daily Rolling Security appender
#
log4j.appender.DRFAS=org.apache.log4j.DailyRollingFileAppender
log4j.appender.DRFAS.File=${hadoop.log.dir}/${hadoop.security.log.file}
log4j.appender.DRFAS.layout=org.apache.log4j.PatternLayout
log4j.appender.DRFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
log4j.appender.DRFAS.DatePattern=.yyyy-MM-dd
#
# hdfs audit logging
#
hdfs.audit.logger=ALL,NullAppender
#hdfs.audit.logger=INFO,NullAppender
hdfs.audit.log.maxfilesize=256MB
hdfs.audit.log.maxbackupindex=20
log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=${hdfs.audit.logger}
log4j.additivity.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=false
log4j.appender.RFAAUDIT=org.apache.log4j.RollingFileAppender
log4j.appender.RFAAUDIT.File=${hadoop.log.dir}/hdfs-audit.log
log4j.appender.RFAAUDIT.layout=org.apache.log4j.PatternLayout
log4j.appender.RFAAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
log4j.appender.RFAAUDIT.MaxFileSize=${hdfs.audit.log.maxfilesize}
log4j.appender.RFAAUDIT.MaxBackupIndex=${hdfs.audit.log.maxbackupindex}
#
# mapred audit logging
#
mapred.audit.logger=ALL,NullAppender
#mapred.audit.logger=INFO,NullAppender
mapred.audit.log.maxfilesize=256MB
mapred.audit.log.maxbackupindex=20
log4j.logger.org.apache.hadoop.mapred.AuditLogger=${mapred.audit.logger}
log4j.additivity.org.apache.hadoop.mapred.AuditLogger=false
log4j.appender.MRAUDIT=org.apache.log4j.RollingFileAppender
log4j.appender.MRAUDIT.File=${hadoop.log.dir}/mapred-audit.log
log4j.appender.MRAUDIT.layout=org.apache.log4j.PatternLayout
log4j.appender.MRAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
log4j.appender.MRAUDIT.MaxFileSize=${mapred.audit.log.maxfilesize}
log4j.appender.MRAUDIT.MaxBackupIndex=${mapred.audit.log.maxbackupindex}
# Custom Logging levels
log4j.logger.org.apache.hadoop.mapred.JobTracker=DEBUG
log4j.logger.org.apache.hadoop.mapred.TaskTracker=DEBUG
log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=DEBUG
# Jets3t library
log4j.logger.org.jets3t.service.impl.rest.httpclient.RestS3Service=ERROR
#
# Event Counter Appender
# Sends counts of logging messages at different severity levels to Hadoop Metrics.
#
log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter
#
# Job Summary Appender
#
# Use following logger to send summary to separate file defined by
# hadoop.mapreduce.jobsummary.log.file :
# hadoop.mapreduce.jobsummary.logger=INFO,JSA
#
hadoop.mapreduce.jobsummary.logger=${hadoop.root.logger}
hadoop.mapreduce.jobsummary.log.file=hadoop-mapreduce.jobsummary.log
hadoop.mapreduce.jobsummary.log.maxfilesize=256MB
hadoop.mapreduce.jobsummary.log.maxbackupindex=20
log4j.appender.JSA=org.apache.log4j.RollingFileAppender
log4j.appender.JSA.File=${hadoop.log.dir}/${hadoop.mapreduce.jobsummary.log.file}
log4j.appender.JSA.MaxFileSize=${hadoop.mapreduce.jobsummary.log.maxfilesize}
log4j.appender.JSA.MaxBackupIndex=${hadoop.mapreduce.jobsummary.log.maxbackupindex}
log4j.appender.JSA.layout=org.apache.log4j.PatternLayout
log4j.appender.JSA.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n
log4j.logger.org.apache.hadoop.mapred.JobInProgress$JobSummary=${hadoop.mapreduce.jobsummary.logger}
log4j.additivity.org.apache.hadoop.mapred.JobInProgress$JobSummary=false
#
# Yarn ResourceManager Application Summary Log
#
# Set the ResourceManager summary log filename
yarn.server.resourcemanager.appsummary.log.file=rm-appsummary.log
# Set the ResourceManager summary log level and appender
#yarn.server.resourcemanager.appsummary.logger=INFO,RMSUMMARY
yarn.server.resourcemanager.appsummary.logger=ALL,RMSUMMARY
# Appender for ResourceManager Application Summary Log
# Requires the following properties to be set
# - hadoop.log.dir (Hadoop Log directory)
# - yarn.server.resourcemanager.appsummary.log.file (resource manager app summary log filename)
# - yarn.server.resourcemanager.appsummary.logger (resource manager app summary log level and appender)
#log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=${yarn.server.resourcemanager.appsummary.logger}
#log4j.additivity.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=false
#log4j.appender.RMSUMMARY=org.apache.log4j.RollingFileAppender
#log4j.appender.RMSUMMARY.File=${hadoop.log.dir}/${yarn.server.resourcemanager.appsummary.log.file}
#log4j.appender.RMSUMMARY.MaxFileSize=256MB
#log4j.appender.RMSUMMARY.MaxBackupIndex=20
#log4j.appender.RMSUMMARY.layout=org.apache.log4j.PatternLayout
#log4j.appender.RMSUMMARY.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
[bruce@iRobot hadoop]$
4. 运行cluster中的hadoop
前提cluster中的机器的ssh登录免密码设置
--------------------------------------
以下为单机上的登录免密码设置。如果是多机,自行搜索中。。。
$ ssh localhost
如果需要密码,则执行:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
启动Hadoop
--------------
To start a Hadoop cluster you will need to start both the HDFS and YARN cluster.
以下文字中的命令开头的[]的含义说明:
[hdfs]: 表示在 hdfs节点上执行
[yarn]: 表示在 yarn节点上执行
[mapred]:表示在 mapreduce节点上执行
(0)要启动HDFS,首先要执行格式化文件系统
The first time you bring up HDFS, it must be formatted. Format a new distributed filesystem as hdfs:
[hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format <cluster_name>
(1)启动HDFS
>>>>>>>启动HDFS(方式一): 分步骤执行
(a)启动NameNode
Start the HDFS NameNode with the following command on the designated node as hdfs:
[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode
(b)启动DataNode
Start a HDFS DataNode with the following command on each designated node as hdfs:
[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs start datanode
(c)启动SecondaryNameNode
[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs start secondarynamenode
>>>>>>>启动HDFS(方式二):与上面步骤等效。 推荐使用该方式!!!
If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the HDFS processes can be started with a utility script. As hdfs
[hdfs]$ $HADOOP_PREFIX/sbin/start-dfs.sh
(2)启动YARN
>>>>>>>启动YARN(方式一): 分步骤执行
Start the YARN with the following command, run on the designated ResourceManager as yarn:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager
Run a script to start a NodeManager on each designated host as yarn:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start nodemanager
Start a standalone WebAppProxy server. Run on the WebAppProxy server as yarn. If multiple servers are used with load balancing it should be run on each of them:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start proxyserver
>>>>>>>启动YARN(方式二):与上面步骤等效。 推荐使用该方式!!!
If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the YARN processes can be started with a utility script. As yarn:
[yarn]$ $HADOOP_PREFIX/sbin/start-yarn.sh
(可选)
>>>>>>>>>>>>>>>>>>
Start the MapReduce JobHistory Server with the following command, run on the designated server as mapred:
[mapred]$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh start historyserver --config $HADOOP_CONF_DIR
运行job
-----------------
Make the HDFS directories required to execute MapReduce jobs:
先创建HDFS的目录:
$ $HADOOP_PREFIX/bin/hdfs dfs -mkdir /user
$ $HADOOP_PREFIX/bin/hdfs dfs -mkdir /user/<username>
具体执行过程如下:
$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/bruce
$ bin/hdfs dfs -ls .
$ bin/hdfs dfs -put 1.txt <= 上传text文件,用作统计
$ bin/hdfs dfs -ls .
Found 1 items
-rw-r--r-- 1 bruce oinstall 60 2015-11-18 18:01 1.txt
跑1个 job:统计某个文本的字数
Run some of the examples provided:
$ $HADOOP_PREFIX/bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.5.0.jar wordcount 1.txt 1.sort
跑1个 job:统计某个目录下的文件中的字数
Delete existing directory
$ $HADOOP_PREFIX/bin/hdfs dfs -rm -f -r input
$ $HADOOP_PREFIX/bin/hdfs dfs -rm -f -r output
Copy the input files into the distributed filesystem:
$ $HADOOP_PREFIX/bin/hdfs dfs -put etc/hadoop input
$ $HADOOP_PREFIX/bin/hadoop jar /home/bruce/hadoop-install/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.5.0.jar grep input output 'dfs[a-z.]+'
View the output files on the distributed filesystem:
$ $HADOOP_PREFIX/bin/hdfs dfs -cat output/*
跑1个 job: pi的计算
$ $HADOOP_PREFIX/bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.5.0.jar pi 10 100
跑1个job: 自己写的wordcount 任务
$ $HADOOP_PREFIX/bin/hadoop jar hellohadoop-1.0-SNAPSHOT.jar WordCount 1.txt 1.sort
//
Hadoop 2.x中YARN系统中运行wordcount例子的 输出日志
(1)服务日志包括ResourceManager日志和各个NodeManager日志
ResourceManager日志:存放位置是Hadoop安装目录下的logs目录下的yarn-*-resourcemanager-*.log
NodeManager日志:存放位置是各个NodeManager节点上hadoop安装目录下的logs目录下的yarn-*-nodemanager-*.log
(2)应用程序日志包括jobhistory日志和Container日志
jobhistory日志:是应用程序运行日志,包括应用程序启动时间、结束时间,每个任务的启动时间、结束时间,各种counter信息等。
Container日志: 包含ApplicationMaster日志和普通Task日志,它们均存放在yarn.nodemanager.log-dirs配置的目录中的application_xxx目录下.
(a) ApplicationMaster日志目录名称为container_xxx_000001。
(b) 普通task日志目录名称则为container_xxx_000002,container_xxx_000003,…。每个目录下包含三个日志文件:stdout、stderr和syslog。
stdout是通过标准输出打印出来的日志,比如system.out.println。
程序中通过标准输出打印的日志不直接显示在终端上,而是保存在stdout文件中。
syslog是通过log4j打印的日志。
------------------------------------------------------------------------------
停止Hadoop
-----------------
(1)停止HDFS(方式一): 分步骤执行
Stop the NameNode with the following command, run on the designated NameNode as hdfs:
[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop namenode
Run a script to stop a DataNode as hdfs:
[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs stop datanode
>>>>>>>停止HDFS(方式二):与上面步骤等效. 推荐使用该方式!!!
If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the HDFS processes may be stopped with a utility script. As hdfs:
[hdfs]$ $HADOOP_PREFIX/sbin/stop-dfs.sh
(2)停止YARN(方式一): 分步骤执行
Stop the ResourceManager with the following command, run on the designated ResourceManager as yarn:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop resourcemanager
Run a script to stop a NodeManager on a slave as yarn:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemons.sh --config $HADOOP_CONF_DIR stop nodemanager
Stop the WebAppProxy server. Run on the WebAppProxy server as yarn. If multiple servers are used with load balancing it should be run on each of them:
[yarn]$ $HADOOP_YARN_HOME/bin/yarn stop proxyserver --config $HADOOP_CONF_DIR
>>>>>>>停止YARN(方式二):与上面步骤等效。 推荐使用该方式!!!
If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the YARN processes can be stopped with a utility script. As yarn:
[yarn]$ $HADOOP_PREFIX/sbin/stop-yarn.sh
(可选)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Stop the MapReduce JobHistory Server with the following command, run on the designated server as mapred:
[mapred]$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop historyserver --config $HADOOP_CONF_DIR
5. Web查看
Once the Hadoop cluster is up and running check the web-ui of the components as described below:
NameNode http://nn_host:port/ Default HTTP port is 50070.
ResourceManager http://rm_host:port/ Default HTTP port is 8088.
MapReduce JobHistory Server http://jhs_host:port/ Default HTTP port is 19888.
比如:
name node: http://192.168.100.200:50070
resource manager: http://192.168.100.200:8088
jobhistory server: http://192.168.100.200:19888
6.参考:
1. http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.5.0/hadoop-project-dist/hadoop-common/SingleCluster.html
2. http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html
3. http://blog.csdn.net/jollypigclub/article/details/45914317
4. http://zh.hortonworks.com/blog/introducing-apache-hadoop-yarn/
5. http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop-yarn/
6. http://blog.csdn.net/bigdatahappy/article/details/14226235
Hadoop系列一:Hadoop CDH4 编译安装介绍
最新推荐文章于 2020-06-18 18:10:45 发布