系统环境准备
关闭selinux
vim/etc/selinux/config
SELINUX=disabled
setenforce 0 临时关闭Selininux
修改主机名
centos6系列
vim/etc/sysconfig/network
HOSTNAME=master
centos7系列
vim /etc/hostname
nmcli general hostname n1
systemctl restart systemd-hostnamed
重启
修改主从节点的
/etc/hosts
127.0.0.1 localhost
10.3.9.144 n1
10.3.13.208 master
创建Hadoop组和用户
groupadd hadoop
useradd -g hadoop hadoop -d /home/hadoop
设置hadoop用户ssh无密码登录
vim /etc/ssh/sshd_config
开启以下内容
HostKey /etc/ssh/ssh_host_rsa_key
RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys
重启ssh服务
service sshd restart centos6命令
systemctl restart sshd centos7命令
mkdir /home/hadoop/.ssh
su hadoop
ssh-keygen -t rsa 按三次回车
生成 id_rsa id_rsa.pub
cp id_rsa.pub /home/hadoop/.shh/authorized_keys
su root
(3)权限设置
chown -R hadoop:hadoop /home/hadoop
chmod 700 /home/hadoop
chmod 700 /home/hadoop/.ssh
chmod 644 /home/hadoop/.ssh/authorized_keys //公钥文件的所有权限
chmod 600 /home/hadoop/.ssh/id_rsa //私钥文件的所有权限
cd /home/hadoop/.ssh
scp id_rsa hadoop@n1: /home/hadoop/.ssh/authorized_keys
双向操作设置成功
JDK准备
http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
Accept License Agreement
Linux x64 173.04 MB jdk-8u111-linux-x64.tar.gz
创建jdk安装目录
mkdir /usr/java
tar -xvf jdk-8u112-linux-x64.tar.gz -C /usr/java
vim /etc/profile.d/java
export JAVA_HOME=/usr/java/jdk1.8.0_112
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
Hadoop环境变量设置
vim /etc/hadoop/目录Hadoop的配置路径
export HADOOP_PREFIX="/bdapps/hadoop"
export PATH=$PATH:{$HADOOP_PREFIX/bin:$HADOOP_PREFIX}/sbin
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_YARN_HOME=${HADOOP_PREFIX}
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib:$HADOOP_PREFIX/lib/native"
创建数据目录和安装目录
mkdir /bdapps 创建Hadoop安装目录
mkdir /data/hadoop/hdfs/{nn,da,snn}创建namenode、datanode、SecondaryNameNode数据存储位置
下载Hadoop tar包
https://dist.apache.org/repos/dist/release/hadoop/common/
tar -xvf hadoop-2.6.5.tar.gz -C /bdapps/
cd /bddatas/
ln -sv hadoop-2.6.5 hadoop
在安装目录下设置日志目录
cd hadoop mkdir logs
设置Hadoop用户属组权限
chow hadoop:hadoop -R /bdapps
chow hadoop:hadoop -R /data
编译配置文件
core-site.xml
配置 HDFS集群的namenode主机的 IP 和监听端口 默认8020
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:8020</value>
<final>ture</final>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
</configuration>
hdfs-site.xml配置hdfs副本的数量默认为 3, NN和DN和snn存储数据的路径。
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///data/hadoop/hdfs/nn</value>
</property>
<property>
<name>dfs.datanode.data.dir </name>
<value>file:///data/hadoop/hdfs/dn </value>
</property>
<property>
<name>dfs.checkpoint.dir </name>
<value>file///data/hadoop/hdfs/snn </value>
</property>
<property>
<name>dfs.checkpoint.edits.dir </name>
<value>file///data/hadoop/hdfs/snn </value>
</property>
<property>
<name>dfs.permissions</name>
<value>false </value>
</property>
</configuration>
mapred-site.xml配置集群的MapReduce framwork(框架)运行的方式,在yarn之上。
<configuration>
<property>
<name>mapreduce.framework.name </name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml配置yarn进程的相关属性 IP和端口默认8032 调度器端口8030
<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>3072</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>2</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>256</value>
</property>
</configuration>
配置slave节点信息
vim slaves
n1
看到/data/hadoop/hdfs/nn has been successfully formatted. 表示格式化成功
hadoop version检查Hadoop是否安装成功
Hadoop 2.6.4
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 5082c73637530b0b7e115f9625ed7fac69f937e6
Compiled by jenkins on 2016-02-12T09:45Z
Compiled with protoc 2.5.0
From source with checksum 8dee2286ecdbbbc930a6c87b65cbc010
This command was run using /bdapps/hadoop-2.6.4/share/hadoop/common/hadoop-common-2.6.4.jar
CID-83a22e8d-7e0a-4f3d-ab1e-e3d6c1a035fe
检查Hadoop是否为64位数
cd /bdapps/hadoop/lib/native
file libhadoop.so.1.0.0
显示
libhadoop.so.1.0.0: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, not stripped
su到hadoop用户
hadoop namenode -format格式化namenode 注意尽量进程起来后最好不要格式化,会导致clusterID不一致。服务无法启动。
salve节点设置
scp -r /bdapps/hadoop n1: /bdapps 将Hadoop安装目录复制到从节点
scp -r /data n1: /data 将Hadoop数据目录复制到从节点
scp -r /usr/java n1: /usr 将Java目录复制的从节点
scp -r /etc/profile.d/java.sh n1: /etc/profile.d/ 复制Java环境变量文件
scp -r /etc/profile.d/hadoop.sh n1: /etc/profile.d/复制Hadoop环境变量文件
su到
su到hadoop
start-yarn.sh 启动yarn集群
start-dfs.sh 启动Hdfs集群
hadoop-daemon.sh start namenode (启动namenoode节点)
hadoop-daemon.sh start datanode (启动数据节点)
hadoop-daemon.sh start secondarynamenode(启动辅助名称节点)
yarn-daemon.sh start resourcemanager (启动yarn资源管理器)
yarn-daemon.sh start nodemanager (启动节点管理器)运行在从节点
Jps查看进程
192.168.56.128:50070(hdfs集群管理接口,可以使用任意主机打开)
127.0.0.1:8088(yarn集群管理接口)仅可以使用本机浏览器打开
hdfs集群datenood节点的数据的 / 根存储在Linux系统/data/hadoop/hdfs/dn
hdfs dfs -mkdir /test 在hdfs集群中创建目录
hdfs dfs -put /a.txt /test/a.txt 上传一个文件到hdfs集群
hdfs dfs -ls / 查看hdfs集群/下的目录
hdfs dfs -ls -R/ 递归查看hdfs集群的目录
hdfs dfs -rmr / 删除目录
yarn jar /bdapps/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar wordcount /test/fstab /test/fstab.out
yarn jar /bdapps/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar pi 90 100
hdfs -cat /test/a.txt.out/part-r-00000
hadoop dfsadmin -safemode leave 离开datanode安全模式
关闭selinux
vim/etc/selinux/config
SELINUX=disabled
setenforce 0 临时关闭Selininux
修改主机名
centos6系列
vim/etc/sysconfig/network
HOSTNAME=master
centos7系列
vim /etc/hostname
nmcli general hostname n1
systemctl restart systemd-hostnamed
重启
修改主从节点的
/etc/hosts
127.0.0.1 localhost
10.3.9.144 n1
10.3.13.208 master
创建Hadoop组和用户
groupadd hadoop
useradd -g hadoop hadoop -d /home/hadoop
设置hadoop用户ssh无密码登录
vim /etc/ssh/sshd_config
开启以下内容
HostKey /etc/ssh/ssh_host_rsa_key
RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys
重启ssh服务
service sshd restart centos6命令
systemctl restart sshd centos7命令
mkdir /home/hadoop/.ssh
su hadoop
ssh-keygen -t rsa 按三次回车
生成 id_rsa id_rsa.pub
cp id_rsa.pub /home/hadoop/.shh/authorized_keys
su root
(3)权限设置
chown -R hadoop:hadoop /home/hadoop
chmod 700 /home/hadoop
chmod 700 /home/hadoop/.ssh
chmod 644 /home/hadoop/.ssh/authorized_keys //公钥文件的所有权限
chmod 600 /home/hadoop/.ssh/id_rsa //私钥文件的所有权限
cd /home/hadoop/.ssh
scp id_rsa hadoop@n1: /home/hadoop/.ssh/authorized_keys
双向操作设置成功
JDK准备
http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
Accept License Agreement
Linux x64 173.04 MB jdk-8u111-linux-x64.tar.gz
创建jdk安装目录
mkdir /usr/java
tar -xvf jdk-8u112-linux-x64.tar.gz -C /usr/java
vim /etc/profile.d/java
export JAVA_HOME=/usr/java/jdk1.8.0_112
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
Hadoop环境变量设置
vim /etc/hadoop/目录Hadoop的配置路径
export HADOOP_PREFIX="/bdapps/hadoop"
export PATH=$PATH:{$HADOOP_PREFIX/bin:$HADOOP_PREFIX}/sbin
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_YARN_HOME=${HADOOP_PREFIX}
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib:$HADOOP_PREFIX/lib/native"
创建数据目录和安装目录
mkdir /bdapps 创建Hadoop安装目录
mkdir /data/hadoop/hdfs/{nn,da,snn}创建namenode、datanode、SecondaryNameNode数据存储位置
下载Hadoop tar包
https://dist.apache.org/repos/dist/release/hadoop/common/
tar -xvf hadoop-2.6.5.tar.gz -C /bdapps/
cd /bddatas/
ln -sv hadoop-2.6.5 hadoop
在安装目录下设置日志目录
cd hadoop mkdir logs
设置Hadoop用户属组权限
chow hadoop:hadoop -R /bdapps
chow hadoop:hadoop -R /data
编译配置文件
core-site.xml
配置 HDFS集群的namenode主机的 IP 和监听端口 默认8020
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:8020</value>
<final>ture</final>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
</configuration>
hdfs-site.xml配置hdfs副本的数量默认为 3, NN和DN和snn存储数据的路径。
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///data/hadoop/hdfs/nn</value>
</property>
<property>
<name>dfs.datanode.data.dir </name>
<value>file:///data/hadoop/hdfs/dn </value>
</property>
<property>
<name>dfs.checkpoint.dir </name>
<value>file///data/hadoop/hdfs/snn </value>
</property>
<property>
<name>dfs.checkpoint.edits.dir </name>
<value>file///data/hadoop/hdfs/snn </value>
</property>
<property>
<name>dfs.permissions</name>
<value>false </value>
</property>
</configuration>
mapred-site.xml配置集群的MapReduce framwork(框架)运行的方式,在yarn之上。
<configuration>
<property>
<name>mapreduce.framework.name </name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml配置yarn进程的相关属性 IP和端口默认8032 调度器端口8030
<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>3072</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>2</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>256</value>
</property>
</configuration>
配置slave节点信息
vim slaves
n1
看到/data/hadoop/hdfs/nn has been successfully formatted. 表示格式化成功
hadoop version检查Hadoop是否安装成功
Hadoop 2.6.4
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 5082c73637530b0b7e115f9625ed7fac69f937e6
Compiled by jenkins on 2016-02-12T09:45Z
Compiled with protoc 2.5.0
From source with checksum 8dee2286ecdbbbc930a6c87b65cbc010
This command was run using /bdapps/hadoop-2.6.4/share/hadoop/common/hadoop-common-2.6.4.jar
CID-83a22e8d-7e0a-4f3d-ab1e-e3d6c1a035fe
检查Hadoop是否为64位数
cd /bdapps/hadoop/lib/native
file libhadoop.so.1.0.0
显示
libhadoop.so.1.0.0: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, not stripped
su到hadoop用户
hadoop namenode -format格式化namenode 注意尽量进程起来后最好不要格式化,会导致clusterID不一致。服务无法启动。
salve节点设置
scp -r /bdapps/hadoop n1: /bdapps 将Hadoop安装目录复制到从节点
scp -r /data n1: /data 将Hadoop数据目录复制到从节点
scp -r /usr/java n1: /usr 将Java目录复制的从节点
scp -r /etc/profile.d/java.sh n1: /etc/profile.d/ 复制Java环境变量文件
scp -r /etc/profile.d/hadoop.sh n1: /etc/profile.d/复制Hadoop环境变量文件
su到
su到hadoop
start-yarn.sh 启动yarn集群
start-dfs.sh 启动Hdfs集群
hadoop-daemon.sh start namenode (启动namenoode节点)
hadoop-daemon.sh start datanode (启动数据节点)
hadoop-daemon.sh start secondarynamenode(启动辅助名称节点)
yarn-daemon.sh start resourcemanager (启动yarn资源管理器)
yarn-daemon.sh start nodemanager (启动节点管理器)运行在从节点
Jps查看进程
192.168.56.128:50070(hdfs集群管理接口,可以使用任意主机打开)
127.0.0.1:8088(yarn集群管理接口)仅可以使用本机浏览器打开
hdfs集群datenood节点的数据的 / 根存储在Linux系统/data/hadoop/hdfs/dn
hdfs dfs -mkdir /test 在hdfs集群中创建目录
hdfs dfs -put /a.txt /test/a.txt 上传一个文件到hdfs集群
hdfs dfs -ls / 查看hdfs集群/下的目录
hdfs dfs -ls -R/ 递归查看hdfs集群的目录
hdfs dfs -rmr / 删除目录
yarn jar /bdapps/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar wordcount /test/fstab /test/fstab.out
yarn jar /bdapps/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar pi 90 100
hdfs -cat /test/a.txt.out/part-r-00000
hadoop dfsadmin -safemode leave 离开datanode安全模式