hadoop
hdfs集群(负责文件读写)
yam集群(负责为mapreduce分配硬件资源)
name node 默认端口9000(客户端)
resource manage (管理干活的)
data node(node manage)(干活的)
准备4台虚拟机 最少每台1GB内存,推荐2GB
sudo vi /etc/hosts修改所有虚拟机hosts,将之前系统默认的全部注释掉
192.168.1.104 mini1
192.168.1.107 mini2
192.168.1.103 mini3
192.168.1.106 mini4
修改每个虚拟机名称
打开192.168.1.104 命令行
hostname mini1
打开192.168.1.107 命令行
hostname mini2
打开192.168.1.103 命令行
hostname mini3
打开192.168.1.106 命令行
hostname mini4
给四台虚拟机
useradd hadoop
passwd hadoop
自行安装JDK
配置权限
先切换回去
su
vi /etc/sudoers
root ALL=(ALL) ALL这句话下面加入
hadoop ALL=(ALL) ALL
:wq!
发送给其他虚拟机
scp /etc/sudoers mini2:/etc/
或者手动修改
防火墙自行关闭
systemctl stop firewalld.service #停止firewall
systemctl disable firewalld.service #禁止firewall开机启动
安装免密登录
http://archive.apache.org/dist/hadoop/common/hadoop-2.6.4/hadoop-2.6.4.tar.gz下载地址
在hadoop用户下新建/home/hadoop/app文件夹
mkdir /home/hadoop/app
上传hadoop-2.6.4.tar
解压缩
tar -zxvf hadoop-2.6.4.tar.gz -C /home/hadoop/app/
打开mini1
cd /home/hadoop/app/hadoop-2.6.4/etc/hadoop/
配置
echo $JAVA_HOME
/usr/local/jdk/jdk1.8.0_11
vi hadoop-env.sh
export JAVA_HOME=/usr/local/jdk/jdk1.8.0_11
:wq
配置
core-site.xml(公共)
hdfs-site.xml(hdfs)
mpared-site.xml(mapReduce)
yarn-site.xml(yarn)
vi core-site.xml
在configuratio中加入
<property>
<name>fs.defaultFS</name>
<value>hdfs://mini1:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hdpdata</value>
</property>
(第一个是文件服务器路径,第二个是文件地址)
:wq
vi hdfs-site.xml
在configuratio中加入
<property>
<name>dfs.replication</name>
<value>2</value>
<description>文件保留两份,默认为3</description>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
<description>去掉权限,小心误删文件</description>
</property>
vi mapred-site.xml.template
在configuratio中加入
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
(mapreduce提交后就会交给yarn去跑,如果不填,默认local,则会在本机上单机模拟跑一下,就不会分布式运行)
mv mapred-site.xml.template mapred-site.xml
vi yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>mini1</value>
</property>
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<description>The http address of the RM web application.</description>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
<description>The https adddress of the RM web application.</description>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>${yarn.resourcemanager.hostname}:8090</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<description>The address of the RM admin interface.</description>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2048</value>
<discription>每个节点可用内存,单位MB,默认8182MB</discription>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<discription>这个的意思是忽略虚拟内存的检查,如果你是安装在虚拟机上,这个配置很有用,配上去之后后续操作不容易出问题。如果是实体机上,并且内存够多,可以将这个配置去掉</discription>
</property>
(提供一个辅助服务)
分发到所有机器上去
cd /home/hadoop/
scp -r app mini2:/home/hadoop/
scp -r app mini3:/home/hadoop/
scp -r app mini4:/home/hadoop/
sudo vi /etc/profile
添加
export JAVA_HOME=/usr/local/jdk/jdk1.8.0_11
export HADOOP_HOME=/home/hadoop/app/hadoop-2.6.4
export PATH=
PATH:
P
A
T
H
:
JAVA_HOME/bin:
HADOOPHOME/bin:
H
A
D
O
O
P
H
O
M
E
/
b
i
n
:
HADOOP_HOME/sbin
sudo scp /etc/profile 192.168.1.107:/etc/
sudo scp /etc/profile 192.168.1.103:/etc/
sudo scp /etc/profile 192.168.1.106:/etc/
先把hadoop中hdfs格式化(生成相应数据目录)
source /etc/profile
cd
hadoop namenode -format
ll /home/hadoop/hdpdata/dfs/name/current
cd /home/hadoop/app/hadoop-2.6.4/sbin
hadoop-daemon.sh start namenode
jps
浏览器访问(namenode页面)
192.168.1.104:50070
在mini2上面输入命令
source /etc/profile
hadoop-daemon.sh start datanode
jps
在mini3上面输入命令
source /etc/profile
hadoop-daemon.sh start datanode
jps
在mini4上面输入命令
source /etc/profile
hadoop-daemon.sh start datanode
jps
查看 192.168.1.104:50070
如果有错误查看 cd /home/hadoop/app/hadoop-2.6.4/logs日志文件
以脚本启动datanode
先关闭各个节点datanode
hadoop-daemon.sh stop datanode
打开mini1
hadoop-daemon.sh stop namenode
cd /home/hadoop/app/hadoop-2.6.4/etc/hadoop
vi slaves
去掉原来的localhost
加入
mini2
mini3
mini4
:wq
免密登录配置
ssh-keygen
三个回车
ssh-copy-id mini1
ssh-copy-id mini2
ssh-copy-id mini3
ssh-copy-id mini4
输入三次密码hadoop
测试是否正常
ssh mini4
cd /home/hadoop/app/hadoop-2.6.4/sbin
stop-all.sh
stop-dfs.sh
jps
start-all.sh