分布式hadoop-2.6.0
机器名 | IP | 作用 | linux系统 |
master | 192.168.218.133 | Hadoop的Namenode节点 | CentOS-6.9-x86_64-bin-DVD1.iso |
slave1 | 192.168.218.135 | Hadoop的Datanode节点 | CentOS-6.9-x86_64-bin-DVD1.iso |
slave2 | 192.168.218.134 | Hadoop的Datanode节点 | CentOS-6.9-x86_64-bin-DVD1.iso |
Vmware版本:VMware-workstation-full-14.1.2-8497320.exe
Linux版本:CentOS-6.9-x86_64-bin-DVD1.iso
Jdk版本:jdk-8u171-linux-x64.tar.gz
hadoop版本:hadoop-2.6.0.tar.gz
centos设置默认启动命令行,不启动图形界面
sudo vim /etc/inittab
将5改为3
sudo reboot -h now
设置用户mwt可以执行sudo
sudo vim /etc/sudoers
mwt ALL=(ALL) ALL
创建用户hadoop并设置密码也为hadoop
sudo useradd hadoop
su
echo ‘hadoop’ | passed --stdin hadoop
配置java环境变量
vim ~/.bash_profile
source ~/.bash_profile
java -version
export JAVA_HOME = /home/hadoop/work/jdk1.8.0_171
export JRE_HOME=${JAVA_HOME}/bin
export CLASS_PATH=.:${JDK_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH:$HOME/lib
关闭防火墙
查看防火墙状态service iptables status
关闭防火墙service iptables stop
查看防火墙是否自启动chkconfig --list iptables
关闭防火墙自启动chkconfig iptables off
修改三台机器上的ip
vim /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE="eth0"
BOOTPROTO="static"
HWADDR="00:0C:29:07:34:04"
IPV6INIT="yes"
NM_CONTROLLED="yes"
ONBOOT="yes"
TYPE="Ethernet"
UUID="a4f0bcb9-b098-45f6-a959-413654d1510e"
IPADDR=192.168.218.133
NETMASK=255.255.255.0
GATEWAY=192.168.218.1
在虚拟机上克隆出另外两台机器
机器名 | IP | 作用 | linux系统 |
master | 192.168.218.133 | Hadoop的Namenode节点 | CentOS-6.9-x86_64-bin-DVD1.iso |
slave1 | 192.168.218.135 | Hadoop的Datanode节点 | CentOS-6.9-x86_64-bin-DVD1.iso |
slave2 | 192.168.218.134 | Hadoop的Datanode节点 | CentOS-6.9-x86_64-bin-DVD1.iso |
修改三台机器上的hostname
sudo vim /etc/sysconfig/network
使hostname立刻生效 sudo hostname master
每台机器的主机名应该都不一样
修改三台机器上的hosts
sudo vim /etc/hosts
192.168.218.133 master
192.168.218.135 slave1
192.168.218.134 slave2
建立ssh信任
未建立ssh信任之前键入ssh 192.168.218.135 date后需要手动输入密码
建立ssh信任目的是为了实现ssh免密访问
生成公钥ssh-keygen -t rsa
一路enter,不需要任何输入
传输公钥到所有服务器
ssh-copy-id -i ~/.ssh/id_rsa.pub slave1
ssh-copy-id -i ~/.ssh/id_rsa.pub master
ssh-copy-id -i ~/.ssh/id_rsa.pub slave2
注意:一定要传到自身,也即master除了要传到slave1和slave2外,还要传到master,其他节点的公钥也如此。每台机器都需要建立ssh信任
此时键入ssh 192.168.218.135 date后就不需要输入密码验证了
配置hadoop环境
解压缩tar -zxvf hadoop-3.0.3.tar,gz
修改环境变量
sudo vim ~/.bash_profile
source ~/.bash_profile
export JAVA_HOME=/home/hadoop/work/jdk1.8.0_171
export HADOOP_HOME=/home/hadoop/hadoop/hadoop-2.6.0
export JRE_HOME=${JAVA_HOME}/jre
export CLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH:$HOME/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin
执行hdfs -version检测hadoop环境变量是否配置正确
发送hadoop到集群其他机器上
scp -r /home/hadoop/hadoop/ slave1:/home/hadoop/hadoop
开始修改hadoop的配置文件$HADOOP_HOME/etc/hadoop/
core-site.xml
hadoop-env.sh
hdfs-site.xml
mapred-env.sh
mapred-site.xml
yarn-env.sh
yarn-site.xml
slaves
第一个文件hadoop-env.sh
sudo vim $HADOOP_HOME/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/home/hadoop/work/jdk1.8.0_171
第二个配置文件yarn-env.sh
sudo vim /HADOOP_HOME/etc/hadoop/yarn-env.sh
export JAVA_HOME=/home/hadoop/work/jdk1.8.0_171
第三个配置文件mapred-env.sh
sudo vim /HADOOP_HOME/etc/hadoop/mapred-env.sh
export JAVA_HOME=/home/hadoop/work/jdk1.8.0_171
第四个配置文件slaves
touch slaves
sudo chmod 644 slaves
sudo vim /HADOOP_HOME/etc/hadoop/slaves
master
slave1
slave2
第五个配置文件core-site.xml
sudo vim /HADOOP_HOME/etc/hadoop/core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop/hadoop_data/tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>fs.checkpoint.period</name>
<value>600</value>
</property>
<property>
<name>fs.checkpoint.size</name>
<value>67108854</value>
</property>
</configuration>
第六个配置文件hdfs-site.xml
sudo vim $HADOOP_HOME/etc/hadoop/hdfs-site.xml
</configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/hadoop/hadoop_data/name</value>
</property>
<property>
<name>dfs.datanode.name.dir</name>
<value>/home/hadoop/hadoop/hadoop_data/data</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.support.append</name>
<value>true</value>
</property>
<property>
<name>dfs.support.broken.append</name>
<value>true</value>
</property>
<property>
<name>dfs.balance.bandwidthPerSec</name>
<value>1048576</value>
</property>
<property>
<name>dfs.http.address</name>
<value>master:50070</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>slave2:50090</value>
</property>
</configuration>
第七个配置文件mapred-site.xml
sudo vim $HADOOP_HOME/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/home/hadoop/YDB/localdata/mr/history-tmp</value>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/home/hadoop/YDB/localdata/mr/history-done</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/home/hadoop/YDB/localdata/user</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>master:9001</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>16</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1024m</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>mapreduce.task.io.sort.mb</name>
<value>100</value>
</property>
<property>
<name>mapreudce.task.io.sort.factor</name>
<value>100</value>
</property>
<property>
<name>mapreduce.reduce.shuffle.parallelcopies</name>
<value>50</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx1024m</value>
</property>
<property>
<name>mapreduce.map.speculative</name>
<value>true</value>
</property>
<property>
<name>mapreduce.reduce.speculative</name>
<value>true</value>
</property>
<property>
<name>mapreduce.job.queuename</name>
<value>default</value>
</property>
<property>
<name>mapreduce.cluster.acls.enabled</name>
<value>false</value>
</property>
<property>
<name>mapreduce.job.acl-view-job</name>
<value></value>
</property>
<property>
<name>mapreduce.job.acl-modify-job</name>
<value></value>
</property>
</configuration>
第八个配置文件yarn-site.xml
sudo vim $HADOOP_HOME/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8084</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>10000</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>10000</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>3000</value>
</property>
</configuration>
拷贝hadoop配置文件到其他节点
cd /HADOOP_HOME/etc
scp -r . slave1:/home/hadoop/hadoop/hadoop-3.0.3/etc/
scp -r . slave2:/home/hadoop/hadoop/hadoop-3.0.3/etc/
建立name和data的存储目录
根据$HADOOP_HOME/etc/hadoop/hdfs-site.xml中的dfs.namenode.name.dir和dfs.dataname.name.dir的设置,在三台服务器上建立相应目录
mkdir -pv /home/hadoop/hadoop/hadoop_data/{name,data}
记住,三台服务器都创建
初始化namenode
sudo hdfs namenode -format
启动hadoop服务
start-all.sh(start-dfs.sh以及start-yarn.sh)
检查hadoop各节点是否启动正常
namenode节点
jps -l
datanode节点
使用浏览器检查
namenode节点ip:50070
http://192.168.218.133:8084
用命令查看hdfs系统
hadoop fs -ls /
上传文件到hadoop的根路径下
hadoop fs -put jdk-8u171-linux-x64.tar.gz /
下载文件
hadoop fs -get /jdk-8u171-linux-x64.tar.gz
下载数据
hadoop fs -get hdfs://192.168.2.251:9000/anadata/person/n18/qol9ie20180307143009.csv /home/bjjd/dmq/n63.csv
上传数据
hadoop fs -put /home/bjjd/dmq/n63.csv hdfs://192.168.2.251:9000/anadata/person/n18
java代码测试
Configuration conf = new Configuration();
try {
// conf.set("fs.hdfs.impl","org.apache.hadoop.hdfs.DistributedFileSystem");
URI uri = URI.create("hdfs://192.168.218.133:9000");
FileSystem fs = FileSystem.get(uri, conf);
FSDataInputStream fsdis = fs.open(new Path("/blk_1073741825"));
FileOutputStream fos = new FileOutputStream("C:\\Users\\test\\Desktop\\新建文本文档.txt");
IOUtils.copyBytes(fsdis, fos, 1024, true);
} catch (Exception e) {
e.printStackTrace();
}
eclipse安装hadoop插件
将解压后release文件夹下的hadoop-eclipse-plugin-2.6.0.jar复制到eclipse安装目录下的plugins目录下 ,重启eclipse,修改配置如下
在Window->Show View->other->MapReduce Tools中选择Map/Reduce Locations
在Map/Reduce Locations中新建一个Hadoop Location
修改前如下
修改后如下
rpc
协议
public interface LoginServiceProtocol {
final long versionID = 1l;
String login(String username, String passwd);
}
rpc服务端
public class LoginServiceImpl implements LoginServiceProtocol{
@Override
public String login(String username, String passwd) {
return "success";
}
}
Server server = new RPC.Builder(new Configuration())
.setBindAddress("localhost")
.setPort(10000)
.setProtocol(LoginServiceProtocol.class)
.setInstance(new LoginServiceImpl())
.build();
server.start();
rpc客户端
long clientVersion = 1l;
InetSocketAddress addr = new InetSocketAddress("localhost", 10000);
Configuration conf = new Configuration();
LoginServiceProtocol proxy = RPC.getProxy(LoginServiceProtocol.class, clientVersion, addr, conf);
System.out.println(proxy.login("1", "123456"));