本文以一主三从部署hadoop为介绍
1 首先克隆原有centos系统
2 nn_y 为主,dn1,dn2,dn3为从,都是通过克隆快速完成的. 右键管理有克隆,选择完全克隆。
3 配置网络集群:
设置静态ip
192.168.64.132
192.168.64.133
192.168.64.134
192.168.64.135
根据自己ip网络设置静态ip地址,最后三位不同。
4 通过x’shell首先完成nn_y静态ip的设置
1)命令 vi /etc/sysconfig/network-scripts/ifcfg-ens33
2 命令:vi /etc/hostname
将主机名改为:nn
3 vi /etc/hosts
4 将其他三个也分别如此操作
5 新增用户,给四个分别新增用户: adduser hadoop
6 设置 ssh 无密码登陆如 nn无密码登陆dn1
1) 命令 ssh-keygen -t rsa 全部enter
因为启动Hadoop会让输入密码现在设置免密启动
接着将公用钥匙写入到authorized_keys文件中,并修改这个文件的权限(重要,请不要忽略)
cat id_rsa.pub >> authorized_keys
cat id_rsa.pub >> authorized_keys
2) 具体原理请看下面这篇博客:
https://blog.csdn.net/wh_19910525/article/details/74331649
生成如下文件
3) id_rsa (私钥) id_rsa.pub(公钥) known_hosts(记录谁登陆机子)
现在要做的是将 id_rsa.pub(公钥)传递到dn1,dn2,dn3.
10 命令 ssh-copy-id dn1 将id传递到dn1中,结果如下:
4)同样对dn2,dn3如此操作,如下图般就可以无密码登陆dn1,dn2,dn3.
7 配置jdk Java home
1 ) 安装jdk 和hadoop
2) 建立软连接 ln -s jdk1.8.0_152 jdk8 和 ln -s hadoop-2.7.5 hadoop2
3) 环境变量配置
1) echo export JAVA_HOME=/home/hadoop/opt/jdk8
2) echo export JAVA_HOME=/home/hadoop/opt/jdk8 >> ~/.bashrc
3)echo export HADOOP_HOME=/home/hadoop/opt/hadoop2 >> ~/.bashrc
4) echo export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop >> ~/.bashrc
如下图:
5) vi ~/.bashrc 用这命令,直接填写也一样,最终要有四行
配置完后,输入source .bashrc,使配置生效
6)测试配置是否成功 echo $JAVA_HOME 和 echo $HADOOP_CONF_DIR
7 hadoop默认不支持分布式系统,需要改成分布式,改它的协议,将file:///协议改成hdfs
- 命令 vi $HADOOP_CONF_DIR/core-site.xml
- vi $HADOOP_CONF_DIR/hdfs-site.xml
设置datanode个数:hadoop默认为3,而本此hadoop部署是一个namenode和三个datanode,可以不配置,但要知道在哪里配置
- 配置namenode路径 vi $HADOOP_CONF_DIR/hdfs-site.xml
启动hadoop二代引擎yar 将 mapred-site.xml.template复制并重命名mapred-site.xml
cp mapred-site.xml.template mapred-site.xml
5) vi $HADOOP_CONF_DIR/yarn-site.xml
6) vi $HADOOP_CONF_DIR/slaves
8 命令 tar zcf opt.tar.gz opt 将opt文件夹进行压缩传到dn1,dn2,dn3中
9 命令 scp opt.tar.gz dn1:~ 和环境变量 scp .bashrc dn1:~
传递到dn1
10 进入dn1,dn2,dn3解压并source .bashrc
11 格式化 hdfs namenode -format
12 状态为零的时候格式化成功
13 启动 start-dfs.sh
14 输入上面自己网址和50070端口,如能登陆就表示启动成功,一定要把防火墙关闭
15 hdfs dfs -mkdir -p /user/hadoop 创建文件夹
start-yarn.sh 启动yarn
16 利用java从hadoop集群中读取文件:
使用idea创建的maven项目:
pom引入文件:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.hadoop</groupId>
<artifactId>hadoop</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>war</packaging>
<name>hadoop Maven Webapp</name>
<!-- FIXME change it to the project's website -->
<url>http://www.example.com</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.7</maven.compiler.source>
<maven.compiler.target>1.7</maven.compiler.target>
</properties>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.8.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.8.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.8.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.8.1</version>
</dependency>
</dependencies>
<build>
<finalName>hadoop</finalName>
<pluginManagement><!-- lock down plugins versions to avoid using Maven defaults (may be moved to parent pom) -->
<plugins>
<plugin>
<artifactId>maven-clean-plugin</artifactId>
<version>3.1.0</version>
</plugin>
<!-- see http://maven.apache.org/ref/current/maven-core/default-bindings.html#Plugin_bindings_for_war_packaging -->
<plugin>
<artifactId>maven-resources-plugin</artifactId>
<version>3.0.2</version>
</plugin>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.8.0</version>
</plugin>
<plugin>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.22.1</version>
</plugin>
<plugin>
<artifactId>maven-war-plugin</artifactId>
<version>3.2.2</version>
</plugin>
<plugin>
<artifactId>maven-install-plugin</artifactId>
<version>2.5.2</version>
</plugin>
<plugin>
<artifactId>maven-deploy-plugin</artifactId>
<version>2.8.2</version>
</plugin>
</plugins>
</pluginManagement>
</build>
</project>
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import java.io.IOException;
//从hadoop集群中读取数据,从namenode
public class ReadHdfs {
public static void main(String[] args) throws IOException {
//获取文件系统
FileSystem fs=FileSystem.get(new Configuration());
//根据路径,通过open获取文件
FSDataInputStream fis=fs.open(new Path("out1/part-r-00000"));
byte[] buffer=new byte[2048];
while(true){
//通过read方法读取文件
int n= fis.read(buffer);
//当n==-1表示文件读完
if(n==-1){
break;
}
System.out.println(new String(buffer,0,n));
}
//关闭资源
fis.close();
}
}
17 往hadoop中写文件:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import java.io.FileInputStream;
import java.io.IOException;
//往hadoop集群写数据
public class WriteHdfs{
public static void main(String[] args) throws IOException {
FileSystem fs=FileSystem.get(new Configuration());
Path path=new Path("data4");
//创建data4文件夹
fs.mkdirs(path);
//根据本地路径读取文件
FileInputStream fis=
new FileInputStream("E:\javaWorkSpace\hadoop\src\main\java\ReadHdfs.java");
//将读取的文件写入data4文件夹下子文件ReadHdfs.java中
FSDataOutputStream fos = fs.create(new Path(path,"ReadHdfs.java") );
byte[] buffer=new byte[2048];
while(true){
int n=fis.read(buffer);
if(n==-1){
break;
}
//写入hadoop集群中,从0开始,一直读到n
fos.write(buffer,0,n);
fos.hflush();
}
//关闭资源
fis.close();
fos.close();
}
}
18 jps等命令提示未找到命令
1) yum list openjdk-devel
2)yum install java-1.8.0-openjdk-devel.x86_64