Hadoop系列(一)

  1. 安装

环境:centOS7 + jdk1.8

@centOS网络配置

e44492423e5e81b887efe254abc68599c25.jpg

修改网络配置vi /etc/sysconfig/network-scripts/ifcfg-ens33,设置开机启动

c7135c6e9b821f090d475f9b560267ca412.jpg

 

修改主机名:vi /etc/sysconfig/network

467802b60b310c988736fdb1861434b23fc.jpg

使用命令hostname master生效

 

修改ip映射:vi /etc/hosts

db4126f5f5004e6b8fececdd5423a311638.jpg

 

配置ssh免密登录:

ssh-keygen -t ras

5c8960394e6046be87bf36977f052b09d45.jpg

 

秘钥生成后在~/.ssh/目录下,有两个文件id_rsa(私钥)和id_rsa.pub(公钥),将公钥复制到authorized_keys并赋予authorized_keys600权限。

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized.keys

Chmod 600 ~/.ssh/authorized.keys

将slave1和slave2的rsa.pub复制到master的authorized_keys下

 

将master上的~/.ssh/authorized_keys传输到slave1和slave2上

scp ~/.ssh/authorized_keys root@slave1:~/.ssh/

b28ef1fb5e956d40fc8a0986a16fb4abc7e.jpg

 

验证:从master到slave1、slave2不需要密码登录

ssh slave1

bb6bcf427c62e0d8746607057640b73ee46.jpg

@安装jdk1.8

上传jdk到/root/java下(master、slave1、slave2);

 

环境变量的配置:

vi /etc/profile

配置环境变量:

export JAVA_HOME=/root/java/jdk1.8.0_191

export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

export PATH=:$JAVA_HOME/bin:$PATH

生效:

source /etc/profile

 

 

安装mysql(master)

查看系统中是否已安装了mysql,如果安装需卸载

查看是否已安装:rmp -qa |grep mariadb

卸载:rpm -e <包名> --nodeps

 

下载mysql-5.7.24-1.el7.x86_64.rpm-bundle.tar上传到master上

(下载地址:https://dev.mysql.com/downloads/file/?id=481064)

 

解压:tar -xvf mysql-5.7.24-1.el7.x86_64.rpm-bundle.tar

安装MySQL有关的rpm包:

rpm -ivh mysql-community-common-5.7.24-1.el7.x86_64.rpm

rpm -ivh mysql-community-libs-5.7.24-1.el7.x86_64.rpm

rpm -ivh mysql-community-client-5.7.24-1.el7.x86_64.rpm

rpm -ivh mysql-community-server-5.7.24-1.el7.x86_64.rpm

a4f2f10005d877f38f287d03c61625d704d.jpg

最后一个可能报

34d7e578ad6196085f3b197ee784d1f9180.jpg

这时候只要在安装perl就行了:

yum install perl

63ed1701c5eb30d356abf6b19eadf9ff911.jpg

然后再安装mysql-community-server-5.7.24-1.el7.x86_64.rpm

5391a7e1f95c636051d991c56eb23ac73b4.jpg

 

此时可以启动mysql了:systemctl restart mysqld。

此时没有密码进不去,需要以下操作:

在/ect/my.cnf 的最后面加上一行:

skip-grant-tables,保存退出。

重启mysql,systemctl restart mysqld。

进入mysql:mysql -u root -p

 

 

安装hadoop

先在master安装,安装完成后,将master上的复制到slave上即可。

 

配置hadoop环境变量:vi /etc/profile

f8a8eadfc3bff7fb3fc18266831c13cc426.jpg

搭建hadoop集群的准备工作

在master节点上创建以下文件夹

/root/hadoop/name

/root/hadoop/data

/root/hadoop/temp

fdee93fb64e86a84becbd6b234b359a2038.jpg

 

配置hadoop配置文件

/root/hadoop/hadoop-2.8.4/etc/hadoop下的

 

hadoop-env.sh:修改jdk路径

4d7f1542e2e200ff05e81587c034aa26c38.jpg

 

yarn-env.sh修改jdk路径

d7ae274036bbbcf28f0256060c505aef8a6.jpg

 

Slaves:去掉localhost,增加slave1、slave2从节点

34d8f2957f28bee25baef61226626f8f3c8.jpg

 

core-site.xml:

 

<configuration>

  <property>

    <name>fs.defaultFS</name>

    <value>hdfs://master:9000</value>

  </property>

  <property>

    <name>hadoop.tmp.dir</name>

    <value>/root/hadoop/temp</value>

  </property>

</configuration>

 

hdfs.xml

 

<configuration>

  <property>

    <name>dfs.namenode.name.dir</name>

    <value>file:/root/hadoop/name</value>

  </property>

  <property>

    <name>dfs.datanode.data.dir</name>

    <value>file:/root/hadoop/data</value>

  </property>

  <property>

    <name>dfs.replication</name>

    <value>3</value>

  </property>

<property>

    <name>dfs.webhdfs.enabled</name>

    <value>true</value>

  </property>

 

mapred-site.xml

 

<configuration>

  <property>

    <name>mapreduce.framework.name</name>

    <value>yarn</value>

  </property>

</configuration>

 

yarn-site.xml

21fa93c6992058b1a4b18c5a30494170f58.jpg  

Yarn-site.xml属性说明:https://www.cnblogs.com/yinchengzhe/p/5142659.html

 

将配置好的hadoop复制到slave1、slave2节点上

scp -r /root/hadoop/hadoop-2.8.4/ root@slave1:/root/hadoop/

 

scp -r /root/hadoop/hadoop-2.8.4/ root@slave2:/root/hadoop/

 

而后,配置slave1、slave2的hadoop环境变量

 

启动hadoop

格式化Namenode:

bac211ec051c70ceb2ad82754ebcbc1bbf3.jpg

启动集群:

8a9bb62d1d8e83b6e98b388da9b21b28035.jpg

 

 

Hadoop api

hdfs所需jar包:hadoop-xxx\share\hadoop\common\hadoop-common-2.8.4.jar

           hadoop-xxx\share\hadoop\hdfs\lib\*.jar

           hadoop-xxx\share\hadoop\mapreduce\lib\hamcrest-core-1.3.jar

           hadoop-xxx\share\hadoop\common\lib\commons-collections-3.2.2.jar

           hadoop-xxx\share\hadoop\common\lib\servlet-api-2.5.jar

           hadoop-xxx\share\hadoop\common\lib\slf4j-api-1.7.10.jar

hadoop-xxx\share\hadoop\common\lib\slf4j-log4j12-1.7.10.jar

hadoop-xxx\share\hadoop\common\lib\commons-configuration-1.6.jar

hadoop-xxx\share\hadoop\common\lib\hadoop-auth-2.8.4.jar

 

 

hdfs实例:

package dfs;

 

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.LocatedFileStatus;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.fs.RemoteIterator;

import org.junit.Test;

 

public class HdfsTools {

 

Configuration config = new Configuration();

FileSystem sf = null;

 

private static final String defaultFS = "hdfs://localhost:9000";

 

/**

 * 创建 FileSystem,其实是创建了org.apache.hadoop.hdfs.DistributedFileSystem

 */

private void init(){

try{

config.set("fs.defaultFS", defaultFS);

sf = FileSystem.get(config);

}catch(Exception e){

 

}

}

 

/**

 * 创建文件目录

 * @throws Exception

 */

@Test

public void mkdir() throws Exception{

init();

sf.mkdirs(new Path("/a"));

}

 

/**

 * 上传本地文件到hdfs

 * @throws Exception

 */

@Test

public void put() throws Exception{

init();

sf.copyFromLocalFile(new Path

("F:\\hadoop-2.8.4\\share\\hadoop\\common\\lib\\commons-collections-3.2.2.jar"),

new Path("/aaaa"));

 

/*sf.copyFromLocalFile(true,new Path

("F:\\hadoop-2.8.4\\share\\hadoop\\common\\lib\\commons-collections-3.2.2.jar"),

new Path("/aaaa"));*/

}

 

/**

 * 从hdfs上下载文件到本地

 * @throws Exception

 */

@Test

public void download() throws Exception{

init();

sf.copyToLocalFile(new Path("/aaaa/commons-collections-3.2.2.jar"), new Path("F:"));

sf.copyToLocalFile(true,new Path("/aaaa/commons-collections-3.2.2.jar"), new Path("F:"));

}

 

/**

 * 递归查找

 * @throws Exception

 */

@Test

public void find() throws Exception{

init();

RemoteIterator<LocatedFileStatus> remoteIterator = sf.listFiles(new Path("/"), true);

while(remoteIterator.hasNext()) {

LocatedFileStatus fileStatus = remoteIterator.next();

System.out.println("path:"+fileStatus.getPath().toString()+" size:"+fileStatus.getLen()/1024);

 

}

}

 

/**

 * 删除hdfs上的目录

 * @throws Exception

 */

@Test

public void remove() throws Exception{

init();

sf.deleteOnExit(new Path("/a"));

}

 

/**

 * windows环境下将本地文件上传到hdfs

 * @throws Exception

 */

public void putDfsForWindow() throws Exception{

FileInputStream fis = new FileInputStream(new File("D:\\hello.txt"));

OutputStream os = sf.create(new Path("/test/hello1.txt"));

IOUtils.copyBytes(fis, os, 4096, true);

}

}

 

MapReduce 实例:

导入jar:

hadoop-xxx\share\hadoop\mapreduce\*

hadoop-xxx\share\hadoop\yarn\*

package mr;

 

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

 

 

/**

 * Mapper四个泛型说明:输入键值(偏移量),文本,输出文本,输出类型

 * @author sunzy

 *

 */

public class Mapper extends 

org.apache.hadoop.mapreduce.Mapper<LongWritable, Text, Text, LongWritable>{

 

@Override

protected void map(

LongWritable key,

Text value,

org.apache.hadoop.mapreduce.Mapper<LongWritable, Text, Text, LongWritable>.Context context)

throws IOException, InterruptedException {

String[] line = value.toString().split(" ");

 

for(String word : line){

context.write(new Text(word), new LongWritable(1));

}

}

 

}

 

 

package mr;

 

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Reducer;

 

public class Reduce extends Reducer<Text, LongWritable, Text, LongWritable>{

 

@Override

protected void reduce(Text key, Iterable<LongWritable> values,

Reducer<Text, LongWritable, Text, LongWritable>.Context context)

throws IOException, InterruptedException {

long count = 0;

for(LongWritable value : values){

count += 1;

}

 

context.write(key, new LongWritable(count));

}

 

}

 

 

package mr;

 

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.junit.Test;

 

 

public class MRJobRunner{

 

@Test

public void run() throws Exception{

Configuration config = new Configuration();

config.set("fs.defaultFS", "hdfs://localhost:9000");

 

Job job = Job.getInstance(config);

 

job.setJarByClass(MRJobRunner.class);

 

job.setMapperClass(Mapper.class);

job.setReducerClass(Reduce.class);

 

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(LongWritable.class);

 

job.setMapOutputKeyClass(Text.class);

job.setMapOutputValueClass(LongWritable.class);

 

FileInputFormat.setInputPaths(job, new Path("/test"));

FileOutputFormat.setOutputPath(job, new Path("/t09"));

 

job.waitForCompletion(true);

}

 

}

 

Hdfs合并FSImage和edits流程:

58a44ae8c7bd0782b13938cc6b1525d55e0.jpg

Job提交到yarn执行流程:

 

a1c1c3fd66a17bfcaf0baec8fdf6e2e79ac.jpg

转载于:https://my.oschina.net/u/3759047/blog/3025820

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值