实现MapReduce程序完成行统计

新手小黑吖

已于 2024-12-23 15:25:43 修改

阅读量1.8k

点赞数 33

文章标签： mapreduce 大数据

于 2024-10-09 22:52:16 首次发布

本文链接：https://blog.csdn.net/2401_83253656/article/details/142798073

版权

会用到的东西：linux，虚拟机，eclips,jdk1.8,hadoop2.7.4

请根据自己实际情况进行修改，本文仅供参考。

启动Hadoop服务

到hadoop安装目录: cd /home/hadoop/hadoop-2.7.4（自己的安装目录）

启动hadoop集群: sbin/./start-all.sh

查看服务是否启动成功:jps

打开Eclipse开发工具，新建Maven项目

eclipse设置JDK

打开Eclipse开发工具,选择Eclipse菜单栏windows菜单下的Preference 选项，打开Preference窗口，并选择Java下面的Installed JRE

点击add按钮，弹出的Add JRE窗口选择 Standard VM 。

点击 Next 按钮，在弹出的窗口里，点击JRE home 后面的 Directory按钮,选择jdk安装的目录

选择后点击确定，再点击Finish 按钮

点击apply按钮,再点击Apply and Close，此时Eclipse jdk安装完毕。

新建Maven项目

选择Eclipse菜单栏 File菜单里的New 下的Maven Project 选项

点击next，在弹出的窗口中，勾选中Create a simple project 选项

点击Next按钮，在弹出的窗口中，Group id （一般为公司组织名称）填写 :learning ，Artifact Id(项目名称) 填写:MapReduce

点击finish 按钮。

在建好的项目MapReduce上右键，选择Build Path à Configure Build Path...

在弹出的窗口中左边窗口选择Java Build Path,右边选择Libraries 下的JRE System Libray选项后，点击 Edit 按钮

在弹出的窗口中，选择我们安装的1.8版本的jdk

点击Finish按钮，然后将设置保存。

再选择左边Java Compiler选项，将右边的Compiler compliance level: 设置为1.8

点击Apply按钮，再点击Apply and Close。

修改pom.xml文件

新建完Maven项目后：pom.xml文件添加内容如下：

<hadoop.version>2.7.4</hadoop.version>

</properties>

<groupId>org.apache.hadoop</groupId>

<artifactId>hadoop-client</artifactId>

<version>${hadoop.version}</version>

</dependency>

</dependencies>

<build>

<artifactId>maven-compiler-plugin</artifactId>

</configuration>

</plugin>

<groupId>org.apache.maven.plugins</groupId>

<artifactId>maven-surefire-plugin</artifactId>

</configuration>

</plugin>

</plugins>

</build>

保存内容，生产文件Maven Dependencies

项目目录结构如下：

开发LineCount程序

在src/main/java/下新建 com.learning.mapreduce 包

在src/main/java上点击右键新建Package

点击Finish按钮（我已经建过了，补个截图）。

在com.learning.mapreduce下新建LineCount class文件

在创建的com.learning.mapreduce 上点击右键选择New à Class 创建一个类，类名叫LineCount

目录结构如下：

LineCount.java文件内容如下：

package com.learning.mapreduce;

import java.io.IOException;

import java.util.Iterator;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class LineCount {

private static final Text LINE_COUNT_KEY = new Text("Total Line ");

private static class Map extends Mapper<Object, Text, Text, IntWritable> {

public static enum FileRecorder {

LineRecorder

}

private static final IntWritable one = new IntWritable(1);

public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

context.getCounter(FileRecorder.LineRecorder).increment(1);

context.write(LINE_COUNT_KEY, one);

}

private static class Reduce extends Reducer<Text,IntWritable,Text,IntWritable> {

public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {

int sum = 0 ;

Iterator<IntWritable> it = values.iterator();

while(it.hasNext()){

sum+=it.next().get();

}

context.write(key,new IntWritable(sum));

}

public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {

if(args.length!=2){

System.out.println("Please input 2 params！");

System.exit(2);

}

Configuration conf = new Configuration();

Job job = Job.getInstance(conf, "Line Count");

job.setJarByClass(LineCount.class);

job.setMapperClass(Map.class);

job.setReducerClass(Reduce.class);

job.setMapOutputKeyClass(Text.class);

job.setMapOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

System.exit(job.waitForCompletion(true) ? 0 : 1);

}

保存

导出jar文件

在MapReduce项目上点击右键，点击“export”

在弹出的对话框中选择JAR file

点击”Next>”按钮，进入到如下对话框，取消右边的三个选项，并选择jar文件路径

（我已经做过此操作，补个截图）

点击”Finish“按钮，即可完成到处jar文件，在eclipse-workspace/MapReduce/target目录下可以看到生成的MapReduce.jar文件

将jar文件上传到master服务器上

在master服务器上的/home/hadoop目录下新建目录： mapreduce

命令：mkdir /home/hadoop/mapreduce

将MapReduce.jar文件发送到master服务器上的/home/hadoop/mapreduce目录

xftp连接master服务器，在右框打开/home/hadoop/mapreduce目录，左框打开eclipse-workspace/MapReduce/target目录，将左框的MapReduce.jar文件复制粘贴到右框

准备测试数据

在master服务器的/home/hadoop/mapreduce目录下新建loaddata1.txt文件

将文件上传到hdfs系统

进入hadoop-2.7.4目录下，使用命令bin/hdfs dfs -mkdir在hdfs里新建目录 /input 和 /output,

命令：

cd /home/hadoop/hadoop-2.7.4

bin/hdfs dfs -mkdir /input

bin/hdfs dfs -mkdir /output

使用命令hdfs dfs -put将loaddata1.txt上传到hdfs /input 目录

运行LineCount

运行LineCount程序命令:bin/hadoop jar /home/hadoop/mapreduce/MapReduce.jar com.learning.mapreduce.LineCount /input /output/linecount

查看运行成功之后生成了哪些目录和文件：bin/hdfs dfs -ls -R /output

查看运行的结果：bin/hdfs dfs -cat /output/linecount/part-r-00000