Eclipse下使用Hadoop单机模式调试MapReduce程序

在单机模式下Hadoop不会使用HDFS,也不会开启任何Hadoop守护进程,所有程序将在一个JVM上运行并且最多只允许拥有一个reducer

在Eclipse中新创建一个hadoop-test的Java工程(特别要注意的是Hadoop需要1.6或1.6以上版本的JDK

在Hadoop的官网http://apache.fayea.com/apache-mirror/hadoop/common/下载hadoop-1.2.1.tar.gz

解压hadoop-1.2.1.tar.gz得到hadoop-1.2.1目录

将hadoop-1.2.1目录下和hadoop-1.2.1\lib目录下的jar包导入到hadoop-test工程中

接下来编写MapReduce程序(该程序用来统计每月收支结余)

Map:

import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;

public class MapBus extends MapReduceBase 
		implements Mapper<LongWritable, Text, Text, LongWritable> {
	@Override
	public void map(LongWritable key, Text date, 
			OutputCollector<Text, LongWritable> output,
			Reporter reporter) throws IOException {
		//2013-01-11,-200
		String line = date.toString();
		if(line.contains(",")){
			String[] tmp = line.split(",");
			String month = tmp[0].substring(5, 7);
			int money = Integer.valueOf(tmp[1]).intValue();
			output.collect(new Text(month), new LongWritable(money));
		}
	}
}

Reduce:

import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;

public class ReduceBus extends MapReduceBase 
		implements Reducer<Text, LongWritable, Text, LongWritable> {
	@Override
	public void reduce(Text month, Iterator<LongWritable> money,
			OutputCollector<Text, LongWritable> output, Reporter reporter)
			throws IOException {
		int total_money = 0;
		while(money.hasNext()){
			total_money += money.next().get();
		}
		output.collect(month, new LongWritable(total_money));
	}
}

Main:

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;

public class Wallet {
	public static void main(String[] args){
		if(args.length != 2){
			System.err.println("param error!");
			System.exit(-1);
		}
		
		JobConf jobConf = new JobConf(Wallet.class);
		jobConf.setJobName("My Wallet");
		
		FileInputFormat.addInputPath(jobConf, new Path(args[0]));
		FileOutputFormat.setOutputPath(jobConf, new Path(args[1]));
		jobConf.setMapperClass(MapBus.class);
		jobConf.setReducerClass(ReduceBus.class);
		jobConf.setOutputKeyClass(Text.class);
		jobConf.setOutputValueClass(LongWritable.class);
		
		try{
			JobClient.runJob(jobConf);
		}catch(Exception e){
			e.printStackTrace();
		}
	}
}

还需准备待分析的文件,在E:\cygwin_root\home\input路径下创建2个文件,一个文件名为:2013-01.txt,另一个文件名为:2013-02.txt

2013-01.txt:

2013-01-01,100
2013-01-02,-100
2013-01-07,100
2013-01-10,-100
2013-01-11,100
2013-01-21,-100
2013-01-22,100
2013-01-25,-100
2013-01-27,100
2013-01-18,-100
2013-01-09,500

2013-02.txt:

2013-02-01,100

设置好运行参数后,就可以通过Run As -> Java Application运行MapReduce程序了

java.io.IOException: Failed to set permissions of path: 
\tmp\hadoop-linkage\mapred\staging\linkage1150562408\.staging to 0700

报这个错误的主要原因是后期的hadoop版本增加了对文件路径的校验,修改方式比较简单,将hadoop-core-1.2.1.jar替换为hadoop-0.20.2-core.jar即可

下面是MapReduce程序运行时打印的日志

14/02/11 10:54:16 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
14/02/11 10:54:16 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/02/11 10:54:16 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
14/02/11 10:54:16 INFO mapred.FileInputFormat: Total input paths to process : 2
14/02/11 10:54:17 INFO mapred.JobClient: Running job: job_local_0001
14/02/11 10:54:17 INFO mapred.FileInputFormat: Total input paths to process : 2
14/02/11 10:54:17 INFO mapred.MapTask: numReduceTasks: 1
14/02/11 10:54:17 INFO mapred.MapTask: io.sort.mb = 100
14/02/11 10:54:17 INFO mapred.MapTask: data buffer = 79691776/99614720
14/02/11 10:54:17 INFO mapred.MapTask: record buffer = 262144/327680
14/02/11 10:54:17 INFO mapred.MapTask: Starting flush of map output
14/02/11 10:54:18 INFO mapred.MapTask: Finished spill 0
14/02/11 10:54:18 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
14/02/11 10:54:18 INFO mapred.LocalJobRunner: file:/E:/cygwin_root/home/input/2013-01.txt:0+179
14/02/11 10:54:18 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
14/02/11 10:54:18 INFO mapred.MapTask: numReduceTasks: 1
14/02/11 10:54:18 INFO mapred.MapTask: io.sort.mb = 100
14/02/11 10:54:18 INFO mapred.MapTask: data buffer = 79691776/99614720
14/02/11 10:54:18 INFO mapred.MapTask: record buffer = 262144/327680
14/02/11 10:54:18 INFO mapred.MapTask: Starting flush of map output
14/02/11 10:54:18 INFO mapred.MapTask: Finished spill 0
14/02/11 10:54:18 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting
14/02/11 10:54:18 INFO mapred.LocalJobRunner: file:/E:/cygwin_root/home/input/2013-02.txt:0+16
14/02/11 10:54:18 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000001_0' done.
14/02/11 10:54:18 INFO mapred.LocalJobRunner: 
14/02/11 10:54:18 INFO mapred.Merger: Merging 2 sorted segments
14/02/11 10:54:18 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 160 bytes
14/02/11 10:54:18 INFO mapred.LocalJobRunner: 
14/02/11 10:54:18 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
14/02/11 10:54:18 INFO mapred.LocalJobRunner: 
14/02/11 10:54:18 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is allowed to commit now
14/02/11 10:54:18 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to file:/E:/cygwin_root/home/output
14/02/11 10:54:18 INFO mapred.LocalJobRunner: reduce > reduce
14/02/11 10:54:18 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done.
14/02/11 10:54:18 INFO mapred.JobClient:  map 100% reduce 100%
14/02/11 10:54:18 INFO mapred.JobClient: Job complete: job_local_0001
14/02/11 10:54:18 INFO mapred.JobClient: Counters: 13
14/02/11 10:54:18 INFO mapred.JobClient:   FileSystemCounters
14/02/11 10:54:18 INFO mapred.JobClient:     FILE_BYTES_READ=39797
14/02/11 10:54:18 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=80473
14/02/11 10:54:18 INFO mapred.JobClient:   Map-Reduce Framework
14/02/11 10:54:18 INFO mapred.JobClient:     Reduce input groups=2
14/02/11 10:54:18 INFO mapred.JobClient:     Combine output records=0
14/02/11 10:54:18 INFO mapred.JobClient:     Map input records=12
14/02/11 10:54:18 INFO mapred.JobClient:     Reduce shuffle bytes=0
14/02/11 10:54:18 INFO mapred.JobClient:     Reduce output records=2
14/02/11 10:54:18 INFO mapred.JobClient:     Spilled Records=24
14/02/11 10:54:18 INFO mapred.JobClient:     Map output bytes=132
14/02/11 10:54:18 INFO mapred.JobClient:     Map input bytes=195
14/02/11 10:54:18 INFO mapred.JobClient:     Combine input records=0
14/02/11 10:54:18 INFO mapred.JobClient:     Map output records=12
14/02/11 10:54:18 INFO mapred.JobClient:     Reduce input records=12

运行完成后将在E:\cygwin_root\home\output路径下生成2个文件:.part-00000.crc和part-00000。.part-00000.crc为一二进制文件,是一个保存了part-00000文件校验和的内部文件;part-00000文件中保存了最终的统计结果

01	500
02	100

特别要注意的是每次运行前都需要先将输出路径删掉,否则会报

org.apache.hadoop.mapred.FileAlreadyExistsException: 
Output directory file:/E:/cygwin_root/home/output already exists

Hadoop做这个校验的目的是为了避免上一次MapReduce程序没有完成时,再次运行MapReduce程序所产生的中间文件会覆盖掉上一次运行产生的中间文件

 

  • 0
    点赞
  • 12
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
好的,我会尽力回答你的问题。首先,你需要按照以下步骤来搭建Hadoop单机版: 1. 安装Java环境:Hadoop需要Java环境才能运行。你需要安装Java 8或以上版本。 2. 下载Hadoop:你可以从官网上下载Hadoop的最新版本。下载后,解压缩到一个合适的目录下。 3. 配置Hadoop:在Hadoop解压缩后的目录下,有一个conf文件夹,里面包含了Hadoop的配置文件。你需要编辑其中的hadoop-env.sh文件,设置JAVA_HOME为你的Java安装路径。另外,还需要编辑core-site.xml、hdfs-site.xml和mapred-site.xml三个文件,分别设置Hadoop的核心配置、HDFS(Hadoop分布式文件系统)配置和MapReduce配置。 4. 启动Hadoop:在Hadoop解压缩后的目录下,有一个sbin文件夹,里面包含了Hadoop的启动脚本。你需要打开一个终端窗口,进入该文件夹,执行./start-all.sh命令,即可启动Hadoop。启动后,你可以通过http://localhost:50070/ 访问Hadoop的Web界面,查看HDFS的状态。 接下来,你需要解决姓名分析问题。这里提供一个简单的MapReduce程序示例: 1. 编写Mapper类:Mapper类的作用是将输入数据进行切分和处理,生成中间结果。对于姓名分析问题,可以将输入数据按空格切分,将每个单词作为中间结果的键,将出现次数作为值。 2. 编写Reducer类:Reducer类的作用是将中间结果进行合并和处理,生成最终结果。对于姓名分析问题,可以对每个单词的出现次数进行累加,得到该单词在所有姓名中出现的总次数。 3. 编写驱动程序:驱动程序的作用是将Mapper和Reducer类组合起来,并设置MapReduce作业的输入输出路径等参数。对于姓名分析问题,输入路径可以是一个包含多个姓名的文本文件,输出路径可以是一个文本文件,用于存储每个单词在所有姓名中出现的总次数。 4. 运行MapReduce作业:在启动Hadoop后,你可以使用hadoop jar命令来提交MapReduce作业。具体命令如下: ``` hadoop jar <your_jar_file> <driver_class> <input_path> <output_path> ``` 其中,<your_jar_file>是你编译后的Java程序打包成的jar文件,<driver_class>是你编写的驱动程序的类名,<input_path>是输入路径,<output_path>是输出路径。 以上是一个简单的姓名分析问题的MapReduce程序示例,你可以根据实际情况进行修改和扩展。希望能对你有所帮助。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值