在windows8.1+eclipse编写hadoop程序,并尝试运行,步骤如下:
1.在Eclipse开发环境中创建JAVA工程
双击桌面上的Eclipse的快捷方式。首先选择菜单“File -> New -> Java Project”。再在“Project name:”文本框中输入工程名“WordCount”。最后点击“Next”按钮。
2. 添加编译依赖类库
首先点击选项卡“Libraries”。再点击“Add External JARs…”按钮添加编译依赖类库,分别添加目录“D:/hadoop-2.6.0-cdh5.4.2/share/hadoop/mapreduce1”、“D:/hadoop-2.6.0-cdh5.4.2/share/hadoop/mapreduce1/lib”和“D:/hadoop-2.6.0-cdh5.4.2/share/hadoop/common”下的所有JAR文件。最后点击“Finish”按钮。
3. 在工程中添加类
首先右击工程WordCount
,选择菜单“New -> Class”。再在“Package:”文本框中输入包名“lab2.module11”,在“Name:”文本框中输入类名“WordCount”。最后点击“Finish”按钮。
4. 编写词频统计代码
package
hadoop;
import
java.io.IOException;
import
java.util.StringTokenizer;
import
org.apache.hadoop.conf.Configuration;
import
org.apache.hadoop.fs.Path;
import
org.apache.hadoop.io.IntWritable;
import
org.apache.hadoop.io.Text;
import
org.apache.hadoop.mapreduce.Job;
import
org.apache.hadoop.mapreduce.Mapper;
import
org.apache.hadoop.mapreduce.Reducer;
import
org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import
org.apache.hadoop.util.GenericOptionsParser;
public
class
WordCount {
//输入键类型、输入值类型、输出键类型和输出值类型,在本例中分别为普通对象(Object)、字符串(Text)、字符串和整数型(IntWritable)
public
static
class
TokenizerMapper
extends
Mapper<Object, Text, Text, IntWritable> {
private
final
static
IntWritable
one
=
new
IntWritable(1);
private
Text
word
=
new
Text();
//输入键key、输入值value和环境变量context。输入键默认为行号,输入值为每一行的文本字符串
public
void
map(Object
key
, Text
value
, Context
context
)
throws
IOException, InterruptedException {
StringTokenizer
itr
=
new
StringTokenizer(
value
.toString());
while
(
itr
.hasMoreTokens()) {
word
.set(
itr
.nextToken());
//设置键的值
context
.write(
word
,
one
);
//函数context.write()输出键为单词值为\(1\)的键值对
}
}
}
//输入键类型、输入值类型、输出键类型和输出值类型,在本例中分别为字符串、整数型、字符串和整数型
public
static
class
IntSumReducer
extends
Reducer<Text, IntWritable, Text, IntWritable> {
private
IntWritable
result
=
new
IntWritable();
//输入键key、输入值列表values和环境变量context
public
void
reduce(Text
key
, Iterable<IntWritable>
values
, Context
context
)
throws
IOException, InterruptedException {
int
sum
= 0;
for
(IntWritable
val
:
values
) {
sum
+=
val
.get();
}
result
.set(
sum
);
context
.write(
key
,
result
);
//输出键为单词值为单词出现次数的键值对
}
}
public
static
void
main(String[]
args
)
throws
Exception {
Configuration
conf
=
new
Configuration();
//初始化hadoop集群组件配置
String[]
otherArgs
=
new
GenericOptionsParser(
conf
,
args
).getRemainingArgs();
//解析命令行参数
if
(
otherArgs
.
length
< 2) {
System.
err
.println(
"Usage: wordcount <in> [<in>...] <out>"
);
System.exit(2);
}
@SuppressWarnings
(
"deprecation"
)
Job
job
=
new
Job(
conf
,
"word count"
);
//Job命名为"word count"
job
.setJarByClass(WordCount.
class
);
//设置主类
job
.setMapperClass(TokenizerMapper.
class
);
//设置mapper类
job
.setReducerClass(IntSumReducer.
class
);
//设置reducer类
job
.setOutputKeyClass(Text.
class
);
//设置键输出类
job
.setOutputValueClass(IntWritable.
class
);
//设置值输出类
for
(
int
i
= 0;
i
<
otherArgs
.
length
- 1; ++
i
) {
FileInputFormat.addInputPath(
job
,
new
Path(
otherArgs
[
i
]));
//设置作业输入路径
}
FileOutputFormat.setOutputPath(
job
,
new
Path(
otherArgs
[
otherArgs
.
length
- 1]));
//设置作业输出路径
System.exit(
job
.waitForCompletion(
true
) ? 0 : 1);
//调用函数System.exit()等待作业退出,并给出退出状态。
}
}
5. 准备输入数据
在D盘下新建“input”文件夹,并将文件“D:/LICENSE.txt”复制到路径“D:/input/”下。
6. 配置运行参数
首先选择菜单“Run -> Run Configurations…”。再选择左侧的“Java Application”,并点击左上角的新建按钮。然后点击选项卡“Arguments”,并在文本框“Program arguments:”中输入作业的输入和输出路径
D:\input D:\output
。然后点击选项卡“Environments”,点击右侧按钮“New”,在弹出的对话框中文本框“Name:”中输入
HADOOP_HOME
,文本框“Value:”中输入
D:\hadoop-2.6.0
;再一次点击右侧按钮“New”,在弹出的对话框中文本框“Name:”中输入
PATH
,文本框“Value:”中输入
%PATH%;D:\hadoop-2.6.0\bin
。
8. 在Eclipse开发环境中测试程序
点击按钮“Run”,即可以在在Eclipse开发环境中运行程序,得到如下控制台输出。
运行结果:
2017-03-30 17:27:52,485 INFO jvm.JvmMetrics (
JvmMetrics.java:init(76)
) - Initializing JVM Metrics with processName=JobTracker, sessionId=
2017-03-30 17:27:55,388 WARN mapreduce.JobResourceUploader (
JobResourceUploader.java:uploadFiles(171)
) - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
2017-03-30 17:27:55,424 INFO input.FileInputFormat (
FileInputFormat.java:listStatus(283)
) - Total input paths to process : 1
2017-03-30 17:27:55,606 INFO mapreduce.JobSubmitter (
JobSubmitter.java:submitJobInternal(198)
) - number of splits:1
2017-03-30 17:27:55,987 INFO mapreduce.JobSubmitter (
JobSubmitter.java:printTokens(287)
) - Submitting tokens for job: job_local1924825305_0001
2017-03-30 17:27:57,190 INFO mapreduce.Job (
Job.java:submit(1294)
) - The url to track the job: http://localhost:8080/
2017-03-30 17:27:57,192 INFO mapreduce.Job (
Job.java:monitorAndPrintJob(1339)
) - Running job: job_local1924825305_0001
2017-03-30 17:27:57,197 INFO mapred.LocalJobRunner (
LocalJobRunner.java:createOutputCommitter(471)
) - OutputCommitter set in config null
2017-03-30 17:27:57,214 INFO output.FileOutputCommitter (
FileOutputCommitter.java:<init>(100)
) - File Output Committer Algorithm version is 1
2017-03-30 17:27:57,221 INFO mapred.LocalJobRunner (
LocalJobRunner.java:createOutputCommitter(489)
) - OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2017-03-30 17:27:57,514 INFO mapred.LocalJobRunner (
LocalJobRunner.java:runTasks(448)
) - Waiting for map tasks
2017-03-30 17:27:57,518 INFO mapred.LocalJobRunner (
LocalJobRunner.java:run(224)
) - Starting task: attempt_local1924825305_0001_m_000000_0
2017-03-30 17:27:57,762 INFO output.FileOutputCommitter (
FileOutputCommitter.java:<init>(100)
) - File Output Committer Algorithm version is 1
2017-03-30 17:27:57,813 INFO util.ProcfsBasedProcessTree (
ProcfsBasedProcessTree.java:isAvailable(192)
) - ProcfsBasedProcessTree currently is supported only on Linux.
2017-03-30 17:27:58,200 INFO mapreduce.Job (
Job.java:monitorAndPrintJob(1360)
) - Job job_local1924825305_0001 running in uber mode : false
2017-03-30 17:27:58,342 INFO mapreduce.Job (
Job.java:monitorAndPrintJob(1367)
) - map 0% reduce 0%
2017-03-30 17:27:58,956 INFO mapred.Task (
Task.java:initialize(612)
) - Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@4d9fcb2
2017-03-30 17:27:58,990 INFO mapred.MapTask (
MapTask.java:runNewMapper(756)
) - Processing split: file:/D:/input.txt:0+17087
2017-03-30 17:27:59,350 INFO mapred.MapTask (
MapTask.java:setEquator(1205)
) - (EQUATOR) 0 kvi 26214396(104857584)
2017-03-30 17:27:59,350 INFO mapred.MapTask (
MapTask.java:init(998)
) - mapreduce.task.io.sort.mb: 100
2017-03-30 17:27:59,350 INFO mapred.MapTask (
MapTask.java:init(999)
) - soft limit at 83886080
2017-03-30 17:27:59,351 INFO mapred.MapTask (
MapTask.java:init(1000)
) - bufstart = 0; bufvoid = 104857600
2017-03-30 17:27:59,351 INFO mapred.MapTask (
MapTask.java:init(1001)
) - kvstart = 26214396; length = 6553600
2017-03-30 17:27:59,378 INFO mapred.MapTask (
MapTask.java:createSortingCollector(403)
) - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2017-03-30 17:27:59,626 INFO mapred.LocalJobRunner (
LocalJobRunner.java:statusUpdate(591)
) -
2017-03-30 17:27:59,628 INFO mapred.MapTask (
MapTask.java:flush(1460)
) - Starting flush of map output
2017-03-30 17:27:59,628 INFO mapred.MapTask (
MapTask.java:flush(1482)
) - Spilling map output
2017-03-30 17:27:59,629 INFO mapred.MapTask (
MapTask.java:flush(1483)
) - bufstart = 0; bufend = 25275; bufvoid = 104857600
2017-03-30 17:27:59,629 INFO mapred.MapTask (
MapTask.java:flush(1485)
) - kvstart = 26214396(104857584); kvend = 26204872(104819488); length = 9525/6553600
2017-03-30 17:27:59,843 INFO mapred.MapTask (
MapTask.java:sortAndSpill(1667)
) - Finished spill 0
2017-03-30 17:27:59,928 INFO mapred.Task (
Task.java:done(1038)
) - Task:attempt_local1924825305_0001_m_000000_0 is done. And is in the process of committing
2017-03-30 17:27:59,968 INFO mapred.LocalJobRunner (
LocalJobRunner.java:statusUpdate(591)
) - map
2017-03-30 17:27:59,970 INFO mapred.Task (
Task.java:sendDone(1158)
) - Task 'attempt_local1924825305_0001_m_000000_0' done.
2017-03-30 17:27:59,970 INFO mapred.LocalJobRunner (
LocalJobRunner.java:run(249)
) - Finishing task: attempt_local1924825305_0001_m_000000_0
2017-03-30 17:27:59,971 INFO mapred.LocalJobRunner (
LocalJobRunner.java:runTasks(456)
) - map task executor complete.
2017-03-30 17:27:59,978 INFO mapred.LocalJobRunner (
LocalJobRunner.java:runTasks(448)
) - Waiting for reduce tasks
2017-03-30 17:27:59,978 INFO mapred.LocalJobRunner (
LocalJobRunner.java:run(302)
) - Starting task: attempt_local1924825305_0001_r_000000_0
2017-03-30 17:28:00,008 INFO output.FileOutputCommitter (
FileOutputCommitter.java:<init>(100)
) - File Output Committer Algorithm version is 1
2017-03-30 17:28:00,014 INFO util.ProcfsBasedProcessTree (
ProcfsBasedProcessTree.java:isAvailable(192)
) - ProcfsBasedProcessTree currently is supported only on Linux.
2017-03-30 17:28:00,419 INFO mapreduce.Job (
Job.java:monitorAndPrintJob(1367)
) - map 100% reduce 0%
2017-03-30 17:28:00,521 INFO mapred.Task (
Task.java:initialize(612)
) - Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@82ac87a
2017-03-30 17:28:00,568 INFO mapred.ReduceTask (
ReduceTask.java:run(362)
) - Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@7bb8724c
2017-03-30 17:28:00,624 INFO reduce.MergeManagerImpl (
MergeManagerImpl.java:<init>(197)
) - MergerManager: memoryLimit=1325976704, maxSingleShuffleLimit=331494176, mergeThreshold=875144640, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2017-03-30 17:28:00,632 INFO reduce.EventFetcher (
EventFetcher.java:run(61)
) - attempt_local1924825305_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
2017-03-30 17:28:00,838 INFO reduce.LocalFetcher (
LocalFetcher.java:copyMapOutput(144)
) - localfetcher#1 about to shuffle output of map attempt_local1924825305_0001_m_000000_0 decomp: 30041 len: 30045 to MEMORY
2017-03-30 17:28:00,889 INFO reduce.InMemoryMapOutput (
InMemoryMapOutput.java:shuffle(100)
) - Read 30041 bytes from map-output for attempt_local1924825305_0001_m_000000_0
2017-03-30 17:28:00,894 INFO reduce.MergeManagerImpl (
MergeManagerImpl.java:closeInMemoryFile(315)
) - closeInMemoryFile -> map-output of size: 30041, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->30041
2017-03-30 17:28:00,899 INFO reduce.EventFetcher (
EventFetcher.java:run(76)
) - EventFetcher is interrupted.. Returning
2017-03-30 17:28:00,901 INFO mapred.LocalJobRunner (
LocalJobRunner.java:statusUpdate(591)
) - 1 / 1 copied.
2017-03-30 17:28:00,902 INFO reduce.MergeManagerImpl (
MergeManagerImpl.java:finalMerge(687)
) - finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
2017-03-30 17:28:00,987 INFO mapred.Merger (
Merger.java:merge(606)
) - Merging 1 sorted segments
2017-03-30 17:28:00,988 INFO mapred.Merger (
Merger.java:merge(705)
) - Down to the last merge-pass, with 1 segments left of total size: 30035 bytes
2017-03-30 17:28:01,037 INFO reduce.MergeManagerImpl (
MergeManagerImpl.java:finalMerge(754)
) - Merged 1 segments, 30041 bytes to disk to satisfy reduce memory limit
2017-03-30 17:28:01,043 INFO reduce.MergeManagerImpl (
MergeManagerImpl.java:finalMerge(784)
) - Merging 1 files, 30045 bytes from disk
2017-03-30 17:28:01,045 INFO reduce.MergeManagerImpl (
MergeManagerImpl.java:finalMerge(799)
) - Merging 0 segments, 0 bytes from memory into reduce
2017-03-30 17:28:01,046 INFO mapred.Merger (
Merger.java:merge(606)
) - Merging 1 sorted segments
2017-03-30 17:28:01,057 INFO mapred.Merger (
Merger.java:merge(705)
) - Down to the last merge-pass, with 1 segments left of total size: 30035 bytes
2017-03-30 17:28:01,059 INFO mapred.LocalJobRunner (
LocalJobRunner.java:statusUpdate(591)
) - 1 / 1 copied.
2017-03-30 17:28:01,086 INFO Configuration.deprecation (
Configuration.java:warnOnceIfDeprecated(1173)
) - mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2017-03-30 17:28:01,378 INFO mapred.Task (
Task.java:done(1038)
) - Task:attempt_local1924825305_0001_r_000000_0 is done. And is in the process of committing
2017-03-30 17:28:01,389 INFO mapred.LocalJobRunner (
LocalJobRunner.java:statusUpdate(591)
) - 1 / 1 copied.
2017-03-30 17:28:01,390 INFO mapred.Task (
Task.java:commit(1199)
) - Task attempt_local1924825305_0001_r_000000_0 is allowed to commit now
2017-03-30 17:28:01,410 INFO output.FileOutputCommitter (
FileOutputCommitter.java:commitTask(482)
) - Saved output of task 'attempt_local1924825305_0001_r_000000_0' to file:/D:/output/_temporary/0/task_local1924825305_0001_r_000000
2017-03-30 17:28:01,412 INFO mapred.LocalJobRunner (
LocalJobRunner.java:statusUpdate(591)
) - reduce > reduce
2017-03-30 17:28:01,412 INFO mapred.Task (
Task.java:sendDone(1158)
) - Task 'attempt_local1924825305_0001_r_000000_0' done.
2017-03-30 17:28:01,413 INFO mapred.LocalJobRunner (
LocalJobRunner.java:run(325)
) - Finishing task: attempt_local1924825305_0001_r_000000_0
2017-03-30 17:28:01,414 INFO mapred.LocalJobRunner (
LocalJobRunner.java:runTasks(456)
) - reduce task executor complete.
2017-03-30 17:28:01,421 INFO mapreduce.Job (
Job.java:monitorAndPrintJob(1367)
) - map 100% reduce 100%
2017-03-30 17:28:02,423 INFO mapreduce.Job (
Job.java:monitorAndPrintJob(1378)
) - Job job_local1924825305_0001 completed successfully
2017-03-30 17:28:02,462 INFO mapreduce.Job (
Job.java:monitorAndPrintJob(1385)
) - Counters: 30
File System Counters
FILE: Number of bytes read=94574
FILE: Number of bytes written=542839
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=322
Map output records=2382
Map output bytes=25275
Map output materialized bytes=30045
Input split bytes=83
Combine input records=0
Combine output records=0
Reduce input groups=760
Reduce shuffle bytes=30045
Reduce input records=2382
Reduce output records=760
Spilled Records=4764
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=0
Total committed heap usage (bytes)=468713472
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=17087
File Output Format Counters
Bytes Written=8290
8. 查看测试输出结果
在Windows资源管理器中找到文件“D:/output/part-r-00000”,并将其拖拽到Eclipse中查看输出结果。
9. 将程序打包成Jar文件
首先右击工程
WordCount
,选择菜单“Export”。再选择菜单“Java -> JAR file”,点击按钮“Next”。 然后在文本框“JAR file:”右侧点击按钮“Browse…”,选择桌面路径,并将文件命名为“wordcount.jar”。最后点击按钮“Finish”。
10.
上传程序Jar文件到集群
11.提交MapReduce程序
调用命令
yarn jar
提交MapReduce程序。需要注意的是,必须保证输出路径不存在,不然作业会执行失败。可以调用命令
hdfs dfs -rm -r
删除HDFS上的文件夹。