一、Hadoop 运行本地程序
(1)编译(SequenceFileWriterDemo.java)
javac -classpath /usr/local/Cellar/hadoop/2.8.0/libexec/share/hadoop/common/hadoop-common-2.8.0.jar SequenceFileWriterDemo.java -d classes
其中:
-classpath:指定编译需要的 jar 包位置
/usr/local/Cellar/hadoop/2.8.0/libexec/share/hadoop/common/hadoop-common-2.8.0.jar:本地安装 hadoop 的 jar 包位置(本地环境是mac)
-d:指定生成的类文件的位置,后面跟 classes 表示将生成的 class 文件放到 classes 目录下(目录必须存在)
注:classes所在目录为:/Users/zhuqiuhui/Downloads/classes
(2)运行
export HADOOP_CLASSPATH=/Users/zhuqiuhui/Downloads/classes (设置HADOOP_CLASSPATH环境变量用于添加应用程序类的路径,这里的路径是用户本地的文件路径,classes目录存放了刚才编译生成的 SequenceFileWriterDemo.class)
hadoop SequenceFileWriterDemo res.txt (当前目录是classes中,res.txt即放在了伪分布式环境中)
注:其中 core-site.xml (指定了默认的HDFS路径)
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
(3)代码及输出结果
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;
import java.io.IOException;
import java.net.URI;
/**
* Created by zhuqiuhui on 2017/7/13.
*/
public class SequenceFileWriterDemo {
private static final String[] DATA = {
"One, two, buckle my shoe",
"Three, four, shut the door",
"Five, six, pick up sticks",
"Seven, eight, lay them straight",
"Nine, ten, a big fat hen"
};
public static void main(String[] args) throws IOException {
String uri = args[0];
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
Path path = new Path(uri);
IntWritable key = new IntWritable();
Text value = new Text();
SequenceFile.Writer writer = null;
writer = SequenceFile.createWriter(fs, conf, path, key.getClass(), value.getClass());
for(int i = 0;i<100; ++i) {
key.set(100-i);
value.set(DATA[i%DATA.length]);
System.out.printf("[%s]\t%s\t%s\n", writer.getLength(), key, value);
writer.append(key, value);
}
IOUtils.closeStream(writer);
}
}
输出结果:
[128] 100 One, two, buckle my shoe
[173] 99 Three, four, shut the door
[220] 98 Five, six, pick up sticks
[264] 97 Seven, eight, lay them straight
[314] 96 Nine, ten, a big fat hen
[359] 95 One, two, buckle my shoe
[404] 94 Three, four, shut the door
[451] 93 Five, six, pick up sticks
[495] 92 Seven, eight, lay them straight
[545] 91 Nine, ten, a big fat hen
[590] 90 One, two, buckle my shoe
[635] 89 Three, four, shut the door
[682] 88 Five, six, pick up sticks
[726] 87 Seven, eight, lay them straight
[776] 86 Nine, ten, a big fat hen
[821] 85 One, two, buckle my shoe
[866] 84 Three, four, shut the door
[913] 83 Five, six, pick up sticks
[957] 82 Seven, eight, lay them straight
[1007] 81 Nine, ten, a big fat hen
[1052] 80 One, two, buckle my shoe
[1097] 79 Three, four, shut the door
[1144] 78 Five, six, pick up sticks
[1188] 77 Seven, eight, lay them straight
[1238] 76 Nine, ten, a big fat hen
[1283] 75 One, two, buckle my shoe
[1328] 74 Three, four, shut the door
[1375] 73 Five, six, pick up sticks
[1419] 72 Seven, eight, lay them straight
[1469] 71 Nine, ten, a big fat hen
[1514] 70 One, two, buckle my shoe
[1559] 69 Three, four, shut the door
[1606] 68 Five, six, pick up sticks
[1650] 67 Seven, eight, lay them straight
[1700] 66 Nine, ten, a big fat hen
[1745] 65 One, two, buckle my shoe
[1790] 64 Three, four, shut the door
[1837] 63 Five, six, pick up sticks
[1881] 62 Seven, eight, lay them straight
[1931] 61 Nine, ten, a big fat hen
[1976] 60 One, two, buckle my shoe
[2021] 59 Three, four, shut the door
[2088] 58 Five, six, pick up sticks
[2132] 57 Seven, eight, lay them straight
[2182] 56 Nine, ten, a big fat hen
[2227] 55 One, two, buckle my shoe
[2272] 54 Three, four, shut the door
[2319] 53 Five, six, pick up sticks
[2363] 52 Seven, eight, lay them straight
[2413] 51 Nine, ten, a big fat hen
[2458] 50 One, two, buckle my shoe
[2503] 49 Three, four, shut the door
[2550] 48 Five, six, pick up sticks
[2594] 47 Seven, eight, lay them straight
[2644] 46 Nine, ten, a big fat hen
[2689] 45 One, two, buckle my shoe
[2734] 44 Three, four, shut the door
[2781] 43 Five, six, pick up sticks
[2825] 42 Seven, eight, lay them straight
[2875] 41 Nine, ten, a big fat hen
[2920] 40 One, two, buckle my shoe
[2965] 39 Three, four, shut the door
[3012] 38 Five, six, pick up sticks
[3056] 37 Seven, eight, lay them straight
[3106] 36 Nine, ten, a big fat hen
[3151] 35 One, two, buckle my shoe
[3196] 34 Three, four, shut the door
[3243] 33 Five, six, pick up sticks
[3287] 32 Seven, eight, lay them straight
[3337] 31 Nine, ten, a big fat hen
[3382] 30 One, two, buckle my shoe
[3427] 29 Three, four, shut the door
[3474] 28 Five, six, pick up sticks
[3518] 27 Seven, eight, lay them straight
[3568] 26 Nine, ten, a big fat hen
[3613] 25 One, two, buckle my shoe
[3658] 24 Three, four, shut the door
[3705] 23 Five, six, pick up sticks
[3749] 22 Seven, eight, lay them straight
[3799] 21 Nine, ten, a big fat hen
[3844] 20 One, two, buckle my shoe
[3889] 19 Three, four, shut the door
[3936] 18 Five, six, pick up sticks
[3980] 17 Seven, eight, lay them straight
[4030] 16 Nine, ten, a big fat hen
[4075] 15 One, two, buckle my shoe
[4140] 14 Three, four, shut the door
[4187] 13 Five, six, pick up sticks
[4231] 12 Seven, eight, lay them straight
[4281] 11 Nine, ten, a big fat hen
[4326] 10 One, two, buckle my shoe
[4371] 9 Three, four, shut the door
[4418] 8 Five, six, pick up sticks
[4462] 7 Seven, eight, lay them straight
[4512] 6 Nine, ten, a big fat hen
[4557] 5 One, two, buckle my shoe
[4602] 4 Three, four, shut the door
[4649] 3 Five, six, pick up sticks
[4693] 2 Seven, eight, lay them straight
[4743] 1 Nine, ten, a big fat hen
二、Hadoop 运行伪分布式程序(WordCount)
(1)编译
javac -classpath /usr/local/Cellar/hadoop/2.8.0/libexec/share/hadoop/common/hadoop-common-2.8.0.jar:/usr/local/Cellar/hadoop/2.8.0/libexec/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.8.0.jar WordCount.java -d classes
在 classes目录下出现WordCount.class、WordCount$IntSumReducer.class、WordCount$TokenizerMapper.class
(2)打包
jar -cvf wordCount.jar classes (打成wordCount.jar)
(3)运行
hadoop jar wordCount.jar WordCount hdfs://localhost:9000/count.txt /output
主程序需要两个参数,一个输入参数(输入文件),一个输出参数(输出文件)
输入文件已经在 hdfs 根目录中,即 input.txt,其内容:
hadoop mapreduce
hadoop yarn
hadoop hdfs
hadoop mapreduce
hadoop yarn
hadoop hdfs
zqh gkn
lzy zqh
(4)代码及输出结果
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import java.io.IOException;
import java.util.StringTokenizer;
/**
* Created by zhuqiuhui on 2017/7/14.
*/
public class WordCount extends Configured implements Tool {
public int run(String[] args) throws Exception {
Configuration conf = new Configuration();
if(args.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "wordcount");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
return job.waitForCompletion(true)?0:1;
}
public class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
IntWritable one = new IntWritable(1);
Text word = new Text();
public void map(Object key, Text value, Context context) throws IOException,InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while(itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException,InterruptedException {
int sum = 0;
for(IntWritable val:values) {
sum += val.get();
}
result.set(sum);
context.write(key,result);
}
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new WordCount(), args);
System.exit(exitCode);
}
}
17/07/14 14:33:44 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/07/14 14:33:45 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
17/07/14 14:33:45 WARN mapreduce.JobResourceUploader: No job jar file set. User classes may not be found. See Job or Job#setJar(String).
17/07/14 14:33:45 INFO input.FileInputFormat: Total input files to process : 1
17/07/14 14:33:45 INFO mapreduce.JobSubmitter: number of splits:1
17/07/14 14:33:45 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1499913113300_0006
17/07/14 14:33:45 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
17/07/14 14:33:45 INFO impl.YarnClientImpl: Submitted application application_1499913113300_0006
17/07/14 14:33:45 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1499913113300_0006/
17/07/14 14:33:45 INFO mapreduce.Job: Running job: job_1499913113300_0006
17/07/14 14:33:51 INFO mapreduce.Job: Job job_1499913113300_0006 running in uber mode : false
17/07/14 14:33:51 INFO mapreduce.Job: map 0% reduce 0%
17/07/14 14:33:54 INFO mapreduce.Job: Task Id : attempt_1499913113300_0006_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class WordCount$TokenizerMapper not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2216)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
Caused by: java.lang.ClassNotFoundException: Class WordCount$TokenizerMapper not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2122)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2214)
... 8 more
17/07/14 14:33:58 INFO mapreduce.Job: Task Id : attempt_1499913113300_0006_m_000000_1, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class WordCount$TokenizerMapper not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2216)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
Caused by: java.lang.ClassNotFoundException: Class WordCount$TokenizerMapper not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2122)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2214)
... 8 more
17/07/14 14:34:01 INFO mapreduce.Job: Task Id : attempt_1499913113300_0006_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class WordCount$TokenizerMapper not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2216)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
Caused by: java.lang.ClassNotFoundException: Class WordCount$TokenizerMapper not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2122)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2214)
... 8 more
17/07/14 14:34:06 INFO mapreduce.Job: map 100% reduce 100%
17/07/14 14:34:06 INFO mapreduce.Job: Job job_1499913113300_0006 failed with state FAILED due to: Task failed task_1499913113300_0006_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
17/07/14 14:34:07 INFO mapreduce.Job: Counters: 13
Job Counters
Failed map tasks=4
Killed reduce tasks=1
Launched map tasks=4
Other local map tasks=3
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=7720
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=7720
Total time spent by all reduce tasks (ms)=0
Total vcore-milliseconds taken by all map tasks=7720
Total vcore-milliseconds taken by all reduce tasks=0
Total megabyte-milliseconds taken by all map tasks=7905280
Total megabyte-milliseconds taken by all reduce tasks=0
上面没有找到类, 执行mapreduce出现的错,原因是map类和reduce没有加static修饰 ,因为Hadoop在调用map和reduce类时采用的反射调用,内部类不是静态的,没有获取到内部类的实例。对两个静态内部类加上static即可。
三、注意
注:在Hadoop集群中运行作业的时候,必须要将程序打包为jar文件。
在Hadoop本地和伪分布中可以运行jar文件,也可以直接运行class文件,注意直接运行class文件,必须是没有map和reducer的,直接获取FileSystem来进行操作。