mapReduce可以分为2部分,map以及reduce。
map的作用是将业务数字化,模型化。然后指定,reduce函数保证所有映射的键值对中的每一个共享相同的键组。
不管本地运行,还是远程运行,都需要在本地配置hadoop环境。
1. 将hadoop-3.2.1.tar.gz解压至本地目录
2.将winutils.exe以及hadoop.dll放在hadoop-3.2.1\bin目录下
3.配置环境变量
path配置
4. hadoop 3.2.1以上需要配置yarn-site.yml
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>node01</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>node02</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>node02</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/*,$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
之后的就与网上其他代码的一致了。
本地运行
本地main函数
// 1 获取配置信息以及封装任务
Configuration configuration = new Configuration();
configuration.set("fs.defaultFS", "file:///");
System.setProperty("HADOOP_USER_NAME", "root");
Job job = Job.getInstance(configuration);
// 2 设置jar加载路径
job.setJarByClass(WordCountDriver.class);
// 3 设置map和reduce类
job.setMapperClass(WordCountMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setReducerClass(WordCountReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// 6 设置输入和输出路径
FileInputFormat.setInputPaths(job, new Path("D:\\hadoop_demo\\input\\word.txt"));
FileOutputFormat.setOutputPath(job, new Path("D:\\hadoop_demo\\input\\word"));
// 7 提交
boolean result = job.waitForCompletion(true);
运行成功后
结果在part-r-00000中。
远程运行
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://192.168.100.5:8020");
conf.set("yarn.resourcemanager.hostname", "192.168.100.5:8088");
conf.set("dfs.permissions","false");
Job job = Job.getInstance(conf);
job.setJobName("job01..");
job.setJarByClass(JobRunner.class);
job.setMapperClass(FriendsMapper.class);
job.setReducerClass(FriendsReduce.class);
job.setMapOutputKeyClass(FoF.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path("/hdfsinput/input/data.txt"));
FileSystem fs = FileSystem.get(conf);
Path out = new Path("/hdfsinput/output/01/");
if (fs.exists(out)) {
fs.delete(out, true);
}
FileOutputFormat.setOutputPath(job, out);
return job.waitForCompletion(true);