环境
win10
hadoop2.9.0
前期准备
已安装eclipse
win10中已经解压hadoop2.9.0的安装包
已有插件JAR包:hadoop-eclipse-plugin-2.9.0.jar
下载不到,或者用不了,请参照该连接的内容编译jar包
已有2.9.0的hadoop.dll 和winutils.exe
将winutils.exe放到windows中hadoop根目录的bin下
将hadoop.dll放到windows中c:/windows/system32下
下载地址hadoop2.9.0插件套装
强调:
其中插件包是hadoop-eclipse-plugin-2.5.1.jar
hadoop.ddl和winutils.exe是hadoop2.9.0的,有64位和32位的
亲测hadoop.ddl和winutils.exe可用
这个2.5.1的插件包在hadoop2.9.0中也能能。
2.5.1插件包能不能用主要是看eclipse的版本。
我用的eclipse版本是:Version: Mars.2 Release (4.5.2)
一,eclipse插件配置
1.把hadoop-eclipse-plugin-2.9.0.jar复制到eclipse目录下的plugin目录中。重启eclipse
2.windows----->preferences----->hadoop map/reduce
选择本地环境(windows)中解压的hadoop的路径
3.打开map/reduce的视图
4.配置hadoop location
新建,然后配置如下
location name随便取
user name 是远程服务的user name
host是远程分布式hadoop中namenode的IP
port是namenode的core-site.xml文件中 fs.defaultFS配置的port
二、eclipse操作hdfs文件
这里就能看到远程服务器中HDFS中的文件
右键点击DFS Locations可以刷新
双击文件可以查看内容
点击右键可以下载、上传、新增路径、删除HDFS中的内容
如果不能上传,并且提示XXX用户没有权限
是因为会以windows当前登录的用户去操作远程的DFS
解决办法
如果是测试机直接修改hdfs-site.xml文件,
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
三、创建mapreduce项目
右键点击刚创建的项目的名称,选择new-->class
打开类WordCount,加入一下代码
package hadoop.test;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
/**
* @author liyijie
* @date 2018年5月10日下午11:09:02
* @email 37024760@qq.com
* @remark
* @version
*/
public class WordCount{
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException
{
Configuration conf=new Configuration();
Job job=Job.getInstance(conf);
job.setJobName("word count");
job.setJarByClass(WordCount.class);
/**
* 这里实现的是远程提交方法,
* 所以在远程提交时需要将任务的jar包发送到集群中,
* 需要先将jar打好在相应的文件中,
* 然后在程序中,通过下行代码指定jar的位置。.这样就会提交到集群中
* 不然会报错提示找不到WordCount.class
* */
job.setJar("F:\\eclipseworkspace\\wordcount\\wordcount.jar");
//配置任务map和reduce类
job.setMapperClass(WordCountMap.class);
job.setReducerClass(WordCountReduce.class);
//输出类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
//文件格式
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
//设置输出输入路径
//hdfs://node1:9000/data/input/wordcount hdfs://node1:9000/data/output/wordcount
FileInputFormat.addInputPath(job,new Path("hdfs://node1:9000/data/input/wordcount"));
FileOutputFormat.setOutputPath(job, new Path("hdfs://node1:9000/data/output/wordcount"));
//启动任务
job.waitForCompletion(true);
}
public static class WordCountMap extends Mapper<LongWritable, Text, Text, IntWritable>
{
private static Text outKey=new Text();
private static IntWritable outValue=new IntWritable(1);
@Override
protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException {
String words=value.toString();
StringTokenizer tokenizer=new StringTokenizer(words);
while(tokenizer.hasMoreTokens())
{
String word=tokenizer.nextToken();
outKey.set(word);
context.write(outKey, outValue);
}
}
}
public static class WordCountReduce extends Reducer<Text, IntWritable, Text, IntWritable>
{
private static IntWritable outValue=new IntWritable();
@Override
protected void reduce(Text arg0, Iterable<IntWritable> arg1,
Reducer<Text, IntWritable, Text, IntWritable>.Context arg2) throws IOException, InterruptedException {
int sum=0;
for(IntWritable i:arg1)
{
sum+=i.get();
}
outValue.set(sum);
arg2.write(arg0,outValue);
}
}
}
四、运行
1.将远程机器中hadoop的修改过的配置文件(hadoop根目录etc/hadoop下)放入eclipse该项目的src中
(这里只修改了core-site.xml hdfs-site.xml mapred-site.xml yarn-site.xml)
还要需要log4j.properties文件,也在hadoop根目录etc/hadoop下
2.对着项目名字点击右键,选择export,
将项目打成jar包,选择路径跟代码中写jar包的路径相同
job.setJar("F:\\eclipseworkspace\\wordcount\\wordcount.jar");
3.右键点击WordCount.java选择run as ---->run configurations
新建一个java application输入参数
也可以是在代码中设置输入输出文件的参数
//设置输出输入路径
//hdfs://node1:9000/data/input/wordcount hdfs://node1:9000/data/output/wordcount
FileInputFormat.addInputPath(job,new Path("hdfs://node1:9000/data/input/wordcount"));
FileOutputFormat.setOutputPath(job, new Path("hdfs://node1:9000/data/output/wordcount"));
确保HDFS中input路径有需要单词计数的文件
确保output路径下为没有wordcount路径
3.右键点击WordCount.java选择run as ---->run on hadoop
运行成功
查看结果
1.报错解决:
/bin/bash: line 0: fg: no job control
在服务器hadoop的配置文件mapred-site.xml中添加
<property>
<name>mapreduce.app-submission.cross-platform</name>
<value>true</value>
</property>
<property>
<name>mapred.remote.os</name>
<value>Linux</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>
/app/hadoop/hadoop-2.9.0/etc/hadoop,
/app/hadoop/hadoop-2.9.0/share/hadoop/common/*,
/app/hadoop/hadoop-2.9.0/share/hadoop/common/lib/*,
/app/hadoop/hadoop-2.9.0/share/hadoop/hdfs/*,
/app/hadoop/hadoop-2.9.0/share/hadoop/hdfs/lib/*,
/app/hadoop/hadoop-2.9.0/share/hadoop/mapreduce/*,
/app/hadoop/hadoop-2.9.0/share/hadoop/mapreduce/lib/*,
/app/hadoop/hadoop-2.9.0/share/hadoop/yarn/*,
/app/hadoop/hadoop-2.9.0/share/hadoop/yarn/lib/*
</value>
</property>
在服务器hadoop的配置文件yarn-site.xml中添加
<property>
<name>yarn.application.classpath</name>
<value>
/app/hadoop/hadoop-2.9.0/etc/hadoop,
/app/hadoop/hadoop-2.9.0/share/hadoop/common/*,
/app/hadoop/hadoop-2.9.0/share/hadoop/common/lib/*,
/app/hadoop/hadoop-2.9.0/share/hadoop/hdfs/*,
/app/hadoop/hadoop-2.9.0/share/hadoop/hdfs/lib/*,
/app/hadoop/hadoop-2.9.0/share/hadoop/mapreduce/*,
/app/hadoop/hadoop-2.9.0/share/hadoop/mapreduce/lib/*,
/app/hadoop/hadoop-2.9.0/share/hadoop/yarn/*,
/app/hadoop/hadoop-2.9.0/share/hadoop/yarn/lib/*
</value>
</property>
注意使用绝对路径
记得更新eclipse中项目src中对应的文件
2.报错解决
18/05/11 20:29:17 INFO mapreduce.Job: Task Id : attempt_1526041648588_0001_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class hadoop.test.WordCount$TokenizerMapper not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2395)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:751)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171)
Caused by: java.lang.ClassNotFoundException: Class hadoop.test.WordCount$TokenizerMapper not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2299)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2393)
... 8 more
/**
* 这里实现的是远程提交方法,
* 所以在远程提交时需要将任务的jar包发送到集群中,
* 需要先将jar打好在相应的文件中,
* 然后在程序中,通过下行代码指定jar的位置。.这样就会提交到集群中
* 不然会报错提示找不到WordCount.class
* */
job.setJar("F:\\eclipseworkspace\\wordcount\\wordcount.jar");