【五】win10下Eclipse配置hadoop-eclipse-plugin 插件

最新推荐文章于 2022-04-18 08:09:39 发布

jy02268879

最新推荐文章于 2022-04-18 08:09:39 发布

阅读量3.8k

点赞数

分类专栏： hadoop 文章标签： Eclipse配置hadoop windows hadoop eclipsec

本文链接：https://blog.csdn.net/jy02268879/article/details/80273683

版权

hadoop 专栏收录该内容

13 篇文章 4 订阅

订阅专栏

环境

win10

hadoop2.9.0

前期准备

已安装eclipse

win10中已经解压hadoop2.9.0的安装包

已有插件JAR包：hadoop-eclipse-plugin-2.9.0.jar

下载不到，或者用不了，请参照该连接的内容编译jar包

hadoop-eclipse编译对应版本的插件

已有2.9.0的hadoop.dll 和winutils.exe

将winutils.exe放到windows中hadoop根目录的bin下

将hadoop.dll放到windows中c:/windows/system32下

下载地址hadoop2.9.0插件套装

强调：

其中插件包是hadoop-eclipse-plugin-2.5.1.jar
hadoop.ddl和winutils.exe是hadoop2.9.0的，有64位和32位的
亲测hadoop.ddl和winutils.exe可用
这个2.5.1的插件包在hadoop2.9.0中也能能。
2.5.1插件包能不能用主要是看eclipse的版本。
我用的eclipse版本是：Version: Mars.2 Release (4.5.2)

一，eclipse插件配置

1.把hadoop-eclipse-plugin-2.9.0.jar复制到eclipse目录下的plugin目录中。重启eclipse

2.windows----->preferences----->hadoop map/reduce

选择本地环境（windows）中解压的hadoop的路径

3.打开map/reduce的视图

4.配置hadoop location

新建，然后配置如下

location name随便取

user name 是远程服务的user name

host是远程分布式hadoop中namenode的IP

port是namenode的core-site.xml文件中 fs.defaultFS配置的port

二、eclipse操作hdfs文件

这里就能看到远程服务器中HDFS中的文件

右键点击DFS Locations可以刷新

双击文件可以查看内容

点击右键可以下载、上传、新增路径、删除HDFS中的内容

如果不能上传，并且提示XXX用户没有权限

是因为会以windows当前登录的用户去操作远程的DFS

解决办法

如果是测试机直接修改hdfs-site.xml文件，

<property>
<name>dfs.permissions</name>
<value>false</value>

</property>

三、创建mapreduce项目

右键点击刚创建的项目的名称，选择new-->class

打开类WordCount，加入一下代码

package hadoop.test; 


import java.io.IOException;  
import java.util.StringTokenizer;  
  
import org.apache.hadoop.conf.Configuration;  
import org.apache.hadoop.fs.Path;  
import org.apache.hadoop.io.IntWritable;  
import org.apache.hadoop.io.LongWritable;  
import org.apache.hadoop.io.Text;  
import org.apache.hadoop.mapreduce.Job;  
import org.apache.hadoop.mapreduce.Mapper;  
import org.apache.hadoop.mapreduce.Reducer;  
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;  
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;  
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;  
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;  


/**
 * @author liyijie
 * @date 2018年5月10日下午11:09:02
 * @email 37024760@qq.com
 * @remark
 * @version 
 */
public class WordCount{
    
  public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException  
  {  
      Configuration conf=new Configuration();  
      Job job=Job.getInstance(conf);  
      job.setJobName("word count");  
      job.setJarByClass(WordCount.class);  
      /**
       * 这里实现的是远程提交方法，
       * 所以在远程提交时需要将任务的jar包发送到集群中，
       * 需要先将jar打好在相应的文件中，
       * 然后在程序中，通过下行代码指定jar的位置。.这样就会提交到集群中
       * 不然会报错提示找不到WordCount.class
       * */
      job.setJar("F:\\eclipseworkspace\\wordcount\\wordcount.jar");  
      //配置任务map和reduce类  
      job.setMapperClass(WordCountMap.class);  
      job.setReducerClass(WordCountReduce.class);  
      //输出类型  
      job.setOutputKeyClass(Text.class);  
      job.setOutputValueClass(IntWritable.class);  
      //文件格式  
      job.setInputFormatClass(TextInputFormat.class);  
      job.setOutputFormatClass(TextOutputFormat.class);  
      //设置输出输入路径  
      //hdfs://node1:9000/data/input/wordcount hdfs://node1:9000/data/output/wordcount
      FileInputFormat.addInputPath(job,new Path("hdfs://node1:9000/data/input/wordcount"));  
      FileOutputFormat.setOutputPath(job, new Path("hdfs://node1:9000/data/output/wordcount"));  
      //启动任务  
      job.waitForCompletion(true);  
  }  
    
  public static class WordCountMap extends Mapper<LongWritable, Text, Text, IntWritable>  
  {  
      private static Text outKey=new Text();  
      private static IntWritable outValue=new IntWritable(1);  
      @Override  
      protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)  
              throws IOException, InterruptedException {  
          String words=value.toString();  
          StringTokenizer tokenizer=new StringTokenizer(words);  
          while(tokenizer.hasMoreTokens())  
          {  
              String word=tokenizer.nextToken();  
              outKey.set(word);  
              context.write(outKey, outValue);  
          }  
      }  
  }  
    
  public static class WordCountReduce extends Reducer<Text, IntWritable, Text, IntWritable>  
  {  
      private static IntWritable outValue=new IntWritable();   
      @Override  
      protected void reduce(Text arg0, Iterable<IntWritable> arg1,  
              Reducer<Text, IntWritable, Text, IntWritable>.Context arg2) throws IOException, InterruptedException {  
          int sum=0;  
          for(IntWritable i:arg1)  
          {  
              sum+=i.get();  
          }  
          outValue.set(sum);  
          arg2.write(arg0,outValue);  
      }  
  } 


}

四、运行

1.将远程机器中hadoop的修改过的配置文件(hadoop根目录etc/hadoop下)放入eclipse该项目的src中

（这里只修改了core-site.xml hdfs-site.xml mapred-site.xml yarn-site.xml）

还要需要log4j.properties文件，也在hadoop根目录etc/hadoop下

2.对着项目名字点击右键，选择export，

将项目打成jar包，选择路径跟代码中写jar包的路径相同

job.setJar("F:\\eclipseworkspace\\wordcount\\wordcount.jar");

3.右键点击WordCount.java选择run as ---->run configurations

新建一个java application输入参数

也可以是在代码中设置输入输出文件的参数

 //设置输出输入路径  
      //hdfs://node1:9000/data/input/wordcount hdfs://node1:9000/data/output/wordcount
      FileInputFormat.addInputPath(job,new Path("hdfs://node1:9000/data/input/wordcount"));  
      FileOutputFormat.setOutputPath(job, new Path("hdfs://node1:9000/data/output/wordcount"));

确保HDFS中input路径有需要单词计数的文件

确保output路径下为没有wordcount路径

3.右键点击WordCount.java选择run as ---->run on hadoop

运行成功

查看结果

1.报错解决：

/bin/bash: line 0: fg: no job control

在服务器hadoop的配置文件mapred-site.xml中添加

<property>
<name>mapreduce.app-submission.cross-platform</name>
<value>true</value>
</property>
<property>
<name>mapred.remote.os</name>
<value>Linux</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>
/app/hadoop/hadoop-2.9.0/etc/hadoop,
/app/hadoop/hadoop-2.9.0/share/hadoop/common/*,
/app/hadoop/hadoop-2.9.0/share/hadoop/common/lib/*,
/app/hadoop/hadoop-2.9.0/share/hadoop/hdfs/*,
/app/hadoop/hadoop-2.9.0/share/hadoop/hdfs/lib/*,
/app/hadoop/hadoop-2.9.0/share/hadoop/mapreduce/*,
/app/hadoop/hadoop-2.9.0/share/hadoop/mapreduce/lib/*,
/app/hadoop/hadoop-2.9.0/share/hadoop/yarn/*,
/app/hadoop/hadoop-2.9.0/share/hadoop/yarn/lib/*
</value>

</property>

在服务器hadoop的配置文件yarn-site.xml中添加

<property>
<name>yarn.application.classpath</name>
<value>
/app/hadoop/hadoop-2.9.0/etc/hadoop,
/app/hadoop/hadoop-2.9.0/share/hadoop/common/*,
/app/hadoop/hadoop-2.9.0/share/hadoop/common/lib/*,
/app/hadoop/hadoop-2.9.0/share/hadoop/hdfs/*,
/app/hadoop/hadoop-2.9.0/share/hadoop/hdfs/lib/*,
/app/hadoop/hadoop-2.9.0/share/hadoop/mapreduce/*,
/app/hadoop/hadoop-2.9.0/share/hadoop/mapreduce/lib/*,
/app/hadoop/hadoop-2.9.0/share/hadoop/yarn/*,
/app/hadoop/hadoop-2.9.0/share/hadoop/yarn/lib/*
</value>

</property>

注意使用绝对路径

记得更新eclipse中项目src中对应的文件

2.报错解决

18/05/11 20:29:17 INFO mapreduce.Job: Task Id : attempt_1526041648588_0001_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class hadoop.test.WordCount$TokenizerMapper not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2395)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:751)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171)
Caused by: java.lang.ClassNotFoundException: Class hadoop.test.WordCount$TokenizerMapper not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2299)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2393)

... 8 more

      /**
       * 这里实现的是远程提交方法，
       * 所以在远程提交时需要将任务的jar包发送到集群中，
       * 需要先将jar打好在相应的文件中，
       * 然后在程序中，通过下行代码指定jar的位置。.这样就会提交到集群中
       * 不然会报错提示找不到WordCount.class
       * */
      job.setJar("F:\\eclipseworkspace\\wordcount\\wordcount.jar");