IntelliJ IDEA 远程调试Hadoop

开发环境   IntelliJ IDEA 2017.1.3 

JDK VERSION  jdk 1.8

Hadoop 版本 hadoop1.0.0

虚拟机 完全分布式 

node1  172.16.20.101  master 

node2  172.16.20.102  slave1

node3  172.16.20.103  slave2 


由于当前关于Hadoop2.x的书籍国内还是很少  所以本人从hadoop1.x 开始入门 推荐书籍 《Hadoop 实战》《Hadoop 权威指南》


Eclipse 网上有很多DFS插件 开发起来比较容易 但是IDEA上的插件比较少 本文主要讨论如何在IDEA上远程调试Hadoop  


一 、 Maven 构建 Hadoop开发环境 

<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-core -->
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-core</artifactId>
    <version>1.0.0</version>
</dependency>

二、添加配置文件

直接从Master的 $HADOOP_HOME/conf 下拷贝 



  core-site.xml 

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/hadoop/tmp</value>
        <description>A base for other temporary directories. </description>
    </property>
    <!-- file system properties  -->
    <property>
        <name>fs.default.name</name>
        <value>hdfs://172.16.20.101:9000</value>
    </property>
</configuration>

mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>mapred.job.tracker</name>
        <value>http://172.16.20.101:9001</value>
    </property>
</configuration>

由于Hadoop在运行下面的代码时会自动加载classpath中shang's配置文件 

Configuration conf = new Configuration();

三、运行WordCount 实例  

Hadoop example.jar 里面直接拷贝代码并进行改写


package com.hadoop.wordcount;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

import java.io.IOException;
import java.util.StringTokenizer;

/**
 * Created by nanzhou on 2017/9/13.
 */
public class WordCount {

    public static class TokenizerMapper
            extends Mapper<Object, Text, Text, IntWritable> {
        private static final IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(Object key, Text value, Mapper<Object, Text, Text, IntWritable>.Context context)
                throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                this.word.set(itr.nextToken());
                context.write(this.word, one);
            }
        }
    }

    public static class IntSumReducer
            extends Reducer<Text, IntWritable, Text, IntWritable> {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, Text, IntWritable>.Context context)
                throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            this.result.set(sum);
            context.write(key, this.result);
        }
    }

    public static void main(String[] args)
            throws Exception {
        Configuration conf = new Configuration();

        String[] ioArgs = new String[]{"/user/hadoop/input", "/user/hadoop/output"};
        String[] otherArgs = new GenericOptionsParser(conf, ioArgs).getRemainingArgs();
        if (otherArgs.length != 2) {
            System.err.println("Usage: wordcount <in> <out>");
            System.exit(2);
        }
        JobConf jobConf = new JobConf();
        jobConf.setJar("/Applications/file/work/JavaProject/hadoopbasic/target/hadoop-basic-1.0-SNAPSHOT.jar");

        Job job = new Job(jobConf, "word count");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

NOTICE: 


(1) 很多在本地搭建运行Hadoop会出现权限的问题 解决方法有两种

         <1> 将自己的用户名改为 Hadoop上的用户名  列如hadoop

         <2> 更改配置hdfs-site.xml   

<property> 
<name>dfs.permissions</name> 
<value>true</value> 
</property>

 (2) 运行MapReduce时 会出现 Map以及Reduce class not found的情况 

   需要代码上 加上 JobConf 配置  指定本地Jar包的地址 就可以实现Eclipse插件 Run on hadoop 的作用 

   运行时需要利用Maven 重新 install 工程 


 本文源码地址   https://github.com/stupidcupid/hadoop-1.x  

  






  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值