IntelliJ IDEA 远程调试Hadoop

最新推荐文章于 2022-08-30 07:56:19 发布

Angel_Heart_Java

最新推荐文章于 2022-08-30 07:56:19 发布

阅读量1.8k

点赞数

分类专栏： hadoop

本文链接：https://blog.csdn.net/Angel_Heart_Java/article/details/77989322

版权

hadoop 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

开发环境 IntelliJ IDEA 2017.1.3

JDK VERSION jdk 1.8

Hadoop 版本 hadoop1.0.0

虚拟机完全分布式

node1 172.16.20.101 master

node2 172.16.20.102 slave1

node3 172.16.20.103 slave2

由于当前关于Hadoop2.x的书籍国内还是很少所以本人从hadoop1.x 开始入门推荐书籍《Hadoop 实战》《Hadoop 权威指南》

Eclipse 网上有很多DFS插件开发起来比较容易但是IDEA上的插件比较少本文主要讨论如何在IDEA上远程调试Hadoop

一、 Maven 构建 Hadoop开发环境

<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-core -->
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-core</artifactId>
    <version>1.0.0</version>
</dependency>

二、添加配置文件

直接从Master的 $HADOOP_HOME/conf 下拷贝

core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/hadoop/tmp</value>
        <description>A base for other temporary directories. </description>
    </property>
    <!-- file system properties  -->
    <property>
        <name>fs.default.name</name>
        <value>hdfs://172.16.20.101:9000</value>
    </property>
</configuration>

mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>mapred.job.tracker</name>
        <value>http://172.16.20.101:9001</value>
    </property>
</configuration>

由于Hadoop在运行下面的代码时会自动加载classpath中shang's配置文件

Configuration conf = new Configuration();

三、运行WordCount 实例

Hadoop example.jar 里面直接拷贝代码并进行改写

package com.hadoop.wordcount;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

import java.io.IOException;
import java.util.StringTokenizer;

/**
 * Created by nanzhou on 2017/9/13.
 */
public class WordCount {

    public static class TokenizerMapper
            extends Mapper<Object, Text, Text, IntWritable> {
        private static final IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(Object key, Text value, Mapper<Object, Text, Text, IntWritable>.Context context)
                throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                this.word.set(itr.nextToken());
                context.write(this.word, one);
            }
        }
    }

    public static class IntSumReducer
            extends Reducer<Text, IntWritable, Text, IntWritable> {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, Text, IntWritable>.Context context)
                throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            this.result.set(sum);
            context.write(key, this.result);
        }
    }

    public static void main(String[] args)
            throws Exception {
        Configuration conf = new Configuration();

        String[] ioArgs = new String[]{"/user/hadoop/input", "/user/hadoop/output"};
        String[] otherArgs = new GenericOptionsParser(conf, ioArgs).getRemainingArgs();
        if (otherArgs.length != 2) {
            System.err.println("Usage: wordcount <in> <out>");
            System.exit(2);
        }
        JobConf jobConf = new JobConf();
        jobConf.setJar("/Applications/file/work/JavaProject/hadoopbasic/target/hadoop-basic-1.0-SNAPSHOT.jar");

        Job job = new Job(jobConf, "word count");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

NOTICE：

（1）很多在本地搭建运行Hadoop会出现权限的问题解决方法有两种

<1> 将自己的用户名改为 Hadoop上的用户名列如hadoop

<2> 更改配置hdfs-site.xml

<property> 
<name>dfs.permissions</name> 
<value>true</value> 
</property>

(2) 运行MapReduce时会出现 Map以及Reduce class not found的情况

需要代码上加上 JobConf 配置指定本地Jar包的地址就可以实现Eclipse插件 Run on hadoop 的作用

运行时需要利用Maven 重新 install 工程

本文源码地址 https://github.com/stupidcupid/hadoop-1.x

Angel_Heart_Java

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
IntelliJ IDEA 远程调试Hadoop

开发环境 IntelliJ IDEA 2017.1.3 JDK VERSION jdk 1.8Hadoop 版本 hadoop1.0.0虚拟机完全分布式 node1 172.16.20.101 master node2 172.16.20.102 slave1node3 172.16.20.103 slave2 由于当前关于Hadoop2
复制链接

扫一扫

专栏目录