第一个hadoop程序（JAVA)

最新推荐文章于 2021-02-12 11:09:02 发布

Konago

最新推荐文章于 2021-02-12 11:09:02 发布

阅读量463

点赞数 1

分类专栏： Hadoop 文章标签： Hadoop 第一个Hadoop代码

本文链接：https://blog.csdn.net/K_ona/article/details/96146115

版权

Hadoop 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

准备环节

安装如下应用

1.Hadoop2.7.7

2.Jdk1.8.0

3.IDEA2019.1

上篇写过前两个的安装

这里简单说一下idea安装运行

http://www.jetbrains.com/idea/download/#section=windows

下载安装包解压然后在bin目录下运行idea.sh即可

嫌麻烦可以建一个快捷方式

创建项目

SDK指定jdk安装路径

自己设置项目名和包名，别重名就行

新建2个类TokenMapper，TokenReducer

导入相应依赖包

File->Project Structure->Modules->Dependencies

jar包在hadoop安装路径下share/hadoop

如我的路径为：/home/kona/app/hadoop2.7.7/share/hadoop/

将common和mapreduce及yarn下的jar包导入

Java代码

TokenMapper如下

package com.kona;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.mapreduce.Mapper;


public class TokenMapper
        extends Mapper<Object, Text, Text, IntWritable>{
    IntWritable one = new IntWritable(1);
    Text word = new Text();
    public void map(Object key, Text value, Context context) throws IOException,InterruptedException{
        StringTokenizer st = new StringTokenizer(value.toString());
        while(st.hasMoreTokens()) {
            word.set(st.nextToken());
            context.write(word, one);
        }
    }
}

TokenReducer如下

package com.kona;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import org.omg.PortableInterceptor.INACTIVE;

public class TokenReducer extends
        Reducer<Text, IntWritable, Text, IntWritable>{
    IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values, Context context)
            throws IOException, InterruptedException{
        int sum = 0;
        for(IntWritable val:values){
            sum += val.get();
        }
        result.set(sum);
        context.write(key, result);
    }
}

这里给Main改个名字为WordCount

右键点击Main选择Refactor->Rename

最后WordCount如下

package com.kona;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();

        if (otherArgs.length != 2) {
            System.err.println("Usage:wordcount<in> <out>");
            System.exit(2);
        }

        Job job = new Job(conf, "wordcount");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(TokenMapper.class);
        job.setReducerClass(TokenReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);

        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
        System.exit(job.waitForCompletion(true)?0:1);
    }
}

试直接运行

先建立一个保存输入文件的目录如/home/kona/input

然后新建一个文件随便写点单词句子如下：

Weather Predict A film was on location deep in the. One day an old Indian went up to the director and, "Tomorrow rain." The next day it rained.A later, the Indian went up to the director and said, "Tomorrow." The next day there was a hailstorm. "This Indian is incredible," said the director. He told his secretary to hire the Indian to predict the weather. However, after several successful predictions, the old Indian didn't show up for two weeks. Finally the director sent for him. "I have to shoot a big scene tomorrow," said the director, "and I'm depending on you. What will the weather be like?"


One morning a fox saw a cock.He thought,"This is my breakfast.'' He came up to the cock and said,"I know you can sing very well.Can you sing for me?''The cock was glad.He closes his eyes and began to sing.The fox saw that and caught him in his mouth and carried him away. The people in the field saw the fox.They cried,"Look,look!The fox is carrying the cock away.'' The cock said to the fox,"Mr Fox,do you understand?The people say you are carrying their cock away.Tell them it is yours.Not theirs.'' The fox opened his mouth and said,"The cock is mine,not yours.''Just then the cock ran away from the fox and fled into the tree.

然后设置命令行参数

两个参数对应WordCount的main方法的args[0]和args[1]

其中第二个不需要自己建立

然后直接运行，没有报错即说明可以运行成功

（常见错误NoClassDefFoundError是由于有依赖包未导入）

然后在刚才设置的第二个参数目录下可以发现出现两个文件

其中part-r-00000为输出结果

部分结果如下

打Jar包

File->Project Structure->Artifacts

改第一个框的名字为WordCount（随意），双击第二个框，然后点击第三个框，弹框选择路径

我直接选择项目根目录，ok之后发现根目录下多了一个.mf文件

编辑mf文件如下

Manifest-Version: 1.0

Main-Class: com.kona.WordCount

然后头顶工具栏Build->Build Artifacts

点击build等待完成

在out/artifacts/WordCount中得到打包的jar

Hadoop上运行jar

启动hadoop

确保hadoop相关五个进程都运行成功

在hdfs上创建一个用于保存文件的目录并将准备好的测试文档上传到此目录，如in

运行jar

命令：hadoop jar jar路径输入输出

如

hadoop jar /home/kona/IdeaProjects/Test/out/artifacts/WordCount/WordCount.jar /in /out

其中路径为hdfs上的路径

验证结果

部分结果如下

运行完成

Konago

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录