hadoop中java部分练习_Hadoop学习（二）第一个小练习

最新推荐文章于 2022-09-23 23:53:26 发布

weixin_39860849

最新推荐文章于 2022-09-23 23:53:26 发布

阅读量136

点赞数

文章标签： hadoop中java部分练习

本文链接：https://blog.csdn.net/weixin_39860849/article/details/115064707

版权

使用eclipse编写代码，分析一个text文件中每个单词的出现的次数

如果出现如下报错:

Java

RunTime Environment (JRE) or Java Development Kit (JDK) must be available in order to run Eclipse.

No java virtual machine was found after searching the following locations:…

意思没有配置JDK的路径，那就配置一下，在终端进入你的eclipse目录，然后输入：

mkdir jre

cd jre

ln -s 你的JDK目录/bin bin 创建jdk的软连接

ln -s /home/chs/java/jdk1.8.0_172/bin

在hadoop创建一个in目录

hadoop fs -mkdir /in

本地创建一个words文件

touch words

vim words 输入一些分析的单词

比如输入：

hadoop spark word space

hdfs hadoop word mapper

reduce hello hadoop hello

word hai hadoop space

total google

保存退出

把words文件传到 in 目录下

hadoop fs -put words /in

使用eclipse编写代码创建一个类WordCount

导入hadoop的jar包

右键项目》properties》java Build Path》libraries》Add External JARS》在hadoop解压目录总share文件夹中的hadoop文件夹中。导入所需要的jar包也可都导入

package hdfs;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {

if(args.length!=2){

System.exit(0);

}

Configuration con = new Configuration();

@SuppressWarnings("deprecation")

Job job = new Job(con);

job.setJarByClass(WordCount.class);

//设置输入路径

FileInputFormat.setInputPaths(job, new Path(args[0]));

//设置输出路径

FileOutputFormat.setOutputPath(job, new Path(args[1]));

//设置实现map函数的类

job.setMapperClass(MyMap.class);

//设置实现reduce函数的类

job.setReducerClass(MyReduce.class);

//设置map阶段产生的key和value的类型

job.setMapOutputKeyClass(Text.class);

job.setMapOutputValueClass(IntWritable.class);

//设置reduce阶段产生的key和value的类型

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(IntWritable.class);

//提交job

job.waitForCompletion(true);

}

public static class MyMap extends Mapper{

protected void map(LongWritable key, Text value, org.apache.hadoop.mapreduce.Mapper.Context context) throws java.io.IOException ,InterruptedException {

//在此处写map代码

String[] lines = value.toString().split(" ");

for (String word : lines) {

context.write(new Text(word), new IntWritable(1));

}

};

}

public static class MyReduce extends Reducer{

protected void reduce(Text k2, java.lang.Iterable v2s, org.apache.hadoop.mapreduce.Reducer.Context context) throws java.io.IOException ,InterruptedException {

//在此处写reduce代码

int count=0;

for (IntWritable cn : v2s) {

count=count+cn.get();

}

context.write(k2, new IntWritable(count));

};

}

自己建个文件夹保存jar包

使用eclipse把项目打成jar包

项目上右键选择Export