搭建Hadoop的MapReduce代码环境- Maven-eclipse

最新推荐文章于 2020-12-15 21:52:32 发布

半臻（火白）

最新推荐文章于 2020-12-15 21:52:32 发布

阅读量981

点赞数 2

分类专栏：大数据文章标签： mapreduce hdfs hadoop

本文链接：https://blog.csdn.net/qq_35556504/article/details/109821326

版权

大数据专栏收录该内容

2 篇文章 0 订阅

订阅专栏

程序链接

链接：https://pan.baidu.com/s/1DPwPQLAO5cXdMoNcEgEttQ
提取码：crz1

搭建步骤

安装jdk
配置maven
安装eclipse
给eclipse配置maven
创建maven项目，导入pom.xml文件
编写WordCount，然后运行代码

详细搭建步骤

安装jdk
1. 安装java
2. 配置java环境
配置maven
1. 解压maven文件
2. 配置conf/setting.xml 中的本地仓库和镜像
安装eclipse
给eclipse配置maven
1. Windows -> Preferences
2. 选择Maven -> Installations 然后点击Add。然后点击Directory选择maven所在的目录
3. 选择Maven -> User settings, 然后点击Browser，选择setting.xml所在的目录
创建maven项目，导入pom.xml文件
1. File ->New -> Maven Project
2. 点击Next，选择quick-start ,点击Next，输入Group Id和 Artifact Id ,然后点击Finish
3. 修改pom.xml文件，添加
编写WordCount，然后运行代码

详细搭建步骤

安装jdk
1. 安装Java1.8
2. 详细安装步骤在这里，这里就不多做赘述了：https://www.cnblogs.com/maoning/p/10701349.html
配置maven
- 下载maven，并解压
- 修改conf/setting.xml文件，用记事本打开即可
```
<mirror>
        <id>alimaven</id>
        <mirrorOf>central</mirrorOf>
        <name>aliyun maven</name>
        <url>http://maven.aliyun.com/nexus/content/repositories/central/</url>
    </mirror>
```
- 保存后即可
- 新建本地仓库，新建一个文件夹，名字随便起，但是要记住。比如新建一个文件夹：D:\Hadoop\resp
- 像上述步骤一样，编辑setting.xml
安装eclipse
- 免安装版本，解压之后，直接运行eclipse.exe就行
给eclipse配置maven
1. Windows -> Preferences
2. 选择Maven -> Installations 然后点击Add。然后点击Directory选择maven所在的目录
  
  完成之后点击Finish
  点击Apply and Close
3. 选择Maven -> User settings, 然后点击Browser，选择setting.xml所在的目录
创建maven项目，导入pom.xml文件
1. File ->New -> Maven Project
2. 点击Next
  
  选择quick-start ,点击Next，
  输入Group Id和 Artifact Id ,然后点击Finish
3. 修改pom.xml文件，将内容复制进去 ,pom.xml的具体内容见附录
- 保存之后，然后耐心等待jar包加载完毕
- 在eclipse的右下角有加载的进度条
编写程序，然后运行代码
- 具体代码见附录
- 运行代码
- 查看运行结果，是否创建成功

搭建Window版的Hadoop，以及Eclipse插件

安装Windows环境的Hadoop

解压Hadoop的压缩文件
将Hadoop添加到环境变量中，分别添加HADOOP_HOME和 HADOOP_USER_NAME
打开hadoop\etc\hadoop目录下的hadoop-env.cmd文件，将其中的%JAVA_HOME%改为Java JDK的安装路径，然后保存。

安装eclipse插件

解压插件hadoop-eclipse-plugin.zip
将插件中的bin目录下的所有文件放入Hadoop2.7.3的bin目录下
在bin里面.dll复制到C:\Windows\system32中
把jar包复制到eclipse安装目录的plugs文件夹里面
把platform.xml删了，可到Eclipse根目录下按CTRL+F进行查找该文件，
找到后删除，Eclipse就会重新读取Plugins目录下的插件并后进行安装。
Window -> show view -> other -> MapReduce Tools
鼠标右键，新建一个Hadoop的location
显示DFS图标
运行MapReduce程序，右键 -> Run As -> Run on Hadoop

附录

pom.xml配置

<dependency>
           <groupId>junit</groupId>
           <artifactId>junit</artifactId>
           <version>3.8.1</version>
           <scope>test</scope>
       </dependency>
       <dependency>
           <groupId>org.apache.hadoop</groupId>
           <artifactId>hadoop-common</artifactId>
           <version>2.6.0</version>
       </dependency>
       <dependency>
           <groupId>org.apache.hadoop</groupId>
           <artifactId>hadoop-client</artifactId>
           <version>2.6.0</version>
       </dependency>
       <dependency>
           <groupId>org.apache.hadoop</groupId>
           <artifactId>hadoop-hdfs</artifactId>
           <version>2.6.0</version>
       </dependency>
       <dependency>
           <groupId>jdk.tools</groupId>
           <artifactId>jdk.tools</artifactId>
           <version>1.8</version>
           <scope>system</scope>
           <systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
       </dependency>
        <dependency>
   <groupId>commons-lang</groupId>
   <artifactId>commons-lang</artifactId>
   <version>2.6</version>
   </dependency>

java测试代码

import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

public class Test {
	public static void main(String args[] ) throws URISyntaxException, IOException{
		// 设置uri的地址 这是你虚拟机的ip地址
	    URI uri = new URI("hdfs://192.168.43.100:9000");
	    // 创建配置信息
	    Configuration conf = new Configuration();
	    //获取hdfs的对象
	    FileSystem fs = FileSystem.get(uri,conf);
	    //创建一个目录
	    fs.mkdirs(new Path("/hf002"));
	    fs.mkdirs(new Path("/hf002/hello"));
	    //查看目录信息
	    FileStatus[] listStatus = fs.listStatus(new Path("/hf002"));
	    
	    for(FileStatus fileStatus: listStatus){
	        System.out.println(fileStatus.getPath());
	    }
	    System.out.println("创建成功");
	    //查看里面的数据
	    
//	    FSDataInputStream in = fs.open(new Path("/hf002/hello.txt"));
//	    BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(in,"utf-8"));
//	    String line = null;
//	    while((line= bufferedReader.readLine())!=null){
//	        System.out.println(line);
//	    }
	}
	
    
    
}

mapreduce代码

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

/**   
 * @Title: WordCount.java  
 * @Description:  
 *   wordCount:单词统计，是最常用的mapreduce代码，通常用于词频分析、词云制作。
 * 
 **/ 
public class WordCount2 {

		// mapper一共有四个参数，是两对key-value，前两个是从文件传输到map端时候的数据类型，后两个是map输出时候的数据类型
		/**
		 * 
		 * @Title: wordcount1.java  
		 * @Description:  
		 * 
		 * 111  222  333
		 * 123  234  456
		 * aa    bb   c
		 * 
		 * <111,1>  <222,1>  <333,1> <111,1>
		 * 
		 * @version 1.0
		 */
		public static class wordcount1Mapper extends Mapper<LongWritable, Text, Text, IntWritable> {

			private final static IntWritable one = new IntWritable(1);
			private Text outputKey = new Text();

			@Override
			protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
					throws IOException, InterruptedException {

				// \\s+:一个或者多个空格的统称
				//String[] words = value.toString().split("\\s+");
				System.out.println("hello");
				String[] words = value.toString().split("\\s+");
				System.out.println(words);
				for (String word : words) {
					outputKey.set(word);
					context.write(outputKey, one);
				}
			}
		}

		public static class wordcount1Reducer extends Reducer<Text, IntWritable, Text, IntWritable> {

			private IntWritable result = new IntWritable();

			// <aa,1>  <bb,1>  <cc,1> <aa,1>  <bb,1> <aa,1>
			// <aa,1 1 1> <bb,1 1> <cc,1>
			@Override
			protected void reduce(Text key, Iterable<IntWritable> values,
					Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {

				int sum = 0;
				for (IntWritable value : values) {
					sum+=value.get();
				}

				result.set(sum);
				
				context.write(key, result);
			}
		}

		public static void main(String[] args) {

			try {
				Configuration conf = new Configuration();
				conf.set("fs.defaultFS", "hdfs://192.168.43.100:9000");
				
				Job job;
				job=Job.getInstance(conf,"wordcount1");
				job.setJarByClass(WordCount2.class);
				
				// 配置此job的专属mapper和reducer
				job.setMapperClass(wordcount1Mapper.class);
				job.setReducerClass(wordcount1Reducer.class);
				// 配置输出的key和value的数据类型
				job.setOutputKeyClass(Text.class);
				job.setMapOutputValueClass(IntWritable.class);
				// 设置map端读取数据文件的方式
				job.setInputFormatClass(TextInputFormat.class);
				
				FileInputFormat.addInputPath(job, new Path("/hf002/cat.txt"));
				Path outputPath = new Path("/wordcount");
				
				FileSystem.get(conf).delete(outputPath,true);
				FileOutputFormat.setOutputPath(job, outputPath);
				// 三目运算符
				System.exit(job.waitForCompletion(true)?0:1);
				
			} catch (Exception e) {
				e.printStackTrace();
			}

		}
}