初识大数据(五)-----用IntelliJ IDEA进行Hadoop的开发

最新推荐文章于 2022-12-16 16:52:19 发布

归来少年Plus

最新推荐文章于 2022-12-16 16:52:19 发布

阅读量1k

点赞数 1

分类专栏：大数据

本文链接：https://blog.csdn.net/weixin_41709748/article/details/106019696

版权

大数据专栏收录该内容

32 篇文章 1 订阅

订阅专栏

1、在本地配置hadoop的环境变量
增加系统变量HADOOP_HOME，变量值为hadoop-2.6.0.rar压缩包解压所在的目录
在系统变量中对变量名为PATH的系统变量追加变量值，变量值为 %HADOOP_HOME%/bin
在这里插入图片描述
2、新建一个maven工程
打开IDEA，依次点击“File”→“New”→“Project”，点击左侧Maven，勾选上方“Create from archetype”，在下方列表中选择org.apache.maven.archetypes:maven-archetype-quickstart，点击“Next”，文件建好之后，在Project框中src/main目录中新建目录resources。

3、将远程集群的Hadoop安装目录下hadoop/hadoop-2.7.7/etc/hadoop目录下的core-site.xml、hdfs-site.xml两个文件通过Xftp等SFTP文件传输软件将两个文件复制，并移动到上述src/main/resources目录中（拖拽即可），然后将下载的log4j.properties文件移动到src/main/resources目录中（防止不输出日志文件）

4、引入pom文件
使用下面的pom.xml文件覆盖项目本身的pom.xml文件（直接拖拽即可），该文件中的一些版本号（比如JDK、Hadoop等）修改为自己电脑中对应的版本

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.neu</groupId>
  <artifactId>MapreduceTrain</artifactId>
  <version>0.0.1-SNAPSHOT</version>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <hadoop.version>2.7.3</hadoop.version>
    <jdkLevel>1.8</jdkLevel>
  </properties>

  <dependencies>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-hdfs</artifactId>
      <version>${hadoop.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-common</artifactId>
      <version>${hadoop.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-mapreduce-client-core</artifactId>
      <version>${hadoop.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-client</artifactId>
      <version>${hadoop.version}</version>
    </dependency>
    <dependency>
      <groupId>cn.hutool</groupId>
      <artifactId>hutool-all</artifactId>
      <version>4.1.7</version>
    </dependency>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.12</version>
      <scope>test</scope>
    </dependency>
  </dependencies>

  <build>
    <resources>
    </resources>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>3.2</version>
        <configuration>
          <source>${jdkLevel}</source>
          <target>${jdkLevel}</target>
          <showDeprecation>true</showDeprecation>
          <showWarnings>true</showWarnings>
        </configuration>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>2.4.3</version>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
            <configuration>
              <transformers>
                <transformer
                        implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                  <mainClass>com.neu.mapreduce.WordCount</mainClass>
                </transformer>
              </transformers>
              <filters>
                <filter>
                  <artifact>*:*</artifact>
                  <excludes>
                    <exclude>META-INF/*.SF</exclude>
                    <exclude>META-INF/*.DSA</exclude>
                    <exclude>META-INF/*.RSA</exclude>
                  </excludes>
                </filter>
              </filters>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
</project>

5、编写WordCount程序
在这里插入图片描述
下面是wordCount程序

package org.example;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;


public class WordCount {
	public static class Map extends Mapper<Object,Text,Text,IntWritable>{
		private static IntWritable one = new IntWritable(1);
		private Text word = new Text();
		public void map(Object key,Text value,Context context) throws IOException,InterruptedException{
			StringTokenizer st = new StringTokenizer(value.toString());
			while(st.hasMoreTokens()){
				word.set(st.nextToken());
				context.write(word, one);
			}
		}
	}
	
	public static class Reduce extends Reducer<Text,IntWritable,Text,IntWritable>{
		private static IntWritable result = new IntWritable();
		public void reduce(Text key,Iterable<IntWritable> values,Context context) throws IOException,InterruptedException{
			int sum = 0;
			for(IntWritable val:values){
				sum += val.get();
			}
			result.set(sum);
			context.write(key, result);
		}
	}
	
	public static void main(String[] args) throws Exception{
		System.setProperty("HADOOP_USER_NAME", "CNDCJEKINS01");
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(conf);
		String[] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();
		if(otherArgs.length != 2){
			System.err.println("Usage WordCount <int> <out>");
			System.exit(2);
		}
		Path outPath = new Path(otherArgs[1]);
		if(fs.exists(outPath)) {
			fs.delete(outPath, true);
		}
		Job job = new Job(conf,"word count");
		job.setJarByClass(WordCount.class);
		job.setMapperClass(Map.class);
		job.setCombinerClass(Reduce.class);
		job.setReducerClass(Reduce.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
		FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
		System.exit(job.waitForCompletion(true) ? 0 : 1);
	}
}

6、配置参数（入参和回参）
WordCount.java代码中有两处参数值，因此需要配置参数

FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));

点击IDEA右上角的“Edit Configurations”
在Main class中填写WordCount类的包路径，在Program arguments中填写程序两个参数，即输入路径和输出路径
以我填写的Program arguments参数为例："/user/hadoop/input_wordcount" “/user/hadoop/output/temp”，我的output目录中一定不要存在temp目录
在这里插入图片描述
7、运行程序
点击运行即可，若出现org.apache.hadoop.security.AccessControlException:Permission denied: user=…错误，需要在主函数第一行添加代码System.setProperty(“HADOOP_USER_NAME”, ”root”);，其中root为远程Hadoop所在虚拟机的主机名，每个人根据各自的情况填写

运行成功日志如下：
在这里插入图片描述
8、查看文件输出结果
使用XShell等终端模拟软件连接Hadoop集群所在的虚拟机，查看程序运行结果
1）查看入参文件

hadoop fs -ls /user/hadoop/input_wordcount

在这里插入图片描述
2）查看生成的文件

 hadoop fs -ls /user/hadoop/output/temp

在这里插入图片描述

 hadoop fs -cat /user/hadoop/output/temp/part-r-00000

在这里插入图片描述
9、期间遇到的问题
运行程序时报下面的错误

Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z

在这里插入图片描述
1）在Intellij编辑器中解决办法：本地重新创建NativeIO类，修改一个方法返回值，然后用新建的NativeIO类覆盖源码中的NativeIO类。下面会展示。首先按2次shift，找到NativeIO，然后选择 download resource下载源码

2）然后在项目的java目录下重建一个NativeIO类，用于覆盖该源码，ctrl+A选中NativeIO源码，覆盖掉在java目录下新建的NativeIO类，就是把源码粘贴到这个新建的类里面
在这里插入图片描述
3）在NativeIO类中找到access0返回值所在的方法，将返回参数改成return true。

至此，再次运行wordcount程序，报错问题得以解决！

如果还有什么问题，清扫描下面二维码添加公众号：架构师Plus，进行咨询
在这里插入图片描述

归来少年Plus

关注

1
点赞
踩
9

收藏

觉得还不错? 一键收藏
0
评论
初识大数据(五)-----用IntelliJ IDEA进行Hadoop的开发

1、在本地配置hadoop的环境变量增加系统变量HADOOP_HOME，变量值为hadoop-2.6.0.rar压缩包解压所在的目录在系统变量中对变量名为PATH的系统变量追加变量值，变量值为 %HADOOP_HOME%/bin2、新建一个maven工程打开IDEA，依次点击“File”→“New”→“Project”，点击左侧Maven，勾选上方“Create from archetype”，在下方列表中选择org.apache.maven.archetypes:maven-archetype-
复制链接

扫一扫

专栏目录