HDFSAPI操作(篇2:案例一)

本文通过一个WordCount案例详细介绍了如何使用HDFS API进行文件操作,包括在Xshell中创建目录、上传文件,使用Idea编写并打包Java代码,最后通过HDFS API将jar包上传至Hadoop集群,并在Web端查看运行结果。
摘要由CSDN通过智能技术生成

目录

一、准备 

二、Xshell

三、Idea

四、打包

五、上传

六、查看


需求:统计文件中各个单词出现的个数

一、准备

hello.txt

Once when l was six years old l saw a magnificent picture in a book called True Stories from Nature, about the primeval forest.
lt was a picture of a boa constrictor in the act of swallowing an animal.
Here is a copy of the drawing: In the book it said:"Boa constrictors swallow their prey whole, without chewing it.
After that they are not able to move, and they sleep through the six months that they need for digestion."
And after some work with a coloured pencil l succeeded in making my first drawing.
My Drawing Number One.
lt iooked like this: I showed my masterpiece to the grown-ups, and asked them whether the drawing frightened them.
But they answered:"Frighten?
why should anyone be frightened by a hat?
My drawing was not a picture of a hat.
lt was a picture of a boa constrictor digesting an elephant.
But since the grown-ups were not able to understand it, I made another drawing.
l drew the inside of the boa constrictor, so that the grown-ups could see it clearly.
They always need to have things explained.

二、Xshell

1、进入 software 目录

cd /opt/software

2、创建testData目录

mkdir testData

3、进入 testData 目录

cd testData

4、创建  mapreduce 目录

mkdir mapreduce

5、进入 mapreduce 目录

cd mapreduce/

6、使用rz 命令上传 hello.txt 文件

7、使用 ll 命令进行查看是否上传成功

三、Idea

创建文件

创建三个类

Pom.xml 里加入代码:

  <dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.9.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
            <version>2.9.2</version>
            <scope>provided</scope>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-jar-plugin</artifactId>
                <configuration>
                    <archive>
                        <manifest>
                            <mainClass>org.hadoop.WordCount</mainClass>
                        </manifest>
                    </archive>
                </configuration>
            </plugin>
        </plugins>
</build>

 

WordCountMapper.java

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

public class WordCountMapper extends Mapper<Object, Text, Text, IntWritable> {

    public void map(Object key, Text value, Context context) throws IOException, InterruptedException{
        //将获得的value进行切分(每行字符串),单词之间空格切分,获得一个数组
        String[] words = value.toString().split(" ");
        //遍历数组,获得每个单词,每个单词标记为1进行输出
        for (int i = 0; i < words.length; i++) {
            String word = words[i];

            Text text = new Text();
            text.set(word);
            context.write(text,new IntWritable(1));
        }
    }
}

 

WordCountReduce.java

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

public class WordCountReduce extends Reducer<Text, IntWritable, Text, IntWritable> {

    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        //设置计数
        int sum = 0;
        //遍历values列表进行加总
        for (IntWritable intWritable : values) {
            sum = sum + intWritable.get();
        }

        //获得每个单词最终次数,进行输出
        context.write(key, new IntWritable(sum));
    }
}

 

WordCount.java

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
        if (otherArgs.length < 2) {
            System.err.println("Usage: wordcount <in> [<in>...] <out>");
            System.exit(2);
        }
        Job job = Job.getInstance(conf, "word count");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(WordCountMapper.class);
        //job.setMapperClass(TokenizerMapper.class);
        //job.setCombinerClass(IntSumReducer.class);
        //job.setReducerClass(IntSumReducer.class);
        job.setReducerClass(WordCountReduce.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        for (int i = 0; i < otherArgs.length - 1; ++i) {
             FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
        }
        FileOutputFormat.setOutputPath(job,
             new Path(otherArgs[otherArgs.length - 1]));
             System.exit(job.waitForCompletion(true) ? 0 : 1);
        }

        }

 

四、打包

4.1 点击右侧 maven 里的package

 

在下面生成的路径中找到对应的本地文件

 

4.2 直接拖拽到xshell 里

使用 ll 命令进行查看是否成功

 

4.3 开启Hadoop集群

start-dfs.sh

开启yarn集群

start-yarn.sh

五、上传

使用 hello.txt 测试上传

创建 input 目录

hdfs dfs -mkdir /input

上传文件

hdfs dfs -put hello.txt  /input

查看

hdfs dfs -ls /input

 

上传 HDFSAPITest-1.0-SNAPSHOT.jar (写你自己的文件名)

hadoop jar HDFSAPITest-1.0-SNAPSHOT.jar  /input /output

六、查看

hdfs dfs -cat /output/part-r-00000

 

 

Web端查看

http://192.168.67.110:50070/

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值