hadoop之HDFS/MapReduce的java接口简单测试

最新推荐文章于 2024-08-01 10:00:45 发布

IT布道者

最新推荐文章于 2024-08-01 10:00:45 发布

阅读量1.4k

点赞数

分类专栏： Hadoop 文章标签： hadoop

本文链接：https://blog.csdn.net/zxae86/article/details/46278469

版权

Hadoop 专栏收录该内容

14 篇文章 0 订阅

订阅专栏

1、首先建立一个java项目

HDFS实现：
2、创建个lib文件夹添加jar包（在附件）；
具体路径：

\hadoop-2.6.0\share\hadoop\common  三个；
\hadoop-2.6.0\share\hadoop\common\lib  所有
\hadoop-2.6.0\share\hadoop\hdfs 三个

3，具体测试代码：

//已省略导入的包
public class HdfsDemo {

    FileSystem fileSystem=null;

    //在执行 @Test 之前先执行初始化
    @Before
    public void init() throws IOException, InterruptedException, URISyntaxException {
        //伪装成root用户
        fileSystem=FileSystem.get(new URI("hdfs://192.168.1.168:9000"), new Configuration(),"root");

    }
    @Test
    public void testUpload() throws IllegalArgumentException, IOException {
        //读取本地文件系统的文件，返回输入流
        InputStream inputStream=new FileInputStream("G:\\UU\\FXFK.rar");
        //在hdfs创建一个文件，返回输出流
        OutputStream outputStream=fileSystem.create(new Path("/fxfk.rar"));
        //输入->输出
        IOUtils.copyBytes(inputStream, outputStream, 4096, true);
    }
    @Test
    public void downLoad() throws IllegalArgumentException, IOException {
        fileSystem.copyToLocalFile(true, new Path("/fxfk.rar"), new Path("g://f.tar"));
    }
    @Test
    public void delTest() throws IllegalArgumentException, IOException{
        //当目录不空时不可删除
        boolean flag = fileSystem.delete(new Path("/dirs"), true);
        System.out.println(flag);
    }
    @Test
    public void testMkdir() throws IllegalArgumentException, IOException {
        //添加目录
        boolean flag = fileSystem.mkdirs(new Path("/dirs"));
        System.out.println(flag);
    }

    public static void main(String[] args) throws IOException, URISyntaxException {
        FileSystem fileSystem=FileSystem.get(new URI("hdfs://192.168.1.168:9000"), new Configuration());
        InputStream inputStream=fileSystem.open(new Path("/had"));
        OutputStream outputStream = new FileOutputStream("G://hda2.6");
        IOUtils.copyBytes(inputStream, outputStream, 4096, true);
    }

}

MapReduce实现：
Mapreduce单词统计实现

总结步骤：
首先拷贝 hadoop-2.6.0\share\hadoop\mapreduce*.jar包

 * 1.分析具体业务逻辑，确定输入输出数据的样式
 * 2.自定义一个类，这个类继承Mapper类，重写map方法，在其里实现具体业务逻辑，将新的key，value输出
 * 3.再自定义个类继承Reducer,实现reduce方法
 * 4.将自定义的mapper、reducer类，通过job对象组装起来

1.自定义WCMapper：

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

// 虽然Long、String实现了序列化接口，但其实对应的jdk序列化接口，性能比较慢，
// 因此Hadoop有自己的序列化接口实现类
// Long->LongWritable;String->Text
public class WCMapper extends Mapper<LongWritable, Text, Text, LongWritable>{
    @Override
    protected void map(LongWritable key, Text value,
            Mapper<LongWritable, Text, Text, LongWritable>.Context context)
            throws IOException, InterruptedException {
        // 接收数据V1
        String line = value.toString();
        // 切分数据
        String []words=line.split(" ");
        // 循环
        for (String string : words) {
            // 出现一次，计一个1，输出
            context.write(new Text(string), new LongWritable(1));
        } 
    }
}

2.自定义Reducer类

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class WCReducer extends Reducer<Text, LongWritable, Text, LongWritable>{
    // 注意 ：V2S 是Iterable<LongWritable>类型
    // 也就是经过Shuffle处理过程后其一个key对应的value是一个类似这样的集合{v2,...}
    @Override
    protected void reduce(Text V2, Iterable<LongWritable> V2S,
            Reducer<Text, LongWritable, Text, LongWritable>.Context context)
            throws IOException, InterruptedException {
        // 接收数据
        // 定义一个计数器
        long count=0;
        // 循环v2s
        for (LongWritable longWritable : V2S) {
            count+=longWritable.get();
        }
        // 输出
        context.write(V2, new LongWritable(count));
    }
}

3.WorldCount类

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

/**
 * 总结步骤：
 * 1.分析具体业务逻辑，确定输入输出数据的样式
 * 2.自定义一个类，这个类继承Mapper类，重写map方法，在其里实现具体业务逻辑，将新的key，value输出
 * 3.再自定义个类继承Reducer,实现reduce方法
 * 4.将自定义的mapper、reducer类，通过job对象组装起来
 * @author xin
 *
 */
public class WordCount {

    public static void main(String[] args) throws Exception {

        //构建job对象
        Job job = Job.getInstance(new Configuration());

        //注意：main方法所在类
        job.setJarByClass(WordCount.class);

        // 设置mapper相关属性
        job.setMapperClass(WCMapper.class);
        // 设置map后输出的k2的类型类
        job.setMapOutputKeyClass(Text.class);
        // 设置map后输出的v2的类型类
        job.setMapOutputValueClass(LongWritable.class);
        // 设置HDFS上的待输入处理的文件
        FileInputFormat.setInputPaths(job, new Path("/words.txt"));

        // 设置reducer相关属性
        job.setReducerClass(WCReducer.class);
        // 设置reducer输出的k3类型类
        job.setOutputKeyClass(Text.class);
        // 设置reducer输出的v3类型类
        job.setOutputValueClass(LongWritable.class);
        // 设置输出到HDFS上的处理结果文件夹
        FileOutputFormat.setOutputPath(job, new Path("/wcout601"));

        //提交作业，并打印进度详情
        job.waitForCompletion(true);
    }
}

提示：在eclipse端打包成普通jar包，选择主函数类为WorldCount，然后将wc.jar包传递到虚拟机环境
通过下面命令执行wc.jar包(因为是普通jar，所以需要使用hadoop jar命令)：

hadoop jar wc.jar (普通jar)
    java -jar wc.jar   (runable jar包)

IT布道者

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录