MapReduce的自定义组件案例总结

MapReduce的自定义组件案例总结

需要的依赖:

(其中需要用到两个maven打包插件,便于将项目打成jar包发布到hadoop集群上运行)

<dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.6.0-mr1-cdh5.14.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.6.0-cdh5.14.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>2.6.0-cdh5.14.0</version>
        </dependency>

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-core</artifactId>
            <version>2.6.0-cdh5.14.0</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/junit/junit -->
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.11</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.testng</groupId>
            <artifactId>testng</artifactId>
            <version>RELEASE</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
            <scope>test</scope>
        </dependency>
    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.0</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                    <encoding>UTF-8</encoding>
                    <!--    <verbal>true</verbal>-->
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>2.4.3</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <minimizeJar>true</minimizeJar>
                        </configuration>
                    </execution>
                </executions>
            </plugin> 

一.自定义分区案例

需求将文件中的第六列的15以上的结果以及15以下的结果进行分开成两个文件进行保存
思路:其中k2,v2和k3,v3均分别为Text和NullWritable,自定义partitioner将k2中的值的第六列取出并根据这一列以15为分界线将不同的数据分到不同的两个分区
代码如下:
Main方法:
public class MyPartitionerMain extends Configured implements Tool {
    @Override
    public int run(String[] strings) throws Exception {
        Job job = Job.getInstance(super.getConf(), "MyPartitioner");
        job.setJarByClass(MyPartitionerMain.class);
        job.setInputFormatClass(TextInputFormat.class);
        TextInputFormat.addInputPath(job, new Path("hdfs://node01:8020/test/input"));
        job.setMapperClass(MyPartitionerMapper.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(NullWritable.class);
        job.setPartitionerClass(MyPatitioner.class);
        job.setNumReduceTasks(2);
        job.setReducerClass(MyPartionerReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(NullWritable.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        TextOutputFormat.setOutputPath(job, new Path("hdfs://node01:8020/test/output01"));
        boolean b = job.waitForCompletion(true);
        return b ? 0 : 1;
    }

    public static void main(String[] args) throws Exception {
        int run = ToolRunner.run(new Configuration(), new MyPartitionerMain(), args);
        System.exit(run);
    }
}

注意使用自定义分区的时候必须在集群上运行,故要设定打成jar包,且必须设定reduceTask的个数

自定义mapper方法
public class MyPartitionerMapper extends Mapper<LongWritable, Text,Text, NullWritable> {
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        context.write(value,NullWritable.get());
    }
}

其中自定义mapper和reducer不需要改变

自定义reducer方法
public class MyPartionerReducer extends Reducer<Text, NullWritable,Text,NullWritable> {
    @Override
    protected void reduce(Text key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException {
        context.write(key,NullWritable.get());
    }
}
自定义partitioner方法
public class MyPatitioner extends Partitioner<Text, NullWritable> {
    @Override
    public int getPartition(Text text, NullWritable nullWritable, int i) {
        String s = text.toString().split("\t")[5];
        if(Long.parseLong(s)>15){
            return 1;
        }else{
            return 0;
        }
    }
}

二.自定义排序案例

对于一个数据文件,要求第一列按照字典顺序进行排列,第一列相同的时候,第二列按照升序进行排列:

数据格式样例如下:
a 1
a 7
b 8
b 10
a 5

思路:自定义数据类型实现WritableComparable接口,重写比较器和序列化反序列化方法,并将此数据类型作为key2,从而完成自定义排序
自定义数据类型代码如下:
@Data
public class MySortWritable implements WritableComparable<MySortWritable> {
    private String first;
    private Long second;
    @Override
    public int compareTo(MySortWritable o) {
        if(this.first.compareTo(o.first)==0){
            return this.second.compareTo(o.second);
        }else{
            return this.first.compareTo(o.first);
        }
    }
    //重写序列化方法
    @Override
    public void write(DataOutput dataOutput) throws IOException {
        dataOutput.writeUTF(this.first);
        dataOutput.writeLong(this.second);
    }
    //重写反序列化方法
    @Override
    public void readFields(DataInput dataInput) throws IOException {
        this.first = dataInput.readUTF();
        this.second = dataInput.readLong();
    }
}
自定义mapper:
public class MySortMapper extends Mapper<LongWritable, Text, MySortWritable, Text> {
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        MySortWritable mySortWritable = new MySortWritable();
        mySortWritable.setFirst(value.toString().split("\t")[0]);
        mySortWritable.setSecond(Long.parseLong(value.toString().split("\t")[1]));
        context.write(mySortWritable, value);
    }
}
自定义reducer:
public class MySortReducer extends Reducer<MySortWritable, Text, Text, NullWritable> {
    @Override
    protected void reduce(MySortWritable key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        for (Text value : values) {
            context.write(value, NullWritable.get());
        }
    }
}
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值