MultipleOutputFormat多文件输出

最新推荐文章于 2019-01-08 20:19:59 发布

weixin_34416754

最新推荐文章于 2019-01-08 20:19:59 发布

阅读量115

点赞数

文章标签： java 大数据

原文链接：https://yq.aliyun.com/articles/524783

版权

 
         import  
         java.io.IOException; 
        
         import  
         org.apache.hadoop.conf.Configuration; 
        
         import  
         org.apache.hadoop.fs.FileSystem; 
        
         import  
         org.apache.hadoop.io.IntWritable; 
        
         import  
         org.apache.hadoop.mapred.JobConf; 
        
         import  
         org.apache.hadoop.mapred.RecordWriter; 
        
         import  
         org.apache.hadoop.mapred.lib.MultipleOutputFormat; 
        
         import  
         org.apache.hadoop.util.Progressable; 
        
         import  
         org.w3c.dom.Text; 
        
         public  
         class  
         MultipleOutputFormatTest  
         extends  
         MultipleOutputFormat<Text, IntWritable>{ 
        
         protected  
         String generateFileNameForKeyValue(Text key, IntWritable value, Configuration conf) { 
        
         char  
         c = key.toString().toLowerCase().charAt( 
         0 
         ); 
        
         if 
         (c >=  
         'a'  
         && c <=  
         'z' 
         ){ 
        
         return  
         c +  
         ".txt" 
         ; 
        
         } 
        
         return  
         "other.txt" 
         ; 
        
         } 
        
         @Override 
        
         protected  
         RecordWriter<Text, IntWritable> getBaseRecordWriter( 
        
         FileSystem fs, JobConf job, String name, Progressable arg3) 
        
         throws  
         IOException { 
        
         // TODO Auto-generated method stub 
        
         return  
         null 
         ; 
        
         } 
        
         }

在教程当中只需要重写generateFileNameForKeyValue 就能达到分文件的效果但是在实践当中

还需要重写另一个方法 getBaseRecordWriter 还没有清楚其功能先写着先

conf.setOutputFormat() //通过这个设定我们输出格式

本文转自拖鞋崽 51CTO博客，原文链接:http://blog.51cto.com/1992mrwang/1206459

weixin_34416754

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
MultipleOutputFormat多文件输出

12345678910111213141516171819202122232425importjava.io.IOException;importorg.apache.hadoop.conf.Configuration;importorg.apache.hadoop.fs...
复制链接

扫一扫