本地将普通文件转为hadoop序列化的文件

最新推荐文章于 2024-08-16 17:56:46 发布

chenchenrao

最新推荐文章于 2024-08-16 17:56:46 发布

阅读量681

点赞数

分类专栏： hadoop 文章标签： hadoop hdfs

本文链接：https://blog.csdn.net/chenchenrao/article/details/38733565

版权

hadoop 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

近日，因为要将某个项目迁移，涉及到文件格式转化。

一是可以在本地将文件转为hadoop序列化文件，二是本地利用hadoop接口直接将转换后的文件放到hdfs上(当然前提是本机能连接hdfs环境)，三是可以利用hadoop的mr来处理文件，生成的文件自然在hdfs上，符合要求。

因为以前都是采用第三种的方式，利用hadoop的mr生成hadoop序列化文件。这次尝试了前两种，补充。

此次贴下代码

一、本地将文件转为hadoop序列化文件

本地利用hadoop接口生成hadoop序列化的文件[win7环境]

这个好处就是不用将原始文件扔到hdfs上，另外本地跑数据方便测试，不折腾，个人拙见。

java代码如下：

public static void main(String[] args) throws Exception{

    	String input = "C:\\Users\\xxx\\Downloads\\test";
        String outputDir = "d:\\20140818\\part-0000";
        org.apache.hadoop.fs.Path output = new org.apache.hadoop.fs.Path(outputDir);
        
        Class outkeyClass = org.apache.hadoop.io.LongWritable.class;
        Class outvalueClass = xxx.data.Info.class;
        
        LongWritable key = new LongWritable();
        Info value = new Info();
        
        
        Configuration conf = new Configuration();
        conf.set("mapred.job.tracker", "local");  
        conf.set("fs.default.name", "file:///");
        org.apache.hadoop.fs.FileSystem hdfs = org.apache.hadoop.fs.FileSystem.get(conf);;
        
        Writer writer = null;
        writer = org.apache.hadoop.io.SequenceFile.createWriter(hdfs, conf, output, outkeyClass, outvalueClass);
        
        
        IFileSystem opfs = FileSystem.getNamed("local");//此处可适配File接口
        
        
        org.apache.hadoop.io.LongWritable outputkey = new org.apache.hadoop.io.LongWritable();
        xxx.data.Info outputvalue = new xxx.data.Info();
        for(Path subFile: opfs.listPaths(new Path(input))){此处可适配File接口
            System.out.println(outputDir + subFile.getName());
            SequenceFile.Reader reader = new SequenceFile.Reader(opfs, subFile);
            int count = 0;
            try {
                while (reader.next(key, value)) {
                    try {
                    	//将opfs的key value 转换 为hdfs
                    	if(key instanceof LongWritable){
                    		outputkey.set(((LongWritable) key).get());
                    	}
                    	
                    	 
                    	if(value instanceof Info){
                    		outputvalue.setUserId(((Info) value).getUserId());
                    		...
                    	}
                    	
                        writer.append(outputkey, outputvalue);
                        count++;
                        if(count>2000){
                        	break;
                        }
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                }
            } catch (Exception e) {
                e.printStackTrace();
            } finally {
                if (writer != null) {
                    try {
                        writer.close();
                        reader.close();
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                }
            }
        }
    }

二、本机调用hadoop接口直接生成hdfs的文件[前提本地机器与hdfs能够ping通]

public static void main(String[] args) throws Exception{
    	String input = "/Info/2014-08-21/";
        
        Class outkeyClass = org.apache.hadoop.io.LongWritable.class;
        Class outvalueClass = xxx.data.Info.class;
        
        LongWritable key = new LongWritable();
        Info value = new Info();

        
        Configuration conf = new Configuration();

        URI uri = URI.create("hdfs://hadoop-namenode-1896:9000");//指定了hdfs的uri
        org.apache.hadoop.fs.FileSystem hdfs = org.apache.hadoop.fs.FileSystem.get(uri,conf);;

        
        IFileSystem opfs = FileSystem.getNamed("local");//此处可适配file
        
        
        org.apache.hadoop.io.LongWritable outputkey = new org.apache.hadoop.io.LongWritable();
        xxx.data.Info outputvalue = new xxx.data.Info();

        String outputDir = "/home/buzztemp/blackholeuserinfo/20140820/";//hdfs的存放目录
        for(Path subFile: opfs.listPaths(new Path(input))){//此处可适配file接口
            System.out.println(outputDir + subFile.getName());
            SequenceFile.Reader reader = new SequenceFile.Reader(opfs, subFile);
            String outputfilename = outputDir + subFile.getName();
            org.apache.hadoop.fs.Path output = new org.apache.hadoop.fs.Path(outputfilename);
            Writer writer = org.apache.hadoop.io.SequenceFile.createWriter(hdfs, conf, output, outkeyClass, outvalueClass);

            try {
                while (reader.next(key, value)) {
                    try {
                    	//将opfs的key value 转换 为hdfs
                    	if(key instanceof LongWritable){
                    		outputkey.set(((LongWritable) key).get());
                    	}
                    	
                    	 
                    	if(value instanceof Info){
                    		outputvalue.setUserId(((Info) value).getUserId());
                    		...
                    	}
                    	
                        writer.append(outputkey, outputvalue);

                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                }
            } catch (Exception e) {
                e.printStackTrace();
                System.out.println("outputkey:" + outputkey.get()  + "outputvalue" + outputvalue.toString());
            } finally {
                if (writer != null) {
                    try {
                        writer.close();
                        reader.close();
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                }
            }
        }
    }

三、利用hadoop的mr来处理文件[这个容易处理，忽略]

将原始文件扔到hdfs上，经过mr处理后输出在hdfs上。