串行化和反串行化(java,hadoop,avro)实现

最新推荐文章于 2023-11-25 16:04:54 发布

zxzLife

最新推荐文章于 2023-11-25 16:04:54 发布

阅读量765

点赞数 1

分类专栏：大数据

本文链接：https://blog.csdn.net/weixin_41122339/article/details/82468116

版权

大数据专栏收录该内容

41 篇文章 2 订阅

订阅专栏

1，首先我们先说说什么是串行化和反串行化

串行化也叫序列化:就是把存在于内存的对象数据转化成可以保存成硬盘文件形式去储存

反串行化也叫反序列化:就是把串行化后硬盘文件加载到内存，重新变成对象数据,他们都是以字节流的方式在网络上传输的

2，我们在这里结束3种实现串行化的用法

java串行化

hadoop串行化

hadoop之父(Doug Cutting)写的avro串行化

还有一种特别好用的串行化技术就是来自google的protobuf这种串行化技术大家可以直接下去了解一下

3，自定义实现java串行化

实现自定义java串行化首先我们创建的类需要实现Serializable接口，这个接口只做标识作用，用来标识当前对象可被串行化，而且我们需要使用java io包下的两个类，ObjectOutputStream/ObjectInputStream中的writeObject()/readObject()方法来实现串行化和反串行化下面是代码实现:

//Javabean

import java.io.Serializable;

public class Person  implements Serializable{
	private int id;
	private String name;
	private String age;
	private String sex;
	public int getId() {
		return id;
	}
	public void setId(int id) {
		this.id = id;
	}
	public String getName() {
		return name;
	}
	public void setName(String name) {
		this.name = name;
	}
	public String getAge() {
		return age;
	}
	public void setAge(String age) {
		this.age = age;
	}
	public String getSex() {
		return sex;
	}
	public void setSex(String sex) {
		this.sex = sex;
	}
	public String toString() {
		return "Person [id=" + id + ", name=" + name + ", age=" + age + ", sex=" + sex + "]";
	}
	

----------------------------------------------------------------------------------
//java序列化
@Test
public void javaSeri() throws Exception {
	Person p=new Person();
	p.setId(1);
	p.setName("tom");
	p.setAge("14");
	p.setSex("男");
	//java序列化
	ByteArrayOutputStream baos=new ByteArrayOutputStream();
    //DataOutputStream dos=new DataOutputStream(baos);
    ObjectOutputStream oos=new ObjectOutputStream(baos);
	oos.writeObject(p);
	//java反序列化
	ByteArrayInputStream bais=new ByteArrayInputStream(baos.toByteArray());
	ObjectInputStream ois=new ObjectInputStream(bais);
	Person pp = (Person) ois.readObject();
	System.out.println(pp);
}

4，hadoop串行化

如果学hadoop，都会使用hadoop串行化格式Writable，因为他比java串行化更轻巧速度快，而且内置了很多数据类型

1，Writable接口，

Writable接口定义两种方法:一种将其状态写到DataOutput二进制流中，另一个写在dataInput二进制流读取状态后面会说道

2，Writable封装的数据类型

java基本类型和Writable类型

高级类型:


 1,Text 相当于java的String
   对比String类型他是可变的
   内部操作是byte数组
   使用utf-8编码
   charAt(int);//返回ascii编码表
   find(String);//返回所在字符串的索引位置和String的indexof()方法差不多
 2，NullWritable
    占位符
    单例设置模式(饿汉式)
    没有参加串行和反串行化
 3，ObjectWritable
	处理基本类型、String以及两种类型构成的数组.
	自定义不可以
 4，MapWritable相当于java中的map集合
 5，ArrayWritable相当于java中的list集合

实现hadoop串行化需要实现Writable接口并且实现接口的两个方法read和write

//Person类是上面那个
public class PersonWritble  implements Writable{
    private Person p;
    
	public Person getP() {
		return p;
	}

	public void setP(Person p) {
		this.p = p;
	}

	public void write(DataOutput out) throws IOException {
		out.writeInt(p.getId());
		out.writeUTF(p.getName());
		out.writeUTF(p.getAge());
		out.writeUTF(p.getSex());
	}
	public void readFields(DataInput in) throws IOException {
		p.setId(in.readInt());
		p.setName(in.readUTF());
		p.setAge(in.readUTF());
		p.setSex(in.readUTF());
	}

	public String toString() {
		return p.toString();
	}
----------------------------------------------------------------------------
import java.io.DataInputStream;
import java.io.DataOutputStream;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;

import org.junit.Before;
import org.junit.Test;

public class hadoopSerializtble {
	PersonWritble pw=new PersonWritble();
@Before
public void test1() throws Exception {
//hadoop串行化
Person p=new Person();
p.setId(1);
p.setName("cat");
p.setAge("15");
p.setSex("男");
pw.setP(p);
FileOutputStream   fos=new FileOutputStream("E:/a.txt");

DataOutputStream dos=new DataOutputStream(fos);
pw.write(dos);
}

@Test
public void test2() throws IOException {
//反串行化
//	PersonWritble pw=new PersonWritble();
	FileInputStream fis=new FileInputStream("E:/a.txt");
	DataInputStream dis=new DataInputStream(fis);
	pw.readFields(dis);
    System.out.println(pw.toString());
	
}
}

5，avro串行化

Avro是一个独立于编程语言的数据序列化系统，他是hadoop之父(Doug Cutting)编写，主要是解决Writable类型的不足:缺乏语言的可移植性。拥有一个可被多种语言(C,C++,C#,Java,PHO,Python和Ruby)处理的数据。Avro同时也更具有生命力，该语言将使得数据具有更长的生命周期，即使原先用于读/写该数据的语言已经不再使用

1，avro简介:

数据串行化系统

丰富的数据结构

紧凑、快速、二进制数据格式

容器型文件，存储持久化数据

远程RPC过程调用

动态语言的简单集成

使用json定义schema

2，avro运行过程

编写schema文件(.avsc)------(编译avsc文件)-------> 生成java源文件-------> 使用java类

3，使用maven项目导入pom文件

(之前hadoop串行化的pom文件我就说了，大家应该都知道)

	<dependencies>
		<dependency>
			<groupId>junit</groupId>
			<artifactId>junit</artifactId>
			<version>4.10</version>
		</dependency>
		<dependency>
			<groupId>org.apache.avro</groupId>
			<artifactId>avro</artifactId>
			<version>1.8.0</version>
		</dependency>
	</dependencies>

4，avro基本类型

5，avro复杂类型

6,avro第一种方式实现串行化

a),创建schema（H:\zxz\emp.txt）

{
    "type" : "record",
    "namespace" : "Tutorialspint"
    "name" : "Employee",
    "fields" : [
        {"name" : "Name" : "type" : "string"},
        {"name" : "age" : "type" : "int"}
          
    ]

}
这里面以json格式定义上面图片都有解释

b)编译schema（进入cmd里编译）

cmd>cd H:/zxz

cmd>java -jar avro-tools-1.8.0.jar compile schema emp.txt out （使用java -jar来编译我们定义好的emp.txt 编译后会生成一个java文件这就是我们需要的）

注意:在编译时我们还需要两个jar包，可以在Apache官网里下载，我在这里就不多说了

4，把我们编译好的java文件加载到maven项目里，如果不用maven可以把刚才的java包考普通java项目的lib下

查看加载的java内容

/**
 * Autogenerated by Avro
 * 
 * DO NOT EDIT DIRECTLY
 */
package Tutorialspoint;  
@SuppressWarnings("all")
@org.apache.avro.specific.AvroGenerated
public class Employee extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord {
  private static final long serialVersionUID = -8873171083721622992L;
  public static final org.apache.avro.Schema SCHEMA$ = new org.apache.avro.Schema.Parser().parse("{\"type\":\"record\",\"name\":\"Employee\",\"namespace\":\"Tutorialspoint\",\"fields\":[{\"name\":\"Name\",\"type\":\"string\"},{\"name\":\"age\",\"type\":\"int\"}]}");
  public static org.apache.avro.Schema getClassSchema() { return SCHEMA$; }
  @Deprecated public java.lang.CharSequence Name;
  @Deprecated public int age;

  /**
   * Default constructor.  Note that this does not initialize fields
   * to their default values from the schema.  If that is desired then
   * one should use <code>newBuilder()</code>. 
   */
  public Employee() {}

  /**
   * All-args constructor.
   */
  public Employee(java.lang.CharSequence Name, java.lang.Integer age) {
    this.Name = Name;
    this.age = age;
  }

  public org.apache.avro.Schema getSchema() { return SCHEMA$; }
  // Used by DatumWriter.  Applications should not call. 
  public java.lang.Object get(int field$) {
    switch (field$) {
    case 0: return Name;
    case 1: return age;
    default: throw new org.apache.avro.AvroRuntimeException("Bad index");
    }
  }
  // Used by DatumReader.  Applications should not call. 
  @SuppressWarnings(value="unchecked")
  public void put(int field$, java.lang.Object value$) {
    switch (field$) {
    case 0: Name = (java.lang.CharSequence)value$; break;
    case 1: age = (java.lang.Integer)value$; break;
    default: throw new org.apache.avro.AvroRuntimeException("Bad index");
    }
  }

  /**
   * Gets the value of the 'Name' field.
   */
  public java.lang.CharSequence getName() {
    return Name;
  }

  /**
   * Sets the value of the 'Name' field.
   * @param value the value to set.
   */
  public void setName(java.lang.CharSequence value) {
    this.Name = value;
  }

  /**
   * Gets the value of the 'age' field.
   */
  public java.lang.Integer getAge() {
    return age;
  }

  /**
   * Sets the value of the 'age' field.
   * @param value the value to set.
   */
  public void setAge(java.lang.Integer value) {
    this.age = value;
  }

  /**
   * Creates a new Employee RecordBuilder.
   * @return A new Employee RecordBuilder
   */
  public static Tutorialspoint.Employee.Builder newBuilder() {
    return new Tutorialspoint.Employee.Builder();
  }
  
  /**
   * Creates a new Employee RecordBuilder by copying an existing Builder.
   * @param other The existing builder to copy.
   * @return A new Employee RecordBuilder
   */
  public static Tutorialspoint.Employee.Builder newBuilder(Tutorialspoint.Employee.Builder other) {
    return new Tutorialspoint.Employee.Builder(other);
  }
  
  /**
   * Creates a new Employee RecordBuilder by copying an existing Employee instance.
   * @param other The existing instance to copy.
   * @return A new Employee RecordBuilder
   */
  public static Tutorialspoint.Employee.Builder newBuilder(Tutorialspoint.Employee other) {
    return new Tutorialspoint.Employee.Builder(other);
  }
  
  /**
   * RecordBuilder for Employee instances.
   */
  public static class Builder extends org.apache.avro.specific.SpecificRecordBuilderBase<Employee>
    implements org.apache.avro.data.RecordBuilder<Employee> {

    private java.lang.CharSequence Name;
    private int age;

    /** Creates a new Builder */
    private Builder() {
      super(Tutorialspoint.Employee.SCHEMA$);
    }
    
    /**
     * Creates a Builder by copying an existing Builder.
     * @param other The existing Builder to copy.
     */
    private Builder(Tutorialspoint.Employee.Builder other) {
      super(other);
      if (isValidValue(fields()[0], other.Name)) {
        this.Name = data().deepCopy(fields()[0].schema(), other.Name);
        fieldSetFlags()[0] = true;
      }
      if (isValidValue(fields()[1], other.age)) {
        this.age = data().deepCopy(fields()[1].schema(), other.age);
        fieldSetFlags()[1] = true;
      }
    }
    
    /**
     * Creates a Builder by copying an existing Employee instance
     * @param other The existing instance to copy.
     */
    private Builder(Tutorialspoint.Employee other) {
            super(Tutorialspoint.Employee.SCHEMA$);
      if (isValidValue(fields()[0], other.Name)) {
        this.Name = data().deepCopy(fields()[0].schema(), other.Name);
        fieldSetFlags()[0] = true;
      }
      if (isValidValue(fields()[1], other.age)) {
        this.age = data().deepCopy(fields()[1].schema(), other.age);
        fieldSetFlags()[1] = true;
      }
    }

    /**
      * Gets the value of the 'Name' field.
      * @return The value.
      */
    public java.lang.CharSequence getName() {
      return Name;
    }

    /**
      * Sets the value of the 'Name' field.
      * @param value The value of 'Name'.
      * @return This builder.
      */
    public Tutorialspoint.Employee.Builder setName(java.lang.CharSequence value) {
      validate(fields()[0], value);
      this.Name = value;
      fieldSetFlags()[0] = true;
      return this; 
    }

    /**
      * Checks whether the 'Name' field has been set.
      * @return True if the 'Name' field has been set, false otherwise.
      */
    public boolean hasName() {
      return fieldSetFlags()[0];
    }


    /**
      * Clears the value of the 'Name' field.
      * @return This builder.
      */
    public Tutorialspoint.Employee.Builder clearName() {
      Name = null;
      fieldSetFlags()[0] = false;
      return this;
    }

    /**
      * Gets the value of the 'age' field.
      * @return The value.
      */
    public java.lang.Integer getAge() {
      return age;
    }

    /**
      * Sets the value of the 'age' field.
      * @param value The value of 'age'.
      * @return This builder.
      */
    public Tutorialspoint.Employee.Builder setAge(int value) {
      validate(fields()[1], value);
      this.age = value;
      fieldSetFlags()[1] = true;
      return this; 
    }

    /**
      * Checks whether the 'age' field has been set.
      * @return True if the 'age' field has been set, false otherwise.
      */
    public boolean hasAge() {
      return fieldSetFlags()[1];
    }


    /**
      * Clears the value of the 'age' field.
      * @return This builder.
      */
    public Tutorialspoint.Employee.Builder clearAge() {
      fieldSetFlags()[1] = false;
      return this;
    }

    @Override
    public Employee build() {
      try {
        Employee record = new Employee();
        record.Name = fieldSetFlags()[0] ? this.Name : (java.lang.CharSequence) defaultValue(fields()[0]);
        record.age = fieldSetFlags()[1] ? this.age : (java.lang.Integer) defaultValue(fields()[1]);
        return record;
      } catch (Exception e) {
        throw new org.apache.avro.AvroRuntimeException(e);
      }
    }
  }

  private static final org.apache.avro.io.DatumWriter
    WRITER$ = new org.apache.avro.specific.SpecificDatumWriter(SCHEMA$);  

  @Override 
  public void writeExternal(java.io.ObjectOutput out)
    throws java.io.IOException {
    WRITER$.write(this, org.apache.avro.specific.SpecificData.getEncoder(out));
  }

  private static final org.apache.avro.io.DatumReader
    READER$ = new org.apache.avro.specific.SpecificDatumReader(SCHEMA$);  

  @Override 
  public void readExternal(java.io.ObjectInput in)
    throws java.io.IOException {
    READER$.read(this, org.apache.avro.specific.SpecificData.getDecoder(in));
  }

}

编写实现(反)串行化:


import java.io.File;
import java.io.IOException;

import org.apache.avro.file.DataFileReader;
import org.apache.avro.file.DataFileWriter;
import org.apache.avro.io.DatumReader;
import org.apache.avro.io.DatumWriter;
import org.apache.avro.reflect.ReflectDatumReader;
import org.apache.avro.reflect.ReflectDatumWriter;
import org.junit.Test;

public class AvroTest {
@Test
public void testSerial() throws IOException {
	Employee e =new Employee();
	e.setName("tom");
	e.setAge(1);
	//创建数据写入器
	DatumWriter<Employee> dw=new ReflectDatumWriter<Employee>(Employee.class);
	DataFileWriter<Employee> dfw=new DataFileWriter<Employee>(dw) ;
	dfw.create(e.getSchema(), new File("H:/zxz/Seria.avro"));
	dfw.append(e);
	dfw.close();
}
@Test
public void testUnSerial() throws IOException {
	//创建阅读器
	DatumReader<Employee> dr=new ReflectDatumReader<Employee>(Employee.class);
	DataFileReader<Employee> dfr=new DataFileReader<Employee>(new File("H:/zxz/Seria.avro"), dr);
	Employee e=new Employee();
	while(dfr.hasNext()) {
		e = dfr.next();
		System.out.println(e.getName()+"   "+e.getAge());
	}
	dfr.close();
}
}

第二种实现avro(反)串行化不需要前面编译过程直接写程序

//串行化
@Test
public void AvroSerialNoCompile() throws IOException {
	//通过scheame创建Schema对象
	Schema schema=new Schema.Parser().parse(new File("H:/zxz/emp.txt"));
	//创建Rec对象
	GenericRecord e1=new GenericData.Record(schema);
	e1.put("Name", "tom");
	e1.put("age", 11);
	//数据写入
	DatumWriter<GenericRecord> dw=new ReflectDatumWriter<>(schema);
	DataFileWriter<GenericRecord> dfw=new DataFileWriter<>(dw);
	dfw.create(schema, new File("H:/zxz/NoSeria.avro"));
	dfw.append(e1);
	dfw.close();
	
}
//反串行化
@Test
public void AvroUnSerialNoCompile() throws IOException {
	//创建Schema对象
	Schema schema=new Schema.Parser().parse(new File("H:/zxz/emp.txt"));

	DatumReader<GenericRecord> dr=new GenericDatumReader<>(schema);
	DataFileReader<GenericRecord> dfr=new DataFileReader<>(new File("H:/zxz/NoSeria.avro"), dr);
	while (dfr.hasNext()) {
		GenericRecord gr = dfr.next();
		System.out.println(gr.get("Name")+"   "+gr.get("age"));
	}
	dfr.close();
}

zxzLife

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
串行化和反串行化(java,hadoop,avro)实现

1，首先我们先说说什么是串行化和反串行化串行化也叫序列化:就是把存在于内存的对象数据转化成可以保存成硬盘文件形式去储存反串行化也叫反序列化:就是把串行化后硬盘文件加载到内存，重新变成对象数据,他们都是以字节流的方式在网络上传输的2，我们在这里结束3种实现串行化的用法java串行化hadoop串行化hadoop之父(Doug Cutting)...
复制链接

扫一扫

专栏目录