本文大概讲下用C++实现序列化
本文的序列化:简单的讲是将C++里的对象(此处是广义上的对象,内置类型或者用户自定义类型)数据变成char*,即单个字节的数据,这样方便传输等
本文要求有一点C/C++基础
涉及到的知识点有C++中的函数重载,操作符重写,allcotor,左值和右值,继承,模板等
序列化: 数据对象 =====> 字节数组
反序列化: 字节数组 =====> 数据对象
首先,我们使用一个类去做这些工作,DataStream中存放数据,用来序列化和反序列化数据,其中这个类得有个字节数组,然后会有一个记录这个数组相关信息的对象。为了方便对这个字节数组的操作,我们自己定义个类CharVec。我们把记录这个数组的相关信息单独定义一个类DataHeader,作为数组的开头的数据,方便解析。
然后我们先看CharVec(参照C++ primer),代码如下:
// CharVec.h
#ifndef CHARVEC_H
#define CHARVEC_H
#include <memory>
class CharVec
{
public:
CharVec();
CharVec(const CharVec &vec);
CharVec &operator =(const CharVec &vec);
~CharVec();
bool operator ==(const CharVec &vec) const;
size_t size() const;
size_t capacity() const;
char *begin() const;
char *end() const;
void push(const char *data, int len);
void push(const std::string &str);
void push(char c);
void removeFromFront(int len);
void clear();
private:
void checkAndAlloc();
void reallocate();
void free();
std::pair<char *, char *> allocAndCopy(char *begin, char *end);
private:
char *m_Elements; // 首元素
char *m_FirstFree; // 最后一个实际元素之后的位置
char *m_Cap; // 分配内存末尾之后的位置
std::allocator<char> m_Allocator; // 内存分配器
};
#endif // CHARVEC_H
我们简单的实现了一个vector<char>,这里也可以定义成模板类,然后就是一个vector了 。
废话不多说,先看这个类的声明,我们声明了他的构造函数,拷贝构造函数, 等号操作符,析构函数,然后重写了他的==操作符。size()和capacity(),相信知道std::vector的人知道他俩的区别,这里说下size是vector里实际存放数据的大小,capacity是vector的内存分配大小。然后写了三个push的重写函数,用来添加数据。同时提供了一个从vector的头移除数据函数removeFromFront等等。
然后,我们看他的实现:
// CharVec.cpp
#include "CharVec.h"
CharVec::CharVec() :
m_Elements(nullptr),
m_FirstFree(nullptr),
m_Cap(nullptr)
{}
CharVec::CharVec(const CharVec &vec)
{
auto newData = allocAndCopy(vec.begin(), vec.end());
m_Elements = newData.first;
m_FirstFree = newData.second;
m_Cap = newData.second;
}
CharVec &CharVec::operator =(const CharVec &vec)
{
auto newData = allocAndCopy(vec.begin(), vec.end());
free();
m_Elements = newData.first;
m_FirstFree = newData.second;
m_Cap = newData.second;
return *this;
}
CharVec::~CharVec()
{
free();
}
bool CharVec::operator ==(const CharVec &vec) const
{
if (m_Elements == vec.m_Elements &&
m_FirstFree == vec.m_FirstFree &&
m_Cap == vec.m_Cap) {
return true;
}
return false;
}
size_t CharVec::size() const
{
return m_FirstFree - m_Elements;
}
size_t CharVec::capacity() const
{
return m_Cap - m_Elements;
}
char *CharVec::begin() const
{
return m_Elements;
}
char *CharVec::end() const
{
return m_FirstFree;
}
void CharVec::push(const char *data, int len)
{
if (len <= 0) {
return ;
}
for (int i = 0; i < len; ++i) {
push(data[i]);
}
}
void CharVec::push(const std::string &str)
{
push(str.c_str(), str.size());
}
void CharVec::push(char c)
{
checkAndAlloc();
m_Allocator.construct(m_FirstFree++, c);
}
void CharVec::removeFromFront(int len)
{
if (len > size()) {
return ;
}
char *from = m_Elements;
char *to = m_Elements + len;
m_Elements += len;
for (int i = 0; i < len; ++i) {
m_Allocator.destroy(--to);
}
m_Allocator.deallocate(from, m_Elements - from);
}
void CharVec::clear()
{
free();
m_Elements = nullptr;
m_FirstFree = nullptr;
m_Cap = nullptr;
}
void CharVec::checkAndAlloc()
{
if (size() == capacity()) {
reallocate();
}
}
void CharVec::reallocate()
{
auto newCapacity = size() ? 2 * size() : 1;
auto newData = m_Allocator.allocate(newCapacity);
auto dest = newData;
auto ele = m_Elements;
for (size_t i = 0; i != size(); ++i) {
m_Allocator.construct(dest++, std::move(*ele++));
}
free();
m_Elements = newData;
m_FirstFree = dest;
m_Cap = m_Elements + newCapacity;
}
void CharVec::free()
{
if (m_Elements) {
for (auto p = m_FirstFree; p != m_Elements;) {
m_Allocator.destroy(--p);
}
m_Allocator.deallocate(m_Elements, m_Cap - m_Elements);
}
}
std::pair<char *, char *> CharVec::allocAndCopy(char *begin, char *end)
{
auto startPos = m_Allocator.allocate(end - begin);
return {startPos, std::uninitialized_copy(begin, end, startPos)};
}
我们从push函数开始看,最终会调用到push(char)的函数,这个函数首先会会调用checkAndAlloc函数,这个函数会先判断size是不是和capacity相等,然后调用reallocate进行内存的分配,重新分配的空间是原来的2倍,然后数据转移,使用std::move而不是拷贝可以提高效率。然后free掉原来的数据。这里涉及到std::allocator,这个类和的new或者malloc都是内存分配的。allocator将分配内存和构造对象分开,如果要避免内存碎片,可以考虑使用这类,这样我们的push就说完了。继续看removeFromFromt,从m_Element开始释放掉len长度的数据。
然后我们看DataHeader类的声明
// DataHeader.h
#ifndef DATAHEADER_H
#define DATAHEADER_H
struct DataHeader
{
DataHeader(int id = 0);
bool operator==(const DataHeader &header);
void reset();
const static int s_HeaderLen = 3 * sizeof(int);
int m_Id;
int m_HeaderLen;
int m_TotalLen;
};
#endif // DATAHEADER_H
声明比较简单,只是简单的存储id,及headerlen,totalLen.这应该没什么好说的
接写来我们看下重头戏DataStream,先看下声明:
// DataStream.h
#ifndef DATASTREAM_H
#define DATASTREAM_H
#include <memory>
#include <map>
#include <list>
#include <vector>
#include <set>
#include "DataHeader.h"
#include "CharVec.h"
class CustomTypeInterface;
class DataStream
{
public:
DataStream(std::unique_ptr<DataHeader> *header = nullptr);
DataStream(const DataStream &stream);
DataStream& operator =(const DataStream &stream);
enum class DataType : char {
UnKnown,
Boolean,
Char,
WChar,
Int,
UInt,
Int64,
Double,
String,
WString,
Vector,
List,
Map,
Set,
CustomType,
};
bool operator == (const DataStream &stream) const;
// 指数组里存放的数据
int totalSize() const { return m_Header->m_TotalLen; }
int headerSize() const { return m_Header->m_HeaderLen; }
int dataSize() const {return m_Header->m_TotalLen - m_Header->m_HeaderLen;}
void clear();
// write
void writeHeader();
void writeData(const char *data, int len);
DataStream& operator<<(char val);
void writeVal(char val);
DataStream& operator<<(wchar_t val);
void writeVal(wchar_t val);
DataStream& operator <<(bool val);
void writeVal(bool val);
DataStream& operator <<(int val);
void writeVal(int val);
DataStream& operator <<(unsigned int val);
void writeVal(unsigned int val);
DataStream& operator <<(int64_t val);
void writeVal(int64_t val);
DataStream& operator <<(double val);
void writeVal(double val);
DataStream& operator <<(const std::string &val);
void writeVal(const std::string &val);
DataStream& operator <<(const std::wstring &val);
void writeVal(const std::wstring &val);
DataStream& operator <<(CustomTypeInterface *val);
void writeVal(CustomTypeInterface *val);
template<typename T>
DataStream& operator <<(const std::vector<T>& val);
template<typename T>
void writeVal(const std::vector<T>& val);
template<typename T>
DataStream& operator <<(const std::list<T>& val);
template<typename T>
void writeVal(const std::list<T>& val);
template<typename T1, typename T2>
DataStream& operator <<(const std::map<T1, T2>& val);
template<typename T1, typename T2>
void writeVal(const std::map<T1, T2>& val);
template<typename T>
DataStream& operator <<(const std::set<T>& val);
template<typename T>
void writeVal(const std::set<T>& val);
// read
void readHeader(const char *data);
template<typename T>
bool readData(T *val);
bool operator>>(char &val);
bool readVal(char &val);
bool operator>>(wchar_t& val);
bool readVal(wchar_t &val);
bool operator>>(bool &val);
bool readVal(bool &val);
bool operator>>(int &val);
bool readVal(int &val);
bool operator>>(unsigned int &val);
bool readVal(unsigned int &val);
bool operator>>(int64_t &val);
bool readVal(int64_t &val);
bool operator>>(double &val);
bool readVal(double &val);
bool operator>>(std::string &val);
bool readVal(std::string &val);
bool operator>>(std::wstring &val);
bool readVal(std::wstring &val);
bool operator>>(CustomTypeInterface *val);
bool readVal(CustomTypeInterface *val);
template<typename T>
bool operator>>(std::vector<T> &val);
template<typename T>
bool readVal(std::vector<T> &val);
template<typename T>
bool operator>>(std::list<T> &val);
template<typename T>
bool readVal(std::list<T> &val);
template<typename T1, typename T2>
bool operator>>(std::map<T1, T2> &val);
template<typename T1, typename T2>
bool readVal(std::map<T1, T2> &val);
template<typename T>
bool operator>>(std::set<T> &val);
template<typename T>
bool readVal(std::set<T> &val);
// Serialize and Deserialize
int Serialize(char *buf) const;
bool Deserialize(const char *buf, int len);
private:
std::unique_ptr<DataHeader> m_Header;
CharVec m_DataBuffer;
int m_IsFirstWrite;
};
我这里贴了所有的声明,我简单的说下:
支持的类型有:
enum class DataType : char {
UnKnown,
Boolean,
Char,
WChar,
Int,
UInt,
Int64,
Double,
String,
WString,
Vector,
List,
Map,
Set,
CustomType,
};
这些类型,当然读者如果想要自己扩充也是可以的,我觉得这些也是够用了,这个枚举继承char类型意思是说这个枚举里的枚举值底层是用char来存储的。往这个类的写数据和读数据都是提供了两种形式,流式操作符(<< 或者 >> )和函数(readVal,writeVal),,然后两个关键函数Serialze和Deserialize,顾名思义就是序列化和反序列化。
然后我们他的实现,由于代码有点多,我简单拿出几个列子来说:
#include "DataStream.h"
#include "CustomTypeInterface.h"
DataStream::DataStream(std::unique_ptr<DataHeader> *header) :
m_IsFirstWrite(true)
{
if (header == nullptr) {
m_Header.reset(new DataHeader);
}
else {
m_Header.reset(header->release());
}
}
DataStream::DataStream(const DataStream &stream)
{
operator =(stream);
}
DataStream &DataStream::operator =(const DataStream &stream)
{
if (&stream == this) {
return *this;
}
m_Header.reset(new DataHeader);
*m_Header = *stream.m_Header;
m_DataBuffer = stream.m_DataBuffer;
m_IsFirstWrite = stream.m_IsFirstWrite;
return *this;
}
bool DataStream::operator ==(const DataStream &stream) const
{
if (&stream == this) {
return true;
}
if (m_Header.get() == stream.m_Header.get() &&
m_DataBuffer == stream.m_DataBuffer) {
return true;
}
return false;
}
void DataStream::clear()
{
m_IsFirstWrite = true;
m_DataBuffer.clear();
m_Header->reset();
}
void DataStream::writeHeader()
{
int headerLen = DataHeader::s_HeaderLen;
writeData((char *)&(m_Header->m_TotalLen), sizeof(int));
writeData((char *)&headerLen, sizeof(int));
writeData((char *)&m_Header->m_Id, sizeof(int));
m_Header->m_HeaderLen = headerLen;
}
void DataStream::writeData(const char *data, int len)
{
if (len == 0) {
return ;
}
if (m_IsFirstWrite) {
m_IsFirstWrite = false;
writeHeader();
}
m_DataBuffer.push(data, len);
m_Header->m_TotalLen += len;
memcpy(m_DataBuffer.begin(), &m_Header->m_TotalLen, sizeof(int));
}
void DataStream::writeVal(char val)
{
char type = (char)DataType::Char;
writeData((char *)&(type), sizeof(char));
writeData(&val, sizeof(char));
}
void DataStream::writeVal(const std::string &val)
{
char type = (char)DataType::String;
writeData((char *)&(type), sizeof(char));
int size = val.size();
writeVal(size);
writeData(val.c_str(), size);
}
void DataStream::writeVal(CustomTypeInterface *val)
{
val->serialize(*this, (char)DataType::CustomType);
}
void DataStream::readHeader(const char *data)
{
int *p = (int *)data;
m_Header->m_TotalLen = *p++;
m_Header->m_HeaderLen = *p++;
m_Header->m_Id = *p++;
m_Header->m_TotalLen -= m_Header->m_HeaderLen;
m_Header->m_HeaderLen = 0;
}
bool DataStream::readVal(char &val)
{
char type = 0;
if (readData(&type) && type == (char)DataType::Char) {
return readData(&val);
}
return false;
}
bool DataStream::readVal(std::string &val)
{
char type = 0;
if (readData(&type) && type == (char)DataType::String) {
int len = 0;
if (readVal(len) && len > 0) {
val.assign(m_DataBuffer.begin(), len);
m_DataBuffer.removeFromFront(len);
m_Header->m_TotalLen -= len;
}
return true;
}
return false;
}
bool DataStream::readVal(CustomTypeInterface *val)
{
return val->deserialize(*this, (char)DataType::CustomType);
}
int DataStream::Serialize(char *buf) const
{
int totalLen = m_Header->m_TotalLen;
int size = m_DataBuffer.size();
if (size <= 0 || totalLen == 0 || size != totalLen) {
return 0;
}
memcpy(buf, m_DataBuffer.begin(), totalLen);
return totalLen;
}
bool DataStream::Deserialize(const char *buf, int len)
{
if (buf == nullptr || len <= 0) {
return false;
}
readHeader(buf);
m_DataBuffer.clear();
m_DataBuffer.push(buf + DataHeader::s_HeaderLen, len - DataHeader::s_HeaderLen);
return true;
}
流式操作符的函数会调用相应的readVal和writeVal函数。
最简单的我们先写一个writeVal(char)char类型的,首先把他的type写入,然后在把数据写入,我们看下写的具体函数writeData,如果是第一写入,先把header写入,更新totalLen, 然后再写数据,再更新totalLen, 我们把totalLen最先写入,这样可以及时更新totalLen, 而CharVec的begin也派上用场。同样的可以看下string的写入。
接下来我们再看下char数据读取,先读取出来类型,然后读取数据,具体看下readData,由于这个函数是模板函数,所以我们把他放在了头文件。具体实现:
template<typename T>
bool DataStream::readData(T *val)
{
int size = m_DataBuffer.size();
int count = sizeof(T);
if (size < count) {
return false;
}
*val = *((T*)m_DataBuffer.begin());
m_DataBuffer.removeFromFront(count);
m_Header->m_TotalLen -= count;
return true;
}
我们首先从dataBuffer的数据取出来,然后更新totalLen.
这样就差不多了,等等,是不是发现少了什么,对我们还没有说自定义类型,比如你自己定义了一个结构体,怎样传输它呢。实现方式是这样的。我们为自定义的结构体定义一个接口类。
class CustomTypeInterface
{
public:
virtual ~CustomTypeInterface() = default;
virtual void serialize(DataStream &stream, char type) const = 0;
virtual bool deserialize(DataStream &stream, char type) = 0;
};
其中的serialize和deserialize不是用户来调用的,也不是用户来实现的,是给DataStream来调用的,里边怎么实现的,无非就是将结构体的成员写入到DataStream,但是怎么知道他有哪些成员呢,我们使用一个宏来实现。
#define SerializeAndDeserialize(className, params) \
void serialize(DataStream &stream, char type) const override \
{ \
InOperator in(stream, type, #className); \
in * params; \
} \
\
bool deserialize(DataStream &stream, char type) override \
{ \
OutOperator out(stream, type, #className); \
out * params; \
return out.isSuccess(); \
}
这里实现了这两个函数,宏把类名和成员传进来,这样可以定制化的传递成员,即想要一些成员序列化,另一些不序列化,这也是需要考虑到的。然后我们用了两个类来实现读写操作
class InOperator
{
public:
explicit InOperator(DataStream &stream,
char type,
const std::string &className) :
m_InStream(stream) {
m_InStream.writeData(&type, sizeof(char));
m_InStream << className;
}
InOperator(const InOperator &) = delete;
InOperator& operator=(const InOperator &) = delete;
template<typename T>
InOperator& operator *(const T ¶m) {
m_InStream << param;
return *this;
}
private:
DataStream &m_InStream;
};
class OutOperator
{
public:
explicit OutOperator(DataStream &stream,
char type,
const std::string &className);
OutOperator(const OutOperator &) = delete;
OutOperator& operator=(const OutOperator &) = delete;
template<typename T>
OutOperator& operator* (T ¶m) {
if (!m_IsSuccess) {
return *this;
}
if (!(m_OutStream >> param)) {
m_IsSuccess = false;
}
return *this;
}
bool isSuccess() const {
return m_IsSuccess;
}
private:
DataStream &m_OutStream;
bool m_IsSuccess;
};
这两个类一个是输入一个是输出,重载了 * 操作符,这样我们传递参数的时SerializeAndDeserialize(name, m_M1 * m_M2);
这样我们在*操作符重写的函数里把成员都写进入或者读出来。有点难懂,大概意思是说要解决自动获取成员,使用宏可以在声明的时候获得成员,然后在调用写入该对象或者读取该对象时使用成员。
到这里就讲完了,我们测试一下。
#include <iostream>
#include "DataStream.h"
#include "CustomTypeInterface.h"
class Test : public CustomTypeInterface
{
public:
SerializeAndDeserialize(Test, m_A * m_B);
public:
int m_A;
bool m_B;
};
int main(int argc, char *argv[])
{
char c1 = 'c';
Test t;
t.m_A = 1;
t.m_B = false;
DataStream stream;
stream.writeVal(c1);
stream.writeVal(&t);
int size = stream.totalSize();
char *data = new char[size];
stream.Serialize(data);
DataStream stream2;
stream2.Deserialize(data, size);
char c2;
Test t2;
stream2.readVal(c2);
stream2.readVal(&t2);
std::cout << c2 << t2.m_A << t2.m_B;
return 0;
}
输出是这样的:
完成,欢迎交流~
参考c++primer和公司的大牛写的代码,写出来只是为了记录。
代码地址:https://github.com/zhangdexin/Serialization