(为啥在可视化编辑里的字都好好的,出来后就忽大忽小的,尤其在代码和文字混排的时候)
本篇主要是对“序列化.写入”所做的代码跟踪,会稍微提到点其他的。采取的例子是自带的addressbook
对我目前而言,主要关心这么几个点,对字段的管理,对协议的管理。
其中在一些代码分析的思路上是这样的:因为是对这套代码的整个需求不是太了解,所以采取的方式是,会先分析各个模块是干吗的,然后根据它们的行为开始推导。最后再将所有的模块串联起来。
要注意的是,作为一个阅读者,这套代码产生的环境、需求、历史都不太了解,所以在一些细节上的东西会稍微忽略,会有自己的疑问,但是不对里面的方法进行比较和评价,做到原原本本的展现出来。
流程图就不画了,我太懒了。反正也不是什么正规的,就是一个流水的记录。
1.字段管理.普通字段
对每个成员变量,都会有这几类的接口,一些set的接口还可能有若干的重载以
message Person {
required string name = 1;
}
为例,分别有以下接口
inline bool has_name() const;
inline void clear_name();
inline const ::std::string& name() const;
inline void set_name(const ::std::string& value);
inline void set_name(const char* value);
inline void set_name(const char* value, size_t size);
以及三个标志位的接口
inline bool Person::has_name() const {
return (_has_bits_[0] & 0x00000001u) != 0;
}
inline void Person::set_has_name() {
_has_bits_[0] |= 0x00000001u;
}
inline void Person::clear_has_name() {
_has_bits_[0] &= ~0x00000001u;
}
标志位类型如下
::google::protobuf::uint32 _has_bits_[(4 + 31) / 32];
在set_name(),clear_name()中,分别会调用相应的标志位接口。
因为值是和xxx=tag,中的tag绑定的, 所以在向后或向前兼容上,tag不能够重复的使用
bool SerializeToFileDescriptor(int file_descriptor) const;
bool SerializePartialToFileDescriptor(int file_descriptor) const;
bool SerializeToOstream(ostream* output) const;
bool SerializePartialToOstream(ostream* output) const;
bool Message::SerializeToOstream(ostream* output) const {
{
io::OstreamOutputStream zero_copy_output(output);
if (!SerializeToZeroCopyStream(&zero_copy_output)) return false;
}
return output->good();
}
bool MessageLite::SerializeToZeroCopyStream(
io::ZeroCopyOutputStream* output) const {
io::CodedOutputStream encoder(output);
return SerializeToCodedStream(&encoder);
}
bool MessageLite::SerializeToCodedStream(io::CodedOutputStream* output) const {
GOOGLE_DCHECK(IsInitialized()) << InitializationErrorMessage("serialize", *this);
return SerializePartialToCodedStream(output);
}
有些函数有Partial之分,最终都会调用到SerializePartialToCodedStream,因此整个类大体的调用层次如下:
bool MessageLite::SerializePartialToCodedStream(
io::CodedOutputStream* output) const {
const int size = ByteSize(); // Force size to be cached.
uint8* buffer = output->GetDirectBufferForNBytesAndAdvance(size);
if (buffer != NULL) {
uint8* end = SerializeWithCachedSizesToArray(buffer);
if (end - buffer != size) {
ByteSizeConsistencyError(size, ByteSize(), end - buffer);
}
return true;
} else {
int original_byte_count = output->ByteCount();
SerializeWithCachedSizes(output);
if (output->HadError()) {
return false;
}
int final_byte_count = output->ByteCount();
if (final_byte_count - original_byte_count != size) {
ByteSizeConsistencyError(size, ByteSize(),
final_byte_count - original_byte_count);
}
return true;
}
}
1).有两种写入的方式,SerializeWithCachedSizesToArray和SerializeWithCachedSizes
class LIBPROTOBUF_EXPORT CodedOutputStream {
public:
// Create an CodedOutputStream that writes to the given ZeroCopyOutputStream.
explicit CodedOutputStream(ZeroCopyOutputStream* output);
// Skips a number of bytes, leaving the bytes unmodified in the underlying
// buffer. Returns false if an underlying write error occurs. This is
// mainly useful with GetDirectBufferPointer().
bool Skip(int count);
// Sets *data to point directly at the unwritten part of the
// CodedOutputStream's underlying buffer, and *size to the size of that
// buffer, but does not advance the stream's current position. This will
// always either produce a non-empty buffer or return false. If the caller
// writes any data to this buffer, it should then call Skip() to skip over
// the consumed bytes. This may be useful for implementing external fast
// serialization routines for types of data not covered by the
// CodedOutputStream interface.
bool GetDirectBufferPointer(void** data, int* size);
// If there are at least "size" bytes available in the current buffer,
// returns a pointer directly into the buffer and advances over these bytes.
// The caller may then write directly into this buffer (e.g. using the
// *ToArray static methods) rather than go through CodedOutputStream. If
// there are not enough bytes available, returns NULL. The return pointer is
// invalidated as soon as any other non-const method of CodedOutputStream
// is called.
inline uint8* GetDirectBufferForNBytesAndAdvance(int size);
// Write raw bytes, copying them from the given buffer.
void WriteRaw(const void* buffer, int size);
// Like WriteRaw() but writing directly to the target array.
// This is _not_ inlined, as the compiler often optimizes memcpy into inline
// copy loops. Since this gets called by every field with string or bytes
// type, inlining may lead to a significant amount of code bloat, with only a
// minor performance gain.
static uint8* WriteRawToArray(const void* buffer, int size, uint8* target);
// Equivalent to WriteRaw(str.data(), str.size()).
void WriteString(const string& str);
// Like WriteString() but writing directly to the target array.
static uint8* WriteStringToArray(const string& str, uint8* target);
// Write a 32-bit little-endian integer.
void WriteLittleEndian32(uint32 value);
// Returns the total number of bytes written since this object was created.
inline int ByteCount() const;
// Returns true if there was an underlying I/O error since this object was
// created.
bool HadError() const { return had_error_; }
private:
GOOGLE_DISALLOW_EVIL_CONSTRUCTORS(CodedOutputStream);
ZeroCopyOutputStream* output_;
uint8* buffer_;
int buffer_size_;
int total_bytes_; // Sum of sizes of all buffers seen so far.
bool had_error_; // Whether an error occurred during output.
// Advance the buffer by a given number of bytes.
void Advance(int amount);
// Called when the buffer runs out to request more data. Implies an
// Advance(buffer_size_).
bool Refresh();
};
这个类干了这么几件事
1) 维护一个ZeroCopyOutputStream
2) 维护一个uint8* buffer_,各种write函数都是和它绑定,这也是他希望的意识形态
3) uint8* buffer_和ZeroCopyOutputStream通过Refresh()转换
4) Refresh()的转换调用buffer_和ZeroCopyOutputStream通过Refresh::Next函数。而且Next必然是个虚函数
在XXXOutputStream结构类如下,以OstreamOutputStream为例,源码简化如下:
class LIBPROTOBUF_EXPORT OstreamOutputStream : public ZeroCopyOutputStream {
public:
// Creates a stream that writes to the given C++ ostream.
// If a block_size is given, it specifies the size of the buffers
// that should be returned by Next(). Otherwise, a reasonable default
// is used.
explicit OstreamOutputStream(ostream* stream, int block_size = -1);
~OstreamOutputStream();
// implements ZeroCopyOutputStream ---------------------------------
bool Next(void** data, int* size);
void BackUp(int count);
int64 ByteCount() const;
private:
class LIBPROTOBUF_EXPORT CopyingOstreamOutputStream : public CopyingOutputStream {
public:
CopyingOstreamOutputStream(ostream* output);
~CopyingOstreamOutputStream();
// implements CopyingOutputStream --------------------------------
bool Write(const void* buffer, int size);
private:
// The stream.
ostream* output_;
GOOGLE_DISALLOW_EVIL_CONSTRUCTORS(CopyingOstreamOutputStream);
};
CopyingOstreamOutputStream copying_output_;
CopyingOutputStreamAdaptor impl_;
GOOGLE_DISALLOW_EVIL_CONSTRUCTORS(OstreamOutputStream);
};
1) OstreamOutputStream本身继承ZeroCopyOutputStream
2) 有个内置类Copying...,继承CopyingOutputStream
3) 及成员变量copying_output_和一个impl_
我们先看看OstreamOutputStream和copying_output_、impl_是怎么交互的。
// implements ZeroCopyOutputStream ---------------------------------
bool Next(void** data, int* size);
void BackUp(int count);
int64 ByteCount() const;
bool OstreamOutputStream::Next(void** data, int* size) {
return impl_.Next(data, size);
}
void OstreamOutputStream::BackUp(int count) {
impl_.BackUp(count);
}
int64 OstreamOutputStream::ByteCount() const {
return impl_.ByteCount();
}
而copying_output_只是给impl_构造用
OstreamOutputStream::OstreamOutputStream(ostream* output, int block_size)
: copying_output_(output),
impl_(©ing_output_, block_size) {
}
可以看到,OstreamOutputStream,copying_output_都继承了ZeroCopyOutputStream,但实现都是在copying_output_中,OstreamOutputStream只是起到接口约束。
继续跟调CopyingOutputStreamAdaptor。
1).维护scoped_array<uint8> buffer_; CopyingOutputStream* copying_stream_;
2).围绕buffer_做了很多事,主要是字段,位置,写入等等
3).buffer_和copying_stream_交互主要通过一个Write的虚函数,比如
if (copying_stream_->Write(buffer_.get(), buffer_used_)) {
4).buffer_是一个连续的空间,大小由外部传入
至此,几个大模块功能都差不多过了一遍,现在把他们串起来。
自定义协议继承google::protobuf::Message,当你要把协议体序列化到某个介质的时候,如下:
std::fstream output(filename.c_str(), ios::out | ios::trunc | ios::binary);
addressbook.SerializeToOstream(&output);
SerializeToXXX,XXX可以是用户的自定义格式
进行一个IO流的封装,可以叫FileOutputStream,也可以叫OstreamOutputStream,以后者为例,都继承自一个叫ZeroCopyOutputStream接口类,需要实现以下三个函数
bool Next(void** data, int* size);
void BackUp(int count);
int64 ByteCount() const;
为了重写这3个接口的方便和统一,只要求用户在数据的导出上做一个重写。于是抽象出
CopyingOutputStream类,这个类里面只有一个bool Write(const void* buffer, int size);函数,也就是把第三方的数据源导入到buffer里面。
Next,BackUp,ByteCount自然可以起到一个重用的机制,于是抽象出叫CopyingOutputStreamAdaptor。
其继承自ZeroCopyOutputStream,主要是为了Next,BackUp,ByteCount接口约束。在父类OstreamOutputStream里的Next,BackUp,ByteCount,只是对CopyingOutputStreamAdaptor封装调用
(一开始对OstreamOutputStream,CopyingOutputStream,CopyingOutputStreamAdaptor有点迷惑,理清关系后,发现层次挺清晰的)
CopyingOutputStreamAdaptor维护着scoped_array<uint8> buffer_;会调用CopyingOutputStream的接口Write导入数据
OK,那现在OstreamOutputStream已经有数据了,进行CodedOutputStream
CodedOutputStream是为两者提供服务,一个是 ZeroCopyOutputStream* output_;也就是我们前文中转换后的OstreamOutputStream;一个是静态数据,供第三方直接调用.
CodedOutputStream提供了一个uint8* buffer_;指针,其实是直接从ZeroCopyOutputStream* output_读取指针值的,这也是为什么叫ZeroCopyOutputStream。
最后调用MessageLite::SerializePartialToCodedStream函数,里面会判断调用虚函数SerializeWithCachedSizesToArray,
SerializeWithCachedSizes。(前者最后还是会调用SerializeWithCachedSizes)
在虚函数SerializeWithCachedSizesToArray里,参数是一个uint8* buffer_,把协议里的值和tag号顺序的写入入。tag|长度|值
4.序列化.读出
代码架构和写入的一样,主要关注最终的MergePartialFromCodedStream函数。