前言
架构
Apache Thrift API CS架构
Thrift包含一套完整的栈来创建客户端和服务端程序。[7]顶层部分是由Thrift定义生成的代码。而服务则由这个文件客户端和处理器代码生成。在生成的代码里会创建不同于内建类型的数据结构,并将其作为结果发送。协议和传输层是运行时库的一部分。有了Thrift,就可以定义一个服务或改变通讯和传输协议,而无需重新编译代码。除了客户端部分之外,Thrift还包括服务器基础设施来集成协议和传输,如阻塞、非阻塞及多线程服务器。栈中作为I/O基础的部分对于不同的语言则有不同的实现。
Thrift支持众多通讯协议:[7]
-
TBinaryProtocol – 一种简单的二进制格式,简单,但没有为空间效率而优化。比文本协议处理起来更快,但更难于调试。
-
TCompactProtocol – 更紧凑的二进制格式,处理起来通常同样高效。
-
TDebugProtocol – 一种人类可读的文本格式,用来协助调试。
-
TDenseProtocol – 与TCompactProtocol类似,将传输数据的元信息剥离。
-
TJSONProtocol – 使用JSON对数据编码。
-
TSimpleJSONProtocol – 一种只写协议,它不能被Thrift解析,因为它使用JSON时丢弃了元数据。适合用脚本语言来解析。[8]
支持的传输协议有: -
TFileTransport – 该传输协议会写文件。
-
TFramedTransport – 当使用一个非阻塞服务器时,要求使用这个传输协议。它按帧来发送数据,其中每一帧的开头是长度信息。
-
TMemoryTransport – 使用存储器映射输入输出。(Java的实现使用了一个简单的ByteArrayOutputStream。)
-
TSocket – 使用阻塞的套接字I/O来传输。
-
TZlibTransport – 用zlib执行压缩。用于连接另一个传输协议。
Thrift还提供众多的服务器,包括: -
TNonblockingServer – 一个多线程服务器,它使用非阻塞I/O(Java的实现使用了NIO通道)。TFramedTransport必须跟这个服务器配套使用。
-
TSimpleServer – 一个单线程服务器,它使用标准的阻塞I/O。测试时很有用。
-
TThreadPoolServer – 一个多线程服务器,它使用标准的阻塞I/O。
优点
Thrift一些已经明确的优点包括:[来源请求]
跟一些替代选择,比如SOAP相比,跨语言序列化的代价更低,因为它使用二进制格式。
它有一个又瘦又干净的库,没有编码框架,没有XML配置文件。
绑定感觉很自然。例如,Java使用java.util.ArrayList;C++使用std::vectorstd::string。
应用层通讯格式与序列化层通讯格式是完全分离的。它们都可以独立修改。
预定义的序列化格式包括:二进制格式、对HTTP友好的格式,以及紧凑的二进制格式。
兼作跨语言文件序列化。
协议使用软版本号机制软件版本管理[需要解释]。Thrift不要求一个中心化的和显式的版本号机制,例如主版本号/次版本号。松耦合的团队可以轻松地控制RPC调用的演进。
没有构建依赖也不含非标准化的软件。不存在不兼容的软件许可证混用的情况。
创建一个Thrift服务
Thrift由C++编写,但可以为众多语言创建代码。要创建一个Thrift服务,必须写一些Thrift文件来描述它,为目标语言生成代码,并且写一些代码来启动服务器及从客户端调用它。下面就是一个这样的描述文件的代码示例:
enum PhoneType {
HOME,
WORK,
MOBILE,
OTHER
}
struct Phone {
1: i32 id,
2: string number,
3: PhoneType type
}
正文
今天我们主要分析TCompactProtocol和TMemoryBuffer
一, TMemoryBuffer类分析
1. resetBuffer方法
申请内存 调用initCommon方法调用std::malloc
class TMemoryBuffer : public TVirtualTransport<TMemoryBuffer, TBufferBase> {
private:
// Common initialization done by all constructors.
void initCommon(uint8_t* buf, uint32_t size, bool owner, uint32_t wPos) {
if (buf == NULL && size != 0) {
assert(owner);
buf = (uint8_t*)std::malloc(size);
if (buf == NULL) {
throw std::bad_alloc();
}
}
buffer_ = buf;
bufferSize_ = size;
rBase_ = buffer_;
rBound_ = buffer_ + wPos;
// TODO(dreiss): Investigate NULL-ing this if !owner.
wBase_ = buffer_ + wPos;
wBound_ = buffer_ + bufferSize_;
owner_ = owner;
// rBound_ is really an artifact. In principle, it should always be
// equal to wBase_. We update it in a few places (computeRead, etc.).
}
public:
static const uint32_t defaultSize = 1024;
/**
* This enum specifies how a TMemoryBuffer should treat
* memory passed to it via constructors or resetBuffer.
*
* OBSERVE:
* TMemoryBuffer will simply store a pointer to the memory.
* It is the callers responsibility to ensure that the pointer
* remains valid for the lifetime of the TMemoryBuffer,
* and that it is properly cleaned up.
* Note that no data can be written to observed buffers.
*
* COPY:
* TMemoryBuffer will make an internal copy of the buffer.
* The caller has no responsibilities.
*
* TAKE_OWNERSHIP:
* TMemoryBuffer will become the "owner" of the buffer,
* and will be responsible for freeing it.
* The membory must have been allocated with malloc.
*/
enum MemoryPolicy { OBSERVE = 1, COPY = 2, TAKE_OWNERSHIP = 3 };
/**
* Construct a TMemoryBuffer with a default-sized buffer,
* owned by the TMemoryBuffer object.
*/
TMemoryBuffer() { initCommon(NULL, defaultSize, true, 0); }
/**
* Construct a TMemoryBuffer with a buffer of a specified size,
* owned by the TMemoryBuffer object.
*
* @param sz The initial size of the buffer.
*/
TMemoryBuffer(uint32_t sz) { initCommon(NULL, sz, true, 0); }
/**
* Construct a TMemoryBuffer with buf as its initial contents.
*
* @param buf The initial contents of the buffer.
* Note that, while buf is a non-const pointer,
* TMemoryBuffer will not write to it if policy == OBSERVE,
* so it is safe to const_cast<uint8_t*>(whatever).
* @param sz The size of @c buf.
* @param policy See @link MemoryPolicy @endlink .
*/
TMemoryBuffer(uint8_t* buf, uint32_t sz, MemoryPolicy policy = OBSERVE) {
if (buf == NULL && sz != 0) {
throw TTransportException(TTransportException::BAD_ARGS,
"TMemoryBuffer given null buffer with non-zero size.");
}
switch (policy) {
case OBSERVE:
case TAKE_OWNERSHIP:
initCommon(buf, sz, policy == TAKE_OWNERSHIP, sz);
break;
case COPY:
initCommon(NULL, sz, true, 0);
this->write(buf, sz);
break;
default:
throw TTransportException(TTransportException::BAD_ARGS,
"Invalid MemoryPolicy for TMemoryBuffer");
}
}
~TMemoryBuffer() {
if (owner_) {
std::free(buffer_);
}
}
bool isOpen() { return true; }
bool peek() { return (rBase_ < wBase_); }
void open() {}
void close() {}
// TODO(dreiss): Make bufPtr const.
void getBuffer(uint8_t** bufPtr, uint32_t* sz) {
*bufPtr = rBase_;
*sz = static_cast<uint32_t>(wBase_ - rBase_);
}
std::string getBufferAsString() {
if (buffer_ == NULL) {
return "";
}
uint8_t* buf;
uint32_t sz;
getBuffer(&buf, &sz);
return std::string((char*)buf, (std::string::size_type)sz);
}
void appendBufferToString(std::string& str) {
if (buffer_ == NULL) {
return;
}
uint8_t* buf;
uint32_t sz;
getBuffer(&buf, &sz);
str.append((char*)buf, sz);
}
void resetBuffer() {
rBase_ = buffer_;
rBound_ = buffer_;
wBase_ = buffer_;
// It isn't safe to write into a buffer we don't own.
if (!owner_) {
wBound_ = wBase_;
bufferSize_ = 0;
}
}
/// See constructor documentation.
void resetBuffer(uint8_t* buf, uint32_t sz, MemoryPolicy policy = OBSERVE) {
// Use a variant of the copy-and-swap trick for assignment operators.
// This is sub-optimal in terms of performance for two reasons:
// 1/ The constructing and swapping of the (small) values
// in the temporary object takes some time, and is not necessary.
// 2/ If policy == COPY, we allocate the new buffer before
// freeing the old one, precluding the possibility of
// reusing that memory.
// I doubt that either of these problems could be optimized away,
// but the second is probably no a common case, and the first is minor.
// I don't expect resetBuffer to be a common operation, so I'm willing to
// bite the performance bullet to make the method this simple.
// Construct the new buffer.
TMemoryBuffer new_buffer(buf, sz, policy);
// Move it into ourself.
this->swap(new_buffer);
// Our old self gets destroyed.
}
/// See constructor documentation.
void resetBuffer(uint32_t sz) {
// Construct the new buffer.
TMemoryBuffer new_buffer(sz);
// Move it into ourself.
this->swap(new_buffer);
// Our old self gets destroyed.
}
std::string readAsString(uint32_t len) {
std::string str;
(void)readAppendToString(str, len);
return str;
}
uint32_t readAppendToString(std::string& str, uint32_t len);
// return number of bytes read
uint32_t readEnd() {
// This cast should be safe, because buffer_'s size is a uint32_t
uint32_t bytes = static_cast<uint32_t>(rBase_ - buffer_);
if (rBase_ == wBase_) {
resetBuffer();
}
return bytes;
}
// Return number of bytes written
uint32_t writeEnd() {
// This cast should be safe, because buffer_'s size is a uint32_t
return static_cast<uint32_t>(wBase_ - buffer_);
}
uint32_t available_read() const {
// Remember, wBase_ is the real rBound_.
return static_cast<uint32_t>(wBase_ - rBase_);
}
uint32_t available_write() const { return static_cast<uint32_t>(wBound_ - wBase_); }
// Returns a pointer to where the client can write data to append to
// the TMemoryBuffer, and ensures the buffer is big enough to accommodate a
// write of the provided length. The returned pointer is very convenient for
// passing to read(), recv(), or similar. You must call wroteBytes() as soon
// as data is written or the buffer will not be aware that data has changed.
uint8_t* getWritePtr(uint32_t len) {
ensureCanWrite(len);
return wBase_;
}
// Informs the buffer that the client has written 'len' bytes into storage
// that had been provided by getWritePtr().
void wroteBytes(uint32_t len);
/*
* TVirtualTransport provides a default implementation of readAll().
* We want to use the TBufferBase version instead.
*/
uint32_t readAll(uint8_t* buf, uint32_t len) { return TBufferBase::readAll(buf, len); }
protected:
void swap(TMemoryBuffer& that) {
using std::swap;
swap(buffer_, that.buffer_);
swap(bufferSize_, that.bufferSize_);
swap(rBase_, that.rBase_);
swap(rBound_, that.rBound_);
swap(wBase_, that.wBase_);
swap(wBound_, that.wBound_);
swap(owner_, that.owner_);
}
// Make sure there's at least 'len' bytes available for writing.
void ensureCanWrite(uint32_t len);
// Compute the position and available data for reading.
void computeRead(uint32_t len, uint8_t** out_start, uint32_t* out_give);
uint32_t readSlow(uint8_t* buf, uint32_t len);
void writeSlow(const uint8_t* buf, uint32_t len);
const uint8_t* borrowSlow(uint8_t* buf, uint32_t* len);
// Data buffer
uint8_t* buffer_;
// Allocated buffer size
uint32_t bufferSize_;
// Is this object the owner of the buffer?
bool owner_;
// Don't forget to update constrctors, initCommon, and swap if
// you add new members.
};
}
二, TCompactProtocol类 分析
template <class Transport_>
class TCompactProtocolT : public TVirtualProtocol<TCompactProtocolT<Transport_> > {
protected:
static const int8_t PROTOCOL_ID = (int8_t)0x82u;
static const int8_t VERSION_N = 1;
static const int8_t VERSION_MASK = 0x1f; // 0001 1111
static const int8_t TYPE_MASK = (int8_t)0xE0u; // 1110 0000
static const int8_t TYPE_BITS = 0x07; // 0000 0111
static const int32_t TYPE_SHIFT_AMOUNT = 5;
Transport_* trans_;
/**
* (Writing) If we encounter a boolean field begin, save the TField here
* so it can have the value incorporated.
*/
struct {
const char* name;
TType fieldType;
int16_t fieldId;
} booleanField_;
/**
* (Reading) If we read a field header, and it's a boolean field, save
* the boolean value here so that readBool can use it.
*/
struct {
bool hasBoolValue;
bool boolValue;
} boolValue_;
/**
* Used to keep track of the last field for the current and previous structs,
* so we can do the delta stuff.
*/
std::stack<int16_t> lastField_;
int16_t lastFieldId_;
public:
TCompactProtocolT(boost::shared_ptr<Transport_> trans)
: TVirtualProtocol<TCompactProtocolT<Transport_> >(trans),
trans_(trans.get()),
lastFieldId_(0),
string_limit_(0),
string_buf_(NULL),
string_buf_size_(0),
container_limit_(0) {
booleanField_.name = NULL;
boolValue_.hasBoolValue = false;
}
TCompactProtocolT(boost::shared_ptr<Transport_> trans,
int32_t string_limit,
int32_t container_limit)
: TVirtualProtocol<TCompactProtocolT<Transport_> >(trans),
trans_(trans.get()),
lastFieldId_(0),
string_limit_(string_limit),
string_buf_(NULL),
string_buf_size_(0),
container_limit_(container_limit) {
booleanField_.name = NULL;
boolValue_.hasBoolValue = false;
}
~TCompactProtocolT() { free(string_buf_); }
/**
* Writing functions
*/
virtual uint32_t writeMessageBegin(const std::string& name,
const TMessageType messageType,
const int32_t seqid);
uint32_t writeStructBegin(const char* name);
uint32_t writeStructEnd();
uint32_t writeFieldBegin(const char* name, const TType fieldType, const int16_t fieldId);
uint32_t writeFieldStop();
uint32_t writeListBegin(const TType elemType, const uint32_t size);
uint32_t writeSetBegin(const TType elemType, const uint32_t size);
virtual uint32_t writeMapBegin(const TType keyType, const TType valType, const uint32_t size);
uint32_t writeBool(const bool value);
uint32_t writeByte(const int8_t byte);
uint32_t writeI16(const int16_t i16);
uint32_t writeI32(const int32_t i32);
uint32_t writeI64(const int64_t i64);
uint32_t writeDouble(const double dub);
uint32_t writeString(const std::string& str);
uint32_t writeBinary(const std::string& str);
/**
* These methods are called by structs, but don't actually have any wired
* output or purpose
*/
virtual uint32_t writeMessageEnd() { return 0; }
uint32_t writeMapEnd() { return 0; }
uint32_t writeListEnd() { return 0; }
uint32_t writeSetEnd() { return 0; }
uint32_t writeFieldEnd() { return 0; }
protected:
int32_t writeFieldBeginInternal(const char* name,
const TType fieldType,
const int16_t fieldId,
int8_t typeOverride);
uint32_t writeCollectionBegin(const TType elemType, int32_t size);
uint32_t writeVarint32(uint32_t n);
uint32_t writeVarint64(uint64_t n);
uint64_t i64ToZigzag(const int64_t l);
uint32_t i32ToZigzag(const int32_t n);
inline int8_t getCompactType(const TType ttype);
public:
uint32_t readMessageBegin(std::string& name, TMessageType& messageType, int32_t& seqid);
uint32_t readStructBegin(std::string& name);
uint32_t readStructEnd();
uint32_t readFieldBegin(std::string& name, TType& fieldType, int16_t& fieldId);
uint32_t readMapBegin(TType& keyType, TType& valType, uint32_t& size);
uint32_t readListBegin(TType& elemType, uint32_t& size);
uint32_t readSetBegin(TType& elemType, uint32_t& size);
uint32_t readBool(bool& value);
// Provide the default readBool() implementation for std::vector<bool>
using TVirtualProtocol<TCompactProtocolT<Transport_> >::readBool;
uint32_t readByte(int8_t& byte);
uint32_t readI16(int16_t& i16);
uint32_t readI32(int32_t& i32);
uint32_t readI64(int64_t& i64);
uint32_t readDouble(double& dub);
uint32_t readString(std::string& str);
uint32_t readBinary(std::string& str);
/*
*These methods are here for the struct to call, but don't have any wire
* encoding.
*/
uint32_t readMessageEnd() { return 0; }
uint32_t readFieldEnd() { return 0; }
uint32_t readMapEnd() { return 0; }
uint32_t readListEnd() { return 0; }
uint32_t readSetEnd() { return 0; }
protected:
uint32_t readVarint32(int32_t& i32);
uint32_t readVarint64(int64_t& i64);
int32_t zigzagToI32(uint32_t n);
int64_t zigzagToI64(uint64_t n);
TType getTType(int8_t type);
// Buffer for reading strings, save for the lifetime of the protocol to
// avoid memory churn allocating memory on every string read
int32_t string_limit_;
uint8_t* string_buf_;
int32_t string_buf_size_;
int32_t container_limit_;
};
typedef TCompactProtocolT<TTransport> TCompactProtocol;