使用c++ string
类不可避免会带来很多不必要的拷贝,拷贝多了必然影响性能。因此在很多高性能C++框架的实现中,都会使用StringPiece类作为string类的wrapper,该类只持有目标字符串的指针,而避免额外的拷贝,比较常见的实现有:
- muduo实际上是使用了pcre的StringPiece的实现
- Chromium
Chromium也说明了引入StringPiece的意义,主要是为了免拷贝:
// You can use StringPiece as a function or method parameter. A StringPiece
// parameter can receive a double-quoted string literal argument, a "const
// char*" argument, a string argument, or a StringPiece argument with no data
// copying. Systematic use of StringPiece for arguments reduces data
// copies and strlen() calls.
pcre中也说明了StringPiece的使用注意事项
// ------------------------------------------------------------------
// Functions used to create STL containers that use StringPiece
// Remember that a StringPiece's lifetime had better be less than
// that of the underlying string or char*. If it is not, then you
// cannot safely store a StringPiece into an STL container
// ------------------------------------------------------------------
下面我们以muduo源码为例,讲解StringPiece的实现
构造函数
由于StringPiece不控制字符串的生命周期,因为调用方要保证在StringPiece的生命周期里,其指向的字符串始终有效
注意字符串的头指针ptr_
被定义为const
类型,其不允许任何修改
class StringPiece {
private:
const char *ptr_;
int length_;
public:
// We provide non-explicit singleton constructors so users can pass
// in a "const char*" or a "string" wherever a "StringPiece" is
// expected.
StringPiece() : ptr_(NULL), length_(0) {}
StringPiece(const char *str)
: ptr_(str), length_(static_cast<int>(strlen(ptr_))) {}
StringPiece(const unsigned char *str)
: ptr_(reinterpret_cast<const char *>(str)),
length_(static_cast<int>(strlen(ptr_))) {}
StringPiece(const std::string &str)
: ptr_(str.data()), length_(static_cast<int>(str.size())) {}
StringPiece(const char *offset, int len) : ptr_(offset), length_(len) {}
...
};
操作
由于StringPiece不控制字符串的生命周期,因此字符串操作都是常数时间复杂度。
void clear() {
ptr_ = NULL;
length_ = 0;
}
void set(const char *buffer, int len) {
ptr_ = buffer;
length_ = len;
}
void set(const char *str) {
ptr_ = str;
length_ = static_cast<int>(strlen(str));
}
void set(const void *buffer, int len) {
ptr_ = reinterpret_cast<const char *>(buffer);
length_ = len;
}
char operator[](int i) const { return ptr_[i]; }
void remove_prefix(int n) {
ptr_ += n;
length_ -= n;
}
void remove_suffix(int n) { length_ -= n; }
另外如果想使用StringPiece作为key使用hashmap,记得自定义哈希函数,这点可以查看Chromium的实现。
字符串比较
字符串的字典序比较,在==
和!=
这两种比较中,调用memcmp
比较到不相等的那一位为止
其他类型的比较中,按照短的长度进行比较,然后再根据需要的比较符号做判断
bool operator==(const StringPiece &x) const {
return ((length_ == x.length_) && (memcmp(ptr_, x.ptr_, length_) == 0));
}
bool operator!=(const StringPiece &x) const { return !(*this == x); }
#define STRINGPIECE_BINARY_PREDICATE(cmp, auxcmp) \
bool operator cmp(const StringPiece &x) const { \
int r = memcmp(ptr_, x.ptr_, length_ < x.length_ ? length_ : x.length_); \
return ((r auxcmp 0) || ((r == 0) && (length_ cmp x.length_))); \
}
STRINGPIECE_BINARY_PREDICATE(<, < );
STRINGPIECE_BINARY_PREDICATE(<=, < );
STRINGPIECE_BINARY_PREDICATE(>=, > );
STRINGPIECE_BINARY_PREDICATE(>, > );
#undef STRINGPIECE_BINARY_PREDICATE
Traits
由于StringPiece
只持有目标指针,所以是POD类型,并且拥有平凡构造函数,所以可以定义如下的type traits
以指示STL采用更为高效的算法实现。
#ifdef HAVE_TYPE_TRAITS
// This makes vector<StringPiece> really fast for some STL implementations
template <>
struct __type_traits<base::StringPiece> {
typedef __true_type has_trivial_default_constructor;
typedef __true_type has_trivial_copy_constructor;
typedef __true_type has_trivial_assignment_operator;
typedef __true_type has_trivial_destructor;
typedef __true_type is_POD_type;
};
#endif