ThreadCachedInt概览
ThreadCachedInt是一个适合写多读少的整数类,它采用ThreadLocalPtr保存每个线程本地一个整数值,写请求只会改写线程本地的整数,当要用读整个整数的时候,会用ThreadLocalPtr的遍历方法,返回全部线程本地的整数和。这样写场景性能会比原子整数性能高很多,适合统计的场景,例如统计连接数,已用连接数大小等等。
性能根据官方的注释:Higher performance (up to 10x) atomic increment using thread caching.
ThreadCachedInt组成
ThreadCachedInt由三个变量组成
class ThreadCachedInt {
std::atomic<IntT> target_;
std::atomic<uint32_t> cacheSize_;
ThreadLocalPtr<IntCache, Tag, AccessModeStrict>
cache_; // Must be last for dtor ordering
}
struct IntCache {
ThreadCachedInt* parent_;
mutable std::atomic<IntT> val_;//加减ThreadCachedInt先修改val,定期刷新到parent_->target_
mutable uint32_t numUpdates_;//执行numUpdates_次操作会把val同步到parent_->target_
std::atomic<bool> reset_;//当ThreadCachedInt设reset为true时,把val清0
}
ThreadCachedInt的cache_,是每个线程都有,每次对本地线程cache的操作就是操作cache_
ThreadCachedIntd读写
写操作重载了+= -= ++ -- 运算符,应用者直接用就可,重载运算符函数调用了increment函数,然后调用cache->increment(inc);
void increment(IntT inc) {
auto cache = cache_.get();
if (UNLIKELY(cache == nullptr)) {
cache = new IntCache(*this);
cache_.reset(cache);
}
cache->increment(inc);
}
increment会先加val,因为cache只有本线程会操作,不需要担心并发操作的问题,numUpdates记录修改次数,当修改次数达到cacheSize(由构造ThreadCacheInt指定)的时候,就会执行flush().原子增加parent->target,
所以ThreadCacheInt的target是一个非实时的数值,所以ThreadCacheInt的readFast方法就是直接返回target的值
void increment(IntT inc) {
if (LIKELY(!reset_.load(std::memory_order_acquire))) {
// This thread is the only writer to val_, so it's fine do do
// a relaxed load and do the addition non-atomically.
val_.store(
val_.load(std::memory_order_relaxed) + inc,
std::memory_order_release);
} else {
val_.store(inc, std::memory_order_relaxed);
reset_.store(false, std::memory_order_release);
}
++numUpdates_;
if (UNLIKELY(
numUpdates_ >
parent_->cacheSize_.load(std::memory_order_acquire))) {
flush();
}
}
void flush() const {
parent_->target_.fetch_add(val_, std::memory_order_release);
val_.store(0, std::memory_order_release);
numUpdates_ = 0;
}
ThreadCacheInt的读方法 有2个readFast,readFast就是快速返回target_的值, readFull利用folly threadlocal的黑魔法,遍历每个线程的cache,将readFast的值,加上 每个线程的val值 ,返回
IntT readFast() const {
return target_.load(std::memory_order_relaxed);
}
// Reads the current value plus all the cached increments. Requires grabbing
// a lock, so this is significantly slower than readFast().
IntT readFull() const {
// This could race with thread destruction and so the access lock should be
// acquired before reading the current value
const auto accessor = cache_.accessAllThreads();
IntT ret = readFast();
for (const auto& cache : accessor) {
if (!cache.reset_.load(std::memory_order_acquire)) {
ret += cache.val_.load(std::memory_order_relaxed);
}
}
return ret;
}
set方法直接修改值
void set(IntT newVal) {
for (auto& cache : cache_.accessAllThreads()) {
cache.reset_.store(true, std::memory_order_release);
}
target_.store(newVal, std::memory_order_release);
}