文件
Fence.h
封装x86/64系统架构内存屏障指令
关于内存屏障内容涉及较多,还需深入理解
Util.h
cpu线程绑定、编译器屏障、rdpmc
TimeTrace.h
线程安全类TimeTrace
通过环形buffer记录event,如下:
120 struct Event {
121 uint64_t timestamp; // Time when a particular event occurred.
122 const char* format; // Format string describing the event.
123 // NULL means that this entry is unused.
124 uint32_t arg0; // Argument that may be referenced by format
125 // when printing out this event.
126 uint32_t arg1; // Argument that may be referenced by format
127 // when printing out this event.
128 uint32_t arg2; // Argument that may be referenced by format
129 // when printing out this event.
130 uint32_t arg3; // Argument that may be referenced by format
131 // when printing out this event.
132 };
记录实现:
125 void TimeTrace::Buffer::record(uint64_t timestamp, const char* format,
126 uint32_t arg0, uint32_t arg1, uint32_t arg2, uint32_t arg3)
127 {
128 if (activeReaders > 0) {
129 return;
130 }
131
132 Event* event = &events[nextIndex];
133 nextIndex = (nextIndex + 1) & BUFFER_MASK;
134
135 // There used to be code here for prefetching the next few events,
136 // in order to minimize cache misses on the array of events. However,
137 // performance measurements indicate that this actually slows things
138 // down by 2ns per invocation.
139 // prefetch(event+1, NUM_PREFETCH*sizeof(Event));
140
141 event->timestamp = timestamp;
142 event->format = format;
143 event->arg0 = arg0;
144 event->arg1 = arg1;
145 event->arg2 = arg2;
146 event->arg3 = arg3;
147 }
注意:
访问原子成员变量,如果大于0表示读介入,禁止写操作,但如果写入event时,消费线程在读,那会不会造成读到的event不是最新的呢?
从代码上看,消费线程从activeReaders++到读取event有很多代码要走且读取前对format进行非空判断,理论上没问题,那会不会存在生成线程被抢占或中断等导致非最新呢?
std::atomic<int> activeReaders;
输出内容到文件、或stdout、或字符串实现:
TimeTrace::printInternal(std::vector<TimeTrace::Buffer*>* buffers, string* s)
1.首先,遍历buffers,buffers->at(i)->activeReaders.fetch_and(1);
2.遍历buffers再遍历event,输出事件,格式为,
fprintf(output, "%8.1f ns (+%6.1f ns): %s", ns, ns - prevTime,message);,这里ns - prevTime (这个差值表示循环里上次调用snprintf+fprintf)
3.最后,遍历buffers,buffers->at(i)->activeReaders.fetch_and(-1);
Log.h
格式化参数类型
88 enum ParamType : int32_t {
89 // Indicates that there is a problem with the parameter
90 INVALID = -6,
91
92 // Indicates a dynamic width (i.e. the '*' in %*.d)
93 DYNAMIC_WIDTH = -5,
94
95 // Indicates dynamic precision (i.e. the '*' in %.*d)
96 DYNAMIC_PRECISION = -4,
97
98 // Indicates that the parameter is not a string type (i.e. %d, %lf)
99 NON_STRING = -3,
100
101 // Indicates the parameter is a string and has a dynamic precision
102 // (i.e. '%.*s' )
103 STRING_WITH_DYNAMIC_PRECISION = -2,
104
105 // Indicates a string with no precision specified (i.e. '%s' )
106 STRING_WITH_NO_PRECISION = -1,
107
108 // All non-negative values indicate a string with a precision equal to its
109 // enum value casted as an int32_t
110 STRING = 0
111 };
举例说明:
54 char randomString[] = "Hello World";
55 NANO_LOG(NOTICE, "A string, pointer, number, and float: '%s', %p, %d, %f",
56 randomString,
57 &randomString,
58 512,
59 3.14159);
60
(const std::array<NanoLogInternal::ParamType, 4> &) @0x428890: {_M_elems = {
NanoLogInternal::STRING_WITH_NO_PRECISION, NanoLogInternal::NON_STRING,
NanoLogInternal::NON_STRING, NanoLogInternal::NON_STRING}}
61 // Even with width and length specifiers
62 NANO_LOG(NOTICE, "Shortend String: '%5s' and shortend float %0.2lf",
63 randomString,
64 3.14159);
(const std::array<NanoLogInternal::ParamType, 2> &) @0x4288a0: {_M_elems = {
NanoLogInternal::STRING_WITH_NO_PRECISION, NanoLogInternal::NON_STRING}}
压缩数据实现:
1.参数描述,消费位置、字节数、stagingBuffer id、stagingBuffer列表回环标记即回到第一个stagingBuffer、格式化参数类型、压缩条数
307 Log::Encoder::encodeLogMsgs(char *from,
308 uint64_t nbytes,
309 uint32_t bufferId,
310 bool newPass,
311 std::vector<StaticLogInfo> dictionary,
312 uint64_t *numEventsCompressed)
2.标识该写入内容所属的stagingBuffer,创建BufferExtent,类型为EntryType::BUFFER_EXTENT
218 if (!encodeBufferExtentStart(bufferId, newPass))
219 return 0;
419 bool
420 Log::Encoder::encodeBufferExtentStart(uint32_t bufferId, bool newPass)
423 char *writePosStart = writePos;
424 if (sizeof(BufferExtent) + sizeof(bufferId) >
425 static_cast<size_t>(endOfBuffer - writePos))
426 return false;
427
428 BufferExtent *tc = reinterpret_cast<BufferExtent*>(writePos);
429 writePos += sizeof(BufferExtent);
430
431 tc->entryType = EntryType::BUFFER_EXTENT;
432 tc->wrapAround = newPass;
433
434 if (bufferId < (1<<4)) {
435 tc->isShort = true;
436 tc->threadIdOrPackNibble = 0x0F & bufferId;
437 } else {
438 tc->isShort = false;
439 tc->threadIdOrPackNibble = 0x0F & BufferUtils::pack<uint32_t>(
440 &writePos, bufferId);
441 }
442
443 tc->length = downCast<uint32_t>(writePos - writePosStart);
444 currentExtentSize = &(tc->length);
445 lastBufferIdEncoded = bufferId;
446
447 return true;
注意:
currentExtentSize,表示该 stagingBuffer已压缩的字节数
3.写入压缩数据,每条数据log头部是结构体UncompressedEntry,后面是格式化参数,头部占用16个字节
while (remaining > 0) {
369 uint32_t maxCompressedSize = downCast<uint32_t>(2*entry->entrySize
370 + sizeof(Log::UncompressedEntry));
371 if (maxCompressedSize > (endOfBuffer - writePos))
372 break;
373
374 compressLogHeader(entry, &writePos, lastTimestamp);
375 lastTimestamp = entry->timestamp;
377 StaticLogInfo &info = dictionary.at(entry->fmtId);
382 char *argData = entry->argData;
383 info.compressionFunction(info.numNibbles, info.paramTypes,
384 &argData, &writePos);
385
386 remaining -= entry->entrySize;
387 from += entry->entrySize;
388
389 ++numEventsProcessed;
390 }
4.更新已写入的StagingBuffer的长度,返回已处理的字节数和压缩的条数
392 assert(currentExtentSize);
393 uint32_t currentSize;
394 std::memcpy(¤tSize, currentExtentSize, sizeof(uint32_t));
395 currentSize += downCast<uint32_t>(writePos - bufferStart);
396 std::memcpy(currentExtentSize, ¤tSize, sizeof(uint32_t));
397
398 if (numEventsCompressed)
399 *numEventsCompressed += numEventsProcessed;
400
401 return nbytes - remaining;
Cycles.h
基于rdtsc,封装了一套,时间相关操作
如:
1. sleep
2. 通过cycles获取耗时
3. 通过耗时获取cycles
测试了rdtsc和gettimeofday性能,结果是
//100W loop
0.000704
0.001773
//CPU info
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 158
model name : Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
stepping : 13
microcode : 0xb8
cpu MHz : 3599.780
cache size : 16384 KB
physical id : 0
siblings : 16
core id : 0
cpu cores : 8
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch intel_pt ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear spec_ctrl intel_stibp flush_l1d arch_capabilities
bogomips : 7200.00
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
gettimeofday较rdtsc高出1.5倍的样子
关于linux时钟涉及较多,还需深入理解
RuntimeLogger.h
类RuntimeLogger,会生成单例,static RuntimeLogger nanoLogSingleton
管理重要数据:
char *compressingBuffer;与char *outputDoubleBuffer;是个双bufferI(使用posix_memalign,通过512对齐进行分配,这里对齐对该块内存的优化需要深入研究)
static __thread StagingBuffer *stagingBuffer;
std::vector<StagingBuffer *> threadBuffers;
std::vector<StaticLogInfo> invocationSites;
消费线程:
RuntimeLogger::compressionThreadMain()
1.线程while循环前,初始化Encoder,填充结构体Checkpoint 到compressingBuffer,类型CHECKPOINT
Log::Encoder encoder(compressingBuffer, NanoLogConfig::OUTPUT_BUFFER_SIZE);
insertCheckpoint(&writePos, endOfBuffer, writeDictionary)
47 Log::insertCheckpoint(char **out, char *outLimit, bool writeDictionary) {
48 if (static_cast<uint64_t>(outLimit - *out) < sizeof(Checkpoint))
49 return false;
50
51 Checkpoint *ck = reinterpret_cast<Checkpoint*>(*out);
52 *out += sizeof(Checkpoint);
53
54 ck->entryType = Log::EntryType::CHECKPOINT;
55 ck->rdtsc = PerfUtils::Cycles::rdtsc();
56 ck->unixTime = std::time(nullptr);
57 ck->cyclesPerSecond = PerfUtils::Cycles::getCyclesPerSec();
58 ck->newMetadataBytes = ck->totalMetadataEntries = 0;
2. 遍历静态格式化字符串,对encoder进行写字典数据
396 {
397 std::unique_lock<std::mutex> lock(bufferMutex);
398 size_t i = lastStagingBufferChecked;
399
400 // Output new dictionary entries, if necessary
401 if (nextInvocationIndexToBePersisted < invocationSites.size())
402 {
403 std::unique_lock<std::mutex> lock (registrationMutex);
404 encoder.encodeNewDictionaryEntries(
405 nextInvocationIndexToBePersisted,
406 invocationSites);
407
408 // update our shadow copy
409 for (uint64_t i = shadowStaticInfo.size();
410 i < nextInvocationIndexToBePersisted; ++i)
411 {
412 shadowStaticInfo.push_back(invocationSites.at(i));
413 }
414 }
注意:
这里使用shadowStaticInfo,作为影子再后面使用,这是防止容器在其他线程被修改而不用加锁的方法
具体实现 :
生成字典类型(LOG_MSGS_OR_DIC)数据DictionaryFragment ,然后添加静态格式化串到结构体CompressedLogInfo
140 uint32_t
141 Log::Encoder::encodeNewDictionaryEntries(uint32_t& currentPosition,
142 std::vector<StaticLogInfo> allMetadata)
143 {
144 char *bufferStart = writePos;
145
146 if (sizeof(DictionaryFragment) >=
147 static_cast<uint32_t>(endOfBuffer - writePos))
148 return 0;
149
150 DictionaryFragment *df = reinterpret_cast<DictionaryFragment*>(writePos);
151 writePos += sizeof(DictionaryFragment);
152 df->entryType = EntryType::LOG_MSGS_OR_DIC;
153
154 while (currentPosition < allMetadata.size()) {
155 StaticLogInfo &curr = allMetadata.at(currentPosition);
156 size_t filenameLength = strlen(curr.filename) + 1;
157 size_t formatLength = strlen(curr.formatString) + 1;
158 size_t nextDictSize = sizeof(CompressedLogInfo)
159 + filenameLength
160 + formatLength;
161
162 // Not enough space, break out!
163 if (nextDictSize >= static_cast<uint32_t>(endOfBuffer - writePos))
164 break;
165
166 CompressedLogInfo *cli = reinterpret_cast<CompressedLogInfo*>(writePos);
167 writePos += sizeof(CompressedLogInfo);
168
169 cli->severity = curr.severity;
170 cli->linenum = curr.lineNum;
171 cli->filenameLength = static_cast<uint16_t>(filenameLength);
172 cli->formatStringLength = static_cast<uint16_t>(formatLength);
173
174 memcpy(writePos, curr.filename, filenameLength);
175 memcpy(writePos + filenameLength, curr.formatString, formatLength);
176 writePos += filenameLength + formatLength;
177 ++currentPosition;
178 }
179
180 df->newMetadataBytes = 0x3FFFFFFF & static_cast<uint32_t>(
181 writePos - bufferStart);
182 df->totalMetadataEntries = currentPosition;
183 return df->newMetadataBytes;
184 }
3.循环,从threadBuffers获取第一个StagingBuffer,调用peek获取可消费数据大小peekBytes ,然后通过数组stagingBufferPeekDist统计指标,写buffer每增长5%在相应槽位累加
418 while (!outputBufferFull && !threadBuffers.empty())
419 {
420 uint64_t peekBytes = 0;
421 StagingBuffer *sb = threadBuffers[i];
422 char *peekPosition = sb->peek(&peekBytes);
425 if (peekBytes > 0) {
426 uint64_t start = PerfUtils::Cycles::rdtsc();
427 lock.unlock();
429 // Record metrics on the peek size
430 size_t sizeOfDist = Util::arraySize(stagingBufferPeekDist);
431 size_t distIndex = (sizeOfDist*peekBytes)/
432 NanoLogConfig::STAGING_BUFFER_SIZE;
433 ++(stagingBufferPeekDist[distIndex]);
4.通过peekBytes计算最小压缩字节数,调用encodeLogMsgs压缩数据,如果返回0表示满了,结束循环while (!outputBufferFull && !threadBuffers.empty())
437 uint32_t remaining = downCast<uint32_t>(peekBytes);
438 while (remaining > 0) {
439 long bytesToEncode = std::min(
440 NanoLogConfig::RELEASE_THRESHOLD,
441 remaining);
450 long bytesRead = encoder.encodeLogMsgs(
451 peekPosition + (peekBytes - remaining),
452 bytesToEncode,
453 sb->getId(),
454 wrapAround,
455 shadowStaticInfo,
456 &logsProcessed);
460 if (bytesRead == 0) {
461 lastStagingBufferChecked = i;
462 outputBufferFull = true;
463 break;
464 }
这里函数 encodeLogMsgs具体实现查看上文Log.h
5.没返回0的情况,更新消费偏移
466 wrapAround = false;
467 remaining -= downCast<uint32_t>(bytesRead);
468 sb->consume(bytesRead);
469 totalBytesRead += bytesRead;
470 bytesConsumedThisIteration += bytesRead;
这里注意
加入读屏障,保证对consumerPos 赋值时,consumerPos 是读取的最新值
373 inline void
374 consume(uint64_t nbytes) {
375 Fence::lfence(); // Make sure consumer reads finish before bump
376 consumerPos += nbytes;
377 }
6.第一个StagingBuffer消费完了,继续消费下一个,如果所有StagingBuffer都完成,结束循环while (!outputBufferFull && !threadBuffers.empty())
497 i = (i + 1) % threadBuffers.size();
498
499 if (i == 0)
500 wrapAround = true;
501
502 // Completed a full pass through the buffers
503 if (i == lastStagingBufferChecked)
504 break;
7.获取已压缩的数据大小,填充aioCb,调用aio_write,设置hasOutstandingOperation=true,异步写文件,然后swap buffer,设置outputBufferFull=false
594 ssize_t bytesToWrite = encoder.getEncodedBytes();
595 if (bytesToWrite == 0)
596 continue;
597
598 // Pad the output if necessary
599 if (NanoLogConfig::FILE_PARAMS & O_DIRECT) {
600 ssize_t bytesOver = bytesToWrite % 512;
601
602 if (bytesOver != 0) {
603 memset(compressingBuffer, 0, 512 - bytesOver);
604 bytesToWrite = bytesToWrite + 512 - bytesOver;
605 padBytesWritten += (512 - bytesOver);
606 }
607 }
608
609 aioCb.aio_fildes = outputFd;
610 aioCb.aio_buf = compressingBuffer;
611 aioCb.aio_nbytes = bytesToWrite;
612 totalBytesWritten += bytesToWrite;
613
614 cyclesAtLastAIOStart = PerfUtils::Cycles::rdtsc();
615 if (aio_write(&aioCb) == -1)
616 fprintf(stderr, "Error at aio_write(): %s\n", strerror(errno));
617
618 hasOutstandingOperation = true;
619
620 // Swap buffers
621 encoder.swapBuffer(outputDoubleBuffer,
622 NanoLogConfig::OUTPUT_BUFFER_SIZE);
623 std::swap(outputDoubleBuffer, compressingBuffer);
624 outputBufferFull = false;
以上就是消费数据,然后写入文件的过程
该线程还有几个细节要处理
细节1:调用sync函数后,条件变量workAdded唤醒消费线程,再通过条件变量hintSyncCompleted等待,并设置syncStatus = SYNC_REQUESTED,直到消费线程调用 hintSyncCompleted.notify_one()
所以该处有两个用处:
一个压缩缓冲区为空,不需写文件,休眠一定时间或被函数sync唤醒
另一个是唤醒sync
510 // If there's no data to output, go to sleep.
511 if (encoder.getEncodedBytes() == 0) {
512 std::unique_lock<std::mutex> lock(condMutex);
513
514 // If a sync was requested, we should make at least 1 more
515 // pass to make sure we got everything up to the sync point.
516 if (syncStatus == SYNC_REQUESTED) {
517 syncStatus = PERFORMING_SECOND_PASS;
518 continue;
519 }
520
521 if (syncStatus == PERFORMING_SECOND_PASS) {
522 syncStatus = (hasOutstandingOperation) ? WAITING_ON_AIO
523 : SYNC_COMPLETED;
524 }
525
526 if (syncStatus == SYNC_COMPLETED) {
527 hintSyncCompleted.notify_one();
528 }
529
530 cyclesActive += PerfUtils::Cycles::rdtsc() - cyclesAwakeStart;
531 workAdded.wait_for(lock, std::chrono::microseconds(
532 NanoLogConfig::POLL_INTERVAL_NO_WORK_US));
533
534 cyclesAwakeStart = PerfUtils::Cycles::rdtsc();
535 }
细节2:hasOutstandingOperation标识异步写文件
如果处理中,如果log缓冲区满了,一直等待aio完成
如果此次写入压缩数据为0,休眠
否则,aio完成,切换状态为syncStatus = SYNC_COMPLETED;,并唤醒hintSyncCompleted.notify_one();
537 if (hasOutstandingOperation) {
538 if (aio_error(&aioCb) == EINPROGRESS) {
539 const struct aiocb *const aiocb_list[] = {&aioCb};
540 if (outputBufferFull) {
541 // If the output buffer is full and we're not done,
542 // wait for completion
543 cyclesActive += PerfUtils::Cycles::rdtsc() - cyclesAwakeStart;
544 int err = aio_suspend(aiocb_list, 1, NULL);
545 cyclesAwakeStart = PerfUtils::Cycles::rdtsc();
546 if (err != 0)
547 perror("LogCompressor's Posix AIO "
548 "suspend operation failed");
549 } else {
550 // If there's no new data, go to sleep.
551 if (bytesConsumedThisIteration == 0 &&
552 NanoLogConfig::POLL_INTERVAL_DURING_IO_US > 0)
553 {
554 std::unique_lock<std::mutex> lock(condMutex);
555 cyclesActive += PerfUtils::Cycles::rdtsc() -
556 cyclesAwakeStart;
557 workAdded.wait_for(lock, std::chrono::microseconds(
558 NanoLogConfig::POLL_INTERVAL_DURING_IO_US));
559 cyclesAwakeStart = PerfUtils::Cycles::rdtsc();
560 }
561
562 if (aio_error(&aioCb) == EINPROGRESS)
563 continue;
564 }
565 }
566
567 // Finishing up the IO
568 int err = aio_error(&aioCb);
569 ssize_t ret = aio_return(&aioCb);
570
571 if (err != 0) {
572 fprintf(stderr, "LogCompressor's POSIX AIO failed"
573 " with %d: %s\r\n", err, strerror(err));
574 } else if (ret < 0) {
575 perror("LogCompressor's Posix AIO Write failed");
576 }
577 ++numAioWritesCompleted;
578 hasOutstandingOperation = false;
579 cyclesDiskIO_upperBound += (start - cyclesAtLastAIOStart);
580
581 // We've completed an AIO, check if we need to notify
582 if (syncStatus == WAITING_ON_AIO) {
583 std::unique_lock<std::mutex> lock(nanoLogSingleton.condMutex);
584 if (syncStatus == WAITING_ON_AIO) {
585 syncStatus = SYNC_COMPLETED;
586 hintSyncCompleted.notify_one();
587 }
588 }
589 }
使用
预分配:
NanoLog::preallocate();
最终实现
172 inline void
173 ensureStagingBufferAllocated() {
174 if (stagingBuffer == nullptr) {
175 std::unique_lock<std::mutex> guard(bufferMutex);
176 uint32_t bufferId = nextBufferId++;
177
178 // Unlocked for the expensive StagingBuffer allocation
179 guard.unlock();
180 stagingBuffer = new StagingBuffer(bufferId);
181 guard.lock();
182
183 threadBuffers.push_back(stagingBuffer);
184 }
185 }
主要分配线程存储StagingBuffer,然后加入到vector
这个锁粒度很巧妙
函数preallocate只分配了当前线程,因此,最好对每个线程处理函数都调用一次
记录日志:
注意:写空间不足会阻塞,对于高可用服务要注意,需要修改reserveSpaceInternal的参数
NanoLogCpp17.h
1073 #define NANO_LOG(severity, format, ...) do { \
1074 constexpr int numNibbles = NanoLogInternal::getNumNibblesNeeded(format); \
1075 constexpr int nParams = NanoLogInternal::countFmtParams(format); \
1076 \
1077 /*** Very Important*** These must be 'static' so that we can save pointers
1078 * to these variables and have them persist beyond the invocation.
1079 * The static logId is used to forever associate this local scope (tied
1080 * to an expansion of #NANO_LOG) with an id and the paramTypes array is
1081 * used by the compression function, which is invoked in another thread
1082 * at a much later time. */ \
1083 static constexpr std::array<NanoLogInternal::ParamType, nParams> paramTypes = \
1084 NanoLogInternal::analyzeFormatString<nParams>(format); \
1085 static int logId = NanoLogInternal::UNASSIGNED_LOGID; \
1086 \
1087 if (NanoLog::severity > NanoLog::getLogLevel()) \
1088 break; \
1089 \
1090 /* Triggers the GNU printf checker by passing it into a no-op function.
1091 * Trick: This call is surrounded by an if false so that the VA_ARGS don't
1092 * evaluate for cases like '++i'.*/ \
1093 if (false) { NanoLogInternal::checkFormat(format, ##__VA_ARGS__); } /*NOLINT(cppcoregui
delines-pro-type-vararg, hicpp-vararg)*/\
1094 \
1095 NanoLogInternal::log(logId, __FILE__, __LINE__, NanoLog::severity, format, \
1096 numNibbles, paramTypes, ##__VA_ARGS__); \
1097 } while(0)
这里循环时一直调用函数进行判断,初始看以为对性能有影响,后发现使用时有标识constexpr ,编译期运算,解析格式化字符串中非字符串或精度相关参数个数
330 template<size_t N>
331 constexpr int
332 getNumNibblesNeeded(const char (&fmt)[N])
333 {
334 int numNibbles = 0;
335 for (int i = 0; i < countFmtParams(fmt); ++i) {
336 ParamType t = getParamInfo(fmt, i);
337 if (t == NON_STRING || t == DYNAMIC_PRECISION || t == DYNAMIC_WIDTH)
338 ++numNibbles;
339 }
340
341 return numNibbles;
342 }
函数countFmtParams,获取格式化参数 个数
这里注意:
paramTypes ,logId 都是静态变量,很重要,都只会初始化一次
paramTypes保存的数据具体可以看上文Log.h示例
最终调用:
994 template<long unsigned int N, int M, typename... Ts>
995 inline void
996 log(int &logId,
997 const char *filename,
998 const int linenum,
999 const LogLevel severity,
1000 const char (&format)[M],
1001 const int numNibbles,
1002 const std::array<ParamType, N>& paramTypes,
1003 Ts... args)
1.把参数复制到info,获取logId,因logId是静态变量,因此只会生成一次,然后添加到vector,只会添加一次
1008 if (logId == UNASSIGNED_LOGID) {
1009 const ParamType *array = paramTypes.data();
1010 StaticLogInfo info(&compress<Ts...>,
1011 filename,
1012 linenum,
1013 severity,
1014 format,
1015 sizeof...(Ts),
1016 numNibbles,
1017 array);
1018
1019 RuntimeLogger::registerInvocationSite(info, logId);
1020 }
51 inline void
52 registerInvocationSite_internal(int &logId, StaticLogInfo info) {
53 // TODO(syang0) Make this into a spin lock
54 std::lock_guard<std::mutex>
55 lock(nanoLogSingleton.registrationMutex);
56
57 if (logId != UNASSIGNED_LOGID)
58 return;
59
60 logId = static_cast<int32_t>(invocationSites.size());
61 invocationSites.push_back(info);
}
这里注意:
paramTypes变量具体内容是什么,可看上面Log.h
2.调用getArgSizes,获取需要分配的空间大小
1024 size_t stringSizes[N + 1] = {}; //HACK: Zero length arrays are not allowed
1025 size_t allocSize = getArgSizes(paramTypes, previousPrecision,
1026 stringSizes, args...) + sizeof(UncompressedEntry);
1027
1028 char *writePos = NanoLogInternal::RuntimeLogger::reserveAlloc(allocSize);
举例:
54 char randomString[] = "Hello World";
62 NANO_LOG(NOTICE, "Shortend String: '%5s' and shortend float %0.2lf",
63 randomString,
64 3.14159);
(gdb) p stringSizes
$4 = {11, 0, 0}
(gdb) p allocSize
$5 = 39
这里看到stringSizes数组存储字符串参数长度,数组大小等于参数个数+1,目前只存储randomString
allocSize大小39,是,11+4+8(lf)+sizeof(UncompressedEntry)(16)
11+4的由来,通过函数getArgSize获取stringBytes + sizeof(uint32_t),用于向buffer写入数据时,先写入字符串长度,再写入字符串,具体可用看函数store_argument实现
3.调用reserveAlloc获取可以写入数据的位置
104 static inline char *
105 reserveAlloc(size_t nbytes) {
106 if (stagingBuffer == nullptr)
107 nanoLogSingleton.ensureStagingBufferAllocated();
108
109 // NOLINTNEXTLINE(clang-analyzer-core.CallAndMessage)
110 return stagingBuffer->reserveProducerSpace(nbytes);
111 }
333 inline char *
334 reserveProducerSpace(size_t nbytes) {
335 ++numAllocations;
336
337 // Fast in-line path
338 if (nbytes < minFreeSpace)
339 return producerPos;
340
341 // Slow allocation
342 return reserveSpaceInternal(nbytes);
343 }
这里注意:
reserveSpaceInternal的实现,目的是针对环形buffer,循环使用问题,更新producerPos、minFreeSpace,并且这里会while阻塞直到有可用空间为止
4.写入数据
1031 UncompressedEntry *ue = new(writePos) UncompressedEntry();
1032 writePos += sizeof(UncompressedEntry);
1033
1034 store_arguments(paramTypes, stringSizes, &writePos, args...);
1035
1036 ue->fmtId = logId;
1037 ue->timestamp = timestamp;
1038 ue->entrySize = downCast<uint32_t>(allocSize);
1039
1040 #ifdef ENABLE_DEBUG_PRINTING
1041 printf("\r\nRecording %d:'%s' of size %u\r\n",
1042 logId, info.formatString, ue->entrySize);
1043 #endif
1044
注意:
先写入结构体UncompressedEntry,再写入数据 ,allocSize是格式化中参数所需的字节数
5.计算使用字节数,更新写入缓冲区偏移
1045 assert(allocSize == downCast<uint32_t>((writePos - originalWritePos)));
1046 NanoLogInternal::RuntimeLogger::finishAlloc(allocSize);
352 inline void
353 finishReservation(size_t nbytes) {
354 assert(nbytes < minFreeSpace);
355 assert(producerPos + nbytes <
356 storage + NanoLogConfig::STAGING_BUFFER_SIZE);
357
358 Fence::sfence(); // Ensures producer finishes writes before bump
359 minFreeSpace -= nbytes;
360 producerPos += nbytes;
361 }
这里注意:
使用内存屏障,保证写数据完成后,再更新偏移,防止读线程读到错误数据
为什么要使用内存屏障,因为consumerPos和producerPos这两个变量都不是原子类型,,操作上涉及到多线程,虽然每个变量都在自己的线程进行修改,但会涉及读操作,为了保证程序正确性,所以才使用
其他
template<class T, size_t N>
constexpr int getl(T (&fmt)[N]) {
return N;
}
这里即使不加constexpr也不会有影响
测试1:
把getl,与strlen、sizeof做了个性能比较
如:getl("fewafjoiewjfjfjefjefjewifjiejfiejfierjgjfijdiejiejf");
1000W循环结果:
0.018162(strlen=sizeof)
0.015995
会有少许提升,但时不时会突然暴增并达到0.018162(原因未知)
测试2:
如:char buf2[30];getl(buf2);
测试结果仍有少许提升
0.017336
0.018183(sizeof)
测试3:
对于constexpr 也做了两种测试
如果函数由constexpr 修饰,但包含运行内容会编译报错
int F5() {
return 2 + 2;
}
constexpr int F2() {
return 2 + 2;
}
int F3() {
constexpr int a = F2();
}
int F4() {
const int a = F5();
}
针对F3、与F4调用进行比较,一千万次循环
测试结果快了大约一倍
0.022771(constexpr)
0.045756
压缩缓冲区内容:
63 * *****************
64 * * Checkpoint *
65 * * ------------- *
66 * * Dictionary *
67 * * ------------- *
68 * * Buffer Extent *
69 * * (LogMsgs) *
70 * * --------------*
71 * * .... *
1.结构体DictionaryFragment,包含了元数据大小,类型 EntryType::LOG_MSGS_OR_DIC,里面还存储了CompressedLogInfo(行号,文件名,格式化串等),注意,当元数据增加时,会增量追加写入到压缩缓冲区
2.结构体BufferExtent,类型为EntryType::BUFFER_EXTENT,标识所属StagingBuffer,以及该StagingBuffer已经压缩的字节数
3.构造时最开始写入,结构体Checkpoint,类型为EntryType::CHECKPOINT,属性
309 struct Checkpoint {
310 // Byte representation of an EntryType::CHECKPOINT
311 uint64_t entryType:2;
312
313 // rdtsc() time that corresponds with the unixTime below
314 uint64_t rdtsc;
315
316 // std::time() that corresponds with the rdtsc() above
317 time_t unixTime;
318
319 // Conversion factor between cycles returned by rdtsc() and 1 second
320 double cyclesPerSecond;
321
322 // Number of bytes following this checkpoint that are used to encode
323 // metadata for new log messages.
324 uint32_t newMetadataBytes;
325
326 // Number of unique log messages that the runtime has provided
327 // metadata for thus far. This should represent the maximum formatId
328 // that can appear in the log until the next checkpoint. It is assumed
329 // that the reader/writer will number the log messages sequentially
330 // in the log file.
331 uint32_t totalMetadataEntries;
332
333 };
解析文件解析时间戳:
通过记录的tcs,和checkpoint.rdtsc计算差值再加上
通过checkpoint存储的时间戳ck->unixTime = std::time(nullptr);
1350 secondsSinceCheckpoint = PerfUtils::Cycles::toSeconds(
1351 nextLogTimestamp - checkpoint.rdtsc,
1352 checkpoint.cyclesPerSecond);
1353 int64_t wholeSeconds = static_cast<int64_t>(secondsSinceCheckpoint);
1354 nanos = 1.0e9 * (secondsSinceCheckpoint
1355 - static_cast<double>(wholeSeconds));
1356
1357 // If the timestamp occurred before the checkpoint, we may have to
1358 // adjust the times so that nanos remains positive.
1359 if (nanos < 0.0) {
1360 wholeSeconds--;
1361 nanos += 1.0e9;
1362 }
1363
1364 std::time_t absTime = wholeSeconds + checkpoint.unixTime;
1365 std::tm *tm = localtime(&absTime);
1366 strftime(timeString, sizeof(timeString), "%Y-%m-%d %H:%M:%S", tm);
总结
1.注释丰富、c++代码质量很高,值得学习
2.细节很多,背后的知识需要深入研究
3.整体看,做了很多优化
4.如:线程局部存储buffer,tsc使用,类似redis的数据压缩增大写磁盘吞吐
写入时只写参数类型和参数数据(省去了sprintf操作,即格式化字符串操作,写入内存数据可能更大)
为了减少队列生产者和消费者多线程造成队列头尾指针判断时的cache miss。生产者端会缓存消费者线程在环形队列中的尾的位置。只有当空间不足时才重新读取一次队尾位置。这样就不会每次写入都可能发生cache miss,而是只有写满一次环形队列时,才会发生一次cache miss
编写多线程低延迟系统时要合理规划结构体的内存布局,按线程对变量的读写情况规划变量定义顺序,尽可能按线程访问顺序定义结构体顺序,同一线程的放一起,不同线程间会改变的变量中间加空间进行分开,避免因false sharing导致cache miss
5.适用于高性能场景,具体还需压测后再看效果