NanoLog源码

NanoLog是一个高效、轻量级的日志库,源码中涵盖了内存屏障、线程安全、时间测量和日志压缩等功能。文章详细剖析了其关键组件,包括Fence.h、Util.h、TimeTrace.h、Log.h、Cycles.h和RuntimeLogger.h,涉及内存屏障、CPU绑定、日志格式化、压缩策略等。通过对NanoLog的使用、预分配、记录日志及最终调用的介绍,展示了其在多线程环境下的优化策略,如线程局部存储、循环缓冲区和异步写入。文章还讨论了性能测试,以及如何减少cache miss和提高日志写入效率。
摘要由CSDN通过智能技术生成

文件

Fence.h

封装x86/64系统架构内存屏障指令

关于内存屏障内容涉及较多,还需深入理解

Util.h

cpu线程绑定、编译器屏障、rdpmc

TimeTrace.h

线程安全类TimeTrace

通过环形buffer记录event,如下:

120     struct Event {
121       uint64_t timestamp;        // Time when a particular event occurred.
122       const char* format;        // Format string describing the event.
123                                  // NULL means that this entry is unused.
124       uint32_t arg0;             // Argument that may be referenced by format
125                                  // when printing out this event.
126       uint32_t arg1;             // Argument that may be referenced by format
127                                  // when printing out this event.  
128       uint32_t arg2;             // Argument that may be referenced by format
129                                  // when printing out this event.
130       uint32_t arg3;             // Argument that may be referenced by format
131                                  // when printing out this event.
132     };

记录实现:

125 void TimeTrace::Buffer::record(uint64_t timestamp, const char* format,
126         uint32_t arg0, uint32_t arg1, uint32_t arg2, uint32_t arg3)
127 {
128     if (activeReaders > 0) {
129         return;
130     }
131 
132     Event* event = &events[nextIndex];
133     nextIndex = (nextIndex + 1) & BUFFER_MASK;
134 
135     // There used to be code here for prefetching the next few events,
136     // in order to minimize cache misses on the array of events. However,
137     // performance measurements indicate that this actually slows things
138     // down by 2ns per invocation.
139     // prefetch(event+1, NUM_PREFETCH*sizeof(Event));
140 
141     event->timestamp = timestamp;
142     event->format = format;
143     event->arg0 = arg0;
144     event->arg1 = arg1;
145     event->arg2 = arg2;
146     event->arg3 = arg3;
147 }

注意:

访问原子成员变量,如果大于0表示读介入,禁止写操作,但如果写入event时,消费线程在读,那会不会造成读到的event不是最新的呢?

从代码上看,消费线程从activeReaders++到读取event有很多代码要走且读取前对format进行非空判断,理论上没问题,那会不会存在生成线程被抢占或中断等导致非最新呢?

std::atomic<int> activeReaders;

输出内容到文件、或stdout、或字符串实现:

TimeTrace::printInternal(std::vector<TimeTrace::Buffer*>* buffers, string* s)

1.首先,遍历buffers,buffers->at(i)->activeReaders.fetch_and(1);

2.遍历buffers再遍历event,输出事件,格式为,

  fprintf(output, "%8.1f ns (+%6.1f ns): %s", ns, ns - prevTime,message);,这里ns - prevTime (这个差值表示循环里上次调用snprintf+fprintf)

3.最后,遍历buffers,buffers->at(i)->activeReaders.fetch_and(-1);

Log.h

格式化参数类型

 88 enum ParamType : int32_t {
 89     // Indicates that there is a problem with the parameter
 90     INVALID = -6,
 91 
 92     // Indicates a dynamic width (i.e. the '*' in  %*.d)
 93     DYNAMIC_WIDTH = -5,
 94 
 95     // Indicates dynamic precision (i.e. the '*' in %.*d)
 96     DYNAMIC_PRECISION = -4,
 97 
 98     // Indicates that the parameter is not a string type (i.e. %d, %lf)
 99     NON_STRING = -3,
100 
101     // Indicates the parameter is a string and has a dynamic precision
102     // (i.e. '%.*s' )
103     STRING_WITH_DYNAMIC_PRECISION = -2,
104 
105     // Indicates a string with no precision specified (i.e. '%s' )
106     STRING_WITH_NO_PRECISION = -1,
107 
108     // All non-negative values indicate a string with a precision equal to its
109     // enum value casted as an int32_t
110     STRING = 0
111 };

举例说明:

 54     char randomString[] = "Hello World";
 55     NANO_LOG(NOTICE, "A string, pointer, number, and float: '%s', %p, %d, %f",
 56                         randomString,
 57                         &randomString,
 58                         512,
 59                         3.14159);
 60 

(const std::array<NanoLogInternal::ParamType, 4> &) @0x428890: {_M_elems = {
    NanoLogInternal::STRING_WITH_NO_PRECISION, NanoLogInternal::NON_STRING, 
    NanoLogInternal::NON_STRING, NanoLogInternal::NON_STRING}}


 61     // Even with width and length specifiers
 62     NANO_LOG(NOTICE, "Shortend String: '%5s' and shortend float %0.2lf",
 63                      randomString,
 64                      3.14159);


(const std::array<NanoLogInternal::ParamType, 2> &) @0x4288a0: {_M_elems = {
    NanoLogInternal::STRING_WITH_NO_PRECISION, NanoLogInternal::NON_STRING}}

压缩数据实现:

1.参数描述,消费位置、字节数、stagingBuffer id、stagingBuffer列表回环标记即回到第一个stagingBuffer、格式化参数类型、压缩条数

 307 Log::Encoder::encodeLogMsgs(char *from,
 308                             uint64_t nbytes,
 309                             uint32_t bufferId,
 310                             bool newPass,
 311                             std::vector<StaticLogInfo> dictionary,
 312                             uint64_t *numEventsCompressed)

2.标识该写入内容所属的stagingBuffer,创建BufferExtent,类型为EntryType::BUFFER_EXTENT 

 218     if (!encodeBufferExtentStart(bufferId, newPass))
 219         return 0;

 419 bool
 420 Log::Encoder::encodeBufferExtentStart(uint32_t bufferId, bool newPass)
 423     char *writePosStart = writePos;
 424     if (sizeof(BufferExtent) + sizeof(bufferId) >
 425             static_cast<size_t>(endOfBuffer - writePos))
 426         return false;
 427 
 428     BufferExtent *tc = reinterpret_cast<BufferExtent*>(writePos);
 429     writePos += sizeof(BufferExtent);
 430 
 431     tc->entryType = EntryType::BUFFER_EXTENT;
 432     tc->wrapAround = newPass;
 433 
 434     if (bufferId < (1<<4)) {
 435         tc->isShort = true;
 436         tc->threadIdOrPackNibble = 0x0F & bufferId;
 437     } else {
 438         tc->isShort = false;
 439         tc->threadIdOrPackNibble = 0x0F & BufferUtils::pack<uint32_t>(
 440                                                         &writePos, bufferId);
 441     }
 442 
 443     tc->length = downCast<uint32_t>(writePos - writePosStart);
 444     currentExtentSize = &(tc->length);
 445     lastBufferIdEncoded = bufferId;
 446 
 447     return true;

注意:

currentExtentSize,表示该 stagingBuffer已压缩的字节数

3.写入压缩数据,每条数据log头部是结构体UncompressedEntry,后面是格式化参数,头部占用16个字节

while (remaining > 0) {
 369         uint32_t maxCompressedSize = downCast<uint32_t>(2*entry->entrySize
 370                                               + sizeof(Log::UncompressedEntry));
 371         if (maxCompressedSize > (endOfBuffer - writePos))
 372             break;
 373 
 374         compressLogHeader(entry, &writePos, lastTimestamp);
 375         lastTimestamp = entry->timestamp;
 377         StaticLogInfo &info = dictionary.at(entry->fmtId);
 382         char *argData = entry->argData;
 383         info.compressionFunction(info.numNibbles, info.paramTypes,
 384                                         &argData, &writePos);
 385 
 386         remaining -= entry->entrySize;
 387         from += entry->entrySize;
 388 
 389         ++numEventsProcessed;
 390     }

4.更新已写入的StagingBuffer的长度,返回已处理的字节数和压缩的条数

 392     assert(currentExtentSize);
 393     uint32_t currentSize;
 394     std::memcpy(&currentSize, currentExtentSize, sizeof(uint32_t));
 395     currentSize += downCast<uint32_t>(writePos - bufferStart);
 396     std::memcpy(currentExtentSize, &currentSize, sizeof(uint32_t));
 397 
 398     if (numEventsCompressed)
 399         *numEventsCompressed += numEventsProcessed;
 400 
 401     return nbytes - remaining;

Cycles.h

基于rdtsc,封装了一套,时间相关操作

如:

1. sleep

2. 通过cycles获取耗时

3. 通过耗时获取cycles

测试了rdtsc和gettimeofday性能,结果是

//100W loop
0.000704
0.001773

//CPU info
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 158
model name      : Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
stepping        : 13
microcode       : 0xb8
cpu MHz         : 3599.780
cache size      : 16384 KB
physical id     : 0
siblings        : 16
core id         : 0
cpu cores       : 8
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch intel_pt ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear spec_ctrl intel_stibp flush_l1d arch_capabilities
bogomips        : 7200.00
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

gettimeofday较rdtsc高出1.5倍的样子

关于linux时钟涉及较多,还需深入理解

RuntimeLogger.h

类RuntimeLogger,会生成单例,static RuntimeLogger nanoLogSingleton

管理重要数据:

char *compressingBuffer;与char *outputDoubleBuffer;是个双bufferI(使用posix_memalign,通过512对齐进行分配,这里对齐对该块内存的优化需要深入研究)

static __thread StagingBuffer *stagingBuffer;

std::vector<StagingBuffer *> threadBuffers;

std::vector<StaticLogInfo> invocationSites;

消费线程:

RuntimeLogger::compressionThreadMain()

1.线程while循环前,初始化Encoder,填充结构体Checkpoint 到compressingBuffer,类型CHECKPOINT

Log::Encoder encoder(compressingBuffer, NanoLogConfig::OUTPUT_BUFFER_SIZE);


insertCheckpoint(&writePos, endOfBuffer, writeDictionary)

  47 Log::insertCheckpoint(char **out, char *outLimit, bool writeDictionary) {
  48     if (static_cast<uint64_t>(outLimit - *out) < sizeof(Checkpoint))
  49         return false;
  50         
  51     Checkpoint *ck = reinterpret_cast<Checkpoint*>(*out);
  52     *out += sizeof(Checkpoint);
  53 
  54     ck->entryType = Log::EntryType::CHECKPOINT;
  55     ck->rdtsc = PerfUtils::Cycles::rdtsc();
  56     ck->unixTime = std::time(nullptr);
  57     ck->cyclesPerSecond = PerfUtils::Cycles::getCyclesPerSec();
  58     ck->newMetadataBytes = ck->totalMetadataEntries = 0;

2. 遍历静态格式化字符串,对encoder进行写字典数据

396         {
397             std::unique_lock<std::mutex> lock(bufferMutex);
398             size_t i = lastStagingBufferChecked;
399 
400             // Output new dictionary entries, if necessary
401             if (nextInvocationIndexToBePersisted < invocationSites.size())
402             {
403                 std::unique_lock<std::mutex> lock (registrationMutex);
404                 encoder.encodeNewDictionaryEntries(
405                                                nextInvocationIndexToBePersisted,
406                                                invocationSites);
407 
408                 // update our shadow copy
409                 for (uint64_t i = shadowStaticInfo.size();
410                                     i < nextInvocationIndexToBePersisted; ++i)
411                 {
412                     shadowStaticInfo.push_back(invocationSites.at(i));
413                 }
414             }


注意:

这里使用shadowStaticInfo,作为影子再后面使用,这是防止容器在其他线程被修改而不用加锁的方法

具体实现 :

生成字典类型(LOG_MSGS_OR_DIC)数据DictionaryFragment ,然后添加静态格式化串到结构体CompressedLogInfo 

 140 uint32_t
 141 Log::Encoder::encodeNewDictionaryEntries(uint32_t& currentPosition,
 142                                         std::vector<StaticLogInfo> allMetadata)
 143 {
 144     char *bufferStart = writePos;
 145                                         
 146     if (sizeof(DictionaryFragment) >=
 147                                 static_cast<uint32_t>(endOfBuffer - writePos))
 148         return 0;
 149     
 150     DictionaryFragment *df = reinterpret_cast<DictionaryFragment*>(writePos);
 151     writePos += sizeof(DictionaryFragment);
 152     df->entryType = EntryType::LOG_MSGS_OR_DIC;
 153     
 154     while (currentPosition < allMetadata.size()) {
 155         StaticLogInfo &curr = allMetadata.at(currentPosition);
 156         size_t filenameLength = strlen(curr.filename) + 1;
 157         size_t formatLength = strlen(curr.formatString) + 1;
 158         size_t nextDictSize = sizeof(CompressedLogInfo)
 159                                     + filenameLength
 160                                     + formatLength;
 161         
 162         // Not enough space, break out!
 163         if (nextDictSize >= static_cast<uint32_t>(endOfBuffer - writePos))
 164             break;
 165         
 166         CompressedLogInfo *cli = reinterpret_cast<CompressedLogInfo*>(writePos);
 167         writePos += sizeof(CompressedLogInfo);
 168 
 169         cli->severity = curr.severity;
 170         cli->linenum = curr.lineNum;
 171         cli->filenameLength = static_cast<uint16_t>(filenameLength);
 172         cli->formatStringLength = static_cast<uint16_t>(formatLength);
 173 
 174         memcpy(writePos, curr.filename, filenameLength);
 175         memcpy(writePos + filenameLength, curr.formatString, formatLength);
 176         writePos += filenameLength + formatLength;
 177         ++currentPosition;
 178     }
 179 
 180     df->newMetadataBytes = 0x3FFFFFFF & static_cast<uint32_t>(
 181                                                         writePos - bufferStart);
 182     df->totalMetadataEntries = currentPosition;
 183     return df->newMetadataBytes;
 184 }

3.循环,从threadBuffers获取第一个StagingBuffer,调用peek获取可消费数据大小peekBytes ,然后通过数组stagingBufferPeekDist统计指标,写buffer每增长5%在相应槽位累加

418             while (!outputBufferFull && !threadBuffers.empty())
419             {
420                 uint64_t peekBytes = 0;
421                 StagingBuffer *sb = threadBuffers[i];
422                 char *peekPosition = sb->peek(&peekBytes);
425                 if (peekBytes > 0) {
426                     uint64_t start = PerfUtils::Cycles::rdtsc();
427                     lock.unlock();
429                     // Record metrics on the peek size
430                     size_t sizeOfDist = Util::arraySize(stagingBufferPeekDist);
431                     size_t distIndex = (sizeOfDist*peekBytes)/
432                                             NanoLogConfig::STAGING_BUFFER_SIZE;
433                     ++(stagingBufferPeekDist[distIndex]);

4.通过peekBytes计算最小压缩字节数,调用encodeLogMsgs压缩数据,如果返回0表示满了,结束循环while (!outputBufferFull && !threadBuffers.empty())

437                     uint32_t remaining = downCast<uint32_t>(peekBytes);
438                     while (remaining > 0) {
439                         long bytesToEncode = std::min(
440                                 NanoLogConfig::RELEASE_THRESHOLD,
441                                 remaining);

450                         long bytesRead = encoder.encodeLogMsgs(
451                                 peekPosition + (peekBytes - remaining),
452                                 bytesToEncode,
453                                 sb->getId(),
454                                 wrapAround,
455                                 shadowStaticInfo,
456                                 &logsProcessed);


460                         if (bytesRead == 0) {
461                             lastStagingBufferChecked = i;
462                             outputBufferFull = true;
463                             break;
464                         }

这里函数 encodeLogMsgs具体实现查看上文Log.h

5.没返回0的情况,更新消费偏移 

466                         wrapAround = false;
467                         remaining -= downCast<uint32_t>(bytesRead);
468                         sb->consume(bytesRead);
469                         totalBytesRead += bytesRead;
470                         bytesConsumedThisIteration += bytesRead;

这里注意

加入读屏障,保证对consumerPos 赋值时,consumerPos 是读取的最新值

373             inline void
374             consume(uint64_t nbytes) {
375                 Fence::lfence(); // Make sure consumer reads finish before bump
376                 consumerPos += nbytes;
377             }

6.第一个StagingBuffer消费完了,继续消费下一个,如果所有StagingBuffer都完成,结束循环while (!outputBufferFull && !threadBuffers.empty())

497                 i = (i + 1) % threadBuffers.size();
498                     
499                 if (i == 0)
500                     wrapAround = true;
501 
502                 // Completed a full pass through the buffers
503                 if (i == lastStagingBufferChecked)
504                     break;

7.获取已压缩的数据大小,填充aioCb,调用aio_write,设置hasOutstandingOperation=true,异步写文件,然后swap buffer,设置outputBufferFull=false

594         ssize_t bytesToWrite = encoder.getEncodedBytes();
595         if (bytesToWrite == 0)
596             continue;
597 
598         // Pad the output if necessary
599         if (NanoLogConfig::FILE_PARAMS & O_DIRECT) {
600             ssize_t bytesOver = bytesToWrite % 512;
601 
602             if (bytesOver != 0) {
603                 memset(compressingBuffer, 0, 512 - bytesOver);
604                 bytesToWrite = bytesToWrite + 512 - bytesOver;
605                 padBytesWritten += (512 - bytesOver);
606             }
607         }
608 
609         aioCb.aio_fildes = outputFd;
610         aioCb.aio_buf = compressingBuffer;
611         aioCb.aio_nbytes = bytesToWrite;
612         totalBytesWritten += bytesToWrite;
613 
614         cyclesAtLastAIOStart = PerfUtils::Cycles::rdtsc();
615         if (aio_write(&aioCb) == -1)
616             fprintf(stderr, "Error at aio_write(): %s\n", strerror(errno));
617 
618         hasOutstandingOperation = true;
619 
620         // Swap buffers
621         encoder.swapBuffer(outputDoubleBuffer,
622                            NanoLogConfig::OUTPUT_BUFFER_SIZE);
623         std::swap(outputDoubleBuffer, compressingBuffer);
624         outputBufferFull = false;

以上就是消费数据,然后写入文件的过程

该线程还有几个细节要处理

细节1:调用sync函数后,条件变量workAdded唤醒消费线程,再通过条件变量hintSyncCompleted等待,并设置syncStatus = SYNC_REQUESTED,直到消费线程调用 hintSyncCompleted.notify_one()

所以该处有两个用处:

一个压缩缓冲区为空,不需写文件,休眠一定时间或被函数sync唤醒

另一个是唤醒sync

510         // If there's no data to output, go to sleep.
511         if (encoder.getEncodedBytes() == 0) {
512             std::unique_lock<std::mutex> lock(condMutex);
513                 
514             // If a sync was requested, we should make at least 1 more
515             // pass to make sure we got everything up to the sync point.
516             if (syncStatus == SYNC_REQUESTED) {
517                 syncStatus = PERFORMING_SECOND_PASS;
518                 continue;
519             }
520 
521             if (syncStatus == PERFORMING_SECOND_PASS) {
522                 syncStatus = (hasOutstandingOperation) ? WAITING_ON_AIO
523                                                       : SYNC_COMPLETED;
524             }   
525                 
526             if (syncStatus == SYNC_COMPLETED) {
527                 hintSyncCompleted.notify_one();
528             }
529                 
530             cyclesActive += PerfUtils::Cycles::rdtsc() - cyclesAwakeStart;
531             workAdded.wait_for(lock, std::chrono::microseconds(
532                     NanoLogConfig::POLL_INTERVAL_NO_WORK_US));
533             
534             cyclesAwakeStart = PerfUtils::Cycles::rdtsc();
535         } 

细节2:hasOutstandingOperation标识异步写文件

如果处理中,如果log缓冲区满了,一直等待aio完成

                     如果此次写入压缩数据为0,休眠

否则,aio完成,切换状态为syncStatus = SYNC_COMPLETED;,并唤醒hintSyncCompleted.notify_one();

537         if (hasOutstandingOperation) {
538             if (aio_error(&aioCb) == EINPROGRESS) {
539                 const struct aiocb *const aiocb_list[] = {&aioCb};
540                 if (outputBufferFull) {
541                     // If the output buffer is full and we're not done,
542                     // wait for completion
543                     cyclesActive += PerfUtils::Cycles::rdtsc() - cyclesAwakeStart;
544                     int err = aio_suspend(aiocb_list, 1, NULL);
545                     cyclesAwakeStart = PerfUtils::Cycles::rdtsc();
546                     if (err != 0)
547                         perror("LogCompressor's Posix AIO "
548                                        "suspend operation failed");
549                 } else {        
550                     // If there's no new data, go to sleep.
551                     if (bytesConsumedThisIteration == 0 &&
552                         NanoLogConfig::POLL_INTERVAL_DURING_IO_US > 0)
553                     {
554                         std::unique_lock<std::mutex> lock(condMutex);
555                         cyclesActive += PerfUtils::Cycles::rdtsc() -
556                                        cyclesAwakeStart;
557                         workAdded.wait_for(lock, std::chrono::microseconds(
558                                 NanoLogConfig::POLL_INTERVAL_DURING_IO_US));
559                         cyclesAwakeStart = PerfUtils::Cycles::rdtsc();
560                     }
561 
562                     if (aio_error(&aioCb) == EINPROGRESS)
563                         continue;
564                 }       
565             }
566 
567             // Finishing up the IO
568             int err = aio_error(&aioCb);
569             ssize_t ret = aio_return(&aioCb);
570 
571             if (err != 0) {
572                 fprintf(stderr, "LogCompressor's POSIX AIO failed"
573                         " with %d: %s\r\n", err, strerror(err));
574             } else if (ret < 0) {
575                 perror("LogCompressor's Posix AIO Write failed");
576             }
577             ++numAioWritesCompleted;
578             hasOutstandingOperation = false;
579             cyclesDiskIO_upperBound += (start - cyclesAtLastAIOStart);
580 
581             // We've completed an AIO, check if we need to notify
582             if (syncStatus == WAITING_ON_AIO) {
583                 std::unique_lock<std::mutex> lock(nanoLogSingleton.condMutex);
584                 if (syncStatus == WAITING_ON_AIO) {
585                     syncStatus = SYNC_COMPLETED;
586                     hintSyncCompleted.notify_one();
587                 }
588             }
589         }

使用

预分配:

NanoLog::preallocate();

最终实现

172         inline void
173         ensureStagingBufferAllocated() {
174             if (stagingBuffer == nullptr) {
175                 std::unique_lock<std::mutex> guard(bufferMutex);
176                 uint32_t bufferId = nextBufferId++;
177 
178                 // Unlocked for the expensive StagingBuffer allocation
179                 guard.unlock();
180                 stagingBuffer = new StagingBuffer(bufferId);
181                 guard.lock();
182 
183                 threadBuffers.push_back(stagingBuffer);
184             }
185         }

主要分配线程存储StagingBuffer,然后加入到vector

这个锁粒度很巧妙

函数preallocate只分配了当前线程,因此,最好对每个线程处理函数都调用一次

记录日志:

注意:写空间不足会阻塞,对于高可用服务要注意,需要修改reserveSpaceInternal的参数

NanoLogCpp17.h

1073 #define NANO_LOG(severity, format, ...) do { \
1074     constexpr int numNibbles = NanoLogInternal::getNumNibblesNeeded(format); \
1075     constexpr int nParams = NanoLogInternal::countFmtParams(format); \
1076     \
1077     /*** Very Important*** These must be 'static' so that we can save pointers
1078      * to these variables and have them persist beyond the invocation.
1079      * The static logId is used to forever associate this local scope (tied
1080      * to an expansion of #NANO_LOG) with an id and the paramTypes array is
1081      * used by the compression function, which is invoked in another thread
1082      * at a much later time. */ \
1083     static constexpr std::array<NanoLogInternal::ParamType, nParams> paramTypes = \
1084                                 NanoLogInternal::analyzeFormatString<nParams>(format); \
1085     static int logId = NanoLogInternal::UNASSIGNED_LOGID; \
1086     \
1087     if (NanoLog::severity > NanoLog::getLogLevel()) \
1088         break; \
1089     \
1090     /* Triggers the GNU printf checker by passing it into a no-op function.
1091      * Trick: This call is surrounded by an if false so that the VA_ARGS don't
1092      * evaluate for cases like '++i'.*/ \
1093     if (false) { NanoLogInternal::checkFormat(format, ##__VA_ARGS__); } /*NOLINT(cppcoregui
     delines-pro-type-vararg, hicpp-vararg)*/\
1094     \
1095     NanoLogInternal::log(logId, __FILE__, __LINE__, NanoLog::severity, format, \
1096                             numNibbles, paramTypes, ##__VA_ARGS__); \
1097 } while(0)

这里循环时一直调用函数进行判断,初始看以为对性能有影响,后发现使用时有标识constexpr ,编译期运算,解析格式化字符串中非字符串或精度相关参数个数

 330 template<size_t N>
 331 constexpr int
 332 getNumNibblesNeeded(const char (&fmt)[N])
 333 {
 334     int numNibbles = 0;
 335     for (int i = 0; i < countFmtParams(fmt); ++i) {
 336         ParamType t = getParamInfo(fmt, i);
 337         if (t == NON_STRING || t == DYNAMIC_PRECISION || t == DYNAMIC_WIDTH)
 338             ++numNibbles;
 339     }
 340 
 341     return numNibbles;
 342 }

函数countFmtParams,获取格式化参数 个数

这里注意:

paramTypes ,logId 都是静态变量,很重要,都只会初始化一次

paramTypes保存的数据具体可以看上文Log.h示例

最终调用:

 994 template<long unsigned int N, int M, typename... Ts>
 995 inline void
 996 log(int &logId,
 997     const char *filename,
 998     const int linenum,
 999     const LogLevel severity,
1000     const char (&format)[M],
1001     const int numNibbles,
1002     const std::array<ParamType, N>& paramTypes,
1003     Ts... args)

 1.把参数复制到info,获取logId,因logId是静态变量,因此只会生成一次,然后添加到vector,只会添加一次

1008     if (logId == UNASSIGNED_LOGID) {
1009         const ParamType *array = paramTypes.data();
1010         StaticLogInfo info(&compress<Ts...>,
1011                         filename,
1012                         linenum,
1013                         severity,
1014                         format,
1015                         sizeof...(Ts),
1016                         numNibbles,
1017                         array);
1018 
1019         RuntimeLogger::registerInvocationSite(info, logId);
1020     } 


 51         inline void
 52         registerInvocationSite_internal(int &logId, StaticLogInfo info) {
 53             // TODO(syang0) Make this into a spin lock
 54             std::lock_guard<std::mutex>
 55                                     lock(nanoLogSingleton.registrationMutex);
 56 
 57             if (logId != UNASSIGNED_LOGID)
 58                 return;
 59 
 60             logId = static_cast<int32_t>(invocationSites.size());
 61             invocationSites.push_back(info);
}

这里注意:

paramTypes变量具体内容是什么,可看上面Log.h

2.调用getArgSizes,获取需要分配的空间大小


1024     size_t stringSizes[N + 1] = {}; //HACK: Zero length arrays are not allowed
1025     size_t allocSize = getArgSizes(paramTypes, previousPrecision,
1026                             stringSizes, args...) + sizeof(UncompressedEntry);
1027 
1028     char *writePos = NanoLogInternal::RuntimeLogger::reserveAlloc(allocSize);

举例:

 54     char randomString[] = "Hello World";
 62     NANO_LOG(NOTICE, "Shortend String: '%5s' and shortend float %0.2lf",
 63                      randomString,
 64                      3.14159);

(gdb) p stringSizes
$4 = {11, 0, 0}
(gdb) p allocSize
$5 = 39

这里看到stringSizes数组存储字符串参数长度,数组大小等于参数个数+1,目前只存储randomString

allocSize大小39,是,11+4+8(lf)+sizeof(UncompressedEntry)(16)

11+4的由来,通过函数getArgSize获取stringBytes + sizeof(uint32_t),用于向buffer写入数据时,先写入字符串长度,再写入字符串,具体可用看函数store_argument实现

3.调用reserveAlloc获取可以写入数据的位置

104         static inline char *
105         reserveAlloc(size_t nbytes) {
106             if (stagingBuffer == nullptr)
107                 nanoLogSingleton.ensureStagingBufferAllocated();
108 
109             // NOLINTNEXTLINE(clang-analyzer-core.CallAndMessage)
110             return stagingBuffer->reserveProducerSpace(nbytes);
111         }


333             inline char *
334             reserveProducerSpace(size_t nbytes) {
335                 ++numAllocations;
336 
337                 // Fast in-line path
338                 if (nbytes < minFreeSpace)
339                     return producerPos;
340 
341                 // Slow allocation
342                 return reserveSpaceInternal(nbytes);
343             }

这里注意: 

reserveSpaceInternal的实现,目的是针对环形buffer,循环使用问题,更新producerPos、minFreeSpace,并且这里会while阻塞直到有可用空间为止

4.写入数据

1031     UncompressedEntry *ue = new(writePos) UncompressedEntry();
1032     writePos += sizeof(UncompressedEntry);
1033 
1034     store_arguments(paramTypes, stringSizes, &writePos, args...);
1035 
1036     ue->fmtId = logId;
1037     ue->timestamp = timestamp;
1038     ue->entrySize = downCast<uint32_t>(allocSize);
1039 
1040 #ifdef ENABLE_DEBUG_PRINTING
1041     printf("\r\nRecording %d:'%s' of size %u\r\n",
1042                         logId, info.formatString, ue->entrySize);
1043 #endif
1044 

注意:

先写入结构体UncompressedEntry,再写入数据 ,allocSize是格式化中参数所需的字节数

5.计算使用字节数,更新写入缓冲区偏移

1045     assert(allocSize == downCast<uint32_t>((writePos - originalWritePos)));
1046     NanoLogInternal::RuntimeLogger::finishAlloc(allocSize);

352             inline void
353             finishReservation(size_t nbytes) {
354                 assert(nbytes < minFreeSpace);
355                 assert(producerPos + nbytes <
356                        storage + NanoLogConfig::STAGING_BUFFER_SIZE);
357 
358                 Fence::sfence(); // Ensures producer finishes writes before bump
359                 minFreeSpace -= nbytes;
360                 producerPos += nbytes;
361             }

这里注意:

使用内存屏障,保证写数据完成后,再更新偏移,防止读线程读到错误数据

为什么要使用内存屏障,因为consumerPos和producerPos这两个变量都不是原子类型,,操作上涉及到多线程,虽然每个变量都在自己的线程进行修改,但会涉及读操作,为了保证程序正确性,所以才使用


 

其他

template<class T, size_t N>
constexpr int getl(T (&fmt)[N]) {
    return N;
}

这里即使不加constexpr也不会有影响

测试1:

把getl,与strlen、sizeof做了个性能比较

如:getl("fewafjoiewjfjfjefjefjewifjiejfiejfierjgjfijdiejiejf");

1000W循环结果:

0.018162(strlen=sizeof)

0.015995

会有少许提升,但时不时会突然暴增并达到0.018162(原因未知)

测试2:

如:char buf2[30];getl(buf2);

测试结果仍有少许提升

0.017336
0.018183(sizeof)

测试3:

对于constexpr 也做了两种测试

如果函数由constexpr 修饰,但包含运行内容会编译报错

int F5() {
    return 2 + 2;
}

constexpr int F2() {
    return 2 + 2;
}

int F3() {
    constexpr int a = F2();
}
int F4() {
    const int a = F5();
}

针对F3、与F4调用进行比较,一千万次循环

测试结果快了大约一倍

0.022771(constexpr)

0.045756

压缩缓冲区内容:

 63  * *****************
 64  * *  Checkpoint   *
 65  * * ------------- * 
 66  * *  Dictionary   *
 67  * * ------------- *
 68  * * Buffer Extent *
 69  * *   (LogMsgs)   *
 70  * * --------------*
 71  * *     ....      *

1.结构体DictionaryFragment,包含了元数据大小,类型 EntryType::LOG_MSGS_OR_DIC,里面还存储了CompressedLogInfo(行号,文件名,格式化串等),注意,当元数据增加时,会增量追加写入到压缩缓冲区

2.结构体BufferExtent,类型为EntryType::BUFFER_EXTENT,标识所属StagingBuffer,以及该StagingBuffer已经压缩的字节数

3.构造时最开始写入,结构体Checkpoint,类型为EntryType::CHECKPOINT,属性

309     struct Checkpoint {
310         // Byte representation of an EntryType::CHECKPOINT
311         uint64_t entryType:2;
312 
313         // rdtsc() time that corresponds with the unixTime below
314         uint64_t rdtsc;
315 
316         // std::time() that corresponds with the rdtsc() above
317         time_t unixTime;
318     
319         // Conversion factor between cycles returned by rdtsc() and 1 second
320         double cyclesPerSecond;
321         
322         // Number of bytes following this checkpoint that are used to encode
323         // metadata for new log messages.
324         uint32_t newMetadataBytes;
325 
326         // Number of unique log messages that the runtime has provided
327         // metadata for thus far. This should represent the maximum formatId
328         // that can appear in the log until the next checkpoint. It is assumed
329         // that the reader/writer will number the log messages sequentially
330         // in the log file.
331         uint32_t totalMetadataEntries;
332 
333     };

解析文件解析时间戳:

通过记录的tcs,和checkpoint.rdtsc计算差值再加上

通过checkpoint存储的时间戳ck->unixTime = std::time(nullptr);

1350         secondsSinceCheckpoint = PerfUtils::Cycles::toSeconds(
1351                                             nextLogTimestamp - checkpoint.rdtsc,
1352                                             checkpoint.cyclesPerSecond);
1353         int64_t wholeSeconds = static_cast<int64_t>(secondsSinceCheckpoint);
1354         nanos = 1.0e9 * (secondsSinceCheckpoint
1355                                 - static_cast<double>(wholeSeconds));
1356 
1357         // If the timestamp occurred before the checkpoint, we may have to
1358         // adjust the times so that nanos remains positive.
1359         if (nanos < 0.0) {
1360             wholeSeconds--;
1361             nanos += 1.0e9;
1362         }
1363 
1364         std::time_t absTime = wholeSeconds + checkpoint.unixTime;
1365         std::tm *tm = localtime(&absTime);
1366         strftime(timeString, sizeof(timeString), "%Y-%m-%d %H:%M:%S", tm);

总结

1.注释丰富、c++代码质量很高,值得学习

2.细节很多,背后的知识需要深入研究

3.整体看,做了很多优化

4.如:线程局部存储buffer,tsc使用,类似redis的数据压缩增大写磁盘吞吐

写入时只写参数类型和参数数据(省去了sprintf操作,即格式化字符串操作,写入内存数据可能更大)

为了减少队列生产者和消费者多线程造成队列头尾指针判断时的cache miss。生产者端会缓存消费者线程在环形队列中的尾的位置。只有当空间不足时才重新读取一次队尾位置。这样就不会每次写入都可能发生cache miss,而是只有写满一次环形队列时,才会发生一次cache miss

编写多线程低延迟系统时要合理规划结构体的内存布局,按线程对变量的读写情况规划变量定义顺序,尽可能按线程访问顺序定义结构体顺序,同一线程的放一起,不同线程间会改变的变量中间加空间进行分开,避免因false sharing导致cache miss

5.适用于高性能场景,具体还需压测后再看效果

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

MyObject-C

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值