健壮的C ++:初始化和重新启动

介绍 (Introduction)

In many C++ programs, the main function #includes the world and utterly lacks structure. This article describes how to initialize a system in a structured manner. It then discusses how to evolve the design to support recovery from serious errors (usually corrupted memory) by quickly reinitializing a subset of the system instead of having to reboot its executable.

在许多C ++程序中, main功能#include是世界,却完全缺乏结构。 本文介绍如何以结构化方式初始化系统。 然后讨论如何通过快速重新初始化系统子集而不需要重新启动其可执行文件来改进设计,以支持从严​​重错误(通常是损坏的内存)中进行恢复。

使用代码 (Using the Code)

The code in this article is taken from the Robust Services Core (RSC), a large repository that provides a framework for developing robust C++ applications. RSC's software is organized into static libraries, each in its own namespace. Much of the code excerpted in this article comes from the namespace NodeBase in the nb directory. NodeBase contains about 50K lines of code that provide base classes for things such as:

本文中的代码摘自“ 健壮的服务核心” (RSC),后者是一个大型存储库,为开发健壮的C ++应用程序提供了框架。 RSC的软件被组织成静态库,每个库都有自己的命名空间。 本文摘录的许多代码来自nb目录中的名称空间NodeBaseNodeBase包含约5万行代码,这些代码提供了诸如以下内容的基类:

Although RSC is targeted at Windows, it has an abstraction layer that should allow it to be ported to other platforms with modest effort. The Windows targets (in *.win.cpp files) currently comprise about 3K lines of code.

尽管RSC的目标是Windows,但它具有一个抽象层,该层应允许将其轻而易举地移植到其他平台。 Windows目标(在* .win.cpp文件中)当前包含约3K行代码。

If you don't want to use RSC, you can copy and modify its source code to meet your needs, subject to the terms of its GPL-3.0 license.


RSC contains many details that are not relevant to this article, so the code that we look at will be excerpted from the relevant classes and functions, but with irrelevant details removed. Many of these details are nonetheless important and need to be considered if your approach is to copy and modify RSC software.

RSC包含许多与本文无关的细节,因此我们看的代码将从相关的类和函数中摘录,但删除了无关的细节。 但是,其中许多细节都很重要,如果您要复制和修改RSC软件,则需要考虑这些细节

In most cases, RSC defines each class in a .h of the same name and implements it in a .cpp of the same name. You should therefore be able to easily find the full version of each class.

在大多数情况下,RSC在相同名称的.h中定义每个类,并在相同名称的.cpp中实现它。 因此,您应该能够轻松找到每个类的完整版本。

初始化系统 (Initializing the System)

We'll start by looking at how RSC initializes when the system boots up.


模组 (Module)

Each Module subclass represents a set of interrelated source code files that provides some logical capability. Each of these subclasses is responsible for:

每个Module子类代表一组相互关联的源代码文件,这些文件提供了某些逻辑功能。 这些子类均负责:

  • specifying the other modules on which it depends

  • initializing the set of source code files that it represents when the executable is launched


Each Module subclass currently corresponds 1-to-1 with a static library. This has worked well and is therefore unlikely to change. Dependencies between static libraries must be defined before building an executable, so it's easy to apply the same dependencies among modules. And since no static library is very large, each module can easily initialize the static library to which it belongs.

当前,每个Module子类都与一个静态库一对一对应。 这种方法效果很好,因此不太可能改变。 静态库之间的依赖关系必须在构建可执行文件之前定义,因此很容易在模块之间应用相同的依赖关系。 而且,由于没有静态库太大,因此每个模块都可以轻松地初始化其所属的静态库。

A module specifies its dependencies in its constructor and initializes its static library in its Startup function. Here is the outline of a typical module:

模块在其构造函数中指定其依赖项,并在其Startup函数中初始化其静态库。 这是典型模块的概述:

class SomeModule : public Module
   friend class Singleton< SomeModule >;
   SomeModule() : Module()
      //  Modules 1 to N are the ones on which this module depends.
      //  Creating their singletons ensures that they will exist in
      //  the module registry when the system initializes. Because
      //  each module creates the modules on which it depends before
      //  it adds itself to the registry, the registry will contain
      //  modules in the (partial) order of their dependencies.
      Singleton< Module1 >::Instance();
      //  ...
      Singleton< ModuleN >::Instance();
      Singleton< ModuleRegistry >::Instance()->BindModule(*this);

   ~SomeModule() = default;
   void Startup() override;  // details are specific to each module

If each module's constructor instantiates the modules on which it depends, how are leaf modules created? The answer is that main creates them. The code for main will appear soon.

如果每个模块的构造函数实例化它所依赖的模块,那么叶模块如何创建? 答案是main创建它们。 main的代码将很快出现。

模块注册 (ModuleRegistry)

The singleton ModuleRegistry appeared in the last line of the above constructor. It contains all of the system's modules, sorted by their dependencies (a partial ordering). ModuleRegistry also has a Startup function that initializes the system by invoking Startup on each module.

单例ModuleRegistry出现在上述构造函数的最后一行。 它包含系统的所有模块,并按其依赖性排序(部分排序)。 ModuleRegistry还具有Startup功能,该功能通过在每个模块上调用Startup来初始化系统。

线程,RootThread和InitThread (Thread, RootThread, and InitThread)

In RSC, each thread derives from the base class Thread, which encapsulates a native thread and provides a variety of functions related to things like exception handling, scheduling, and inter-thread communication.

在RSC中,每个线程都派生自基类Thread ,该基类封装了本机线程,并提供了与诸如异常处理,调度和线程间通信之类的事情相关的各种功能。

The first thread that RSC creates is RootThread, which wraps the native thread that the C++ run-time system created to run main. RootThread simply brings the system up to the point where it can create the next thread. That thread, InitThread, is responsible for initializing most of the system. Once initialization is complete, InitThread acts as a watchdog to ensure that threads are being scheduled, and RootThread acts as a watchdog to ensure that InitThread is running.

RSC创建的第一个线程是RootThread ,它包装了C ++运行时系统创建的用于运行main的本机线程。 RootThread只是使系统达到可以创建下一个线程的地步。 该线程InitThread负责初始化大多数系统。 初始化完成后, InitThread充当看门狗以确保正在调度线程,而RootThread充当看门狗以确保InitThread正在运行。

主要() (main())

After it echoes and saves any command line arguments, main simply instantiates leaf modules. RSC currently has 15 static libraries and, therefore, 15 modules. Modules that are instantiated transitively are commented out:

它回显并保存任何命令行参数后, main只需实例化叶子模块。 RSC当前有15个静态库,因此有15个模块。 可传递实例化的模块被注释掉:

main_t main(int argc, char* argv[])
   //  Echo and save the arguments.  MainArgs is a simple class
   //  that saves and provides access to the arguments.
   std::cout << "ENTERING main(int argc, char* argv[])" << CRLF;
   std::cout << "  argc: " << argc << CRLF;

   for(auto i = 0; i < argc; ++i)
      string arg(argv[i]);
      std::cout << "  argv[" << i << "]: " << arg << CRLF;

   std::cout << std::flush;

   //  Instantiate the desired modules.
// Singleton< NbModule >::Instance();
// Singleton< NtModule >::Instance();
   Singleton< CtModule >::Instance();
// Singleton< NwModule >::Instance();
// Singleton< SbModule >::Instance();
// Singleton< StModule >::Instance();
// Singleton< MbModule >::Instance();
// Singleton< CbModule >::Instance();
// Singleton< PbModule >::Instance();
   Singleton< OnModule >::Instance();
   Singleton< CnModule >::Instance();
   Singleton< RnModule >::Instance();
   Singleton< SnModule >::Instance();
   Singleton< AnModule >::Instance();
// Singleton< DipModule >::Instance();  // usually omitted

   return RootThread::Main();

Once the system has initialized, entering the >modules command on the CLI displays the following, which is the order in which the modules were invoked to initialize their static libraries:


  this : 003B0660
  // stuff deleted
  modules [ModuleId]
    size     : 14
    // stuff deleted
    registry : 003B06A0
      [1]: 003B0640 NodeBase.NbModule
      [2]: 003B0E88 NodeTools.NtModule
      [3]: 003B0620 CodeTools.CtModule
      [4]: 003B0F08 NetworkBase.NwModule
      [5]: 003B0EE8 SessionBase.SbModule
      [6]: 003B0EC8 ControlNode.CnModule
      [7]: 003B0F68 SessionTools.StModule
      [8]: 003B0F88 MediaBase.MbModule
      [9]: 003B0F48 CallBase.CbModule
      [10]: 003B0F28 PotsBase.PbModule
      [11]: 003B0EA8 OperationsNode.OnModule
      [12]: 003B0FA8 RoutingNode.RnModule
      [13]: 003B0FC8 ServiceNode.SnModule
      [14]: 003B0FF0 AccessNode.AnModule

If an application built on RSC does not require a particular static library, the instantiation of its module can be commented out, and the linker will exclude all of that library's code from the executable.


main is the only code implemented outside a static library. It resides in the rsc directory, whose only source code file is main.cpp. All other software, whether part of the framework or an application, resides in a static library.

main是在静态库之外实现的唯一代码。 它位于rsc目录中,其唯一的源代码文件是main.cpp 。 所有其他软件(无论是框架的一部分还是应用程序的一部分)都驻留在静态库中。

RootThread :: Main (RootThread::Main)

The last thing that main did was invoke RootThread::Main, which is a static function because RootThread has not yet been instantiated. Its job is to create the things that are needed to actually instantiate RootThread:

main做的最后一件事是调用RootThread::Main ,这是一个static函数,因为RootThread尚未实例化。 它的工作是创建实际实例化RootThread所需的东西:

main_t RootThread::Main()
   //  This loop is hypothetical because our Enter function (invoked
   //  through Thread::EnterThread and Thread::Start) never returns.
      //  Load symbol information.

      //  Create the POSIX signals.  They are needed now so that
      //  RootThread can register for signals when it is wrapped.

      //  Create the log buffer, which is used to log the progress
      //  of initialization.
      Singleton< LogBufferRegistry >::Instance();

      //  Wrap the root thread and enter it.
      auto root = Singleton< RootThread >::Instance();

Invoking Thread::EnterThread leads to the invocation of RootThread::Enter, which implements RootThread's thread loop. RootThread::Enter creates InitThread, whose first task is to finish initializing the system. RootThread then goes to sleep, running a watchdog timer that is cancelled when InitThread interrupts RootThread to tell it that the system has been initialized. If the timer expires, the system failed to initialize: it is embarrassingly dead on arrival, so RootThread exits.

调用Thread::EnterThread导致对RootThread::Enter的调用,该调用实现了RootThread的线程循环。 RootThread::Enter创建InitThread ,其首要任务是完成系统初始化。 然后RootThread进入睡眠状态,运行一个看门狗计时器,当InitThread中断RootThread告知系统已初始化时,该计时器将被取消。 如果计时器到期,则系统初始化失败:到达时令人尴尬地死亡,因此RootThread退出。

ModuleRegistry ::启动 (ModuleRegistry::Startup)

To finish initializing the system, InitThread invokes ModuleRegistry::Startup. This function invokes each module's Startup function. It also records how long it took to initialize each module, code that has been deleted for clarity:

为了完成系统的初始化, InitThread调用ModuleRegistry::Startup 。 该函数调用每个模块的Startup函数。 它还记录初始化每个模块花费了多长时间,为清楚起见已删除了代码:

void ModuleRegistry::Startup()
   for(auto m = modules_.First(); m != nullptr; modules_.Next(m))

Once this function is finished, something very similar to this will have appeared on the console:


Image 1

一个模块::启动功能 (A Module::Startup Function)

Module Startup functions aren't particularly interesting. One of RSC's design principles is that objects needed to process user requests should be created during system initialization, so as to provide predictable latency once the system is in service. Here is the Startup code for NbModule, which initializes the namespace NodeBase:

模块Startup功能并不是特别有趣。 RSC的设计原则之一是,应在系统初始化期间创建处理用户请求所需的对象,以便在系统投入使用后提供可预测的延迟。 这是NbModuleStartup代码,它将初始化命名空间NodeBase

void NbModule::Startup()
   //  Create/start singletons.  Some of these already exist as a
   //  result of creating RootThread, but their Startup functions
   //  must be invoked.
   Singleton< PosixSignalRegistry >::Instance()->Startup();
   Singleton< LogBufferRegistry >::Instance()->Startup();
   Singleton< StatisticsRegistry >::Instance()->Startup();
   Singleton< AlarmRegistry >::Instance()->Startup();
   Singleton< LogGroupRegistry >::Instance()->Startup();
   Singleton< CfgParmRegistry >::Instance()->Startup();
   Singleton< DaemonRegistry >::Instance()->Startup();
   Singleton< ObjectPoolRegistry >::Instance()->Startup();
   Singleton< ThreadRegistry >::Instance()->Startup();
   Singleton< ThreadAdmin >::Instance()->Startup();
   Singleton< MsgBufferPool >::Instance()->Startup();
   Singleton< ClassRegistry >::Instance()->Startup();
   Singleton< Element >::Instance()->Startup();
   Singleton< CliRegistry >::Instance()->Startup();
   Singleton< SymbolRegistry >::Instance()->Startup();
   Singleton< NbIncrement >::Instance()->Startup();

   //  Create/start threads.
   Singleton< FileThread >::Instance()->Startup();
   Singleton< CoutThread >::Instance()->Startup();
   Singleton< CinThread >::Instance()->Startup();
   Singleton< ObjectPoolAudit >::Instance()->Startup();
   Singleton< StatisticsThread >::Instance()->Startup();
   Singleton< LogThread >::Instance()->Startup();
   Singleton< CliThread >::Instance()->Startup();

重新启动系统 (Restarting the System)

So far, we have an initialization framework with the following characteristics:


  • a structured and layered approach to initialization

  • a simple main that only needs to create leaf modules

    一个简单的main ,只需要创建叶子模块

  • ease of excluding a static library from the build by not instantiating the module that initializes it


We will now enhance this framework so that we can reinitialize the system to recover from serious errors. Robust C++ : Safety Net describes how to do this for an individual thread. But sometimes a system gets into a state where the types of errors described in that article recur. In such a situation, more drastic action is required. Quite often, some data has been corrupted, and fixing it will restore the system to health. A partial reinitialization of the system, short of a complete reboot, can often do exactly that.

现在,我们将增强此框架,以便我们可以重新初始化系统以从严重错误中恢复。 健壮的C ++:Safety Net介绍了如何针对单个线程执行此操作。 但是有时系统会陷入该文章中描述的错误类型再次发生的状态。 在这种情况下,需要采取更严厉的行动。 通常,某些数据已损坏,对其进行修复将使系统恢复健康。 在没有完全重新引导的情况下,对系统进行部分重新初始化通常可以做到这一点。

If we can initialize the system in a layered manner, we should also be able to shut it down in a layered manner. We can define Shutdown functions to complement the Startup functions that we've already seen. However, we only want to perform a partial shutdown, followed by a partial startup to recreate the things that the shutdown phase destroyed. If we can do that, we will have achieved a partial reinitialization.

如果我们可以分层的方式初始化系统,那么我们也应该能够以分层的方式将其关闭。 我们可以定义Shutdown功能来补充我们已经看到的Startup功能。 但是,我们只想执行部分关闭,然后执行部分启动来重新创建关闭阶段破坏的内容。 如果能够做到这一点,我们将实现部分重新初始化。

But what, exactly, should we destroy and recreate? Some things are easily recreated. Other things will take much longer, during which time the system will be unavailable. It is therefore best to use a flexible strategy. If the system is in trouble, start by reinitializing what can be recreated quickly. If that doesn't fix the problem, broaden the scope of what gets reinitialized, and so on. Eventually, we'll have to give up and reboot.

但是,究竟我们应该销毁和重建什么呢? 有些事情很容易重新创建。 其他事情将花费更长的时间,在此期间系统将不可用。 因此,最好使用灵活的策略。 如果系统出现问题,请重新初始化可以快速重新创建的内容。 如果那不能解决问题,请扩大重新初始化内容的范围,依此类推。 最终,我们将不得不放弃并重新启动。

Our restart (reinitialization) strategy therefore escalates. RSC supports three levels of restart whose scopes are less than a full reboot. When the system gets into trouble, it tries to recover by initiating the restart with the narrowest scope. But if it soon gets into trouble again, it increases the scope of the next restart:

因此,我们的重新启动(重新初始化)策略会升级。 RSC支持三个级别的重启,其作用域小于完全重启。 当系统出现问题时,它将尝试通过范围最窄的重新启动来恢复。 但是,如果很快又遇到麻烦,则会扩大下次重启的范围:

  • A warm restart destroys temporary data and also exits and recreates as many threads as possible. Any user request currently being processed is lost and must be resubmitted.

    热重启会破坏临时数据,还会退出并重新创建尽可能多的线程。 当前正在处理的所有用户请求都将丢失,必须重新提交。

  • A cold restart also destroys dynamic data, which is data that changes while processing user requests. All sessions, for example, are lost and must be reinitiated.

    冷重启还会破坏动态数据,该动态数据是在处理用户请求时更改的数据。 例如,所有会话都将丢失,必须重新启动。

  • A reload restart also destroys data that is relatively static, such as configuration data that user requests rarely modify. This data is usually loaded from disk or over the network, two examples being an in-memory database of user profiles and another of images that are included in server-to-client HTTP messages.

    重新加载重新启动还会破坏相对静态的数据,例如用户很少修改的配置数据。 此数据通常从磁盘或通过网络加载,两个示例是用户配置文件的内存数据库,另一个是服务器到客户端HTTP消息中包含的图像。

Startup and Shutdown functions therefore need a parameter that specifies what type of restart is occurring:

因此, StartupShutdown功能需要一个参数,该参数指定正在发生的重新启动类型:

enum RestartLevel
   RestartNil,     // in service (not restarting)
   RestartWarm,    // deleting MemTemporary and exiting threads
   RestartCold,    // warm + deleting MemDynamic (user sessions)
   RestartReload,  // cold + deleting MemPersistent & MemProtected (config data)
   RestartReboot,  // exiting and restarting executable
   RestartExit,    // exiting without restarting
   RestartLevel_N  // number of restart levels

启动重启 (Initiating a Restart)

A restart occurs as follows:


  1. The code which decides that a restart is required invokes Restart::Initiate.


  2. Restart::Initiate throws an ElementException.


  3. Thread::Start catches the ElementException and invokes InitThread::InitiateRestart.


  4. InitThread::InitiateRestart interrupts RootThread to tell it that a restart is about to begin and then interrupts itself to initiate the restart.


  5. When InitThread is interrupted, it invokes ModuleRegistry::Restart to manage the restart. This function contains a state machine that steps through the shutdown and startup phases by invoking ModuleRegistry::Shutdown (described below) and ModuleRegistry::Startup (already described).

    InitThread中断时,它将调用ModuleRegistry::Restart来管理重新启动。 此函数包含一个状态机,通过调用ModuleRegistry::Shutdown (如下所述)和ModuleRegistry::Startup (已描述)来逐步完成关闭和启动阶段。

  6. When RootThread is interrupted, it starts a watchdog timer. When the restart is completed, InitThread interrupts RootThread, which cancels the timer. If the timer expires, RootThread forces InitThread to exit and recreates it. When InitThread is reentered, it invokes ModuleRegistry::Restart again, which escalates the restart to the next level.

    RootThread中断时,它将启动看门狗计时器。 重新启动完成后, InitThread中断RootThread ,这将取消计时器。 如果计时器到期,则RootThread强制InitThread退出并重新创建它。 重新输入InitThread ,它将再次调用ModuleRegistry::Restart ,它将重新启动升级到下一个级别。

重新启动期间删除对象 (Deleting Objects During a Restart)

Because the goal of a restart is to reinitialize a subset of the system as quickly as possible, RSC takes a drastic approach. Rather than delete objects one at a time, it simply frees the heap from which they were allocated. In a system with tens of thousands of sessions, for example, this dramatically speeds up the time required for a cold restart. The drawback is that it adds some complexity because each type of memory requires its own heap:

由于重新启动的目标是尽快重新初始化系统的子集,因此RSC采取了激进的方法。 与其一次删除一个对象,不如简单地释放从中分配对象的堆。 例如,在具有成千上万个会话的系统中,这极大地加快了冷重启所需的时间。 缺点是它增加了一些复杂性,因为每种类型的内存都需要自己的堆:

MemoryTypeBase ClassAttributes
MemTemporaryTemporarydoes not survive any restart
MemDynamicDynamicsurvives warm restarts but not cold or reload restarts
MemPersistentPersistentsurvives warm and cold restarts but not reload restarts
MemProtectedProtectedwrite-protected; survives warm and cold restarts but not reload restarts
MemPermanentPermanentsurvives all restarts (this is a wrapper for the C++ default heap)
MemImmutableImmutablewrite-protected; survives all restarts (similar to C++ global const data)
MemoryType 基类 属性
MemTemporary Temporary 重启后无法生存
MemDynamic Dynamic 可以在热启动后存活,但不能在冷启动或重新加载后重启
MemPersistent Persistent 在冷启动和热启动下均能幸存,但在重新启动后不会重启
MemProtected Protected 写保护 在冷启动和热启动下均能幸存,但在重新启动后不会重启
MemPermanent Permanent 在所有重新启动后仍然有效(这是C ++默认堆的包装器)
MemImmutable Immutable 写保护 在所有重新启动后仍然有效(类似于C ++全局const数据)

To use a given MemoryType, a class derives from the corresponding class in the Base Class column. How this works is described later.

要使用给定的MemoryType ,应从“ 基类”列中的相应类派生一个类。 稍后将介绍其工作方式。

一个模块::关机功能 (A Module::Shutdown Function)

A module's Shutdown function closely resembles its Startup function. It invokes Shutdown on objects within its static library, but in the opposite order to which it invoked their Startup functions. Here is the Shutdown function for NbModule, which is (more or less) a mirror image of its Startup function that appeared earlier:

模块的“ Shutdown功能与“ Startup功能非常相似。 它在其静态库中的对象上调用Shutdown ,但顺序与调用其Startup函数的顺序相反。 这是NbModuleShutdown函数,(或多或少)它是先前出现的其Startup函数的镜像:

void NbModule::Shutdown(RestartLevel level)
   Singleton< NbIncrement >::Instance()->Shutdown(level);
   Singleton< SymbolRegistry >::Instance()->Shutdown(level);
   Singleton< CliRegistry >::Instance()->Shutdown(level);
   Singleton< Element >::Instance()->Shutdown(level);
   Singleton< ClassRegistry >::Instance()->Shutdown(level);
   Singleton< ThreadAdmin >::Instance()->Shutdown(level);
   Singleton< ThreadRegistry >::Instance()->Shutdown(level);
   Singleton< ObjectPoolRegistry >::Instance()->Shutdown(level);
   Singleton< DaemonRegistry >::Instance()->Shutdown(level);
   Singleton< CfgParmRegistry >::Instance()->Shutdown(level);
   Singleton< LogGroupRegistry >::Instance()->Shutdown(level);
   Singleton< AlarmRegistry >::Instance()->Shutdown(level);
   Singleton< StatisticsRegistry >::Instance()->Shutdown(level);
   Singleton< LogBufferRegistry >::Instance()->Shutdown(level);
   Singleton< PosixSignalRegistry >::Instance()->Shutdown(level);

   Singleton< TraceBuffer >::Instance()->Shutdown(level);

Given that a restart frees one or more heaps rather than expecting objects on those heaps to be deleted, what is the purpose of a Shutdown function? The answer is that an object which survives the restart might have pointers to objects that will be destroyed or recreated. Its Shutdown function might therefore need to clear these pointers.

鉴于重新启动会释放一个或多个堆,而不是期望这些堆上的对象被删除,所以Shutdown功能的目的是什么? 答案是,在重新启动后幸存的对象可能具有指向将要销毁或重新创建的对象的指针。 因此,其Shutdown功能可能需要清除这些指针。

NbModule's Startup function created a number of threads, so how come its Shutdown function doesn't shut them down? The reason is that ModuleRegistry::Shutdown handles this earlier in the restart.

NbModuleStartup函数创建了多个线程,那么它的Shutdown函数又如何不将其关闭呢? 原因是ModuleRegistry::Shutdown在重新启动时较早地处理了此问题。

ModuleRegistry ::关机 (ModuleRegistry::Shutdown)

This function first allows a subset of threads to run for a while so that they can generate any pending logs. It then notifies all threads of the restart, counting how many of them are willing to exit, and then schedules them until they have exited. Finally, it shuts down all modules in the opposite order that their Startup functions were invoked. As with ModuleRegistry::Startup, code that logs the progress of the restart has been deleted for clarity:

此功能首先允许线程子集运行一段时间,以便它们可以生成任何挂起的日志。 然后,它通知所有线程重新启动,计算它们愿意退出的数量,然后安排它们直到退出。 最后,它以调用其Startup功能的相反顺序关闭所有模块。 与ModuleRegistry::Startup ,为清楚起见,删除了记录重启进度的代码:

void ModuleRegistry::Shutdown(RestartLevel level)
   if(level >= RestartReload)

   Duration delay(25, mSECS);

   // Schedule a subset of the factions so that pending logs will be output.
      for(size_t tries = 120, idle = 0; (tries > 0) && (idle <= 8); --tries)
         if(Thread::SwitchContext() != nullptr)
            idle = 0;

   //  Notify all threads of the restart.
   auto reg = Singleton< ThreadRegistry >::Instance();
   auto before = reg->Threads().size();
   auto planned = reg->Restarting(level);
   size_t actual = 0;

   //  Schedule threads until the planned number have exited. If some
   //  fail to exit, RootThread will time out and escalate the restart.
      while(actual < planned)
         actual = before - reg->Threads().size();

   //  Modules must be shut down in reverse order of their initialization.
   for(auto m = modules_.Last(); m != nullptr; modules_.Prev(m))

关闭线程 (Shutting Down a Thread)

ModuleRegistry::Shutdown (via ThreadRegistry) invokes Thread::Restarting to see if a thread is willing to exit during the restart. This function, in turn, invokes the virtual function ExitOnRestart:

ModuleRegistry::Shutdown (通过ThreadRegistry )调用Thread::Restarting以查看线程是否愿意在重新启动期间退出。 该函数依次调用virtual函数ExitOnRestart

bool Thread::Restarting(RestartLevel level)
   //  If the thread is willing to exit, signal it. ModuleRegistry.Shutdown
   //  will momentarily schedule it so that it can exit.
      return true;

   //  Unless this is RootThread or InitThread, mark it as a survivor. This
   //  causes various functions to force it to sleep until the restart ends.
   if(faction_ < SystemFaction) priv_->action_ = SleepThread;
   return false;

The default implementation of ExitOnRestart is:


bool Thread::ExitOnRestart(RestartLevel level) const
   //  RootThread and InitThread run during a restart. A thread blocked on
   //  stream input, such as CinThread, cannot be forced to exit because C++
   //  has no mechanism for interrupting it.
   if(faction_ >= SystemFaction) return false;
   if(priv_->blocked_ == BlockedOnConsole) return false;
   return true;

A thread that is willing to exit receives the signal SIGCLOSE. Before it delivers this signal, Thread::Raise invokes the virtual function Unblock on the thread in case it is currently blocked. For example, each instance of UdpIoThread receives UDP packets on an IP port. Because pending user requests are supposed to survive warm restarts, UdpIoThread overrides ExitOnRestart to return false during a warm restart. During other types of restarts, it returns true, and its override of Unblock frees its socket so that its call to recvfrom will immediately return, allowing it to exit.

愿意退出的线程接收信号SIGCLOSE 。 在传递此信号之前, Thread::Raise在线程当前被阻止的情况下在线程上调用virtual函数Unblock 。 例如,每个UdpIoThread实例都在IP端口上接收UDP数据包。 由于应该等待挂起的用户请求在热启动中UdpIoThread下来,因此UdpIoThread会覆盖ExitOnRestart以在热启动期间返回false 。 在其他类型的重新启动期间,它返回true ,并且其对Unblock覆盖将释放其套接字,以便其对recvfrom调用将立即返回,从而使其退出。

支持的内存类型 (Supporting Memory Types)

This section discusses what is needed to support a MemoryType, each of which has its own persistence and protection characteristics.



Each MemoryType requires its own heap so that all of its objects can be deleted en masse by simply freeing that heap during the appropriate types of restart. Heap management, at least for the default heap, is platform specific, so RSC defines the class SysHeap to act as a wrapper for platform-specific heap functions.

每个MemoryType需要自己的堆,使得它所有的对象都可以通过在适当类型的重启只是释放该堆被删除集体 。 至少对于默认堆,堆管理是特定于平台的,因此RSC定义了SysHeap类,以充当特定于平台的堆函数的包装器。

To support write-protected memory on Windows, RSC had to implement its own heap, because a Windows heap, for some undisclosed reason, soon fails if it is write-protected. Consequently, there is now a base class, Heap, with two subclasses: the previously mentioned SysHeap, and RSC's NbHeap, which is implemented using buddy allocation. The heaps that support MemProtected and MemImmutable use NbHeap.

为了在Windows上支持写保护的内存,RSC必须实现自己的堆,因为Windows堆(由于某些未公开的原因)如果受到写保护,很快就会失败。 因此,现在有了一个基类Heap ,它具有两个子类:前面提到的SysHeap和RSC的NbHeap ,它是使用伙伴分配实现的。 支持MemProtectedMemImmutable的堆使用NbHeap

The interface Memory.h is used to allocate and free the various types of memory. Its primary functions are similar to malloc and free, with the various heaps being private to Memory.cpp:

接口Memory.h用于分配和释放各种类型的内存。 它的主要功能类似于mallocfree ,各种堆是Memory.cpp专用的

//  Allocates a memory segment of SIZE of the specified TYPE.  The
//  first version throws an AllocationException on failure, whereas
//  the second version returns nullptr.
void* Alloc(size_t size, MemoryType type);
void* Alloc(size_t size, MemoryType type, std::nothrow_t&);

//  Deallocates the memory segment returned by Alloc.
void Free(void* addr, MemoryType type);

基类 (Base Classes)

A class whose objects can be allocated dynamically derives from one of the classes mentioned previously, such as Dynamic. If it doesn't do so, its objects are allocated from the default heap, which is equivalent to deriving from Permanent.

可以动态分配对象的类是从前面提到的类之一派生的,例如Dynamic 。 如果不这样做,则从默认堆中分配其对象,这等效于从Permanent派生。

The base classes that support the various memory types simply override operator new and operator delete to use the appropriate heap. For example:

支持各种内存类型的基类只需重写operator newoperator delete即可使用适当的堆。 例如:

void* Dynamic::operator new(size_t size)
   return Memory::Alloc(size, MemDynamic);

void* Dynamic::operator new[](size_t size)
   return Memory::Alloc(size, MemDynamic);

void Dynamic::operator delete(void* addr)
   Memory::Free(addr, MemDynamic);

void Dynamic::operator delete[](void* addr)
   Memory::Free(addr, MemDynamic);

分配者 (Allocators)

A class with a std::string member wants the string to allocate memory from the same heap that is used for objects of that class. If the string instead allocates memory from the default heap, a restart will leak memory when the object's heap is freed. Although the restart will free the memory used by string object itself, its destructor is not invoked, so the memory that it allocated to hold its characters will leak.

具有std::string成员的类希望该字符串从用于该类对象的同一堆中分配内存。 如果该字符串改为从默认堆中分配内存,则释放对象堆时,重新启动将泄漏内存。 尽管重新启动将释放字符串对象本身使用的内存,但不会调用其析构函数,因此分配给其字符的内存将泄漏。

RSC therefore provides a C++ allocator for each MemoryType so that a class whose objects are not allocated on the default heap can use classes from the standard library. These allocators are defined in Allocators.h and are used to define STL classes that allocate memory from the desired heap. For example:

因此,RSC为每个MemoryType提供一个C ++分配器,以便其对象未在默认堆上分配的类可以使用标准库中的类。 这些分配器在Allocators.h中定义,用于定义从所需堆分配内存的STL类。 例如:

typedef std::char_traits<char> CharTraits;
typedef std::basic_string<char, CharTraits, DynamicAllocator<char>> DynamicStr;

A class derived from Dynamic then uses DynamicStr to declare what would normally have been a std::string member.


写保护数据 (Write-Protecting Data)

The table of memory types noted that MemProtected is write-protected. The rationale for this is that data which is only deleted during a reload restart is expensive to recreate, because it must be loaded from disk or over the network. The data also changes far less frequently than other data. It is therefore prudent but not cost-prohibitive to protect it from trampling.

内存类型指出, MemProtected已写保护。 这样做的理由是,仅在重新加载重新启动期间删除的数据重新创建成本很高,因为必须从磁盘或通过网络加载它们。 数据更改的频率也远少于其他数据。 因此,保护​​它免受践踏是审慎的,但不是成本高昂的。

During system initialization, MemProtected is unprotected. Just before it starts to handle user requests, the system write-protects MemProtected. Applications must then explicitly unprotect and reprotect it in order to modify data whose memory was allocated from its heap. Only during a reload restart is it again unprotected, while recreating this data.

在系统初始化期间, MemProtected不受保护。 在开始处理用户请求之前,系统对MemProtected了写保护。 然后,应用程序必须显式取消保护并重新保护它,以修改从其堆中分配内存的数据。 仅在重新加载重新启动期间,它才在创建此数据时再次不受保护。

A second type of write-protected memory, MemImmutable, is defined for the same reason. It contains critical data that should never change, such as the Module subclasses and ModuleRegistry. Once the system has initialized, it is permanently write-protected so that it cannot be trampled.

出于相同的原因,定义了第二种写保护存储器MemImmutable 。 它包含不应更改的关键数据,例如Module子类和ModuleRegistry 。 系统初始化后,将对其进行永久的写保护,以免被践踏。

When the system is in service, protected memory must be unprotected before it can be modified. Forgetting to do this causes an exception that is almost identical to the one caused by a bad pointer. Because the root causes of these exceptions are very different, RSC distinguishes them by using a proprietary POSIX signal, SIGWRITE, to denote writing to protected memory, rather than the usual SIGSEGV that denotes a bad pointer.

当系统处于服务状态时,必须先取消保护受保护的内存,然后才能对其进行修改。 忘记执行此操作会导致异常,该异常与指针错误导致的异常几乎相同。 由于这些异常的根本原因非常不同,因此RSC通过使用专有的POSIX信号SIGWRITE来表示对异常的区分,以表示写入受保护的内存,而不是通常的SIGSEGV来表示错误的指针。

After protected memory has been modified, say to insert a new subscriber profile, it must be immediately reprotected. The stack object FunctionGuard is used for this purpose. Its constructor unprotects memory and, when it goes out of scope, its destructor automatically reprotects it:

修改受保护的内存后,例如说要插入新的用户配置文件,必须立即对其进行重新保护。 堆栈对象FunctionGuard用于此目的。 它的构造函数取消保护内存,当它超出范围时,其析构函数自动重新保护它:

FunctionGuard guard(Guard_MemUnprotect);

// change data located in MemProtected

return;  // MemProtected is automatically reprotected

There is also a less frequently used Guard_ImmUnprotect for modifying MemImmutable. The FunctionGuard constructor invokes a private Thread function that eventually unprotects the memory in question. The function is defined by Thread because each thread has an unprotection counter for both MemProtected and MemImmutable. This allows unprotection events to be nested and a thread's current memory protection attributes to be restored when it is scheduled in.

还有一个不常用的Guard_ImmUnprotect来修改MemImmutableFunctionGuard构造函数调用private Thread函数,该函数最终取消保护所涉及的内存。 该函数由Thread定义,因为每个线程都有一个针对MemProtectedMemImmutable的取消保护计数器 。 这样可以嵌套取消保护事件,并在安排线程时恢复线程的当前内存保护属性。

设计一个混合内存类型的类 (Designing a Class that Mixes Memory Types)

Not all classes will be satisfied with using a single MemoryType. RSC's configuration parameters, for example, derive from Protected, but its statistics derive from Dynamic. Some classes want to include members that support both of these capabilities.

并非所有类都可以使用单个MemoryType来满足。 例如,RSC的配置参数源自Protected ,但其统计信息源自Dynamic 。 一些类希望包含支持这两种功能的成员。

Another example is a subscriber profile, which would usually derive from Protected. But it might also track a subscriber's state, which changes too frequently to be placed in write-protected memory and would therefore reside outside the profile, perhaps in Persistent memory.

另一个示例是订户配置文件,通常从Protected派生。 但是它也可能跟踪订户的状态,该状态变化太频繁而无法放置在写保护的内存中,因此会驻留在配置文件之外,也许在Persistent内存中。

Here are some guidelines for designing classes with mixed memory types:


  1. If a class embeds another class directly, rather than allocating it through a pointer, that class resides in the same MemoryType as its owner. If the embedded class allocates memory of its own, however, it must use the same MemoryType as its owner. This was previously discussed in conjunction with strings.

    如果一个类直接嵌入另一个类,而不是通过指针进行分配,则该类与其所有者位于相同的MemoryType中。 但是,如果嵌入式类分配自己的内存,则它必须使用与其所有者相同的MemoryType 。 之前已结合字符串讨论了这一点。

  2. If a class wants to write-protect most of its data but also has data that changes too frequently, it should use the PIMPL idiom to allocate its more dynamic data in a struct that usually has the same persistence. That is, a class derived from Protected puts its dynamic data in a struct derived from Persistent, and a class derived from Immutable puts its dynamic data in a struct derived from Permanent. This way, the primary class and its associated dynamic data either survive a restart or get destroyed together.1

    如果一个类想对大多数数据进行写保护,但又有频繁更改的数据,则应使用PIMPL惯用语在通常具有相同持久性的struct中分配其更多动态数据。 也就是说,从Protected派生的类将其动态数据放入从Persistent派生的struct ,从Immutable派生的类将其动态数据放入派生自Permanentstruct 。 这样,主类及其相关的动态数据要么在重新启动后幸存下来,要么一起被销毁。 1个

    • If the class owns an object of lesser persistence, its Shutdown function invokes unique_ptr::release to clear the pointer to that object if the restart will destroy it. When its Startup function notices the nullptr, it reallocates the object.

      如果该类拥有一个持久性较低的对象,则其Shutdown函数将调用unique_ptr::release以清除指向该对象的指针,如果重新启动会破坏该对象。 当其Startup功能注意到nullptr ,它将重新分配该对象。

    • If the class owns an object of greater persistence, its Shutdown function may invoke unique_ptr::reset to prevent a memory leak during a restart that destroys the owner. But if it can find the object, it doesn't need to do anything. When it is recreated during the restart's startup phase, its constructor must not blindly create the object of greater persistence. Instead, it must first try to find it, usually in a registry of such objects. This is the more likely scenario; the object was designed to survive the restart, so it should be allowed to do so.

      如果该类拥有一个持久性更高的对象,则其Shutdown函数可以调用unique_ptr::reset以防止在重新启动期间销毁所有者的内存泄漏。 但是,如果可以找到对象,则无需执行任何操作。 在重新启动的启动阶段重新创建它时,其构造函数不得盲目创建具有更大持久性的对象。 相反,它通常必须首先在此类对象的注册表中尝试找到它。 这是更可能的情况; 该对象被设计为可以在重新启动后保留下来,因此应允许这样做。

    If a class needs to include a class with different persistence, it should manage it through a unique_ptr and override the Shutdown and Startup functions discussed earlier:


编写关机和启动功能 (Writing Shutdown and Startup Functions)

There are a few functions that many Shutdown and Startup functions use. Base::MemType returns the type of memory that a class uses, and Restart::ClearsMemory and Restart::Release use its result:

许多ShutdownStartup功能都使用一些功能。 Base::MemType返回类使用的内存类型,而Restart::ClearsMemoryRestart::Release使用其结果:

//  Types of memory (defined in SysTypes.h).
enum MemoryType
   MemNull,        // nil value
   MemTemporary,   // does not survive restarts
   MemDynamic,     // survives warm restarts
   MemPersistent,  // survives warm and cold restarts
   MemProtected,   // survives warm and cold restarts; write-protected
   MemPermanent,   // survives all restarts (default process heap)
   MemImmutable,   // survives all restarts; write-protected
   MemoryType_N    // number of memory types

//  Returns the type of memory used by the object (overridden by
//  Temporary, Dynamic, Persistent, Protected, Permanent, and Immutable).
virtual MemoryType MemType() const;

//  Returns true if the heap for memory of TYPE will be freed and
//  reallocated during any restart that is currently in progress.
static bool ClearsMemory(MemoryType type);

//  Invokes obj.release() and returns true if OBJ's heap will be freed
//  during any restart that is currently in progress.
template< class T > static bool Release(std::unique_ptr< T >& obj)
   auto type = (obj == nullptr ? MemNull : obj->MemType());
   if(!ClearsMemory(type)) return false;
   return true;

行动准则的痕迹 (Traces of the Code in Action)

RSC's output directory contains console transcripts (*.console.txt), log files (*.log.txt), and function traces (*.trace.txt) of the following:

RSC的输出目录包含以下内容的控制台脚本( * .console.txt ),日志文件( * .log.txt )和功能跟踪( * .trace.txt ):

  • system initialization, in the files init.*


  • a warm restart, in the files warm* (warm1.* and warm2.* are pre- and post-restart, respectively)

    热启动,在文件warm *中 ( warm1。*warm2。*分别是重新启动之前和之后)

  • a cold restart, in the files cold* (cold1.* and cold2.* are pre- and post-restart, respectively)

    冷重启,在文件Cold *中 ( cold1。*cold2。*分别是重启前和重启后)

  • a reload restart, in the files reload* (reload1.* and reload2.* are pre- and post-restart, respectively)

    重新加载重启,在文件reload *中 ( reload1。*reload2。*分别是重启前和重启后)

The restarts were initiated using the CLI's >restart command.


笔记 (Notes)

1 RSC uses the PIMPL idiom in this way in several places: just look for any member named dyn_.

1 RSC在许多地方都以这种方式使用PIMPL惯用语:只需查找任何名为dyn_成员。

翻译自: https://www.codeproject.com/Articles/5254138/Robust-Cplusplus-Initialization-and-Restarts

  • 0
  • 0
    觉得还不错? 一键收藏
  • 0


  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0