Graceful Degradation via System-wide Customization for Distributed Embedded Systems

what is Graceful degradation

why do graceful degradation in distributed embedded systems


  1. 为什么要选择分布式嵌入式系统这个目标

    在作者看来,此文章所提出的平稳降级的方法可以适用于不同类型的计算系统。但是,作者在本文只先考虑分布式嵌入式系统。因为分布式嵌入式系统有三个特点,distribution,general compute ability,optimization requirement


    A distributed system is able to survive many types of failures with computing and network resources intact, as opposed to a centralized system where a single failure may
    crash the sole CPU


    ②general compute ability

    “smart” sensors (and actuators) use micro-controllers to interface the sensor to the rest of the system. But, the micro-controller can execute general purpose software components;

    it might be called upon to do so if a higher utility component is forced to be rehosted due to a hardware failure.

    通用的计算能力通常体现在 “smart” sensors/actors 上。这些组件用微控制器和系统的其它传感器进行交互。微控制器的功能是控制 sensors 和 actuators,以及将sensor收到/传出的原始数据转换为可用的数值形式。同时,这些微控制器也可以执行一些通用目的软件组件。

    ③optimization requirement

    Marketing pressures force designers to add features to systems that do not result in changes to core functionality; they merely add system optimizations.

    An automobile has a large portion of its electronic systems devoted to passenger entertainment, fuel economy, emission management, and advanced stability and control algorithms. The basic mission of the automobile: to convey passengers along a roadway in a safe manner.


  2. 为什么要用此方法进行平稳降级

    The only widely used hardware techniques are based on redundant hardware – triply (or more) redundant modules or a near cousin such as hot spares.

    Unfortunately, both hardware and software redundancy techniques are quite expensive, as they require extra hardware or the production of multiple versions of the software.


    A gracefully degrading system does not inherently demand more hardware than a non-gracefully degrading one.

    A naive method will examine and design the system for each of the 2^n configurations of an n component system.

    automatically collapse the number of configurations to a set which can be handled by human designers or, alternately, to find algorithms to automatically use available

    software components to maximize functionality of whatever hardware is operational.


proposed a framework

The framework involves three algorithms working together, each algorithm responsible for one of the three aspects of system-wide customization: the feature model, the software repository and the allocation to hardware.

整个框架由三种算法组合而成,分别应对三种场景:feature model,software repository,allocation to hardware

  1. The Feature Selection algorithm operates on the feature model, and is responsible for choosing system features to implement.

    A feature is a mechanism to accomplish a system function, or desirable system behavior.


  2. The Adapter Selection algorithm chooses software components to fulfill the requirements

    of the features chosen by the feature selection algorithm.


    In many cases, a large and flexible library of components will not be available as such. However, we make use of a library that often will be available — software components
    from related product models.


  3. The final portion of the framework is the Adapter Allocation algorithm, which determines a feasible placement of software components on the micro-controllers of the system.






how to implement



basic hardware 通过 I/O 接口将一些 raw data与 driver adapter进行通信。

algorithm adapter主要是将 mid-level interface 的信息传输给网络中其它想接收的节点。这些adapter是整个系统弹性最大的,通常配置的变换是将这些adapter进行更换。

null adapter的功能是仅将传感器的原始数据信息传输给网络,然后通过网络再传输到另一个PE上的driver adapter。



Three elements must be represented in the model: the network, processing elements,. and adapters. Beyond that, the flexibility present in the Product Family Architecture must be represented.

  1. processing elements

    It consists of a resource vector, of arbitrary length n. Each element of the vector is a consumable resource, such as RAM, Flash Memory, or I/O channels.

    PE的资源向量由可消耗的资源组成,包括了RAM,Flash Memory,I/O channels

    Sensors and actuators are physically connected to particular processing elements, and require computational resources for proper execution. Such resources are considered to be pre-allocated and not included in the PE’s resource vector.

    但是Sensors和actuators 是被物理连接到特定的PE,PE资源向量所包含的资源需要除去这些Sensors和actuators的资源。

  2. Network

    Our model of the network is a simple resource vector. – bandwidth

  3. configuration

    A well-formed configuration can be represented by a unique DFG arranged to show the interconnections among sensors, adapters and actuators.

    Note that interior vertices are all adapters and exterior vertices are sensors and actuators



​ The specification takes the form of a requirements vector, where each element of the vector indicates the amount of a particular type of resource that the adapter requires.

​ **当将某个adapter分配到某个PE上时,adapter需要消耗计算资源,其需要的资源向量的形 式与PE的资源向量形式相同。

​ Edges are labeled with a requirement vector, in the same dimensionality and element order as the network resource vector.

​ 边表示的adapter、sensors、actuators之间的通信(数据传输),因此资源向量和网络的 资源向量相同。

  1. Fault model

  2. Omission failure a processing element or software component fails to generate an output.

  3. Timing failure an otherwise correct response is generated either too early or too late.

  4. Value failure the value of a response is incorrect.

  5. Crash failure a processing element fails to generate outputs.

  6. Byzantine processing failure results in arbitrary, even malicious, behavior.

For the most part, we consider only crash failures. It could be that a timing, omission or value failure which was detected by the system would result in a PE being shut down and cause a system-wide customization.

problem definition


  1. A hardware failure occurs. We assume the driver has enough system functionality remaining to pull the car to the side of the road.

  2. A customization manager is connected. The connection is either via a remote connection such as OnStarTM or to a laptop carried in a tow truck.

  3. The customization manager polls the system to discover what hardware is available.

  4. The system-wide customization algorithm is executed. The output is a list of adapters to be downloaded to particular hardware.

  5. Adapters are downloaded and installed. The adapters are taken from the adapter repository, either on a remote server or a CD-ROM in the laptop. Note that the adapter repository might contain adapters that were not necessarily available when the automobile was constructed.

  6. The driver reboots the auto and goes on his way.

当汽车的硬件出故障后,有个前提假设,汽车保留下来的功能仍能够让汽车停到路边。这时候,customization manager 连接上来。连接可以通过远程方式或者电脑连接。接着customization manager 检查汽车系统还有哪些可用的硬件。对这些可用的硬件设备,启动重新配置算法,输出一系列能适配当前硬件的adapter。然后将adapter下载安装。(可以通过远程服务器或者CD)。最后重启系统完成配置。

Problem Models


Each choice of possible hardware components can be viewed as a single vertex of a dense lattice that represents a fine-grained product family architecture (PFA)


The system’s configuration state falls toward the bottom of the lattice as components are broken, and rises when they are repaired or replaced.

In terms of the PFA lattice, the system-wide customization problem may be expressed as the process of choosing the software configuration for a particular vertex (i.e., th14721e one representing available hardware) that maximizes the utility of the system.


MUSH model

As this model Maps Utility, Hardware and Software, we call it the MUSH model


For the system composed of h hardware components, the y–axis has 2h distinct allowable values. Likewise, the x–axis has 2s possibilities for the s software components. The z–axis on the graph is the system utility.

The maximal utility projection graph, shown here in the x–z plane. For each value on the x–axis, the largest system utility is chosen from all of the y values and plotted.



For each vertex in a hardware-only lattice there are many possible DFGs. If the DFGs for each such configuration were merged, the resulting graph would be an alternate representation of all the valid configurations available in the lattice.

In order to merge DFGs, a notational element must be introduced to allow for choices between different components. The choice element allows data flow from at most one of its inputs through to the output.




If given the PFA graph of Figure 5.6, a customization manager would not have any guidance about how to optimize functionality of the system.

Which is more important, location display or path planning? Of the available location display options, which should be used?

The algorithm can only attempt to maximize quantitative values, so functionality must be represented quantitatively.

通过data element将不同的product融合在一起,组成一个超大的PFA。从这个PFA中可以知道sensors、adapter、actuators之间的关系,但是无法从PFA中得到最优的系统功能。因此,为了能用优化算法对系统功能进行优化,需要将系统功能进行量化表征。

Each such different means of accomplishing the function is a feature.


This research did not attempt to solve the feature problem, but does require a low-complexity feature representation.

采用feature class可以避免功能相同的feature放进一个configuration的冲突。这样PFA中的特定的adapter就被叫为 features。

Each feature is an adapter that has been given a utility value to represent its desirability. The overall utility of a configuration then is the sum of the utility of all the features of a configuration.

feature也是一种标注了效能值的adapter。但是也有 zero sized的 feature(即不需要资源,比如导航系统中处理GPS输入的 zero sized dead reckoner feature,其对应的adapter为 Null adapter)。


Inputs to the problem are the PFA graph, which provides all the alternatives, and a description of the available hardware. The goal is to generate a valid configuration of adapters to the processing elements (PE) and message traffic to network elements (NE).


Feature selection

在PFA图中包含了 feature类别的集合{C0, C1, . . . , Cm}, 每个类别Ck包含了一些features,{Fk,0, Fk,1., . . . , Fk,n},当a>0时,类别{C0, . . . , Ca−1}是critical的,剩下的 {Ca, . . . , Cm}是non-critical的。每个feature的效能utility值为 u (Fk,i),这些feature在类别中是按照效能值大小排列的,即 u (Fk,i) ≥ u (Fk,j),∀ i ≤ j 。

feature选择是组合优化问题,需要多次迭代。每次迭代后返回一个feature的集合, {F0 , i0,F1 , i1,F2 , i2,…,Fj , ij } 其中 a <= j <= m,整个效能的总和 是优化的目标。

More formally, a PFA graph includes a set of feature classes, {C0, C1, . . . , Cm} where each class Ck contains some number of features{Fk,0, Fk,1., . . . , Fk,n}. For some a ≥ 0, classes C0, . . . , Ca−1} are critical, the remainder {Ca, . . . , Cm} are non-critical. Each feature has a utility u (Fk,i) indicating its desirability in the system. Note that the features are sorted in their classes by utility, so that u (Fk,i) ≥ u (Fk,j),∀ i ≤ j .

Each invocation of the algorithm will return a set of features {F0 , i0,F1 , i1,F2 , i2,…,Fj , ij } where a <= j <= m, The total utility Utot is the optimization metric.


Generate a list of all possible combinations. Each combination on the list will then have its Utot calculated and used as the basis to sort the list.

比如汽车导航中的三类 feature(dead reckoner, turn calculation, map)分别有5(dead reckoner),2(turn calculation),8(map)种feature,那么一共有 6 x 3 x 8 = 144种组合(因为有两种是non-critical,可以没有此feature),但实际上有一类虽不是critical feature但是却了它 critical也没有(dead reckoner),所以是 5 x 3 x 8 =120

As a rough estimate, consider a system with f f f features in c c c classes. If the features are uniformly distributed among the classes, each class would hold f c \frac{f}{c} cf features. Making a conservative assumption that all classes are critical, the total number of combinations would then be ( f c ) c (\frac{f}{c})^c (cf)c. Since excessively huge feature classes will probably not be supported by management, the number of feature classes can be approximated as $ c \approx 2\sqrt{f} $ . The total number of combinations, as a function of the number of features, is thus approximately $(\frac{f}{2})^{2\sqrt{f}} $.

暴力破解算法就是遍历所有的feature 组合,来找到效用最大的一个组合。

章节8中的案例是有50 features 18 classes,按COMB_ALL 算法来计算,大概有 2 33 2^{33} 233




where m is the number of feature classes, ni is the number of features in feature class i and νi is the is criticality of feature class i, with a value of 0 for critical classes and 1 for non-critical ones.

先将所有的feature按照类别分好,然后每类 feature 按照效能从大到小排列。

Without further information, the feature which has the smallest utility increase over the next feature of that class (i.e., choose Fk,0 for feature class Ck which U(Fk,0) − U(Fk,1 is smallest) should be the one to be discarded, resulting in the highest Utot of the available options.


The Feature Selection algorithm must choose a feature set under three different conditions: initial, adapter selection failure and adapter allocation failure.

The adapter selection algorithm fails only because adapters cannot be chosen to fulfill all of the dependencies of a feature.

Upon adapter allocation failure, the packing state achieved on each attempt could be examined to attempt to discover the core reason behind the failure.


特征选择算法就是要选择 a feature set。这些features 至少应该包括critical的feature,并且这些feature之间要满足相应的约束。

执行该算法主要有三个时期,初始阶段,adapter 选择失败阶段,adapter 分配失败阶段。当初始阶段即第一次运行该算法时,由于没有相关信息,所以对于COMB_ALL来说,随便选,对于COMB_SHORT来说,选择效用最大的一个,然后输出到下一阶段。当后面的阶段无法通过时,返回到该算法,这时对于COMB_ALL,换另一种组合,直到能够通过三个阶段的约束,最后将所有的feature set按效能和排列,取最高的;对于COMB_SHORT,选择一个feature丢弃,然后变成另一种组合,直到能满足三个阶段的约束就取当前feature set 作为最终的feature set 。

  1. COMB_ALL brute force

    Each combination was then tested to ensure the proper input and output dependencies were met — other adapters existed to generate each input and no other adapter was included that generated the same outputs.


    ①other adapters existed to generate each input and no other adapter was included that generated the same outputs.

    ②Additional tests ensured consumers existed for each output data element.







adapter selection

The selection of adapters to implement the feature set forms the core of the system-wide customization algorithm. It is during this phase that the relationships and dependencies between adapters, as expressed in the PFA graph, are incorporated into a solution.

Given: a PFA graph P P P, set of features F F F and the state of all sensors/actuators (i.e. working, not working)

Find: an allocation graph A A A, which is a subgraph of P P P, that describes a minimalist valid configuration. Ideally, find A A A such that the probability it will be allocatable is maximized.

A is minimalist in the sense that the removal of any vertex in V or edge in E would make A no longer a valid configuration.

simple adapter count proved to be a suitable heuristic


adapter number size best

adapter allocation

The algorithm described in this chapter exploits the fixed nature of hardware interface software components by examining the other adapters that it might call or be called by.


The adapters are joined in an allocation graph, A ( V , E ) A(V, E) A(V,E) whose vertices are the adapters and edges E i , j E_{i,j} Ei,j represent communication between a d a p t e r i adapter_i adapteri and a d a p t e r j adapter_j adapterj.

Each a d a p t e r i , j adapter_{i,j} adapteri,j is labeled with its processing requirements, p ( i ) p(i) p(i). Processing requirements are often a list of multiple independent values such as CPU cycles, RAM, or I/O channels.

Edges are labeled with communication requirements c ( i ) c(i) c(i), usually representing bandwidth.



