Graceful Degradation via System-wide Customization for Distributed Embedded Systems


Nace 的博士论文解析 Graceful Degradation via System-wide Customization for Distributed Embedded Systems

主要从四个方面对这篇文章进行解析,这是什么,为什么要做这个,做了哪些工作,怎么做的这些工作,实验证明这些工作做的怎么样,最后还有哪些不足以及可以提升的。

what is Graceful degradation

why do graceful degradation in distributed embedded systems

为什么做这个可以从以下几个方面来考虑。

  1. 为什么要选择分布式嵌入式系统这个目标

    在作者看来,此文章所提出的平稳降级的方法可以适用于不同类型的计算系统。但是,作者在本文只先考虑分布式嵌入式系统。因为分布式嵌入式系统有三个特点,distribution,general compute ability,optimization requirement

    ①distribution

    A distributed system is able to survive many types of failures with computing and network resources intact, as opposed to a centralized system where a single failure may
    crash the sole CPU

    分布式系统相对于集中式系统在受到遭受故障时,仍能在计算资源和网络资源保留的情况下存活下来。但是集中式系统可能会因为一个故障导致唯一的CPU崩溃。

    ②general compute ability

    “smart” sensors (and actuators) use micro-controllers to interface the sensor to the rest of the system. But, the micro-controller can execute general purpose software components;

    it might be called upon to do so if a higher utility component is forced to be rehosted due to a hardware failure.

    通用的计算能力通常体现在 “smart” sensors/actors 上。这些组件用微控制器和系统的其它传感器进行交互。微控制器的功能是控制 sensors 和 actuators,以及将sensor收到/传出的原始数据转换为可用的数值形式。同时,这些微控制器也可以执行一些通用目的软件组件。

    ③optimization requirement

    Marketing pressures force designers to add features to systems that do not result in changes to core functionality; they merely add system optimizations.

    An automobile has a large portion of its electronic systems devoted to passenger entertainment, fuel economy, emission management, and advanced stability and control algorithms. The basic mission of the automobile: to convey passengers along a roadway in a safe manner.

    这些系统会附加很多非核心、关键的功能,这些功能不会影响核心功能,只是对系统进行优化。当遇到紧急情况时,这些预先分配给非核心功能的计算资源可以重新分配给关键功能组件。

  2. 为什么要用此方法进行平稳降级

    The only widely used hardware techniques are based on redundant hardware – triply (or more) redundant modules or a near cousin such as hot spares.

    Unfortunately, both hardware and software redundancy techniques are quite expensive, as they require extra hardware or the production of multiple versions of the software.

    传统方式是采用硬件技术,即增加硬件的冗余,但是成本太高,代价大。

    A gracefully degrading system does not inherently demand more hardware than a non-gracefully degrading one.

    A naive method will examine and design the system for each of the 2^n configurations of an n component system.

    automatically collapse the number of configurations to a set which can be handled by human designers or, alternately, to find algorithms to automatically use available

    software components to maximize functionality of whatever hardware is operational.

    运用此平稳降级方法的系统与没有平稳降级系统相比,其不需要增加更多的硬件。

proposed a framework

The framework involves three algorithms working together, each algorithm responsible for one of the three aspects of system-wide customization: the feature model, the software repository and the allocation to hardware.

整个框架由三种算法组合而成,分别应对三种场景:feature model,software repository,allocation to hardware

  1. The Feature Selection algorithm operates on the feature model, and is responsible for choosing system features to implement.

    A feature is a mechanism to accomplish a system function, or desirable system behavior.

    特征选择算法是选择feature来实现系统功能。一种功能可能由多种不同的feature来进行实现。

  2. The Adapter Selection algorithm chooses software components to fulfill the requirements

    of the features chosen by the feature selection algorithm.

    适配器选择算法是选择适配器(软件组件)来满足选定的feature所需要的条件。(PS,为什么叫adapter见4.1.2)

    In many cases, a large and flexible library of components will not be available as such. However, we make use of a library that often will be available — software components
    from related product models.

    通常对于某一特定的产品来说,没有这么多可供选择的组件。因此,考虑从相关的产品模型中的组件进行选取。这些相关的产品就构成PFA架构。

  3. The final portion of the framework is the Adapter Allocation algorithm, which determines a feasible placement of software components on the micro-controllers of the system.

    适配器分配算法是确定如何将适配器(软件组件)放置到系统中的微控制器(我觉得也可以叫处理器)中。

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-tHlGnHSb-1626248993075)(C:\Users\khy\AppData\Roaming\Typora\typora-user-images\image-20210413172752582.png)]

    晶格图包含了所有可能的产品族架构。(即晶格图中的每个晶格表示一个产品,连接线表示不同产品之间的关系)。

    PFA图结合了每个产品的数据流图,和各产品之间的关系。

    第一阶段基于系统特征模型选择feature,第二阶段

how to implement

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-jcqjf2U0-1626248993081)(C:\Users\khy\AppData\Roaming\Typora\typora-user-images\image-20210414113052001.png)]

这是一个PE的抽象图。

basic hardware 通过 I/O 接口将一些 raw data与 driver adapter进行通信。

algorithm adapter主要是将 mid-level interface 的信息传输给网络中其它想接收的节点。这些adapter是整个系统弹性最大的,通常配置的变换是将这些adapter进行更换。

null adapter的功能是仅将传感器的原始数据信息传输给网络,然后通过网络再传输到另一个PE上的driver adapter。

文章中的adapter包含了以上三种adapter。

建模表征

Three elements must be represented in the model: the network, processing elements,. and adapters. Beyond that, the flexibility present in the Product Family Architecture must be represented.

  1. processing elements

    It consists of a resource vector, of arbitrary length n. Each element of the vector is a consumable resource, such as RAM, Flash Memory, or I/O channels.

    PE的资源向量由可消耗的资源组成,包括了RAM,Flash Memory,I/O channels

    Sensors and actuators are physically connected to particular processing elements, and require computational resources for proper execution. Such resources are considered to be pre-allocated and not included in the PE’s resource vector.

    但是Sensors和actuators 是被物理连接到特定的PE,PE资源向量所包含的资源需要除去这些Sensors和actuators的资源。

  2. Network

    Our model of the network is a simple resource vector. – bandwidth

  3. configuration

    A well-formed configuration can be represented by a unique DFG arranged to show the interconnections among sensors, adapters and actuators.

    Note that interior vertices are all adapters and exterior vertices are sensors and actuators

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-xT6AVizz-1626248993083)(C:\Users\khy\AppData\Roaming\Typora\typora-user-images\image-20210414161731686.png)]

sensors和actuators由于已经和特定的PE进行了物理连接,所以无需将这些分配到PE中。需要分配给PE的只是adapters。

​ The specification takes the form of a requirements vector, where each element of the vector indicates the amount of a particular type of resource that the adapter requires.

​ **当将某个adapter分配到某个PE上时,adapter需要消耗计算资源,其需要的资源向量的形 式与PE的资源向量形式相同。

​ Edges are labeled with a requirement vector, in the same dimensionality and element order as the network resource vector.

​ 边表示的adapter、sensors、actuators之间的通信(数据传输),因此资源向量和网络的 资源向量相同。

  1. Fault model

  2. Omission failure a processing element or software component fails to generate an output.

  3. Timing failure an otherwise correct response is generated either too early or too late.

  4. Value failure the value of a response is incorrect.

  5. Crash failure a processing element fails to generate outputs.

  6. Byzantine processing failure results in arbitrary, even malicious, behavior.

For the most part, we consider only crash failures. It could be that a timing, omission or value failure which was detected by the system would result in a PE being shut down and cause a system-wide customization.

problem definition

一个具备reconfiguration的汽车案例

  1. A hardware failure occurs. We assume the driver has enough system functionality remaining to pull the car to the side of the road.

  2. A customization manager is connected. The connection is either via a remote connection such as OnStarTM or to a laptop carried in a tow truck.

  3. The customization manager polls the system to discover what hardware is available.

  4. The system-wide customization algorithm is executed. The output is a list of adapters to be downloaded to particular hardware.

  5. Adapters are downloaded and installed. The adapters are taken from the adapter repository, either on a remote server or a CD-ROM in the laptop. Note that the adapter repository might contain adapters that were not necessarily available when the automobile was constructed.

  6. The driver reboots the auto and goes on his way.

当汽车的硬件出故障后,有个前提假设,汽车保留下来的功能仍能够让汽车停到路边。这时候,customization manager 连接上来。连接可以通过远程方式或者电脑连接。接着customization manager 检查汽车系统还有哪些可用的硬件。对这些可用的硬件设备,启动重新配置算法,输出一系列能适配当前硬件的adapter。然后将adapter下载安装。(可以通过远程服务器或者CD)。最后重启系统完成配置。

Problem Models

Lattice

Each choice of possible hardware components can be viewed as a single vertex of a dense lattice that represents a fine-grained product family architecture (PFA)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-TnkVmTvG-1626248993085)(C:\Users\khy\AppData\Roaming\Typora\typora-user-images\image-20210414204717113.png)]

The system’s configuration state falls toward the bottom of the lattice as components are broken, and rises when they are repaired or replaced.

In terms of the PFA lattice, the system-wide customization problem may be expressed as the process of choosing the software configuration for a particular vertex (i.e., th14721e one representing available hardware) that maximizes the utility of the system.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-GfjwpDe4-1626248993088)(C:\Users\khy\AppData\Roaming\Typora\typora-user-images\image-20210414212333935.png)]

MUSH model

As this model Maps Utility, Hardware and Software, we call it the MUSH model

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-cMDQ9rG1-1626248993091)(C:\Users\khy\AppData\Roaming\Typora\typora-user-images\image-20210415095817234.png)]

For the system composed of h hardware components, the y–axis has 2h distinct allowable values. Likewise, the x–axis has 2s possibilities for the s software components. The z–axis on the graph is the system utility.

The maximal utility projection graph, shown here in the x–z plane. For each value on the x–axis, the largest system utility is chosen from all of the y values and plotted.

X-Z平面的最大效能投影图可以表示,当x的值一定时(硬件组件的数目给定时),能表现最大效能的软件组件组合。

PFAS

For each vertex in a hardware-only lattice there are many possible DFGs. If the DFGs for each such configuration were merged, the resulting graph would be an alternate representation of all the valid configurations available in the lattice.

In order to merge DFGs, a notational element must be introduced to allow for choices between different components. The choice element allows data flow from at most one of its inputs through to the output.

在将不同的DFG融合到同一张PFA图中时,需要引入一种符号元素来允许不同的组件进行选择,比如edge可以表示只有一种输入到输出的方式,下图的D可以表示有两种输入方式到C。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-3v5RO2Hr-1626248993094)(C:\Users\khy\AppData\Roaming\Typora\typora-user-images\image-20210415105608003.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-6DAKMknW-1626248993095)(C:\Users\khy\AppData\Roaming\Typora\typora-user-images\image-20210415114244388.png)]

If given the PFA graph of Figure 5.6, a customization manager would not have any guidance about how to optimize functionality of the system.

Which is more important, location display or path planning? Of the available location display options, which should be used?

The algorithm can only attempt to maximize quantitative values, so functionality must be represented quantitatively.

通过data element将不同的product融合在一起,组成一个超大的PFA。从这个PFA中可以知道sensors、adapter、actuators之间的关系,但是无法从PFA中得到最优的系统功能。因此,为了能用优化算法对系统功能进行优化,需要将系统功能进行量化表征。

Each such different means of accomplishing the function is a feature.

但是对PFA中所有的feature进行效用值的标记是一个复杂的问题,因为需要对每种可能的feature组合进行标记。

This research did not attempt to solve the feature problem, but does require a low-complexity feature representation.

采用feature class可以避免功能相同的feature放进一个configuration的冲突。这样PFA中的特定的adapter就被叫为 features。

Each feature is an adapter that has been given a utility value to represent its desirability. The overall utility of a configuration then is the sum of the utility of all the features of a configuration.

feature也是一种标注了效能值的adapter。但是也有 zero sized的 feature(即不需要资源,比如导航系统中处理GPS输入的 zero sized dead reckoner feature,其对应的adapter为 Null adapter)。

算法框架

Inputs to the problem are the PFA graph, which provides all the alternatives, and a description of the available hardware. The goal is to generate a valid configuration of adapters to the processing elements (PE) and message traffic to network elements (NE).

其输入是PFA图,输出是能适配PE和NE的配置。

Feature selection

在PFA图中包含了 feature类别的集合{C0, C1, . . . , Cm}, 每个类别Ck包含了一些features,{Fk,0, Fk,1., . . . , Fk,n},当a>0时,类别{C0, . . . , Ca−1}是critical的,剩下的 {Ca, . . . , Cm}是non-critical的。每个feature的效能utility值为 u (Fk,i),这些feature在类别中是按照效能值大小排列的,即 u (Fk,i) ≥ u (Fk,j),∀ i ≤ j 。

feature选择是组合优化问题,需要多次迭代。每次迭代后返回一个feature的集合, {F0 , i0,F1 , i1,F2 , i2,…,Fj , ij } 其中 a <= j <= m,整个效能的总和 是优化的目标。

More formally, a PFA graph includes a set of feature classes, {C0, C1, . . . , Cm} where each class Ck contains some number of features{Fk,0, Fk,1., . . . , Fk,n}. For some a ≥ 0, classes C0, . . . , Ca−1} are critical, the remainder {Ca, . . . , Cm} are non-critical. Each feature has a utility u (Fk,i) indicating its desirability in the system. Note that the features are sorted in their classes by utility, so that u (Fk,i) ≥ u (Fk,j),∀ i ≤ j .

Each invocation of the algorithm will return a set of features {F0 , i0,F1 , i1,F2 , i2,…,Fj , ij } where a <= j <= m, The total utility Utot is the optimization metric.

COMB_ALL

Generate a list of all possible combinations. Each combination on the list will then have its Utot calculated and used as the basis to sort the list.

比如汽车导航中的三类 feature(dead reckoner, turn calculation, map)分别有5(dead reckoner),2(turn calculation),8(map)种feature,那么一共有 6 x 3 x 8 = 144种组合(因为有两种是non-critical,可以没有此feature),但实际上有一类虽不是critical feature但是却了它 critical也没有(dead reckoner),所以是 5 x 3 x 8 =120

As a rough estimate, consider a system with f f f features in c c c classes. If the features are uniformly distributed among the classes, each class would hold f c \frac{f}{c} cf features. Making a conservative assumption that all classes are critical, the total number of combinations would then be ( f c ) c (\frac{f}{c})^c (cf)c. Since excessively huge feature classes will probably not be supported by management, the number of feature classes can be approximated as $ c \approx 2\sqrt{f} $ . The total number of combinations, as a function of the number of features, is thus approximately $(\frac{f}{2})^{2\sqrt{f}} $.

暴力破解算法就是遍历所有的feature 组合,来找到效用最大的一个组合。

章节8中的案例是有50 features 18 classes,按COMB_ALL 算法来计算,大概有 2 33 2^{33} 233

COMB_SHORT

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-qGRCI0PW-1626248993098)(C:\Users\khy\AppData\Roaming\Typora\typora-user-images\image-20210416090121196.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-8sU6pBcL-1626248993099)(C:\Users\khy\AppData\Roaming\Typora\typora-user-images\image-20210416205002881.png)]

where m is the number of feature classes, ni is the number of features in feature class i and νi is the is criticality of feature class i, with a value of 0 for critical classes and 1 for non-critical ones.

先将所有的feature按照类别分好,然后每类 feature 按照效能从大到小排列。

Without further information, the feature which has the smallest utility increase over the next feature of that class (i.e., choose Fk,0 for feature class Ck which U(Fk,0) − U(Fk,1 is smallest) should be the one to be discarded, resulting in the highest Utot of the available options.

当没有其它信息可以约束时,每次选取每类feature中排在最前面的feature,来组成一个feature集合输入到第二个阶段。每进行一次算法调用,会丢弃每类feature中某类的最前面的feature,然后将剩下的feature再继续输入到第二个阶段。(这个丢弃的约束是,某类的当前feature的效用值与下一个feature的效用值之差最小,那么将某类的当前feature丢弃,这样可以保证此轮feature组合的效用值最大)

The Feature Selection algorithm must choose a feature set under three different conditions: initial, adapter selection failure and adapter allocation failure.

The adapter selection algorithm fails only because adapters cannot be chosen to fulfill all of the dependencies of a feature.

Upon adapter allocation failure, the packing state achieved on each attempt could be examined to attempt to discover the core reason behind the failure.

evaluation

特征选择算法就是要选择 a feature set。这些features 至少应该包括critical的feature,并且这些feature之间要满足相应的约束。

执行该算法主要有三个时期,初始阶段,adapter 选择失败阶段,adapter 分配失败阶段。当初始阶段即第一次运行该算法时,由于没有相关信息,所以对于COMB_ALL来说,随便选,对于COMB_SHORT来说,选择效用最大的一个,然后输出到下一阶段。当后面的阶段无法通过时,返回到该算法,这时对于COMB_ALL,换另一种组合,直到能够通过三个阶段的约束,最后将所有的feature set按效能和排列,取最高的;对于COMB_SHORT,选择一个feature丢弃,然后变成另一种组合,直到能满足三个阶段的约束就取当前feature set 作为最终的feature set 。

  1. COMB_ALL brute force

    Each combination was then tested to ensure the proper input and output dependencies were met — other adapters existed to generate each input and no other adapter was included that generated the same outputs.

    每个configuration都要被检查是否符合要求。

    ①other adapters existed to generate each input and no other adapter was included that generated the same outputs.

    ②Additional tests ensured consumers existed for each output data element.

    检查两点,一是每个feature(也就是adapter)是否能有输入;二是这些feature不能有相同的输出;三是这些feature的输出能作为其它的输入

  2. COMB_SHORT

  3. SHORT_ORACLE

  4. ORACLE

  5. SHORT_FEEDBACK

    BEST

adapter selection

The selection of adapters to implement the feature set forms the core of the system-wide customization algorithm. It is during this phase that the relationships and dependencies between adapters, as expressed in the PFA graph, are incorporated into a solution.

Given: a PFA graph P P P, set of features F F F and the state of all sensors/actuators (i.e. working, not working)

Find: an allocation graph A A A, which is a subgraph of P P P, that describes a minimalist valid configuration. Ideally, find A A A such that the probability it will be allocatable is maximized.

A is minimalist in the sense that the removal of any vertex in V or edge in E would make A no longer a valid configuration.

simple adapter count proved to be a suitable heuristic

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-hs7Yxoal-1626248993101)(C:\Users\khy\AppData\Roaming\Typora\typora-user-images\image-20210417190019250.png)]

adapter number size best

adapter allocation

The algorithm described in this chapter exploits the fixed nature of hardware interface software components by examining the other adapters that it might call or be called by.

model

The adapters are joined in an allocation graph, A ( V , E ) A(V, E) A(V,E) whose vertices are the adapters and edges E i , j E_{i,j} Ei,j represent communication between a d a p t e r i adapter_i adapteri and a d a p t e r j adapter_j adapterj.

Each a d a p t e r i , j adapter_{i,j} adapteri,j is labeled with its processing requirements, p ( i ) p(i) p(i). Processing requirements are often a list of multiple independent values such as CPU cycles, RAM, or I/O channels.

Edges are labeled with communication requirements c ( i ) c(i) c(i), usually representing bandwidth.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-63LBIp2J-1626248993103)(C:\Users\khy\AppData\Roaming\Typora\typora-user-images\image-20210417192451079.png)]

evaluation


  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值