congestion源码解读

最新推荐文章于 2024-11-18 22:01:25 发布

lx_shudong

最新推荐文章于 2024-11-18 22:01:25 发布

阅读量847

点赞数

分类专栏： ATS

本文链接：https://blog.csdn.net/lx_shudong/article/details/54344348

版权

ATS 专栏收录该内容

15 篇文章

订阅专栏

基于ATS6.1.1代码研究

一、配置文件参数解释

congestion.config

配置格式

primary_destination=value secondary_specifier=value tag=value

<1> primary_destination 针对匹配上的请求生效此行配置，是必选项，且每行配置此配置项仅允许配置一次，支持选项如下

dest_domain: Arequested domain name. 域名

dest_host: A requested hostname. 主机名

dest_ip: A requested IP address. 主机IP

url_regex: Aregular expression (regex) to be found in a URL. 通过URL的正则表达式

<2> secondary_specifier是可选项，支持多选项配置，但同一个配置项不能重复配置

port: A requested URL port or range of ports. 请求的端口

prefix: A prefix in the path part of a URL. 请求URL路径的前缀

<3>tag标签，未配置的则选择默认值

max_connection_failures

Default: 5 The maximum number of connection failuresallowed within the fail window described below before Traffic Server marks theorigin server as congested.

在某个周期内，连接源站的失败次数达到最大次数则标记为拥塞

fail_window

Default: 120 seconds. The time periodduring which the maximum number of connection failures can occur before TrafficServer marks the origin server as congested.

指 max_connection_failures 的统计窗口周期值

proxy_retry_interval

Default: 10 seconds. The number ofseconds that Traffic Server waits before contacting a congested origin serveragain.

指源站拥塞时的重试连接间隔时间

client_wait_interval

Default: 300 seconds. The number ofseconds that the client is advised to wait before retrying the congested originserver.

源站拥塞，告诉客户端多久后进行重试

wait_interval_alpha

Default: 30 seconds. The upper limitfor a random number that is added to the wait interval.

此选项与 client_wait_interval 一起合用，来计算告诉客户端进行重试的时间，主要是错开客户端之间的访问，尽量避免多个客户请求同时来请求

客户端重试时间的计算方法为： client_wait_interval + 随机数% wait_interval_alpha + 源站拥塞时剩余的重试连接源站的时间

live_os_conn_timeout

Default: 60 seconds. The connectiontimeout to the live (uncongested) origin server.

If a client stops a request before thetimeout occurs, then Traffic Server does not record a connection failure.

源站未拥塞时，请求的超时时间

live_os_conn_retries

Default: 2 The maximum number of retries allowed to thelive (uncongested) origin server.

源站未拥塞时，请求允许的重试连接次数

dead_os_conn_timeout

Default: 15 seconds. The connectiontimeout to the congested origin server.

源站拥塞时，请求的超时时间

dead_os_conn_retries

Default: 1 The maximum number of retries allowed to thecongested origin server.

源站拥塞时，请求允许的重试次数

关于超时和重试：当源站不管是否拥塞，只是限制请求数，此时还是可以回源的。

max_connection

Default: -1 The maximum number of connections allowedfrom Traffic Server to the origin server.

与源站的最大连接数，超过则表示拥塞

error_page

Default: "congestion#retryAfter" The error page sent to the client when aserver is congested. You must enclose the value in quotes;

返回给客户端的错误页面

congestion_scheme

Default: "per_ip" Specifies if Traffic Server applies the ruleon a per-host ("per_host") or per-IP basis ("per_ip"). Youmust enclose the value in quotes.

For example:

if the server www.host1.com has two IPaddresses and you use the tag value "per_ip",

then each IP address has its own numberof connection failures and is marked as congested independently.

If you use the tag value"per_host" and the server www.host1.com is marked as congested, thenboth IP addresses are marked as congested.

只允许 per_host 和 per_ip 两种值， per_host 表示源站拥塞统计通过域名， per_ip 表示源站拥塞统计通过IP。当源站有多个服务器时，建议用 per_ip

Notes：

拥塞控制主要分两类：

<1> 一种是针对当前最大连接数的

<2> 二种是针对连接重试失败超时的

records.config

拥塞功能生效，必须打开此开关

CONFIGproxy.config.http.congestion_control.enabled INT 1

congestion.config中的TAG 可以通过下面配置修改默认值

CONFIG proxy.config.http.congestion_control.default.tagINT|STRING value

拥塞时拒绝的请求数统计查看

查看源站响应失败超时拥塞时被拒绝的请求数

./traffic_line -r proxy.process.congestion.congested_on_conn_failures

查看连接数超过最大时被拒绝的请求数

./traffic_line -r proxy.process.congestion.congested_on_max_connection

Notes：此统计是针对所有域名

二、数据结构：

配置模块

此模块与 cache.config 配置模块一样，所有请求都必须匹配 congestion.config 中配置的所有配置项

具体代码实现若有兴趣，可以另外探讨

域名数据存储模块

类功能简介：

CongestionControlRecord:保存congestion.config 的配置，一个对象对应一条规则

CongestionMatcherTable:所有的配置信息 CongestionControlRecord 都保存在此类中

FailHistory: 保存与源站连接失败的相关信息

CongestionEntry: 保存着拥塞状态、当前连接数和失败重试连接等信息，CongestionEntry保存一个 CongestionControlRecord 指针，多对一关系，

一个 CongestionEntry 对应一个 FailHistory

CongestionDB: 用二级HASHTABLE保存着所有的 CongestionEntry 信息

每类请求都有一个 CongestionEntry 对象来进行存储

1、每个 CongestionEntry 对象都对应一个key，key的计算参数如下：

源站域名或IP、端口port、前缀prefix（URL的前缀）

请求的这三者相同则作为一类请求，通过计数器共用一个 CongestionEntry 对象

2、CongestionDB 的HASHTABLE的数据结构如下：

I、 CongestionEntry 对象保存在 node 结点中

II、所有的 CongestionEntry 通过上述HASH结构保存到 theCongestionDB 指针指向的 CongestionDB 对象中

III、MTHashTable 的table长度是固定的，每个元素对应一把lock锁，其数目为 MT_HASHTABLE_PARTITIONS

IV、IMTHashTable HASH是通过拉链表方式解决冲突的。可以动态扩容，扩容的标准是每个 IMTHashTable 表的平均链表长度超过 MT_HASHTABLE_MAX_CHAIN_AVG_LEN 时，对该HASH表进行一次扩容调整，扩容前会删除过期失效的 CongestionEntry 结点

V、每次查询和插入 CongestionEntry 时，必须先获取 MTHashTable 中对应的互斥锁lock

三、函数处理流程

（1）初始化

Congestion.config在启动时通过 CongestionMatcherTable::reconfigure

1、将所有配置解析到CongestionMatcher 指针指向的 CongestionMatcherTable 对象中

2、并通过 revalidateCongestionDB 实例化 CongestionDB 对象，初始化HASHTABLE，用theCongestionDB 指针将其保存

（2）拥塞访问流程

1、每个回源请求都会调用HttpSM::do_congestion_control_lookup 通过 get_congest_entry 函数获取该请求的 t_state.pCongestionEntry

2、 HTTP请求先通过 CongestionControlled 查找对应的 congestion.config 配置项规则 CongestionControlRecord

3、再利用源站域名或IP、端口port、匹配规则CongestionControlRecord 通过make_key 计算对应的key值

4、通过key值先获取 MTHashTable 对应的lock，查找或插入 CongestionEntry ，并返回对应的 CongestionEntry 对象来进行拥塞判断

（3）相关变量和函数解释

t_state中与拥塞控制相关的参数

// congestioncontrol

CongestionEntry*pCongestionEntry;

StateMachineAction_tcongest_saved_next_action; //下一状态保存，有HttpTransact::STATE_MACHINE_ACTION_UNDEFINED, HttpTransact::ORIGIN_SERVER_OPEN 和

//HttpTransact::ORIGIN_SERVER_RAW_OPEN 这三个值，主要用于在拥塞状态下的状态跳转

int congestion_control_crat; // 'client retry after',这个主要是影响访问日志的记录

intcongestion_congested_or_failed; //表示此请求是否拥塞或失败，主要用于对 t_state.pCongestionEntry->go_alive(); 即 m_congested 状态更新为 0

intcongestion_connection_opened; //表示此请求的拥塞控制功能是否打开，主要用于 pCongestionEntry->connection_closed(); 即 m_num_connections 值的更新

FailHistory类

成员变量

long start; //统计周期内第一个事件的访问时间

int bin_len; //其值为 (fail_window + CONG_HIST_ENTRIES) / CONG_HIST_ENTRIES ，即 bins[i] 失败事件的统计间隔时间

int length; //其值为 bin_len * CONG_HIST_ENTRIES ，即所有环形数组bins元素的间隔时间之和

cong_hist_tbins[CONG_HIST_ENTRIES]; //环形数组，bins[i]元素值对应着每个间隔时间的失败事件数

int cur_index; //在环形数组bins的起始位置

notes:所有失败事件的次数和间隔时间通过一个环形数组 bins 保存着，每个 bins[i] 元素对应着 bin_len 这段时间内失败的事件数

longlast_event; //上次最近失败事件注册时间

int events; //当前间隔内失败的事件总个数

成员函数被调顺序

HttpTransact::handle_parent_died--> CongestionEntry::failed_at --> FailHistory::regist_event -->FailHistory::init_event

此类通过 max_connection_failures 和 fail_window 参数来控制拥塞

CongestionEntry类

成员变量

// State --connection failures

FailHistorym_history; //保存 CongestionEntry 一定时间内访问源站失败的记录

ink_hrtimem_last_congested; //上次失败事件的访问时间，由 m_last_congested = m_history.last_event; 赋值，用于日志打印

volatile intm_congested; //0 | 1 当前拥塞判断值，主要是与 m_history 相关的拥塞开关

intm_stat_congested_conn_failures; //通过 m_congested 开关拥塞失败连接的总次数

volatile intm_M_congested; //连接数超过限制时的拥塞开关

ink_hrtimem_last_M_congested; //上次连接数超过限制时被拥塞的时间

// State --concorrent connections

intm_num_connections; //当前连接源站的总连接数

intm_stat_congested_max_conn; //与 m_M_congested 开关相关，连接数超过限制时被限制的总连接数

// Reference count

intm_ref_count; //所有请求通过引用计数器对 CongestionEntry 对象共享的，只有在计数器值未0时才会删除此对象

注: m_congested 的放开是受 m_congested, m_M_congested 和源站是否挂掉了一起控制的

成员函数

virtual RD_Typedata_type(void); //主要用于配置项查找 CongestionControlRecord::UpdateMatch

inline boolproxy_retry(ink_hrtime t); //通过当前时间、 m_history.last_event 、 proxy_retry_interval 来判断是否可以重试

inline intclient_retry_after(); //返给客户端时让其重试的时间，其值是 proxy_retry_interval, m_history.last_event,当前时间, client_wait_interval, wait_interval_alpha 一起计算出

inline intconnect_timeout(); //通过判断 m_congested 返回 dead_os_conn_timeout 或 live_os_conn_timeout

inline intconnect_retries(); //通过判断 m_congested 返回 dead_os_conn_retries 或 live_os_conn_retries

void stat_inc_F(); //m_stat_congested_conn_failures,被统计的地方有 HttpSM::do_http_server_open 和 HttpTransact::handle_server_died

void stat_inc_M();//m_stat_congested_max_conn ,被统计的地方只有 HttpSM::do_http_server_open

四、应用场景

此拥塞模块主要是针对回源连接数和源站响应失败超时这两种情况进行拥塞控制

缺点：

1. 请求匹配配置文件时，需要将所有配置项都要匹配一遍（不管是host,ip,domain还是regex），与cache.config的配置差不多

修正方案：

1> 可以修改匹配规则，查找到对应的配置项时，直接返回该配置项。

2> 设置 host,ip,domain,regex 的优先级匹配。

2. 每个请求为获取对应的 CongestionEntry 时都需要拿一次HASH桶的互斥锁，虽然这种锁的数量较多，对不同的域名进行分散，但是同一个域名 key 对应的锁还是互斥的，

即同一配置下的同一个域名若请求并发量大时，性能也会有拿锁的影响。

修正方案：由于长时间访问后，访问的域名最后趋于稳定，最后HASH表基本属于读操作。可以考虑下面方案，

1> 互斥锁换成读写锁，先拿读锁查找对应的 CongestionEntry

2> 若未找到，再拿写锁插入新的 CongestionEntry 到HASH表中

3> HASH表调整时，不删除过期的 CongestionEntry 对象，一直保存在HASH表中