Improve API Gateway Throttling

Author:  Zhan, James.

What is throttling?

Throttling is a flow control feature that limits access to resource to a certain number of times. Once the upper limit or threshold is reached, access to resource is rejected. A ban list can be used to record such failed access, so that within a time window, access to the same resource is also denied. It provides privilege based access control and shield resource against DDOS attack.

 

In API gateway, throttling is basically used to limit API access based on subscription tier. Eg: gold tier allows 1000 API access per limit. Throttling can be single tier (Eg: per user based, per IP address based) or multi-tier (per user per API endpoint). In both cases, throttling can be viewed as a tuple (key,upper limit). For the multi-tier case, the key can be viewed as apath. Eg, User id + “/” + ipaddress

 

From a business perspective, throttling provide different service level agreements to different type of customers. The server resources are allocated based on importance of customer (larger/more important customers are granted tiers with larger limit and enjoys larger quota to support their business), for stubhub, API resources are protected and used more efficiently.

 

Throttling within VM

Counter based implementation

The time line is divided into a series of time windows. A counter is maintained for the each time window. When new request comes in, the correct time window start is retrieved and counter is incremented. Request is denied if counter reaches limit and throttle key is added to the ban list. The counter is reset at the end of current time window

 

Refer to the following diagram

Time window approach is quite simple to implement: a map is good to capture throttling information and simplicity implies less calculation overhead.

However, it subjects to a problem known as access spike. Consider the following scenario

Throttling criteria: 1000 request per minute

Current time window: 15:00 –15:01

For the first 30 second within current time window, no request

For the last 30 second within current time window, there are 999 requests. All requests pass throttling

Now it starts next time window:15:01 – 15:02

For the first 30 second within current time window, there are 999 requests. All requests pass throttling

Now if you considered the time window from second 30 15:00 to second 30 15:01, it accepts 1998 requests, almost double the throttling limit!

 

Given the above explanation, choice of time window is a fairly important aspect to consider:

If time window is too large, the access spike problem will manifest itself more.

If time window is too small, time of throttling calculation becomes non-trivial compares to the time window and accuracy will be compromised. Currently, we normally use per-second throttling, this is usually well- balanced between accuracy and access spike issue

 

Queue based implementation

An alternative is queue based approach. For each key, a sorted queue is maintained to record request time. When new request comes in, we need to look back the queue, sum up all request times that with range of current time and current time – time window size to gets the total number of requests. This is compared against throttling limit to allow/denyaccess.

At backend, there is another housekeeping thread that cleans up queues for older request times that are earlier than current time – time window size.

 

Refer to the following diagram

Queue based implementation is not subject to access spike mentioned above. But calculation cost is higher due to the need to iterate through the queue. The queue needs to be locked for multi-threaded access. This impacts performance and throughput

 

Scalability consideration:

Within a single VM, scalability issue mainly arises from concurrency and lock contention by different threads. Carefully chosen data structure could reduce lock usage and improve performance. For counter based approach, Concurrent hashmap is used to implement key to counter mapping, and java atomicInteger is used to implement counter. For queue based approach, ConcurrentSkipList is used to implement the sorted queue.

 

Throttling in cluster environment

The simple problem becomes more intriguing in a cluster environment. API requests may be handled by different members in a cluster, and the total number of request should not exceed throttling limit. Currently WSO2 API gateway uses peer to peer cluster synchronization approach, where throttling data on one cluster member is asynchronously replicated to other members.

 

This approach is flawed due to the following reasons:

Large number of messages exchanged among cluster members: On receiving each throttling request, data is synchronized to all cluster members. Given average M throttling request on each node and N cluster members, the total number of message is M*N

 

To make the situation worse, throttling data is stored in axis2 message context and the whole context is replicated. Under peak load, number of throttle keys stored on each node will be huge and throttle data will have a large memory footprint, this makes serialization/deserialization and transferring of the message across network expensive.

 

Throttling accuracy is not guaranteed.

This is mainly due to network transfer latency of throttling data, consider the following scenario:

10 requests are allowed per minute

Node 1 received 5 requests and node 2 received 4 requests previously and both nodes are synchronized, so counter for node 1 and node 2 are both set to 9

Now Node 1 receives6th request for current time window

Node 1 replicate throttling data to node 2

Node 2 received 5threquest for current time window, however this is before latest state of node 1 is replicated to node 2, node 2 checks local counter and allows the request

There are totally 11 requests received within time window and exceeds throttling limit!

 

The proposed solution is to use a centralized throttle server to handle access request for the whole cluster, compared with synchronization approach, it only sends 1 message to throttling server for each request, resulting in much less network overhead.

 

From implementation perspective, we need a centralized key value store of high performance; key is the concatenation of throttle key and start time of current time window, value is counter. Banlist is also kept in the store.

We choose memcached or redis as candidate key value store

Memcached has better read performance and slightly better write performance compared toRedis, especially for highly concurrent access. Redis provide cluster synchronization support and more flexible data structure.So in our case, both redis throttle and memcached throttle are implemented, but memcached throttle is more preferred.

Refer to

http://blog.sina.com.cn/s/blog_72995dcc01018qkf.html

http://iyunlin.com/thread/200319 for various comparison between memcached and redis

 

CAS (compare and swap) is another feature necessary to ensure accuracy of counter. A node receives throttle request will retrieve counter from server, compare it with throttle limit. Before counter is incremented and written to server, it may have been updated by another node. CAS allowed us to detect such data contention and to avoid writing stale value. In this case, client is responsible for retrieve counter again and retry. Luckily both redis and memcached supports CAS operation

 

Counter reset at end of time window is handled by key expiration. Each key’s expiration time is set to time window length, so there is no need to explicitly remove key from store at the end of each time window. Note that the smallest expiration time for memcached is 1 second, which implies that we cannot do accurate throttling at millisecond level.

 

A single centralized throttle may become a bottleneck on heavy load if all throttling request is handled by it. Ideally, the centralized throttle can be a cluster too and throttle request can be distributed among the cluster. So which server should handle a particular throttle request? We use hash partition: calculate a hash value of the throttle key (murmur hash algorithm is used), divide by number of throttle servers. The mod is index of server. Alternatively, consistent hash can be used. This ensures that same throttle key always hits the same server and we don't have to worry about distributing the key value store among servers. A few limitations: We have not considered data replication and backup, as memcached does not support that. If one throttle server is down, data stored on it is lost and we will not auto fallback to anther server. Load distribution is not even as some key may be accessed more frequently than another, implying that the corresponding sever will take more load.

 

Further optimization

During implementation, we made some observations that can improve performance further:

Ban list can be stored in central throttle server as well as API gateway server. Key affinity saves network bandwidth and is effective against DDOS attack. If a key is rejected, it reaches central throttle server only for the first time and the key is added to gateway server’s local ban list. Subsequently, throttle request for the same key is directly rejected by gateway server and does not reach central throttle server.

 

The counter can be distributed between central throttle server and API gateway server.

Say throttle limit = 1000 requests per min with 20 API gateway server in a cluster, we could allocate a quota of 40 as local throttle limit for each gateway server. So that the first 40 throttle requests are handled locally. Note that this is a heuristic approach on the premise that request for a particular throttle key is distributed evenly to all gateway cluster node, and this is meaningful only if throttle limit is fair large.

CAS can be expensive. We observe that under high concurrency scenario, CAS can fail easily and need to retry for many times. Each CAS attempt sends an additional request to central throttle server and degrades performance drastically. So we should try to reduce CAS usage much as possible. Consider the following scenario

Throttle limit = 1000 requests per min with 20 API gateway server in a cluster

The first 980 request is safe without using CAS. A simple atomic increment operation will do. Contention only arises when we are near throttle limit. We should use CAS only when counter >= 980. To further reduce contention, we can sleep a random short time before each CAS attempt

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值