Rate Limiting | Distributed Application Architecture Patterns

7.5 Rate Limiting

Control the rate of incoming requests to prevent overload or starvation

This pattern is based on Rate Limiting by Microsoft [114]. Burns [22, p. 54] mentions this as a denial-of-service defence.

It is sometimes considered as part of an API Gateway [3, p. 163, 20, p. 262].

7.5.1 Context

This pattern can be implemented either server-side or client-side.

Server-side – several clients are using a single endpoint. Overuse from one client can lead to decreased quality of service or even starvation for others – and such a situation is not desired – or increased costs for the provider. This can scale from accidental overuse to misuse or even up to coordinated malicious efforts, such as DDoS¹ or Yo-yo² attacks.
Client-side – a client wants to preemptively avoid errors while communicating with an endpoint that is either rate-limited or has a limited capacity and wants to reduce resource usage or improve communication with users.

7.5.2 Solution

To protect an endpoint, implement a stateful component that keeps track of the number of requests from each client over a specific time period (see fig. 18). If a client exceeds their quota, it denies any future requests in the time window and signals this appropriately back to the client. It can also keep the client updated on their current usage so that they can adjust their behaviour accordingly, but it may also withhold that information to prevent coordinated attacks.

There are several approaches that can be used to implement rate limiting, such as simple approaches like fixed or sliding window counters or classic algorithms like token or leaky buckets [116]. The best approach will depend on the specific requirements (see [117, pp. 1504–1505] for a brief overview).

Requests over the limit can be either dropped or queued to be processed later. There can also be several tiers of access. Unauthenticated users should be rate-limited more strictly, as they can be harder to identify and are easier to scale with attacks [22, p. 54].

7.5.3 Potential issues

Adding a rate limiter increases complexity and may introduce overhead, latency, or bursts, depending on the load and implementation. Optimising for throughput may require specialised hardware. [117, p. 1505] Faulty implementations may become a bottleneck or be bypassed by attackers.

Configuration needs to be carefully tuned to balance throughput and fairness.

7.5.4 Example

Some of ExampleEshop’s users (or perhaps competitors) employ bots to scrape the site for price changes. This puts a significant load on the system and degrades the performance for legitimate users. To prevent this, the system uses rate limiting to restrict the number of requests a single IP address can make in a given time frame. To reduce false positives, the system tracks the number of requests made by logged-in users separately from anonymous users.

As this pattern serves to protect and may need to be applied in several places, it might be desirable to use a common implementation with Offload to Gateway or an Ambassador.

A client can use a Queue to store requests for processing, the estimation in the request replies to plan Retries, or a Circuit Breaker to fail-fast until the limit is lifted.

7.5.6 Further Reading

Wikipedia [118]
Wilder [14, p. 87] on reacting to Busy Signals

Distributed Denial of Service↩︎
A special type of DDoS designed to over-provision cloud services [115]↩︎

Distributed Application Architecture Patterns

An unopinionated catalogue of the status quo