Distributed Application Architecture Patterns

8.1 Circuit Breaker

Isolate failure with controlled recovery

This pattern is based on Circuit Breaker by Nygard [12, p. 95] and further elaborations by Newman [3, p. 401], Richardson [20, p. 77, 130], Microsoft [131] and Deenadayalan [132].

A circuit breaker is an electronic safety device that protects circuits from overcurrent [133]. This pattern stretches this metaphor slightly, but this work keeps this name as it gives a good intuition for its function and might potentially help with communication with people already familiar with the concept, although the terminology implied by the metaphor can be (unfortunately) misleading for software developers.

8.1.1 Context

A service is communicating with an external resource. In the event of failure, it is more beneficial to fail immediately until it is resolved than to try to get a response eventually.

8.1.2 Solution

Introduce a stateful proxy with the following states (see fig. 20).

  1. Closed – this is the default state. The proxy passes along requests but keeps track of failures.

  2. Open1 – enough failures have occurred to trip the breaker, “opening” the circuit, immediately returning an error on any requests.

  3. Half-open2 – a cooldown timer has run out, and the breaker starts routing requests through again. However, on the first failure, it trips back open and resets the timer.

Figure 20: Circuit Breaker

Manual operation of the breaker might also be desirable in certain use cases.

8.1.3 Potential issues

While the additional overhead circuit breakers introduce is minimal, they can still impact high throughput services and introduce additional complexity. The breaker also needs to be tuned correctly and specifically to each service, as, if it is not, it might trip just due to transient failures causing failures, remain in an open state far longer than the underlying service is down, prolonging downtime, or switch between open and half-open stats too frequently, reducing reliability.

Circuit breakers should only be used for remote resources, as they may add too much overhead for local operations [131].

8.1.4 Example

ExampleEshop uses circuit breakers between its bulkheads (see § 7.1). If a bulkhead fails, the circuit breaker trips and isolates the failure. This allows the rest of the system to fall back to a degraded mode, such as using older, cached data or providing a degraded experience, without having to wait for timeouts, which can be critical for business operations3.

This pattern has an interesting resonance with Retry (see § 7.3). If retry logic is placed after a circuit breaker, it may delay the failure too long, causing many more requests to come through and fail before the breaker trips. Retry logic can still be useful before the circuit breaker, e.g. to protect from transient network failures, but an incorrect configuration combination can cause the circuit breaker to trip too often.

As a rule of thumb, the circuit breaker is better in interactive situations, where failing fast might be more important than getting a response. Retrying, on the other hand, is better for long-running operations, where the service might recover in time.

Instead of retrying, Competing Consumers can be used to re-route requests to a healthy backup service (see § 6.1). Or, as a compromise, instead of trying to push the request through the breaker, the breaker can actively check the state of the underlying service akin to Health Monitoring (see § 7.4) and close itself once the service is healthy again. Staying with the same pattern, the circuit breaker is also an important source of information which can be used for alarms and analysis.

The external service might employ Rate Limiting (see § 7.5), which is important to acknowledge when designing the breaker.

If the system employs Bulkheads (see § 7.1), the circuit breaker can be used to close off a failing bulkhead and proactively enter a restricted mode of operation.

If the implementation needs to be shared among multiple services of incompatible technologies, the circuit breaker can be implemented as part of an Ambassador (see § 5.2) or Offload to Gateway (see § 5.3).

8.1.6 Further reading


  1. Coming from software development, this terminology can be very confusing. This work considered using a different naming because an “open” proxy has the completely opposite connotations that it should. However, this would, unfortunately, completely break the metaphor and add confusion to discussions involving people already familiar with this naming scheme.↩︎

  2. Again, the naming convention is a bit misleading. The breaker is practically closed but will trip open much more easily.↩︎

  3. Dieulot compiled an interesting list of sources of how even small changes in response times lead to measurable changes in user behaviour [134]↩︎