Distributed Application Architecture Patterns

6.3 Scatter–Gather

Asynchronously distribute workloads and aggregate results

This pattern is based on Scatter-Gather by Hohpe et al. [4, p. 267, 84] and Deenadayalan [85], and Scatter/Gather by Burns [22, p. 73].

See also Message Broker.

6.3.1 Context

At least one of the following conditions needs to hold.

  1. Many external services need to be queried

  2. The problem is embarassingly parallel1

  3. The operation needs to be run on a whole partitioned data store [22, p. 73]

  4. An operation needs to happen with the lowest possible latency

6.3.2 Solution

This pattern has two possible implementations, depending on the use case.

  1. The publisher sends out the same task using Publisher–Subscriber to multiple different processors

  2. The publisher independently sends out a different task to multiple identical processors

Each processor then processes its task and sends its result back to a central aggregator, which combines the results (see fig. 12).

Figure 12: Scatter–Gather

Unless this operation is a part of an overall workflow, the aggregator does not need to be a different entity than the publisher.

6.3.3 Potential issues

Depending on the use case, each processor might be a bottleneck, and its failure might require retries that might delay the whole operation. Not noticing an error in a processor might lead to incomplete results or the failure of the entire operation.

The parallelism of a single operation may have a lower limit than its theoretical maximum due to network overhead (see § 3.1.1).

See also § 4.2.3.

6.3.4 Example

To improve search capacity, ExampleEshop partitions its search index into multiple data stores. When a user searches for a product, the search service sends the query to all shards in parallel using Publisher–Subscriber (see § 4.2). The search service then aggregates the received results and returns them to the user.


  1. “Embarrassingly parallel” is a term used to describe problems that can be trivially divided into smaller tasks that can be solved independently of each other, which is the ideal scenario for parallelism [86].↩︎