Scatter–Gather | Distributed Application Architecture Patterns

6.3 Scatter–Gather

Asynchronously distribute workloads and aggregate results

This pattern is based on Scatter-Gather by Hohpe et al. [4, p. 267, 84] and Deenadayalan [85], and Scatter/Gather by Burns [22, p. 73].

6.3.1 Context

At least one of the following conditions needs to hold.

Many external services need to be queried
The problem is embarassingly parallel¹
The operation needs to be run on a whole partitioned data store [22, p. 73]
An operation needs to happen with the lowest possible latency

6.3.2 Solution

This pattern has two possible implementations, depending on the use case.

The publisher sends out the same task using Publisher–Subscriber to multiple different processors
The publisher independently sends out a different task to multiple identical processors

Each processor then processes its task and sends its result back to a central aggregator, which combines the results (see fig. 12).

Unless this operation is a part of an overall workflow, the aggregator does not need to be a different entity than the publisher.

6.3.3 Potential issues

Depending on the use case, each processor might be a bottleneck, and its failure might require retries that might delay the whole operation. Not noticing an error in a processor might lead to incomplete results or the failure of the entire operation.

The parallelism of a single operation may have a lower limit than its theoretical maximum due to network overhead (see § 3.1.1).

6.3.4 Example

To improve search capacity, ExampleEshop partitions its search index into multiple data stores. When a user searches for a product, the search service sends the query to all shards in parallel using Publisher–Subscriber (see § 4.2). The search service then aggregates the received results and returns them to the user.

This pattern combines Publisher–Subscriber for sending out tasks and Queue-Based Load Levelling for collecting the results
It is a natural fit for use with data using Partitioning when implementing operations over the whole data store
Use Queue-based Load Levelling to decouple the processors from the aggregator

“Embarrassingly parallel” is a term used to describe problems that can be trivially divided into smaller tasks that can be solved independently of each other, which is the ideal scenario for parallelism [86].↩︎

Distributed Application Architecture Patterns

An unopinionated catalogue of the status quo

6.3 Scatter–Gather

6.3.1 Context

6.3.2 Solution

6.3.3 Potential issues

6.3.4 Example

6.3.5 Related patterns