[erlang-questions] Classification of concurrency patterns

Wed May 12 12:35:09 CEST 2010

Hello Joe,

On Wed, May 12, 2010 at 09:31, Joe Armstrong <erlang@REDACTED> wrote:
> I'm interested in trying to classify and name a number of different
> parallelization strategies.

> 1) Divide and conquer concurrency
> 2) Pipeline concurrency
> 3) Map reduce concurrency
> 4) Identical Job concurrency
> 5) Grid concurrency
> 6) Job queue concurrency

IMHO, #1, #3 and #4 are the same thing, differently parametrized. Do
you think it is important to get them separate identities?

Regarding #5, a more general description would involve a grid that is
not restricted to two dimensions. I think I remember seeing papers
about 3D grids.

#6 can be seen as a generalization of #2, where in the pipeline the
"next" worker is hardcoded and the structure is linear. A pipeline may
or may not use buffering queues between stages.

The way I see it, there are two main categories:
* dividing the data set into smaller chunks and processing them in
parallel (#1,3,4)
* dividing the work into specialized chunks and letting the data flow
to next chunk (#2,6)

These two can also be combined in several ways, giving for example #5
(usually each processor has its own chunk of input data, but data can
flow to neighboring processors). Or each processing unit in #1 could
involve a pipeline.

It depends on the purpose of defining these categories whether a
shorter, more general list or a more detailed one is required.

best regards,
Vlad

> 1) Divide and conquer concurrency
>
> Divide the problem into K jobs which can be performed in parallel.
> Perform the jobs
> Recombine the results
>
> 2) Pipeline concurrency
>
> Divide the problem into K jobs that can be performed in a pipeline.
> The output of the first job must be the input of the second and so on.
> Run all the steps in parallel
>
> 3) Map reduce concurrency
>
>    This is similar to 1). Only the recombination step involves
> merging results that
>   have the same Key.
>
> 4) Identical Job concurrency
>
>    Here the problem does not have to be split into K parts, we
> already have N identical
>    jobs to do (identical means jobs that will consume similar resources)
>
>   There is no recombination step. There are N results.
>
> 5) Grid concurrency
>
>    Divide the problem into N*M identical jobs that can be solved on a
> NxM grid of processors.
>    Each processor can only talk to its direct neighbors
>
> 6) Job queue concurrency
>
>    Divide the problem into a set of named workers (who do jobs). Each
> worker has an input
>    job queue. Workers consume jobs from their job queues and send the
> result to other workers.
>
>    Note:  [1] Paul Morrison calls this flow based programming