[erlang-questions] Parallel List Comprehension

Sat Oct 14 23:02:26 CEST 2006

Hi Jay,

I really like your open-cdr coded list idea.  Prolog has a similar
construct, the difference-list.  Difference-lists are efficient, and
often lead to elegant implementations.

Prolog doesn't limit this to lists, nor restrict it to the final
element, but allows any kind of incomplete data structure.  Perhaps we
could generalize your proposal to also allow tuples of the form {foo,
openCdrFlag, {bar, baz}} where we can nconc a list or tuple in place
of the openCdrFlag.  (In this case we should pick a more general name
than openCdrFlag).

Wayne

---
Wayne Vucenic
No Bugs Software
Ruby, C#, and Erlang Agile Contract Programming in Silicon Valley

On 10/13/06, Jay Nelson <jay@REDACTED> wrote:
> Sounds sort of like overkill doesn't it?  I want to be able to read in a
> GB-sized binary, chop it into a list of sub-binaries very quickly and
> then filter some subset of the 1M+ elements that result.  The final step
> would be to hang on to the 15 elements I care about, drop all references
> to the original binary and purge the excess from memory.
>
> It would be nice if my shiny new 80-core Intel chip could help me out here!
>
> Granted, if my architecture is such that it requires copying the initial
> data it might not pay off well.  But let's take the current situation
> where I can have 4, 8 or 16 schedulers and presumably can arrange that
> very large binary to be shared.  As long as my elements are big enough,
> they will also be pointing to the original shared binary.  The trick now
> is returning the resulting list.
>
> The 2nd most straight forward approach on four cores would give me (I
> only wrote it this way to show why there are 4 different results that
> are each lists -- Core1 etc represents the first element, Results1
> represents the rest of the list):
>
> [[Core1 | Results1], [Core2 | Results2], [Core3 | Results3], [Core4 |
> Results4]]
>
>
> (because the 1st most straight forward approach would give me 4 lists
> which would get appended or flattened and result in rewriting those
> millions of cons cells, possibly running out of memory in the process).
>
>
> Not particularly appealing.  I could accomplish this by slicing the list
> myself, spinning off separate processes and collecting the results.  If
> I could query for the number of schedulers it would be somewhat flexibly
> but still icky.
>
>
> Now suppose the compiler or runtime could arrange this for me very
> efficiently.  The best I could come up with is an open-cdr coded list
> that gets nconc'ed to reuse the space (hmm, that probably makes no sense
> unless you happen to be an old time lisper).
>
> Suppose the runtime determined there would be a benefit in parallelizing
> the comprehension*.  It could construct an internal "list of open ended
> elements" which was some new internal data type that cannot be generated
> any other way than by a list comprehension.  In the above example, the
> lists are illegal in the following way:
>
> every ResultsX has as its final two elements [LastElement . OpenCdrFlag]
>
> OpenCdrFlag would be a new internal atom that represents a lazy value
> that has not been supplied yet.  A list terminated by OpenCdrFlag can
> only be a member of the internal "list of open ended elements" or a
> return value.
>
>
> The recipe is like this:
>
> 1) Determine how many schedulers to use
> 2) Split the list into non-overlapping serial sections (could be
> pipelined to each scheduler and have it bite off just the head number of
> elements it needs before passing it on)
> 3) Run a list comprehension in parallel for each chunk
> 4) Return each result as an OpenCdrList
> 5) Collect the results in an internal "list of open ended elements" in
> serial order
> 6) nconc them together (replace the OpenCdrFlag with a pointer to the
> next open list and null in the last)
>
> An "open ended element" should have a header field that indicates the
> location of the OpenCdrFlag so the append is very fast.
>
>
> Similar thinking should go into the plans for binary comprehensions.
>
>
> --------
> *Hand waving here...
>
>
> Ok, initially it might be easiest to let the user decide by using [ 2*X
> ||| X <-List]
>  where the three vertical bars means do it in parallel.  This would
> allow an experimental implementation that wouldn't impact anyone.  Later
> we could try to make it more automatic.
>
> [The problem is if you have to scan the list to see how big it is
> relative to the number of processors, you might as well not parallelize
> or do the scanning in parallel and abort the comprehension if it will
> take too long.]
>
> ---------
>
>
> Any thoughts?
>
> jay
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>