[erlang-questions] Process pool map/3 implementation

Parnell Springmeyer ixmatus@REDACTED
Sat Jul 23 20:48:25 CEST 2011


Robert,

In my ppool implementation I re-create them, I don't recycle. I call it a
"pool" because instead of breaking off "hunks" (sublists) then spawning x
processes for each item in the sublist - if one process takes longer than
the others, those resources sit idle till that one is done, then the
subdivision process starts again etc...

The "pool" implementation lets me delegate in a round robin fashion, I don't
recycle, I create new processes as other processes finish to keep at most x
number of processes working until the job is done.

pmap in my use case would be "bad". Very bad. 3000 items in a list with
about 1400 of those items making > 20 HTTP requests (the rest doing about 3
to 4 requests) would completely tank the machine and would also be
irresponsible crawling. But, I *do* want to do the work in parallel - just
not at that scale; so using a process pool strategy I limit the number of
concurrent crawl workers to about 5 or 6; which is effective on the machine.

On Fri, Jul 22, 2011 at 5:10 PM, Robert Virding <
robert.virding@REDACTED> wrote:

> But a pmap SHOULD start processes for all the elements in the list in
> parallel. It is after all a 'P'map. In which case all the processes will be
> running and processing in parallel as you want. The only reason I can see
> for using a worker pool is if you actually want to LIMIT the number of
> processes running at the same time.
>
> IMAO in Erlang there are only two reasons for using worker/process pools:
>
> - you want/need to limit the number of "things" running in parallel
> - you actually do want to reuse a process for another computation, there is
> something in the application which mandates reusing processes.
>
> Otherwise it is just extra work, process creation/termination is so fast
> that there is no real gain in keeping them around to reuse.
>
>
> Robert
>
> ----- "Parnell Springmeyer" <ixmatus@REDACTED> wro te:
> > Because the list has about 3000 items in it, and for each item about
> 20-50 HTTP requests are made; I needed a way of parallelizing the operations
> (instead of stepping through the list one by one) but in a controlled
> fashion and using a round robin strategy (worker pool).
>
> >
> > On Fri, Jul 22, 2011 at 6:10 AM, David Mercer <dmercer@REDACTED> wrote:
> >
>>
>> I was curious about that, too.  Hoping you'll get a response...
>> >
>> > > -----Original Message-----
>> > > From: erlang-questions-bounces@REDACTED [mailto:erlang-questions-
>> > > bounces@REDACTED] On Behalf Of Robert Virding
>> > > Sent: Wednesday, July 20, 2011 8:42 PM
>> > > To: Parnell Springmeyer
>> > > Cc: erlang-questions
>> > > Subject: Re: [erlang-questions] Process pool map/3 implementation
>> > >
>> > > One quick question: what was wrong with the straightforward solution
>> of
>> > > just spawning one process for each element in the list? Did this break
>> > > or do you actually need more control?
>> > >
>> > > Robert
>> >
>> > >
>> > > ----- "Parnell Springmeyer" <ixmatus@REDACTED> wrote:
>> > >
>> > > > -----BEGIN PGP SIGNED MESSAGE-----
>> > > > Hash: SHA1
>> > > >
>> > > > For a work project I have a large list (thousands of items) to
>> > > > process
>> > > > and at first built a "pmap" implementation as per Joe's book until I
>> > > > found the plists module (which is awesome btw).
>> > > >
>> > > > There is one glaring issue with the list -> subdivide -> spawn x
>> > > > processes for n sublist items strategy; if an item in the sublist
>> > > > takes
>> > > > longer than all the other items it blocks the entire resource
>> > > > allotment
>> > > > until it is done.
>> > > >
>> > > > In most cases, the plists/pmap implementation works just fine
>> because
>> > > > the items in the list probably don't take more than a few
>> > > > milliseconds
>> > > > to map the fun over. However, it does become an issue when that is
>> > > > not
>> > > > the case.
>> > > >
>> > > > So, I figured the next best strategy would be to implement a process
>> > > > pool since it would allow for slow running processes to continue
>> > > > their
>> > > > work while finished processes can die and new processes spawned into
>> > > > the
>> > > > pool ready for work - so none of the resources are sitting idle.
>> > > >
>> > > > Right now, my module isn't nearly as feature-complete as the plists
>> > > > module is - this is only a drop in replacement for map. Please
>> submit
>> > > > your criticisms and comments to me at this address.
>> > > >
>> > > > You may find the code on BitBucket:
>> > > > https://bitbucket.org/ixmatus/ppool
>> > > >
>> > > > - --
>> > > > Parnell "ixmatus" Springmeyer (http://ixmat.us)
>> > > > -----BEGIN PGP SIGNATURE-----
>> > > > Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
>> > > > Comment: GPGTools - http://gpgtools.org
>> > > >
>> > > > iQEcBAEBAgAGBQJOJmKwAAoJEPvtlbpI1POL+asIAKPcR0SOw67hFwwIbmkf89sS
>> > > > 4+Zx9hx1V/+86OVtXcqcOY+yxNcHezNEKkw8z2XHmDAWbeOl3bbINFySRXbQVydV
>> > > > 854lArqCHRG+ZlJ6ZrgecXKf9mG8ldbK1InwEZWOVZBj63rhmloMaGiyTzmxA88S
>> > > > 7mDNS4uhhpvRT2znpnsWt1x12IAzeayV0hf5/BLjp+b5FMZPc9oSa4n5uzyA9AVW
>> > > > +av6hyuFfK32lhxUb4u3bVMaHOf2n/YwJexS25+NODcpkI3BLXNkrmKwgz8Lv/sA
>> > > > omKzKTiuhpa0vTM+TLI9pn82GCJLdD+ON9DDOFN4ww+BnmXjhykiicBQCg7yhtQ=
>> > > > =GP7K
>> > > > -----END PGP SIGNATURE-----
>> >
>> > > > _______________________________________________
>> > > > erlang-questions mailing list
>> > > > erlang-questions@REDACTED
>> > > > http://erlang.org/mailman/listinfo/erlang-questions
>> > > _______________________________________________
>> > > erlang-questions mailing list
>> > > erlang-questions@REDACTED
>> > > http://erlang.org/mailman/listinfo/erlang-questions
>> >
>> >
>>
>
> >
>
> > --
> > Parnell "ixmatus" Springmeyer (http://ixmat.us)
> >
>
> > _______________________________________________ erlang-questions mailing
> list erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>



-- 
Parnell "ixmatus" Springmeyer (http://ixmat.us)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20110723/9522ffb3/attachment.htm>


More information about the erlang-questions mailing list