[erlang-questions] ANN: gproc_pool + some performance tidbits

Wed Jun 5 12:34:20 CEST 2013

Thanks for the explanation.

On Wed, Jun 5, 2013 at 8:47 AM, Ulf Wiger <ulf@REDACTED> wrote:

>
> On 4 Jun 2013, at 21:30, pablo platt wrote:
>
> What's the use case for workers in the pool?
> Is it only for distributing a task or also for implementing a pool of DB
> connections like https://github.com/devinus/poolboy ?
>
>
>
> I believe it *is* fairly similar to poolboy, but I thought it would be
> consistent with the gproc philosophy to have a pool concept in gproc, since:
>
> - One of the things you need to do in a worker pool implementation is to
> keep track of the worker processes, and gproc is good at this
>
> - A benefit of using gproc is that you can get some
> query/debugging/monitoring capabilities for free. For example, after
> setting up my test pool (gproc_pool:setup_test_pool/3), I can use the
> following stock gproc function:
>
> 2> gproc_pool:setup_test_pool(mypool,round_robin,[]).
> add_worker(mypool, a) -> 1; Ws = [{a,1}]
> add_worker(mypool, b) -> 2; Ws = [{a,1},{b,2}]
> add_worker(mypool, c) -> 3; Ws = [{a,1},{b,2},{c,3}]
> add_worker(mypool, d) -> 4; Ws = [{a,1},{b,2},{c,3},{d,4}]
> add_worker(mypool, e) -> 5; Ws = [{a,1},{b,2},{c,3},{d,4},{e,5}]
> add_worker(mypool, f) -> 6; Ws = [{a,1},{b,2},{c,3},{d,4},{e,5},{f,6}]
> [true,true,true,true,true,true]
> 3> gproc:in
> info/1  info/2  init/1
> 3> catch gproc:info(self()).
> [{gproc,[{{n,l,[gproc_pool,mypool,1,a]},0},
>          {{n,l,[gproc_pool,mypool,2,b]},0},
>          {{n,l,[gproc_pool,mypool,3,c]},0},
>          {{n,l,[gproc_pool,mypool,4,d]},0},
>          {{n,l,[gproc_pool,mypool,5,e]},0},
>          {{n,l,[gproc_pool,mypool,6,f]},0}]},
>  {current_function,{erl_eval,do_apply,6}},
>  {initial_call,{erlang,apply,2}},
>  {status,running},
>  {message_queue_len,0},
>   …]
>
> Thus, from the 'gproc footprint' of the process, I can readily tell that
> it's a worker in the pool 'mypool' (even if I'm not familiar with the
> gproc_pool concept, I can guess from convention that the first part of the
> name is a module name).
>
> The whole idea of gproc was in fact to provide a single set of patterns
> that I saw appearing in many different places in our code, in lots of
> different implementations. So in a sense, practically everything that gproc
> provides is stuff that people have implemented before, in reasonably
> similar ways. :)  Hopefully with gproc, some user code can become simpler,
> more debuggable and a bit more uniform.
>
> Why workers has names?
> I know I can just give them names such as 0,1,2... but trying to
> understand the rational.
>
>
> I thought it was a useful layer of abstraction.
>
> The performance of the pool is somewhat dependent on the spread of workers
> across the available slots (especially if the pool is half-full, and
> hashing or random selection is used). The workers themselves only need to
> know what they are to call themselves as they connect to the pool. Whoever
> manages the pool can control the positioning of each worker.
>
>
> As always, I'm sure this functionality will be a major part in my server
> like everything else in gproc,
> even if I still don't know why ;)
>
>
> Haha! This reminds me of the first design review meeting at Ericsson where
> gproc's predecessor sysproc was up for review. The chairman of the meeting
> said "I guess we'll approve it, even though I don't understand what it's
> for". :)
>
> It was a good decision, I thought…
>
> BR,
> Ulf W
>
>
>
> Thanks
>
>
>
>
> On Tue, Jun 4, 2013 at 10:24 PM, Ulf Wiger <ulf@REDACTED> wrote:
>
>>
>> On 4 Jun 2013, at 18:52, ANTHONY MOLINARO wrote:
>>
>> Hi Ulf,
>>
>> Have you done any concurrent tests?  I only ask because I've seen our own
>> pooling code (https://github.com/openx/gen_server_pool) have issues
>> under load.  Now in our case
>> it's because of a single gen_server acting as a dispatch layer, which
>> should not be the
>> case for gproc as IIRC it uses ets to provide for fast concurrent access
>> (something also
>> done in a novel way by https://github.com/ferd/dispcount/ which I keep
>> meaning to try
>> out), but I'd be curious to know if you've done any concurrent testing
>> which shows that.
>>
>>
>> I hadn't, but did so now.
>>
>> Spawning N clients, which run 1000 iterations each, on e.g. a round_robin
>> pool:
>>
>> N   Avg usec/iteration
>> 1                37
>> 10           250
>> 100       1630
>> 1000  18813
>>
>> Of course, this was a pretty nasty test, with all processes banging away
>> at the pool as fast as they possibly could. If you want frequent mutex
>> conflicts, that's probably as good a way as any to provoke them.
>>
>> When I insert a random sleep (0-50 ms) between each iteration, time each
>> pick request and collect the averages, 100 concurrent workers pay on
>> average 50 usec per selection. For 1000 concurrent workers, the average
>> rises to 60 usec.
>>
>> The corresponding average for the hash pool and 1000 concurrent workers
>> is 20 usec.
>>
>> (All on my Macbook Air)
>>
>>
>> I think the number of pool implementations in erlang has probably finally
>> surpassed
>> the number of json parsers ;)
>>
>>
>> Well, that tends to happen with fun and reasonably well-bounded problems.
>> ;)
>>
>> BR,
>> Ulf W
>>
>>
>> -Anthony
>>
>> On Jun 4, 2013, at 2:18 AM, Ulf Wiger <ulf@REDACTED> wrote:
>>
>>
>> I pushed a new gproc component called gproc_pool the other day.
>>
>> The main idea, apart from wanting to see how well it would work, was that
>> I wanted to be able to register servers with gproc and then have an
>> efficient way of pooling between them. A benefit of using gproc throughout
>> is that the registration objects serve as a 'footprint' for each process -
>> by listing the gproc entities for each process, you can tell a lot about
>> its purpose.
>>
>> The way gproc_pool works is that:
>> 1. You define a pool, by naming it, and optionally specifying its size
>>     (gproc_pool:new(Pool) | gproc_pool:new(Pool, Type, Options))
>> 2. You add worker names to the pool
>>    (gproc_pool:add_worker(Pool, Name))
>> 3. Your servers each connect to a given name
>>    (gproc_pool:connect_worker(Pool, Name))
>> 4. Users pick a worker for each request (gproc_pool:pick(Pool))
>>
>> My little test code indicates that the different load-balancing
>> strategies perform a bit differently:
>>
>> (https://github.com/uwiger/gproc/blob/master/src/gproc_pool.erl#L843)
>>
>> (Create a pool, add 6 workers and iterate 100k times,
>> incrementing a gproc counter for each iteration.)
>>
>> 3> gproc_pool:test(100000,round_robin,[]).
>> worker stats (848):
>> [{a,16667},{b,16667},{c,16667},{d,16667},{e,16666},{f,16666}]
>> {2801884,ok}
>> 4> gproc_pool:test(100000,hash,[]).
>> worker stats (848):
>> [{a,16744},{b,16716},{c,16548},{d,16594},{e,16749},{f,16649}]
>> {1891517,ok}
>> 5> gproc_pool:test(100000,random,[]).
>> worker stats (848):
>> [{a,16565},{b,16542},{c,16613},{d,16872},{e,16727},{f,16681}]
>> {3701011,ok}
>> 6> gproc_pool:test(100000,direct,[]).
>> worker stats (848):
>> [{a,16667},{b,16667},{c,16667},{d,16667},{e,16666},{f,16666}]
>> {1766639,ok}
>> 11> gproc_pool:test(100000,claim,[]).
>> worker stats (848):
>> [{a,100000},{b,0},{c,0},{d,0},{e,0},{f,0}]
>> {7569425,ok}
>>
>>
>> The worker stats show how evenly the workers were selected,
>> and the {Time, ok} comes from timer:tc/3, i.e. Time/100000 is the
>> per-iteration cost:
>>
>> round_robin: 28 usec (maintain a 'current' counter, modulo Size)
>> hash:  19 usec (gproc_pool:pick(Pool, Val), hash on Val)
>> random: 37 usec (pick a random worker, using crypto:rand_uniform/2)
>> direct: 18 usec (gproc_pool:pick(Pool, N), where N modulo Size selects
>> worker)
>> claim: 76 usec (claim the first available worker, apply a fun, then
>> release)
>>
>> I think the per-selection cost is acceptable as-is, but could perhaps be
>> improved (esp. the 'random' strategy is surprisingly expensive). All the
>> selection work is done in the caller's process, BTW - no communication with
>> the gproc or gproc_pool servers (except for admin tasks).
>>
>> The 'claim' strategy is also surprisingly expensive. I believe it's
>> because I'm using gproc:select/3 to find the first free worker. Note also
>> that it results in an extremely uneven distribution. That's obviously
>> because the test run claims the first available worker and then releases it
>> before iterating - it's always going to select the first worker.)
>>
>> https://github.com/uwiger/gproc/blob/master/doc/gproc_pool.md
>>
>> Feedback welcome, be it with performance tips, usability tips, or other.
>>
>> BR,
>> Ulf W
>>
>> Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
>> http://feuerlabs.com
>>
>>
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>>
>>
>>  Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
>> http://feuerlabs.com
>>
>>
>>
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>>
>
> Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
> http://feuerlabs.com
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130605/615f9105/attachment.htm>