[erlang-questions] ANN: gproc_pool + some performance tidbits

Ulf Wiger ulf@REDACTED
Wed Jun 5 07:47:32 CEST 2013


On 4 Jun 2013, at 21:30, pablo platt wrote:

> What's the use case for workers in the pool?
> Is it only for distributing a task or also for implementing a pool of DB connections like https://github.com/devinus/poolboy ?


I believe it *is* fairly similar to poolboy, but I thought it would be consistent with the gproc philosophy to have a pool concept in gproc, since:

- One of the things you need to do in a worker pool implementation is to keep track of the worker processes, and gproc is good at this

- A benefit of using gproc is that you can get some query/debugging/monitoring capabilities for free. For example, after setting up my test pool (gproc_pool:setup_test_pool/3), I can use the following stock gproc function:

2> gproc_pool:setup_test_pool(mypool,round_robin,[]).
add_worker(mypool, a) -> 1; Ws = [{a,1}]
add_worker(mypool, b) -> 2; Ws = [{a,1},{b,2}]
add_worker(mypool, c) -> 3; Ws = [{a,1},{b,2},{c,3}]
add_worker(mypool, d) -> 4; Ws = [{a,1},{b,2},{c,3},{d,4}]
add_worker(mypool, e) -> 5; Ws = [{a,1},{b,2},{c,3},{d,4},{e,5}]
add_worker(mypool, f) -> 6; Ws = [{a,1},{b,2},{c,3},{d,4},{e,5},{f,6}]
[true,true,true,true,true,true]
3> gproc:in
info/1  info/2  init/1  
3> catch gproc:info(self()).
[{gproc,[{{n,l,[gproc_pool,mypool,1,a]},0},
         {{n,l,[gproc_pool,mypool,2,b]},0},
         {{n,l,[gproc_pool,mypool,3,c]},0},
         {{n,l,[gproc_pool,mypool,4,d]},0},
         {{n,l,[gproc_pool,mypool,5,e]},0},
         {{n,l,[gproc_pool,mypool,6,f]},0}]},
 {current_function,{erl_eval,do_apply,6}},
 {initial_call,{erlang,apply,2}},
 {status,running},
 {message_queue_len,0},
  …]

Thus, from the 'gproc footprint' of the process, I can readily tell that it's a worker in the pool 'mypool' (even if I'm not familiar with the gproc_pool concept, I can guess from convention that the first part of the name is a module name).

The whole idea of gproc was in fact to provide a single set of patterns that I saw appearing in many different places in our code, in lots of different implementations. So in a sense, practically everything that gproc provides is stuff that people have implemented before, in reasonably similar ways. :)  Hopefully with gproc, some user code can become simpler, more debuggable and a bit more uniform.

> Why workers has names?
> I know I can just give them names such as 0,1,2... but trying to understand the rational.

I thought it was a useful layer of abstraction.

The performance of the pool is somewhat dependent on the spread of workers across the available slots (especially if the pool is half-full, and hashing or random selection is used). The workers themselves only need to know what they are to call themselves as they connect to the pool. Whoever manages the pool can control the positioning of each worker.


> As always, I'm sure this functionality will be a major part in my server like everything else in gproc,
> even if I still don't know why ;)

Haha! This reminds me of the first design review meeting at Ericsson where gproc's predecessor sysproc was up for review. The chairman of the meeting said "I guess we'll approve it, even though I don't understand what it's for". :)

It was a good decision, I thought…

BR,
Ulf W


> 
> Thanks
> 
> 
> 
> 
> On Tue, Jun 4, 2013 at 10:24 PM, Ulf Wiger <ulf@REDACTED> wrote:
> 
> On 4 Jun 2013, at 18:52, ANTHONY MOLINARO wrote:
> 
>> Hi Ulf,
>> 
>> Have you done any concurrent tests?  I only ask because I've seen our own pooling code (https://github.com/openx/gen_server_pool) have issues under load.  Now in our case
>> it's because of a single gen_server acting as a dispatch layer, which should not be the
>> case for gproc as IIRC it uses ets to provide for fast concurrent access (something also
>> done in a novel way by https://github.com/ferd/dispcount/ which I keep meaning to try
>> out), but I'd be curious to know if you've done any concurrent testing which shows that.
> 
> I hadn't, but did so now.
> 
> Spawning N clients, which run 1000 iterations each, on e.g. a round_robin pool:
> 
> N   Avg usec/iteration
> 1                37
> 10           250
> 100       1630
> 1000  18813
> 
> Of course, this was a pretty nasty test, with all processes banging away at the pool as fast as they possibly could. If you want frequent mutex conflicts, that's probably as good a way as any to provoke them.
> 
> When I insert a random sleep (0-50 ms) between each iteration, time each pick request and collect the averages, 100 concurrent workers pay on average 50 usec per selection. For 1000 concurrent workers, the average rises to 60 usec.
> 
> The corresponding average for the hash pool and 1000 concurrent workers is 20 usec.
> 
> (All on my Macbook Air)
> 
> 
>> I think the number of pool implementations in erlang has probably finally surpassed
>> the number of json parsers ;)
> 
> Well, that tends to happen with fun and reasonably well-bounded problems. ;)
> 
> BR,
> Ulf W
> 
>> 
>> -Anthony
>> 
>> On Jun 4, 2013, at 2:18 AM, Ulf Wiger <ulf@REDACTED> wrote:
>> 
>>> 
>>> I pushed a new gproc component called gproc_pool the other day.
>>> 
>>> The main idea, apart from wanting to see how well it would work, was that I wanted to be able to register servers with gproc and then have an efficient way of pooling between them. A benefit of using gproc throughout is that the registration objects serve as a 'footprint' for each process - by listing the gproc entities for each process, you can tell a lot about its purpose.
>>> 
>>> The way gproc_pool works is that:
>>> 1. You define a pool, by naming it, and optionally specifying its size
>>>     (gproc_pool:new(Pool) | gproc_pool:new(Pool, Type, Options))
>>> 2. You add worker names to the pool
>>>    (gproc_pool:add_worker(Pool, Name))
>>> 3. Your servers each connect to a given name
>>>    (gproc_pool:connect_worker(Pool, Name))
>>> 4. Users pick a worker for each request (gproc_pool:pick(Pool))
>>> 
>>> My little test code indicates that the different load-balancing strategies perform a bit differently:
>>> 
>>> (https://github.com/uwiger/gproc/blob/master/src/gproc_pool.erl#L843)
>>> 
>>> (Create a pool, add 6 workers and iterate 100k times, 
>>> incrementing a gproc counter for each iteration.)
>>> 
>>> 3> gproc_pool:test(100000,round_robin,[]).
>>> worker stats (848):
>>> [{a,16667},{b,16667},{c,16667},{d,16667},{e,16666},{f,16666}]
>>> {2801884,ok}
>>> 4> gproc_pool:test(100000,hash,[]).       
>>> worker stats (848):
>>> [{a,16744},{b,16716},{c,16548},{d,16594},{e,16749},{f,16649}]
>>> {1891517,ok}
>>> 5> gproc_pool:test(100000,random,[]).
>>> worker stats (848):
>>> [{a,16565},{b,16542},{c,16613},{d,16872},{e,16727},{f,16681}]
>>> {3701011,ok}
>>> 6> gproc_pool:test(100000,direct,[]).
>>> worker stats (848):
>>> [{a,16667},{b,16667},{c,16667},{d,16667},{e,16666},{f,16666}]
>>> {1766639,ok}
>>> 11> gproc_pool:test(100000,claim,[]).
>>> worker stats (848):
>>> [{a,100000},{b,0},{c,0},{d,0},{e,0},{f,0}]
>>> {7569425,ok}
>>> 
>>> 
>>> The worker stats show how evenly the workers were selected,
>>> and the {Time, ok} comes from timer:tc/3, i.e. Time/100000 is the per-iteration cost:
>>> 
>>> round_robin: 28 usec (maintain a 'current' counter, modulo Size)
>>> hash:  19 usec (gproc_pool:pick(Pool, Val), hash on Val)
>>> random: 37 usec (pick a random worker, using crypto:rand_uniform/2)
>>> direct: 18 usec (gproc_pool:pick(Pool, N), where N modulo Size selects worker)
>>> claim: 76 usec (claim the first available worker, apply a fun, then release)
>>> 
>>> I think the per-selection cost is acceptable as-is, but could perhaps be improved (esp. the 'random' strategy is surprisingly expensive). All the selection work is done in the caller's process, BTW - no communication with the gproc or gproc_pool servers (except for admin tasks).
>>> 
>>> The 'claim' strategy is also surprisingly expensive. I believe it's because I'm using gproc:select/3 to find the first free worker. Note also that it results in an extremely uneven distribution. That's obviously because the test run claims the first available worker and then releases it before iterating - it's always going to select the first worker.)
>>> 
>>> https://github.com/uwiger/gproc/blob/master/doc/gproc_pool.md
>>> 
>>> Feedback welcome, be it with performance tips, usability tips, or other.
>>> 
>>> BR,
>>> Ulf W
>>> 
>>> Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
>>> http://feuerlabs.com
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-questions
>> 
> 
> 
> Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
> http://feuerlabs.com
> 
> 
> 
> 
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
> 
> 

Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
http://feuerlabs.com



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130605/570d91cf/attachment.htm>


More information about the erlang-questions mailing list