[erlang-questions] ANN: gproc_pool + some performance tidbits
pablo platt
Tue Jun 4 21:30:47 CEST 2013
What's the use case for workers in the pool?
Is it only for distributing a task or also for implementing a pool of DB
connections like https://github.com/devinus/poolboy ?
Why workers has names?
I know I can just give them names such as 0,1,2... but trying to understand
the rational.
As always, I'm sure this functionality will be a major part in my server
like everything else in gproc,
even if I still don't know why ;)
On Tue, Jun 4, 2013 at 10:24 PM, Ulf Wiger <ulf@REDACTED> wrote:
> On 4 Jun 2013, at 18:52, ANTHONY MOLINARO wrote:
> Hi Ulf,
> Have you done any concurrent tests? I only ask because I've seen our own
> pooling code (https://github.com/openx/gen_server_pool) have issues under
> load. Now in our case
> it's because of a single gen_server acting as a dispatch layer, which
> should not be the
> case for gproc as IIRC it uses ets to provide for fast concurrent access
> (something also
> done in a novel way by https://github.com/ferd/dispcount/ which I keep
> meaning to try
> out), but I'd be curious to know if you've done any concurrent testing
> which shows that.
> I hadn't, but did so now.
> Spawning N clients, which run 1000 iterations each, on e.g. a round_robin
> pool:
> N Avg usec/iteration
> 1 37
> 10 250
> 100 1630
> 1000 18813
> Of course, this was a pretty nasty test, with all processes banging away
> at the pool as fast as they possibly could. If you want frequent mutex
> conflicts, that's probably as good a way as any to provoke them.
> When I insert a random sleep (0-50 ms) between each iteration, time each
> pick request and collect the averages, 100 concurrent workers pay on
> average 50 usec per selection. For 1000 concurrent workers, the average
> rises to 60 usec.
> The corresponding average for the hash pool and 1000 concurrent workers is
> 20 usec.
> (All on my Macbook Air)
> I think the number of pool implementations in erlang has probably finally
> surpassed
> the number of json parsers ;)
> Well, that tends to happen with fun and reasonably well-bounded problems.
> ;)
> BR,
> Ulf W
> -Anthony
> On Jun 4, 2013, at 2:18 AM, Ulf Wiger <ulf@REDACTED> wrote:
> I pushed a new gproc component called gproc_pool the other day.
> The main idea, apart from wanting to see how well it would work, was that
> I wanted to be able to register servers with gproc and then have an
> efficient way of pooling between them. A benefit of using gproc throughout
> is that the registration objects serve as a 'footprint' for each process -
> by listing the gproc entities for each process, you can tell a lot about
> its purpose.
> The way gproc_pool works is that:
> 1. You define a pool, by naming it, and optionally specifying its size
> (gproc_pool:new(Pool) | gproc_pool:new(Pool, Type, Options))
> 2. You add worker names to the pool
> (gproc_pool:add_worker(Pool, Name))
> 3. Your servers each connect to a given name
> (gproc_pool:connect_worker(Pool, Name))
> 4. Users pick a worker for each request (gproc_pool:pick(Pool))
> My little test code indicates that the different load-balancing strategies
> perform a bit differently:
> (https://github.com/uwiger/gproc/blob/master/src/gproc_pool.erl#L843)
> (Create a pool, add 6 workers and iterate 100k times,
> incrementing a gproc counter for each iteration.)
> 3> gproc_pool:test(100000,round_robin,[]).
> worker stats (848):
> [{a,16667},{b,16667},{c,16667},{d,16667},{e,16666},{f,16666}]
> {2801884,ok}
> 4> gproc_pool:test(100000,hash,[]).
> worker stats (848):
> [{a,16744},{b,16716},{c,16548},{d,16594},{e,16749},{f,16649}]
> {1891517,ok}
> 5> gproc_pool:test(100000,random,[]).
> worker stats (848):
> [{a,16565},{b,16542},{c,16613},{d,16872},{e,16727},{f,16681}]
> {3701011,ok}
> 6> gproc_pool:test(100000,direct,[]).
> worker stats (848):
> [{a,16667},{b,16667},{c,16667},{d,16667},{e,16666},{f,16666}]
> {1766639,ok}
> 11> gproc_pool:test(100000,claim,[]).
> worker stats (848):
> [{a,100000},{b,0},{c,0},{d,0},{e,0},{f,0}]
> {7569425,ok}
> The worker stats show how evenly the workers were selected,
> and the {Time, ok} comes from timer:tc/3, i.e. Time/100000 is the
> per-iteration cost:
> round_robin: 28 usec (maintain a 'current' counter, modulo Size)
> hash: 19 usec (gproc_pool:pick(Pool, Val), hash on Val)
> random: 37 usec (pick a random worker, using crypto:rand_uniform/2)
> direct: 18 usec (gproc_pool:pick(Pool, N), where N modulo Size selects
> worker)
> claim: 76 usec (claim the first available worker, apply a fun, then
> release)
> I think the per-selection cost is acceptable as-is, but could perhaps be
> improved (esp. the 'random' strategy is surprisingly expensive). All the
> selection work is done in the caller's process, BTW - no communication with
> the gproc or gproc_pool servers (except for admin tasks).
> The 'claim' strategy is also surprisingly expensive. I believe it's
> because I'm using gproc:select/3 to find the first free worker. Note also
> that it results in an extremely uneven distribution. That's obviously
> because the test run claims the first available worker and then releases it
> before iterating - it's always going to select the first worker.)
> https://github.com/uwiger/gproc/blob/master/doc/gproc_pool.md
> Feedback welcome, be it with performance tips, usability tips, or other.
> BR,
> Ulf W
> Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
> http://feuerlabs.com
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
> Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
> http://feuerlabs.com
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130604/90da432f/attachment.htm>
More information about the erlang-questions
mailing list