[erlang-questions] Slow when using chained gen_server:call's, redesign or optimize?

Fri Feb 3 11:44:15 CET 2012

Hi List,

I'm a bit late replying to this thread, only reading it now. Probably too
late to help the original poster but here's my contribution anyway.

Matthew, hope you don't mind me suggesting a minor improvement to your code
snippet, just to demonstrate a larger point. Instead of:

handle_call({long_operation,Data},From,State) ->
    spawn(fun() ->
            Rsp = do_lengthy_operation(Data),
            gen_server:reply(Rsp,From)
     end),
    {noreply, State};

I prefer getting the calling thread to do as much work as possible.
Here a process is spawned to do the work and then die, all while the
calling process waits. I think it's preferable to get the calling process
to do the work itself, it avoids the spawn and any crashes in the
do_lengthy_operation are argubly easier to debug. Eg:

long_operation(Data) ->
    State = gen_server:call(?MODULE, get_state),
    do_lengthy_operation(Data,State).

handle_call(get_state,_From,State) ->
    {reply,State,State};

Normally when I'm writing a new gen_server I get the calling process to do
as much work as possible and try to get the handle_calls/casts to only do
the work that requires mutual exclusion. I normally store what I can in a
protected, named_table ets table so that interface funtions that need this
can read it without going through the gen_server but I do all updates in a
handle_call/cast.

Once one of my gen_servers became a bottle-neck in a system and I ended up
taking these steps to fix the problem. Since then I've adopted this as a
rule of thumb when writing new gen_servers and avoid such pointless
bottle-necks that I'd have to fix later on anyway.

I inherited some code with the same problem as the original poster with
gen_server calls going five-deep in one instance. Everything worked fine at
low load levels but once the load crossed a threshold there was no recovery
as every management process spent most of its time waiting for other
management processes to reply while their own message queues built up.
Meanwhile the worker threads would timeout, reconnect and add to the
backlog. The message queues just built up until the entire node fell over.

After making sure that the chained calls weren't required to keep the data
consistent I moved as work as possible to the calling process. This way I
kept the interface unchanged for each module but avoided the case where
each process in the chain was tied up, waiting for the last one to complete.

I think for newbies it's important to be aware of which process is doing
the work. I know when I started using Erlang it took a while to get my head
around the concurrency.

//TTom.

On Sat, Jan 28, 2012 at 3:18 PM, Matthew Evans <mattevans123@REDACTED>wrote:

>  Of course you need to  run a profiler such as fprof to see what's going
> on.
>
> Sounds like a classic head of line blocking problem. Many requests,
> possibly from processes on different schedulers/cores, all getting
> serialized on a single gen_server.
>
> The obvious, and maybe non-OTP, answer is to hold some of this state
> information in a public or protected named ETS table that your clients read
> from directly. A single gen_server can still own and write to that ETS
> table.
>
> Another obvious answer is to provide back-pressure of some kind to prevent
> clients from requesting data when it is under load.
>
> You might find that a particular infrequent  gen_server:call operation is
> taking a long time to complete causing a message queue to suddenly grow.
> You might want to change such an operation from:
>
> handle_call({long_operation,Data},From,State) ->
>     Rsp = do_lengthy_operation(Data),
>     {reply, Rsp, State};
>
> to:
>
> handle_call({long_operation,Data},From,State) ->
>     spawn(fun() ->
>             Rsp = do_lengthy_operation(Data),
>             gen_server:reply(Rsp,From)
>      end),
>     {noreply, State};
>
>
> ------------------------------
> From: goddang@REDACTED
> To: erlang-questions@REDACTED
> Date: Sat, 28 Jan 2012 00:06:04 +0000
> Subject: [erlang-questions] Slow when using chained gen_server:call's,
> redesign or optimize?
>
>
>  I'm creating a system where I've ended up with alot of gen_servers that
> provides a clean interface. When I run this under load I see that the
> gen_server:call's is becoming a bottleneck.
> For instance, In a handle_request I might ask an other gen_server to get
> me a cached object, then ask the database something, then etc...
> and in some cases I have
> my-gen_server->cache-gen_server->memcache-client-gen_server as you see it
> stacks up to alot of steps. I've tried to optimize with deferring
> gen_server responses and that has given a slight performance improvement
> but not as drastical as if I for instance bypass one gen_server instance.
>
> Is there a better way to go about this or some smart optimization to do?
> And FYI, I use gen_server when I need to keep a state of a connection or
> something so if the answer is to scrap  or reduce the number of gen_servers
> I will need to keep those connections somewhere else.
>
> Thanks, Dang
>
>
> _______________________________________________ erlang-questions mailing
> list erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20120203/efc0186f/attachment.htm>