[erlang-questions] Slow when using chained gen_server:call's, redesign or optimize?

Sun Jan 29 11:08:20 CET 2012

Hi,

What i meant was that you could maybe change your current design (if
it is not like this already) so that you
do not start to process another request from an external interface
until your system has the actual resources
to do so.

On Sun, Jan 29, 2012 at 9:38 AM, God Dang <goddang@REDACTED> wrote:
> Thank you all for taking time to help me out. Some clarification, I've come
> to the conclusion that gen_server(or more precisely my design) is to blame
> after running multiple sessions with fprof, eprof and alot of timer:now_diff
> and timer:tc. For example, if I put load on the system and time
> my handle_call({long_operation,Data},From,State) I see that the execution
> time begins rise from <10ms to around ~2000ms and stabilizing there under my
> load. If I remove the last gen_server and instead implement it as a simple
> function it stabilizes around ~1000ms and removing yet another one gives me
> yet lower execution times.
>
> A word on my design, a lot of the time the only reason we use a gen_server
> is for keeping a state and more precisely a connection. For instance a
> generic db backend which keeps the pid of the db connection, a cache backend
> which keeps the memcache connection pid, my own server that keep state. and
> within my handle_call({long_operation,Data},From,State) I could be doing a
> couple of gen_server:call's and within the database gen_server I could be
> doing additional gen_server call's to the cache backend. Why we've
> constructed it as separate gen_server's is also to be able to call it from
> different parts of the system like db:do_something() or cache:get().
>
> On paper this looked like a clean and nice design but under load it starts
> to behave poorly.
>
> To answer all in order:
> # Gianfranco Alongi 2012-01-28:
>> You could use the synchronized serialization to generate push-back
>> behaviour from your system,
>> so that you do not handle a new request before it's possible - maybe
>> you are already doing this, or not.
> I don' understand what this mean. Can you please clarify.
>
> # Matthew Evans 2012-01-28
>> The obvious, and maybe non-OTP, answer is to hold some of this state
>> information in a public or protected named ETS table
>> that your clients read from directly. A single gen_server can still own
>> and write to that ETS table.
> Sounds like a smart approach.
>
>> Another obvious answer is to provide back-pressure of some kind to prevent
>> clients from requesting data when it is under load.
> I don't understand this fully.
>
>
>> You might find that a particular infrequent  gen_server:call operation is
>> taking a long time to complete causing a message queue
>> to suddenly grow. You might want to change such an operation from:
> I've done that on the first most handle_call. That's what I meant
> with deferring gen_server responses. I saw some speedups on the first
> handle_call but doing this on short lived handle_call's did see slowdowns.
>
> # Jesper Louis Andersen 2012-01-28
>> This would be my first idea. Create an ETS table being protected. Writes
>> to the table goes through the gen_server,
> I like this approach, not as "beautiful" as a pure OTP-approach, but if does
> the trick. Hey.
>
> # Jachym Holecek 2012-01-29
>> You're probably treating asynchronous things as synchronous somewhere
>> along the path, inflicting collateral damage to concurrent users?
> it's basically a get_data() function where we need to do roundtrips to db,
> other stuff to be able to return the value. Can I do this in a asynchronous
> fashion? I tend to think of get's as handle_calls and sets as hanlde_cast.
>
> Big thank you all.
>
> /dang
>
>> Date: Sat, 28 Jan 2012 18:13:25 -0500
>> From: freza@REDACTED
>> To: jesper.louis.andersen@REDACTED
>> CC: erlang-questions@REDACTED
>> Subject: Re: [erlang-questions] Slow when using chained gen_server:call's,
>> redesign or optimize?
>
>>
>> Hi,
>>
>> [Replying to multiple replies at once, all quoted text reformatted for
>> readability (seems people these days can't be bothered!?).]
>>
>> [Warning: excessively nitpicky content ahead, here and there.]
>>
>> # God Dang 2012-01-28:
>> > I'm creating a system where I've ended up with alot of gen_servers that
>> > provides a clean interface. When I run this under load I see that the
>> > gen_server:call's is becoming a bottleneck.
>>
>> You're probably treating asynchronous things as synchronous somewhere
>> along
>> the path, inflicting collateral damage to concurrent users? Synchronous
>> high-level API is fine, but if you know some operation is expensive or
>> depends on response from the outside world you should record request
>> context in ETS (row looking something like {Req_id, Timeout, From[, ...]}
>> and process requests out-of-order OR offload the actual processing to
>> short lived worker processes like Matthew Evans says OR a combination
>> of both OR somesuch.
>>
>> My point being that gen_server:call/N, by itself, is *very* fast in
>> practice, so chances are you're doing something wrong elsewhere.
>>
>> Other (unlikely) thing: you're not sending very large data structures in
>> messages, are you? That could hurt, but there are ways to address that
>> too if needed.
>>
>> # Matthew Evans 2012-01-28:
>> > Another obvious answer is to provide back-pressure of some kind to
>> > prevent
>> > clients from requesting data when it is under load.
>>
>> On external interfaces (or for global resource usage of some sort): yes, a
>> fine idea (a clear "must have", actually!); but doing this internally
>> would
>> seem excessively defensive to me, unless further justification was given.
>>
>> > You might want to change such an operation from:
>> >
>> > handle_call({long_operation,Data},From,State) ->
>> > Rsp = do_lengthy_operation(Data),
>> > {reply, Rsp, State};
>> >
>> > to:
>> >
>> > handle_call({long_operation,Data},From,State) ->
>> > spawn(fun() -> Rsp = do_lengthy_operation(Data),
>> > gen_server:reply(Rsp,From) end),
>> > {noreply, State};
>>
>> 1. Why do people bother introducing "one-shot" variables for trivial
>> expressions they could have inlined? Means less context to maintain
>> when reading the code...
>>
>> 2. Surely you meant proc_lib:spawn_link/X there, didn't you? SASL logs
>> and fault propagation are the reason. While there are exceptions to
>> this, they're extremely rare.
>>
>> 3. The order of arguments to gen_server:reply/2 is wrong.
>>
>> Regarding the general approach: yes, a fine idea too. Depending on what
>> do_lengthy_operation/1 does putting these workers under supervisor might
>> be called for.
>>
>> # Jesper Louis Andersen 2012-01-28:
>> > This would be my first idea. Create an ETS table being protected.
>> > Writes to the table goes through the gen_server,
>>
>> Yes, a fine idea too -- ETS is one of the less obvious cornerstones
>> of Erlang programming (but don't tell "purity" fascists )... One
>> detail: almost all of my ETS tables are public even when many of
>> them are really treated as private or protected, reason is to keep
>> high degree of runtime tweakability just in case (this might be a
>> bit superstitious I admit).
>>
>> > -export([write/1, read/1]).
>> >
>> > write(Obj) ->
>> > call({write, Obj}).
>> >
>> > call(M) ->
>> > gen_server:call(?SERVER, M, infinity).
>>
>> 1. Abstracting trivial functionality such as call/1 above only
>> obfuscates code for precisely zero gain.
>>
>> 2. Same goes for typing "?SERVER" instead of the actual server
>> name. Using "?MODULE" is however alright, as long as it's
>> only referred to from current module (as it should).
>>
>> 3. No infinite timeouts without very good justification! You're
>> sacrificing a good default protective measure for no good
>> reason...
>>
>> > but reads happen in the calling process of the API and does not go
>> > through the gen_server at all,
>> >
>> > read(Key) ->
>> > case ets:lookup(?TAB, Key) of
>> > [] -> not_found;
>> > [_|_] = Objects -> {ok, Objects}
>> > end.
>>
>> (2) from above also applies to "?TAB" here. More to the point, it's
>> sometimes perfectly OK to do table writes directly from caller's
>> context too, like:
>>
>> write(Item) ->
>> true = ets:insert_new(actual_table_name, Item).
>>
>> It can of course be very tricky business and needs good thinking first.
>> I bet you're aware of this, mentioning it just because it's a handy
>> trick that doesn't seem to be widely known.
>>
>> > Creating the table with {read_concurrency, true} as the option will
>> > probably speed up reads by quite a bit as well. It is probably going
>> > to be a lot faster than having all caching reads going through that
>> > single point of contention. Chances are that just breaking parts of
>> > the chain is enough to improve the performance of the system.
>>
>> Well, yes, avoiding central points of contention (such as blessed
>> processes or, you guessed it, ETS tables) is certainly good engineering
>> practice, but see first part of this email for other considerations.
>>
>> BR,
>> -- Jachym
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>