[erlang-questions] gen_server bottleneck

Fri Dec 14 21:12:26 CET 2012

Hi guys,

Saravanan, maybe I am missing the point, but why not to create 10.000
clients and servers processes? In the code you provided I don't see much
process creation...

Described load is relatively small and you should not see any performance
issues with this amount of UDP traffic.

p.s.: I am not familiar with discrete system simulation, so maybe my view
is absolutely wrong))

Best regards,
Max

On Fri, Dec 14, 2012 at 11:21 PM, Saravanan Vijayakumaran <sarva.v@REDACTED
> wrote:

> Dear Garrett, Daniel and Olivier,
>                                               Thanks for your responses. I
> will try your suggestions.
>
> @Olivier: Thanks for sending the Sim-Diasca sources and docs earlier. The
> ns-3 simulator which is the basis for NSIME is discrete event. Since
> Sim-Diasca was discrete time, I put it on the backburner and haven't got a
> chance to check it out.
>
> @All: Let me elaborate a little on the bottleneck I am seeing.
>
> The simulation scenario consists of 10,000 UDP echo client server pairs
> with a pairwise point to point connection between them. Each UDP echo
> client sends 10 packets to a unique UDP server with a inter-packet time of
> 1s. Each UDP server replies to every packet it receives. In terms of the
> simulation, the 10k UDP echo clients each schedule a packet send event
> using a gen_server:call on nsime_simulator. When the simulation is run, the
> send events are executed resulting in receive events being scheduled at the
> UDP echo servers. The receive events are executed resulting in reply events
> being scheduled. Every event being scheduled is a gen_server:call on the
> nsime_simulator process.
>
> The parallelism I am trying to exploit right now is in the form of
> simultaneous events (events having same time stamp). All the 10k send
> events are simultaneous, the 10k receives are simultaneous as the point to
> point connections between each pair has the same delay and data rate and
> the 10k reply events are simultaneous.
>
> The parallel execution of the simultaneous events is done using the
> following code fragment from src/nsime_simulator.erl. The gen_server:call
> returns a list of simultaneous events and Stephen Marsh's plists module (
> https://github.com/rmies/plists) is used to execute each event (an MFA
> triple) in the list by dividing the list among 5 processes. This was for a
> quad core machine. The plists code suggests using  number of cores + 1.
> Each time an event is executed it might result in another event being
> scheduled which itself is a gen_server:call to nsime_simulator.
>
> parallel_run() ->
>     case gen_server:call(?MODULE, parallel_run) of
>         {events, EventList} ->
>             plists:foreach(
>                 fun(Event) ->
>                     erlang:apply(
>                         Event#nsime_event.module,
>                         Event#nsime_event.function,
>                         Event#nsime_event.arguments
>                     )
>
>                 end,
>                 EventList,
>                 {processes, 5}
>
>
>             ),
>             ?MODULE:parallel_run();
>         none ->
>             simulation_complete
>     end.
>
>
> On a quad core machine, the ns-3 C++ code for this example runs in 59
> seconds while occupying only one core. The NSIME code runs in 65 seconds
> while occupying about 300% out of a maximum of 400%. That doesn't sound too
> bad. Another non-trivial C++ vs Erlang benchmark, one can argue. But the
> NSIME code which does not execute the simultaneous events in parallel runs
> in 79 seconds while occupying only one core . So I am seeing sublinear
> speedups.
>
> The even more surprising thing is that on a 32 core machine the
> parallel_run runs in 68 seconds when the simultaneous event list is divided
> among 32 processes. Since the 32 core and quad core had different CPU
> speeds I ran the simulation on the 32 core machine with the simultaneous
> event list divided among 5 processes and that took 66 seconds. So I
> actually saw a slowdown when I use more cores. When I divided the event
> list among 32 processes, only 30-40% of each core was being utilized (as
> seen in htop).
>
> I hope my description makes some sense. My gut feeling is that the single
> nsime_simulator is the bottleneck. I will try to confirm it using the
> Erlang profiling tools. I was hoping pg or pg2 can provide a solution by
> distributing the message handling workload. I could schedule events in a
> randomly chosen process in the process group and then collect the earliest
> events from all the process group members. But to choose a random process I
> may have to use pg2:get_closest_pid which itself may cause a bottleneck.
>
> sarva
>
>
>
>
> On Fri, Dec 14, 2012 at 9:35 PM, Daniel Luna <daniel@REDACTED> wrote:
>
>> On 14 December 2012 09:51, Garrett Smith <g@REDACTED> wrote:
>> > On Fri, Dec 14, 2012 at 8:47 AM, Garrett Smith <g@REDACTED> wrote:
>> >> Hi Saravanan,
>> >> If you're bottlenecking on CPU (all your cores are fully utilized at
>> >> peak load) then you need either a faster machine or you'll need to
>> >> distribute your application to multiple machines.
>> >
>> > I should add there a number of ways you can improve efficiency, short
>> > of adding hardware resources. The big win, once you understand what to
>> > target, is C ports (or NIFs).
>>
>> But long before you start looking at either NIFs or new hardware, look
>> into the complexity of the code itself.  Does that expensive function
>> even have to be called in all cases, etc.
>>
>> I guess this is my pet peeve when it comes to optimizing anything.
>>
>> First you optimize for readability (often gaining speed or at least
>> finding issues)
>> Then you measure
>> Then you optimize the hotspots you've discovered by measuring
>> *Then* you can start looking into hardware or non-Erlang solutions
>>
>> I've also seen situations where minor changes in the requirements have
>> seen the possibility to speed up code by a factor 10 so that's also a
>> possibility.
>>
>> Cheers,
>>
>> Daniel
>>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20121215/6a99f11f/attachment.htm>