[erlang-questions] My frustration with Erlang

Fri Sep 12 15:52:07 CEST 2008

I sell a poker server written in Erlang. It's supposed to be super- 
robust and super-scalable. I'm about to move to the next level by  
adding the missing features, e.g. tournaments and a Flash client.

I appreciate everything that the Erlang/OTP is doing but I thought I  
would vent a few of my recent frustrations with Erlang. I'm in a good  
mood after spending a day with OCaml and I have calmed down. Still,  
prepare yourself for a long rant ahead!

My development workstation is a Mac Pro 2x2.8Ghz Quad Xeon, 12Gb of  
memory, one 250Gb and two more drives 500Gb each, all 7200RPM SATA. I  
use R12B3, SMP and kernel poll, i.e.

Erlang (BEAM) emulator version 5.6.3 [source] [64-bit] [smp:8] [async- 
threads:0] [kernel-poll:true]

My overwhelming frustration is the opacity of a running Erlang system.  
There are no decent tools for peering inside. No usable ones whatsoever!

With any other language you can profile, make changes, evaluate  
performance and make a judgement but not with Erlang.

I first wrote OpenPoker using OTP everywhere. My players, games, pots,  
limits, hands, decks, etc. were all gen_server processes. I used  
Mnesia transactions everywhere and I used them often.

Then I discovered that I cannot scale past 2-3k concurrent players  
under heavy use.

I have a test harness that launches bots which connect to the server  
and play by the script. The bots don't wait before replying to bet  
requests and so launching a few thousand bots heavily loads the server.

I don't want just a few thousand concurrent bots, though! I want at  
least 10k on a single VM and hundreds of thousands on a cluster, so I  
set to optimize my poker server.

The Erlang Efficiency Guide recommends fprof as the tool. I ran fprof  
on my test harness and discovered that the result set cannot be  
processed in my 12Gb of memory. I made this discovery after leaving  
fprof running for a couple of days and realized this because the fprof  
data files were approaching 100Gb and my machine became unusable due  
to heavy swapping.

fprof usets ets tables to analyze the trace results and ets tables  
must fit in memory.

I shortened my test run and was able to see the output of the fprof  
trace analysis. To say that it's dense would be an understatement! I  
realize that dumping out tuples is easy but aren't computers suppose  
to help us humans?

The final output from fprof is still too raw for me to analyze.  
There's absolutely, positively, definitely no way to get a picture of  
a running system by reading through it. I understand that I can infer  
from the analysis that certain functions take a lot of time but what  
if there are none?

The bulk of the time in my system was taken by various OTP functions  
and processes, Mnesia and unknown functions. All I could infer from it  
is that perhaps I have too many processes.

Another thing that I inferred is that the normal method of writing  
gen_server code doesn't work for profiling.

I had to rewrite the gen_server clauses to immediately dispatch to  
functions, e.g.

handle_cast('LOGOUT', Data) ->
     handle_cast_logout(Data);

handle_cast('DISCONNECT', Data) ->
     handle_cast_disconnect(Data);

otherwise all the clauses of a gen_server are squashed together,  
regardless of the message pattern. I don't know if there's a better  
way to tackle this.

Next, I rewrote most of my gen_servers as data structures, e.g. pot,  
limit, deck, etc. A deck of cards can take a message to draw a card  
but the message can just as well be a function call. The deck  
structure will need to be modified regardless and the tuple will be  
duplicated anyway. There didn't seem to be any advantage in using a  
process here, much less a gen_server.

Next I carefully went trough my Mnesia schema and split some tables  
into smaller tables. I made sure that only the absolutely necessary  
tables were disk-based. I wish I could run without updating Mnesia  
tables during a game but this is impossible since player balances and  
status need to be updated when players join or leave a game, as well  
as when a game finishes.

All my hard work paid off and I was able to get close to 10K players,  
with kernel poll enabled, of course. Then I ran out of ETS tables.

I don't create ETS tables on the fly but, apparently, Mnesia does. For  
every transaction!!!

This prompted me to go through the server again and use dirty_read,  
dirty_write wherever possible. I also placed balanced in two separate  
"counter" tables, integers to be divided by 10000 to get 4 decimal  
points of precision. This is so that I could use dirty_update_counter  
instead of a regular read, bump, write pattern.

My frustration kept increasing but I gained more concurrent players. I  
can now safely run up to 8K bots before timeouts start to appear.

These are gen_server call timeouts when requests for game information  
take longer than the default 5 seconds. I have an average of 5 players  
per game so this is not because a large number of processes are trying  
to access the game.

I suppose this is a reflection of the load on the system, although CPU  
usage never goes past 300% which tells me that no more than 3 cores  
are used by Erlang.

The straw that broke my back was when stopping a bot's matching player  
gen_server by returning {stop, ... } started causing my observer  
process to receive tcp_close and exit. I could repeat this like  
clockwork. Only spawning a separate process to send player a stop  
message would fix this.

Then I changed the way I represent cards started seeing this behavior  
again, in just one of my tests. What do cards have to do with  
tcp_close? I don't know and dbg tracer is my best friend! What I know  
is what git tells me and git says cards were the only difference.

Anyway, I don't think I have fully recovered yet. I may need a weekend  
just to regain my sanity. I will try to spread the load among several  
VMs but my hunch is that my distributed 100k players target is far far  
away. I'll may have to keep flying blind, with only traces and  
printouts to my rescue.

	Thanks for listening, Joel

--
wagerlabs.com