[erlang-questions] Erlang doesn't suck

Wed Oct 1 17:57:50 CEST 2008

Not once the pain goes away and only the results stick around.

I spent a grueling 1+ week refactoring OpenPoker and wishing for  
static typing. I started at number of git branches, only to be thrown  
away a day or two later. Many were the times when I wished to tear my  
hair out and throw Erlang far and away from my terrace and towards the  
Atlantic ocean.

I do have an extensive test harness and so the best approach proved to  
be committing reasonably small changes once all tests pass.

The end result is that OpenPoker scales MUCH better now, even on a  
single VM!

My workstation is a Mac Pro, 2x2.8Ghz QuadXeon with 14Gb of RAM.

A while ago I deliberated between Amazon EC2 and Joyent and decided to  
host with the latter. For the purposes of testing, Joyent provided me  
with an 8 Core x86 Solaris Nevada system w/ a 2 GiB memory cap where I  
can burst to 95% CPU Utilization.

A friendly member of this forum gave me access to a Solaris/Niagara  
system but I haven't had a change to test on it yet so I'll only  
compare my Mac and Joyent.

Here are the test results, statistics collected every 30 seconds.

My Mac
------

With SMP...

3000 games started, 15115 players
...
3000 games finished
Elapsed: 181.960007s, Average run time: 0.06065333566666666 seconds

5000 games started, 25404 players
...
5000 games finished
Elapsed: 329.646545s, Average run time: 0.065929309 seconds

7450 games started, 37890 players
...
7450 games finished
Elapsed: 554.956259s, Average run time: 0.07407317925787507 seconds

Without SMP (disabled)...

(1@REDACTED)3> mb:test(localhost, 3000, 3000).
Simulating gameplay with 3000 games...
Waiting for games to end...
50 games started, 262 players

=INFO REPORT==== 1-Oct-2008::16:00:01 ===
     requests: 26065
     bytes: 466134
     requests_per_second: 868
     bytes_per_second: 15536

1750 games started, 8786 players

=INFO REPORT==== 1-Oct-2008::16:00:31 ===
     requests: 26695
     bytes: 303955
     requests_per_second: 889
     bytes_per_second: 10128

1800 games started, 9030 players

=INFO REPORT==== 1-Oct-2008::16:01:01 ===
     requests: 38856
     bytes: 484091
     requests_per_second: 1295
     bytes_per_second: 16136

3000 games started, 15115 players
50 games finished

=INFO REPORT==== 1-Oct-2008::16:01:31 ===
     requests: 22199
     bytes: 249774
     requests_per_second: 739
     bytes_per_second: 8323
2400 games finished
3000 games finished

=INFO REPORT==== 1-Oct-2008::16:02:01 ===
     requests: 14249
     bytes: 122289
     requests_per_second: 474
     bytes_per_second: 4076

Elapsed: 121.541805s, Average run time: 0.040513935 seconds

I figured I'll leave SMP disabled since I will likely be running a few  
VMs anyway. I'm assuming that 8 non-SMP VMs on an 8-core machines are  
much better than 8 VMs with 8-core SMP.

Joyent Labs
-----------

(1@REDACTED)3> mb:test(localhost, 3000, 3000).
Simulating gameplay with 3000 games...
Waiting for games to end...

=INFO REPORT==== 1-Oct-2008::14:54:37 ===
     requests: 38588
     bytes: 710144
     requests_per_second: 1286
     bytes_per_second: 23670
2900 games started, 14614 players

=INFO REPORT==== 1-Oct-2008::14:55:07 ===
     requests: 40604
     bytes: 482419
     requests_per_second: 1351
     bytes_per_second: 16055

900 games finished

3000 games finished

=INFO REPORT==== 1-Oct-2008::14:55:37 ===
     requests: 48872
     bytes: 433680
     requests_per_second: 1628
     bytes_per_second: 14453

Elapsed: 68.21404s, Average run time: 0.02273801333333333 seconds

As you can see, the test finished twice as fast at Joyent and the  
number of request per second stayed high.

How did I achieve my scalability increases?

I switched to a closure based binary serialization approach [1] based  
on Andrew Kennedy's Pickler Combinators. I think performance can be  
better but I'm quite content with describing packets like this:

     Pickler = record(foo, {
                        byte(),
                        record(bar, {
                                 int(),
                                 record(baz, {
                                          sshort(),
                                          list(byte(), byte())
                                         })
                                })
                       }),

[1] http://www.wagerlabs.com/erlang/pickle.erl

I also removed an Erlang-style deadlock from my code.

I have players and games. Erlang/OTP gen_server has both cast (fire  
and forget) and call (wait for reply) mechanisms. My games go in pairs  
of a gen_fsm driving the poker game logic and a gen_server doing the  
house keeping.

My players called games, games called players and everything may have  
been fine if I didn't use game timers. With a high server load I would  
get into a situation where a player was expecting a game reply, a  
timer fired and the game called the player.

A deadlock would ensue and one of the calls would time out.

I found out the reason by grabbing the process info of both the player  
and the game upon a timeout and realizing that there was perhaps a  
single message in each queue but both processes were waiting on each  
another.

I also poked around in Erlang crash dumps but this didn't prove as  
useful.

Then I made sure that outgoing packets had the numeric game and player  
ids stuffed in them at the source.

My previous approach had game and player process ids in the packet and  
had the serialization bits call each process to get its id. This  
really limited scalability and slowed things down. More importantly,  
this was another source of deadlocks since a game would invoke a  
serialization function that called back into the very same game.

Last but not least, my benchmarking approach is a bit flawed since I  
can only start 7.5k games on my Mac and 8.6K games at Joyent. This is  
because my benchmark tool winds down once the number of finished games  
equals the number of games started, disregarding the total number of  
games I want to start in the first place. I don't see 15K players  
online at the same time for the same reason.

My next step is to make sure all games run when the start pistol fires  
and to distribute testing among several VMs.

	Thanks, Joel

--
wagerlabs.com