[erlang-questions] Erlang doesn't suck
Joel Reymont
joelr1@REDACTED
Wed Oct 1 17:57:50 CEST 2008
Not once the pain goes away and only the results stick around.
I spent a grueling 1+ week refactoring OpenPoker and wishing for
static typing. I started at number of git branches, only to be thrown
away a day or two later. Many were the times when I wished to tear my
hair out and throw Erlang far and away from my terrace and towards the
Atlantic ocean.
I do have an extensive test harness and so the best approach proved to
be committing reasonably small changes once all tests pass.
The end result is that OpenPoker scales MUCH better now, even on a
single VM!
My workstation is a Mac Pro, 2x2.8Ghz QuadXeon with 14Gb of RAM.
A while ago I deliberated between Amazon EC2 and Joyent and decided to
host with the latter. For the purposes of testing, Joyent provided me
with an 8 Core x86 Solaris Nevada system w/ a 2 GiB memory cap where I
can burst to 95% CPU Utilization.
A friendly member of this forum gave me access to a Solaris/Niagara
system but I haven't had a change to test on it yet so I'll only
compare my Mac and Joyent.
Here are the test results, statistics collected every 30 seconds.
My Mac
------
With SMP...
3000 games started, 15115 players
...
3000 games finished
Elapsed: 181.960007s, Average run time: 0.06065333566666666 seconds
5000 games started, 25404 players
...
5000 games finished
Elapsed: 329.646545s, Average run time: 0.065929309 seconds
7450 games started, 37890 players
...
7450 games finished
Elapsed: 554.956259s, Average run time: 0.07407317925787507 seconds
Without SMP (disabled)...
(1@REDACTED)3> mb:test(localhost, 3000, 3000).
Simulating gameplay with 3000 games...
Waiting for games to end...
50 games started, 262 players
=INFO REPORT==== 1-Oct-2008::16:00:01 ===
requests: 26065
bytes: 466134
requests_per_second: 868
bytes_per_second: 15536
1750 games started, 8786 players
=INFO REPORT==== 1-Oct-2008::16:00:31 ===
requests: 26695
bytes: 303955
requests_per_second: 889
bytes_per_second: 10128
1800 games started, 9030 players
=INFO REPORT==== 1-Oct-2008::16:01:01 ===
requests: 38856
bytes: 484091
requests_per_second: 1295
bytes_per_second: 16136
3000 games started, 15115 players
50 games finished
=INFO REPORT==== 1-Oct-2008::16:01:31 ===
requests: 22199
bytes: 249774
requests_per_second: 739
bytes_per_second: 8323
2400 games finished
3000 games finished
=INFO REPORT==== 1-Oct-2008::16:02:01 ===
requests: 14249
bytes: 122289
requests_per_second: 474
bytes_per_second: 4076
Elapsed: 121.541805s, Average run time: 0.040513935 seconds
I figured I'll leave SMP disabled since I will likely be running a few
VMs anyway. I'm assuming that 8 non-SMP VMs on an 8-core machines are
much better than 8 VMs with 8-core SMP.
Joyent Labs
-----------
(1@REDACTED)3> mb:test(localhost, 3000, 3000).
Simulating gameplay with 3000 games...
Waiting for games to end...
=INFO REPORT==== 1-Oct-2008::14:54:37 ===
requests: 38588
bytes: 710144
requests_per_second: 1286
bytes_per_second: 23670
2900 games started, 14614 players
=INFO REPORT==== 1-Oct-2008::14:55:07 ===
requests: 40604
bytes: 482419
requests_per_second: 1351
bytes_per_second: 16055
900 games finished
3000 games finished
=INFO REPORT==== 1-Oct-2008::14:55:37 ===
requests: 48872
bytes: 433680
requests_per_second: 1628
bytes_per_second: 14453
Elapsed: 68.21404s, Average run time: 0.02273801333333333 seconds
As you can see, the test finished twice as fast at Joyent and the
number of request per second stayed high.
How did I achieve my scalability increases?
I switched to a closure based binary serialization approach [1] based
on Andrew Kennedy's Pickler Combinators. I think performance can be
better but I'm quite content with describing packets like this:
Pickler = record(foo, {
byte(),
record(bar, {
int(),
record(baz, {
sshort(),
list(byte(), byte())
})
})
}),
[1] http://www.wagerlabs.com/erlang/pickle.erl
I also removed an Erlang-style deadlock from my code.
I have players and games. Erlang/OTP gen_server has both cast (fire
and forget) and call (wait for reply) mechanisms. My games go in pairs
of a gen_fsm driving the poker game logic and a gen_server doing the
house keeping.
My players called games, games called players and everything may have
been fine if I didn't use game timers. With a high server load I would
get into a situation where a player was expecting a game reply, a
timer fired and the game called the player.
A deadlock would ensue and one of the calls would time out.
I found out the reason by grabbing the process info of both the player
and the game upon a timeout and realizing that there was perhaps a
single message in each queue but both processes were waiting on each
another.
I also poked around in Erlang crash dumps but this didn't prove as
useful.
Then I made sure that outgoing packets had the numeric game and player
ids stuffed in them at the source.
My previous approach had game and player process ids in the packet and
had the serialization bits call each process to get its id. This
really limited scalability and slowed things down. More importantly,
this was another source of deadlocks since a game would invoke a
serialization function that called back into the very same game.
Last but not least, my benchmarking approach is a bit flawed since I
can only start 7.5k games on my Mac and 8.6K games at Joyent. This is
because my benchmark tool winds down once the number of finished games
equals the number of games started, disregarding the total number of
games I want to start in the first place. I don't see 15K players
online at the same time for the same reason.
My next step is to make sure all games run when the start pistol fires
and to distribute testing among several VMs.
Thanks, Joel
--
wagerlabs.com
More information about the erlang-questions
mailing list