[erlang-questions] Benchmarking Erlang: Deathmatch of gb_trees, dict, ets, mnesia ... and registered names

Thu Oct 9 13:08:01 CEST 2008

I'm still trying to figure out how to optimize pid <-> integer mapping  
and I thought I'd try several approaches. My goal is to lookup a  
process id as quickly as possible given an integer.

My system is a Mac Pro 2x2.8Ghz Quad Xeon, 14Gb of memory

Erlang (BEAM) emulator version 5.6.3 [source] [smp:8] [async-threads: 
0] [kernel-poll:false]

Here's my set of timings, code at the end of this message.

%% gb_trees

1> map1:test(10000).
Populate: 0.0972
Lookup:   0.0912
ok
2> map1:test(100000).
Populate: 0.8737
Lookup:   5.0007
ok
3> map1:test(1000000).
Populate: 9.9215
Lookup:   5.0010
ok

%% dict

1> map2:test(10000).
Populate: 0.1035
Lookup:   0.0730
ok
2> map2:test(100000).
Populate: 1.0407
Lookup:   1.2715
ok
3> map2:test(1000000).
Populate: 10.5010
Lookup:   5.0010
ok

%% ets

4> map3:test(10000).
Populate: 0.1140
Lookup:   0.0448
ok
5> map3:test(100000).
Populate: 1.3435
Lookup:   0.4669
ok
6> map3:test(1000000).
Populate: 11.6472
Lookup:   5.0860
ok

Dict seems to be the winner for 100K values. What's particularly  
interesting to me is that gb_trees, dict and ets give take about the  
same time to look up 1 mil. values. Is there an explanation?

map4, map5 and map6 that test mnesia ram_copies, disc_only_copies and  
ram_copies in a 2-node setup. There seems to be no overhead compared  
to ets, though. Can't explain this either.

The absolutely bizarre and surprising discovery are the timings for  
registered names. On a hunch, I thought I'd try to register a process  
under its id. This way I could just send messages to that id from any  
node in the cluster, without having to go through a separate  
translation process.

%% registered names

(2@REDACTED)2> map7:test(10000).
Populate: 0.8526
Lookup:   0.0040
ok
(2@REDACTED)1> map7:test(100000).
Populate: 8.8507
Lookup:   0.0605
ok
(2@REDACTED)3> map7:test(1000000).
Populate: 94.6030
Lookup:   0.8558

It seems that registering processes under their integer id is the way  
to go. Am I right in my conclusion? Are there any pitfalls with going  
this route?

Between players and games, I would most likely have <100K ids  
registered at any given time.

Code in follow-up messages, registered names first, then gb_trees,  
dict, ets and mnesia.

	Thanks, Joel

--
wagerlabs.com