[erlang-questions] dict vs. ETS choice

Fred Hebert mononcqc@REDACTED
Fri Jul 11 16:28:55 CEST 2014


On 07/11, Wes James wrote:
> Have you heard of or looked at RIAK?
> 
> http://basho.com/riak/
> 
> -wes
> 

Well that sounds a lot like bringing a tank to a dodgeball game.

Let's say for example that you're going to store 10 million keys to make
the 10s of thousands of value small in comparison. These keys are mostly
read-only so likely easy to just have local on each node.

Assuming 10 simple key-value pairs per proplists, you may say that
you're gonna have at most 10kb per row in the table as a very conservative
estimate.

At 10 million rows of 10kb, you're gonna have roughly 95GB of data to
store. That's the kind of stuff that easily sits on a regular laptop and
can be made to work fine with any good old regular SQL storage, or most
single-node DBs that are not DETS.

If you go down and figure out you may average out 1kb of data per row,
you can now fit the entire thing in memory on some servers or virtual
instances. Have 100,000 keys instead of 10 millions, and you now fit
everything under 100MB of RAM.

You don't really need to go look at Riak when the question, for all
reasonable purposes, likely doesn't even need distribution to begin
with.

I'd personally look at what Richard O'Keefe and Jesper mentioned as
selection criteria at this question.

A note about the heavy size of log messages for dicts and other data
structures on failures: check out the 'format_status' callback offered
by gen_server:

http://www.erlang.org/doc/man/gen_server.html#Module:format_status-2
 
Module:format_status(Opt, [PDict, State]) -> Status
 Opt = normal | terminate
 PDict = [{Key, Value}]

 This callback is optional, so callback modules need not export it. The
 gen_server module provides a default implementation of this function
 that returns the callback module state.

 This function is called by a gen_server process when:

 - One of sys:get_status/1,2 is invoked to get the gen_server status.
   Opt is set to the atom normal for this case.
 - The gen_server terminates abnormally and logs an error. Opt is
   set to the atom terminate for this case.

 This function is useful for customising the form and appearance
 of the gen_server status for these cases. A callback module
 wishing to customise the sys:get_status/1,2 return value as
 well as how its status appears in termination error logs
 exports an instance of format_status/2 that returns a term
 describing the current status of the gen_server.

 [...]

 One use for this function is to return compact alternative state
 representations to avoid having large state terms printed in logfiles.





More information about the erlang-questions mailing list