[erlang-questions] idiomatic large data sets; shared and unshared

Felix Gallo <>
Sat Jul 6 03:54:07 CEST 2013


In my continuing quest to learn erlang, I'm trying to implement a 1000x1000
grid into which items may be placed or removed, with perhaps a millon items
in placement at once, spread fairly evenly across the grid.  No grid cell
will hold more than a thousand things.  In the future, this grid may be
shared across multiple processes, but for now I'm content to just try to
get it working for one.

In the mutable world, you might structure it as a hashtable of hashes, or
hashtable of sets etc., and use the coordinate tuple {x,y} as a key into
the hash table, then mutate the list.  addtocell(Coordinate, Value) and
removefromcell(Coordinate, Value) are very small, easy, and extremely fast
(sub-second for a million iterations of addtocell()).

In erlang world, the naive approach of attempting to use a list of lists
(and then a list of sets), and lists:keytake() worked for small grid
dimensions (10x10) but ran for several minutes before I killed it when I
moved up in grid size and tried to insert a million things, with no
indication when it would ever finish.

The slightly less naive approach of using ets worked for larger grid
dimensions, but was also highly nonperformant (60s for a million iterations
of addtocell()).

The problem sort of has me stumped.  I don't have the natural intuition of
what I can trade away for more speed in order to get to the same order of
magnitude as the mutable solutions.

It may be that I'm doing one of those fabled things you are not ever
supposed to do in Erlang, but on the other hand, it seems like network
switches have scenarios under which you might have to access randomly into
a shared large matrix.  And then there's Riak, which appears to be at least
moderately performant and which appears to be built around the idea of
permitting random write access into a shared datastore of indeterminate
arity.

So knowing that the mutable solution is essentially always going to win
just because it doesn't have to do copies, is it possible to get erlang
into the same ballpark here?  Or is the idiomatic solution to write a NIF?

F.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130705/e7304aef/attachment.html>


More information about the erlang-questions mailing list