[erlang-questions] idiomatic large data sets; shared and unshared

Felix Gallo felixgallo@REDACTED
Mon Jul 8 17:02:35 CEST 2013

Thanks much to Chandru for the simplification and to Jesper for the helpful
and thoughtful insights.  It turned out that I tricked myself in three
different ways when trying to set up the ets grid at first.  For future
readers here were the fundamental flaws in my reasoning:

1.  My test ran a million iterations of ets:insert({random:uniform(N),
random:uniform(M), P}).  Although my target was eventually N = M = 1000, I
ran it with N = M = 10 in order to be able to print out the grid and
visually spot check it for uniformity.  ets is good at many things but
inserting into a set or a bag (and not a duplicate bag) of size 100000 is
not one of them.

2.  set and bag both prevent duplicates from being inserted, which
obviously, in hindsight, is super bad when the data structure is a linked
list, because the entire list has to be traversed every time.  For the
scenario in which you have control over whether duplication happens in the
first place, going to duplicate_bag is a massive performance improvement.

3.  random:uniform/1 is moderately fast but under profiling turned out to
be a significant part of the performance.  Since the locations of bucketed
entities wouldn't be random in practice but rather client-provided, I
switched to using 'N rem M' to better assess ets speed.


On Mon, Jul 8, 2013 at 7:38 AM, Jesper Louis Andersen <
jesper.louis.andersen@REDACTED> wrote:

> On Sat, Jul 6, 2013 at 3:54 AM, Felix Gallo <felixgallo@REDACTED> wrote:
>> In the mutable world, you might structure it as a hashtable of hashes, or
>> hashtable of sets etc., and use the coordinate tuple {x,y} as a key into
>> the hash table, then mutate the list.  addtocell(Coordinate, Value) and
>> removefromcell(Coordinate, Value) are very small, easy, and extremely fast
>> (sub-second for a million iterations of addtocell()).
> In a game, you will get into trouble with this representation alone. You
> need to have locational information about what are in the cells. In other
> words, given a thing in a cell, you must know, quickly what are adjacent or
> close to that cell in order to quickly asses what to do and so on.
> The trick in a game is spatial separation. In other words, you break the
> world into regions and then you handle each region by its own, gaining
> locality. The same is true in Erlang. You need to break your big array into
> a spatial structure, probably controlled by a process tree which handles
> each area of the whole world. In turn, this enables parallel execution and
> it also simplifies local decisions which can be made process-local.
> Take a look into stuff like K,d-trees, octrees and so on for ideas on how
> to approach the problem in a more general fasion.
> A blind grid can be made in ETS: {{X, Y}, Content}. But this runs the risk
> of having to copy Content to and from a process on each access and it is
> also not safe from races on insertion. Think is, this is not a trivial
> problem.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130708/138f94e9/attachment.htm>

More information about the erlang-questions mailing list