[erlang-questions] idiomatic large data sets; shared and unshared

Chandru chandrashekhar.mullaparthi@REDACTED
Sat Jul 6 14:55:44 CEST 2013


On 6 July 2013 02:54, Felix Gallo <felixgallo@REDACTED> wrote:

>
> In my continuing quest to learn erlang, I'm trying to implement a
> 1000x1000 grid into which items may be placed or removed, with perhaps a
> millon items in placement at once, spread fairly evenly across the grid.
>  No grid cell will hold more than a thousand things.  In the future, this
> grid may be shared across multiple processes, but for now I'm content to
> just try to get it working for one.
>
> In the mutable world, you might structure it as a hashtable of hashes, or
> hashtable of sets etc., and use the coordinate tuple {x,y} as a key into
> the hash table, then mutate the list.  addtocell(Coordinate, Value) and
> removefromcell(Coordinate, Value) are very small, easy, and extremely fast
> (sub-second for a million iterations of addtocell()).
>
> In erlang world, the naive approach of attempting to use a list of lists
> (and then a list of sets), and lists:keytake() worked for small grid
> dimensions (10x10) but ran for several minutes before I killed it when I
> moved up in grid size and tried to insert a million things, with no
> indication when it would ever finish.
>
> The slightly less naive approach of using ets worked for larger grid
> dimensions, but was also highly nonperformant (60s for a million iterations
> of addtocell()).
>
>
If I use the cell-id (Row-index, Col-index) as the key for each row in an
ETS table, I am able to insert a million cells in about 1.25 seconds. The
timing is the same whether the ets table is of type set or bag. Have I
misunderstood your problem?

16> grid:init_grid(set).
<0.81.0>
17> grid:add_cells(1000, 1000).
1252287
18> ets:info(grid, size).
1000000
19>
19> grid ! stop.
stop
20>
20>
20> grid:init_grid(bag).
<0.86.0>
21> grid:add_cells(1000, 1000).
1248606
22> grid:add_cells(1000, 1000).
1247905
23> grid:add_cells(1000, 1000).
1345457
24> ets:info(grid, size).
3000000

%%------------------------ grid.erl --------------------
-module(grid).
-compile(export_all).

init_grid(Type) ->
    proc_lib:spawn(?MODULE, grid_owner, [Type]).

grid_owner(Type) ->
    ets:new(grid, [named_table, public, Type]),
    register(grid, self()),
    receive
        stop ->
            ok
    end.

add_cells(Num_rows, Num_cols) ->
    Start = now(),
    add_to_cell(Num_rows, Num_cols),
    End = now(),
    timer:now_diff(End, Start).

add_to_cell(0, _) ->
    ok;
add_to_cell(Row_num, Num_cols) ->
    add_to_row(Row_num, Num_cols),
    add_to_cell(Row_num - 1, Num_cols).

add_to_row(_, 0) ->
    ok;
add_to_row(Row_num, Col_num) ->
    ets:insert(grid, {{Row_num, Col_num}, os:timestamp()}),
    add_to_row(Row_num, Col_num - 1).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130706/6b9a3c8f/attachment.htm>


More information about the erlang-questions mailing list