[erlang-questions] Needed: Great big ordered int set.

Jachym Holecek freza@REDACTED
Mon Jul 8 20:21:17 CEST 2013


# Alex Arnon 2013-07-08:
> I'll probably end up doing just that, but was hoping I could resolve the
> thing in-process.

You could experiment with packing your fixed-width integers into binaries. Keep
integers sorted so you can bsearch on given binary chunk. Then build some index
on top of these, say and ordered_set ETS. If you're concerned about memory you
can also leverage in-chunk ordering for some simple but (probably) efficient
compression scheme (this will complicate lookup a bit, but also means higher
data density -- have a look at literature around column-oriented datastores).

Insert accumulates suitable number of entries on-heap, then builds up another
chunk and inserts into toplevel index -- ETS entry could be {Min_item, Bin}.
Lookup finds chunk (see ets:next/X, ets:prev/X) and searches in it. Foreach
and fold-merge are sweet because you can easily process N chunks in parallel
if you like (note fold-merge as opposed to inherenly sequential fold!). Range
queries are obvious. Random inserts would be pretty expensive I think.

Note I am here assuming that binaries are kept as off-heap objects even when
inserted in ETS. I believe that's the case, but haven't checked -- there are
viable alternatives (trees of processes were mentioned recently) so no big
deal even if I'm wrong.

So no: there probably isn't a ready-made structure for this in Erlang, but it
doesn't sound terribly hard to build something yourself -- you've mentioned
so many handy assumptions you can make that it almost looks easy. It's not
the kind of problem I would personally use an enterprise relational database
for... :-)

BR,
	-- Jachym



More information about the erlang-questions mailing list