Handling huge amounts of data

Vlad Dumitrescu vlad_dumitrescu@REDACTED
Thu Jun 5 08:40:32 CEST 2003


> What are you doing with this data?  Does it have any
> regularity that can be captured in a function?  How
> much manipulation?  Would a binary work for you
> instead of lists?

Well, the data is a list (between 500.000 - 1.500.000 items) of chunks of
data that may be lists of bytes, or binaries and are 27 bytes each. For each
of them, I need to compare it with all the others in a non-trivial way (some
1000 similar tests), and select only those that pass the test.

The problem that I see is the globality of the search, which makes that I
can't use a "divide and conquer" strategy [*]. Also there is no good
locality of data access either.
[*] More precisely, in the worst case "d&c" will fall back onto "serch all"
after much work.

I tried several ways of storing the data in memory, as list of tuples, list
of binaries, ets table and in all cases the VM becomes slower and slower and
slower until it breaks.

I try to find something to do in Erlang in order to present the great
results and maybe allow further applications. We are a Web application that
uses J2EE, and that would make a great target, but this is the third
implementation in 3 years (don't ask why) and I feel nobody will accept
another one if this works...

regards,
Vlad



More information about the erlang-questions mailing list