[erlang-questions] The 2 GB limit

Valentin Micic <>
Tue Nov 24 17:49:13 CET 2009


If you assume data size (as in record size) to be constant, then file size
is directly proportional to a number of the records.

OTOH, if you take a file size as a constant (as in 2GB), then number of
records is inversely proportional to the data size (as in record size).

I've used the term "data density" (for the lack of the better word) to
express a level of distribution of data among dets slots -- higher the
number of records per slot, higher the data density. In this context, the
value for data density is very much in a function of hashing algorithm used
to map a particular key value to a given slot. Assuming that hashing
algorithm maps things (relatively) evenly, thus the "data density" ends up
being function of number of records in dets file.

What I was trying to tell you is that updates on dets that contains high
number of records (and hence more records per dets slot, therefore higher
"data density") may be quite processing intensive, as probability of not
being able to host the resulting data within the memory allocated to the
particular slot is higher with a higher "data density".

With this in mind, I've been asking if increasing dets file beyond 2GB would
make any sense (because bigger file inevitably leads to a higher "data
density") -- it is far healthier to distribute the data across multiple dets
files and keep the "data density" per file at lower level for the same
amount of data.

My apologies for the confusion.

V/


-----Original Message-----
From: Igor Ribeiro Sucupira [mailto:] 
Sent: 24 November 2009 05:02 PM
To: Valentin Micic
Cc: erlang-questions
Subject: Re: [erlang-questions] The 2 GB limit

On Tue, Nov 24, 2009 at 12:47 PM, Valentin Micic <>
wrote:
> I must express my skepticism regarding dets performance when a file
reaches
> such a level of "data density"

Hello.

Since I don't know very much about the inner workings of dets files,
maybe I am confused by the terminology: wouldn't data density be
inversely proportional to file size and directly proportional to data
size?

Thank you.
Igor.

> (assuming that 2 GB translates to at least a
> few million records). As much as I believe that reading from such a big
file
> may be relatively fast, I think that updating the data would be
> disproportionately slower. In my experience, the only way to maintain a
good
> performance with sizable data sets (say, 50-150 million records) in dets
is
> to distribute data over a number of dets files (similar to what mnesia
> fragmentation does).
> So, if my skepticism is justifiable, this is not a question of how
difficult
> it is to increase a dets file size limit, but would there be any point in
> doing it?
>
> V/
>
> -----Original Message-----
> From:  [mailto:] On
> Behalf Of Igor Ribeiro Sucupira
> Sent: 24 November 2009 04:24 PM
> To: erlang-questions
> Subject: [erlang-questions] The 2 GB limit
>
> Hi.
>
> Is there any plan (or work in progress) for removing the 2 GB size
> limit of dets files?
>
> And, while this is not accomplished, do you think it could be easier
> to try to raise that limit to 4 GB? Assuming the issue is related to
> 32-bit addressing, I'm guessing (just guessing) that dealing with 4 GB
> files should not be difficult. Am I wrong? Why?
>
> Thanks.
> Igor.
>
> --
> "The secret of joy in work is contained in one word - excellence. To
> know how to do something well is to enjoy it." - Pearl S. Buck.
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
>
>



-- 
"The secret of joy in work is contained in one word - excellence. To
know how to do something well is to enjoy it." - Pearl S. Buck.



More information about the erlang-questions mailing list