[erlang-questions] The 2 GB limit

Valentin Micic v@REDACTED
Tue Nov 24 21:15:48 CET 2009


Yes, you can tune a number of slots -- I did try it once, but for whatever
reason it did not create any difference in my case (when the table grew to
around 400,000 records, performance deteriorated rapidly when inserting or
updating). It could be that the problem was elsewhere and that I did not
persevere in optimizing it -- it was so much easier to settle for mensia
fragmentation, which combined with an appropriate +A value made substantial
performance difference with disc_only_copy tables (dets).

I will be very glad if you arrive to a different conclusion regarding
min_no_slot testing.

V/

-----Original Message-----
From: Igor Ribeiro Sucupira [mailto:igorrs@REDACTED] 
Sent: 24 November 2009 08:24 PM
To: Valentin Micic
Cc: erlang-questions
Subject: Re: [erlang-questions] The 2 GB limit

Hello, Valentin.

I took a quick look at dets_v9.erl and I understand what you mean: as
you insert more records, you will have more and more data in each
slot.

But, from the documentation:
http://erlang.org/doc/man/dets.html
It seems you can tune the number of slots (see options: min_no_slots
and max_no_slots).

Today you have to create enough fragments so that none of them will
approach the 2 GB limit. If that limit was raised to, say, 4 GB, you
could in advance create your dets files with more slots, thus being
able to store more data without losing performance (and using less
fragments).

I have never tried to fine tune the number of slots in dets files
(just seen it in the docs), so maybe what I am saying causes other
performance problems (they do say something about it in the
documentation, but I'll have to see the source code to understand it).

Thank you.
Igor.

On Tue, Nov 24, 2009 at 2:49 PM, Valentin Micic <v@REDACTED>
wrote:
> If you assume data size (as in record size) to be constant, then file size
> is directly proportional to a number of the records.
>
> OTOH, if you take a file size as a constant (as in 2GB), then number of
> records is inversely proportional to the data size (as in record size).
>
> I've used the term "data density" (for the lack of the better word) to
> express a level of distribution of data among dets slots -- higher the
> number of records per slot, higher the data density. In this context, the
> value for data density is very much in a function of hashing algorithm
used
> to map a particular key value to a given slot. Assuming that hashing
> algorithm maps things (relatively) evenly, thus the "data density" ends up
> being function of number of records in dets file.
>
> What I was trying to tell you is that updates on dets that contains high
> number of records (and hence more records per dets slot, therefore higher
> "data density") may be quite processing intensive, as probability of not
> being able to host the resulting data within the memory allocated to the
> particular slot is higher with a higher "data density".
>
> With this in mind, I've been asking if increasing dets file beyond 2GB
would
> make any sense (because bigger file inevitably leads to a higher "data
> density") -- it is far healthier to distribute the data across multiple
dets
> files and keep the "data density" per file at lower level for the same
> amount of data.
>
> My apologies for the confusion.
>
> V/
>
>
> -----Original Message-----
> From: Igor Ribeiro Sucupira [mailto:igorrs@REDACTED]
> Sent: 24 November 2009 05:02 PM
> To: Valentin Micic
> Cc: erlang-questions
> Subject: Re: [erlang-questions] The 2 GB limit
>
> On Tue, Nov 24, 2009 at 12:47 PM, Valentin Micic <v@REDACTED>
> wrote:
>> I must express my skepticism regarding dets performance when a file
> reaches
>> such a level of "data density"
>
> Hello.
>
> Since I don't know very much about the inner workings of dets files,
> maybe I am confused by the terminology: wouldn't data density be
> inversely proportional to file size and directly proportional to data
> size?
>
> Thank you.
> Igor.
>
>> (assuming that 2 GB translates to at least a
>> few million records). As much as I believe that reading from such a big
> file
>> may be relatively fast, I think that updating the data would be
>> disproportionately slower. In my experience, the only way to maintain a
> good
>> performance with sizable data sets (say, 50-150 million records) in dets
> is
>> to distribute data over a number of dets files (similar to what mnesia
>> fragmentation does).
>> So, if my skepticism is justifiable, this is not a question of how
> difficult
>> it is to increase a dets file size limit, but would there be any point in
>> doing it?
>>
>> V/
>>
>> -----Original Message-----
>> From: erlang-questions@REDACTED [mailto:erlang-questions@REDACTED] On
>> Behalf Of Igor Ribeiro Sucupira
>> Sent: 24 November 2009 04:24 PM
>> To: erlang-questions
>> Subject: [erlang-questions] The 2 GB limit
>>
>> Hi.
>>
>> Is there any plan (or work in progress) for removing the 2 GB size
>> limit of dets files?
>>
>> And, while this is not accomplished, do you think it could be easier
>> to try to raise that limit to 4 GB? Assuming the issue is related to
>> 32-bit addressing, I'm guessing (just guessing) that dealing with 4 GB
>> files should not be difficult. Am I wrong? Why?
>>
>> Thanks.
>> Igor.
>>
>> --
>> "The secret of joy in work is contained in one word - excellence. To
>> know how to do something well is to enjoy it." - Pearl S. Buck.
>>
>> ________________________________________________________________
>> erlang-questions mailing list. See http://www.erlang.org/faq.html
>> erlang-questions (at) erlang.org
>>
>>
>
>
>
> --
> "The secret of joy in work is contained in one word - excellence. To
> know how to do something well is to enjoy it." - Pearl S. Buck.
>
>



-- 
"The secret of joy in work is contained in one word - excellence. To
know how to do something well is to enjoy it." - Pearl S. Buck.



More information about the erlang-questions mailing list