File sharing software

Wed May 3 01:20:14 CEST 2006

Richard,

Thanks for the reply. My response is below.

On May 2, 2006, at 5:34 PM, Richard A. O'Keefe wrote:

> Yariv Sadan <yariv@REDACTED> wrote:
> 	Some issues that I didn't find very reassuring, however, were
> 	mnesia's rather limited table size (2 GB according to
> 	http://www.erlang.org/doc/doc-5.4.13/lib/
> 	stdlib-1.13.12/doc/html/index.html)
>
> I presume this is a reference to the following sentence in the
> documentation for the 'dets' module:
>
>     The size of Dets files cannot exceed 2 GB.
>
> But what about the very next sentence?
>
>     If larger tables are needed, Mnesia's table fragmentation can  
> be used.
>
> Being a bear of very little brain, I thought this meant that while
> *dets* was limited to 2GB for a table, *mnesia* wasn't, and when I
> look at the Mnesia manual, section 1.5.3 seems to say that very thing:
>
>     1.5.3 Table Fragmentation
>
>     The Concept
>
>     A concept of table fragmentation has been introduced in order to
>     cope with very large tables.  The idea is to split a table into
>     several more manageable fragments.  Each fragment is  
> implemented as
>     a first class Mnesia table and may be replicated, have indices  
> etc.
>     as any other table.  But the tables may neither have local content
>     nor have the snmp connection activated.  ...
>
> I must be getting senile.  Try as I might, I cannot make this mean  
> anything
> other than "mnesia can handle logical tables bigger than 2GB".

Yes, I did see that Mnesia can fragment Dets tables that must grow  
over 2GB, but this solution just wasn't very appealing to me because  
I'd rather use a storage engine that "just works" without my having  
to manage table fragmentation as the data grows. Fragmentation just  
sounded like it would be too fragile and require too much  
maintenance, but then again I don't have experience with Mnesia so I  
may very well be wrong on this point. (However, I did post a question  
about these issues to the mailing list and the only answers I got  
were that Mnesia was indeed not very good at handling large blobs.)

Too bad you didn't address my other concern, which is that, according  
to the older discussions, large dets tables take a considerable  
amount of time to fix in the event of a crash. If you're planning on  
running a multi-terabyte production environment with a large number  
of users, this isn't a characteristic you want in your storage engine.

>
> 	Clearly, the size of the Erlang runtime made Erlang too heavy for the
> 	client, which needed to be as light as possible. I would have only
> 	considered Erlang if the runtime were no more than a few hundred KB.
>
> These days, that's a really strict limitation.
> To take a few examples lying around on my disc:
>
>     The Freely Distributable Math Library (fdlibm) compiles to a
>     binary (libm.a) that is 189 kB.
>
>     /usr/lib/libc.a is 1.84 MB (so this would apparently rule out  
> using C...)
>
>     The Perl-Compatible Regular Expression library (PCRE)
>     has a /usr/local/lib/libpcre.a that is 172 kB, and that's
>     *without* the "90kB" of extra tables needed to support Unicode
>     character properties (I decided I didn't want that yet).
>     *With* that, we'd be looking at 260 kB.
>
>     The GNU character encoding library, /usr/local/lib/libiconv.so,
>     is 1.28 MB
>
>     Can we think of a nice small language?  How about a language  
> that has
>     _very_ similar data structures to Erlang, but is a much smaller  
> language.
>     Scheme!  That's it!  Well, /usr/local/lib/libgambc.so.1.1 is  
> 1.46MB.
>
>     One thing Scheme and Erlang share is arbitrary precision integers.
>     So how big is the GNU mp library?  267 kB.
>
> If efficient bignum support is 267 kB, and if trig functions &c take
> 189 kB, and Erlang has bignums and trig functions (which it does),
> then there is NO WAY that an Erlang runtime can fit in "no more than
> a few hundred kB".
>

The tradeoff between using a high-level language and size/memory/ 
performance isn't always easy to weigh and the decision of which  
language to use depends on many variables such as the application's  
functionality, target audience, distribution channel, and the  
programmer's personal taste. As I said, my personal inclination is to  
stick with C/C++ and to use as few libraries as possible but I didn't  
suggest that's a universal truth. I actually think a stripped down  
version of Erlang would be pretty cool :)

Yariv