[erlang-questions] VM & BEAM Specs : idea for improving thelists support

ok <>
Wed Aug 8 07:50:25 CEST 2007

I wrote:
>> Only if someone is daft enough to store the whole thing as a  
>> simple list of characters.  There are much better ways, in the  
>> language and
>> using the implementation that we now have.

On 7 Aug 2007, at 5:57 am, David Mercer asked:
> For newbies among us, are you referring to storing it as a binary, or
> something else?

First possibility: don't store the whole thing.  Work on it a chunk
at a time.  (Think "SAX" rather than "DOM".)  Processing stuff in
chunks often lets you produce summaries as you go instead of waiting
until everything has been read.

Second possibility: use binaries (carefully).  With the <<"string">>
syntax this is readable, and in effect gives you precisely the byte
strings that other languages give you.  (We could do with more string
functions supporting this form, but that's another issue.)

Third possibility: represent structured text as tree structures with
only the "free" stuff as strings, which might themselves be binaries.
This can be so much easier to generate and manipulate that it really
isn't funny.  As a rule, "strings" are useful for input and output,
but not for processing.

Fourth possiblity: hold large chunks of "free" text as data in DETS or
Mnesia, and just keep keys pointing to these chunks in memory.  Only
pull a chunk in when you have an immediate use for it.  Once a potential
customer was visiting Quintus and they expressed a concern about working
with hundreds of megabytes of text in Prolog:  I went off to a terminal
and 90 minutes later had a library module that held "Character Large
Objects" in a file and in Prolog only kept {offset, length} references.
Doing it with DETS should be even easier.

Fifth possibility: build a dictionary of words and represent text as a
list of word numbers.  (Look up "spaceless word encoding"; a good
description can be found in the book "Managing Gigabytes".)

Sixth possibility: oh the heck with it; it's _easy_ to think these up.

More information about the erlang-questions mailing list