[erlang-questions] lists, binaries or something else?

CGS cgsmcmlxxv@REDACTED
Thu Jul 12 18:06:18 CEST 2012


Hi Joe,

The main problem is to find out which strings are read-only and which
strings are read-write, and that requires an algorithm for itself
(processing time and extra space - I don't know how negligible are at this
moment) as I don't know from before which string will be used more
frequently and which less frequently. The second problem is I would like to
minimize the harddisk usage, so, to try to store as much information as
possible in RAM, but without slowing down the overall process. I know, I am
an idealist. :)

I thought also about working with lists and keep them as binaries when I
don't use them, but, as I said before, that implies a lot of garbage to
collect which either can be collected immediately after invoking
list_to_binary/1, either allowing GC to appear naturally when there is
insufficient memory, or to invoke it at certain moments (either at regular
interval of time or based on a scheduler triggered by the application
usage). I am afraid that all may be quite inefficient, but they may work
faster than processing binaries directly. That I have no idea yet. That's
why I am asking here for opinions.

Nevertheless, I didn't think of trying to split the strings in two
categories: read-only and read-write. That definitely is something I should
take into account.

Thanks a lot for your thoughts and shared experience.

Cheers,
CGS




On Thu, Jul 12, 2012 at 5:17 PM, Joe Armstrong <erlang@REDACTED> wrote:

> As you point out list processing is faster than binary processing.
>
> I'd keep things as lists as long as possible until you run into memory
> problems
> If you plot the number of strings against response times (or whatever)
> you should see a sudden decrease in performance when you start paging.
> At that point you have too much in memory - You could turn the oldest
> strings
> into binaries to save space.
>
> I generally keep string as lists when I'm working on them and turn
> them into binaries
> when I'm finished - sometimes even compressed binaries.
>
> Then it depends on the access patterns on the strings - random
> read-write access is horrible
> if you can split them into a read-only part and a write-part, you
> could keep the read-only bit
> as a binary and the writable bit as a list.
>
> It's worth spending a lot of effort to save a single disk access. Then
> it depends what you do with your strings. If you have a solid state
> disk and want read only access to the strings
> then you could store them on disk - or at least arrange so that the
> constant parts of the strings
> are on disk and the variable parts in memory. SSDs are about 10 times
> slower than RAM for reading and usually have multiple controllers so
> can be very fast - but you need to think a bit first.
>
> I'd start with a few measurements, try to stress the system and see
> where things go wrong.
> Plot the results - it's usually easy to see when things go wrong.
>
> Cheers
>
> /Joe
>
>
>
> On Thu, Jul 12, 2012 at 2:46 PM, CGS <cgsmcmlxxv@REDACTED> wrote:
> > Hi,
> >
> > I am trying to find a balance in between processing speed and RAM
> > consumption for sets of large strings (over 1 M characters per string).
> To
> > construct such lists is much faster than constructing its binary
> > counterpart. On the other hand, lists are using more RAM than binaries,
> and
> > that reduces the number of strings I can hold in memory (unless I
> transform
> > the lists in binaries and call GC after that, but that slows down the
> > processing time). Has anyone had this problem before? What was the
> solution?
> > Thoughts?
> >
> > A middle way in between lists and binaries is using tuples, but handling
> > them is not as easy as in the case of lists or binaries, especially at
> > variable tuple size. Therefore, working with tuples seems not a good
> > solution. But I might be wrong, so, if anyone used tuples in an efficient
> > way for this case, please, let me know.
> >
> > Any thought would be very much appreciated. Thank you.
> >
> > CGS
> >
> >
> > _______________________________________________
> > erlang-questions mailing list
> > erlang-questions@REDACTED
> > http://erlang.org/mailman/listinfo/erlang-questions
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20120712/296a0ef2/attachment.htm>


More information about the erlang-questions mailing list