<div class="gmail_quote"><br></div><div class="gmail_quote">Hi Joe,</div><div class="gmail_quote"><br></div><div class="gmail_quote">The main problem is to find out which strings are read-only and which strings are read-write, and that requires an algorithm for itself (processing time and extra space - I don't know how negligible are at this moment) as I don't know from before which string will be used more frequently and which less frequently. The second problem is I would like to minimize the harddisk usage, so, to try to store as much information as possible in RAM, but without slowing down the overall process. I know, I am an idealist. :)</div>

<div class="gmail_quote"><br></div><div class="gmail_quote">I thought also about working with lists and keep them as binaries when I don't use them, but, as I said before, that implies a lot of garbage to collect which either can be collected immediately after invoking list_to_binary/1, either allowing GC to appear naturally when there is insufficient memory, or to invoke it at certain moments (either at regular interval of time or based on a scheduler triggered by the application usage). I am afraid that all may be quite inefficient, but they may work faster than processing binaries directly. That I have no idea yet. That's why I am asking here for opinions.</div>

<div class="gmail_quote"><br></div><div class="gmail_quote">Nevertheless, I didn't think of trying to split the strings in two categories: read-only and read-write. That definitely is something I should take into account.</div>

<div class="gmail_quote"><br></div><div class="gmail_quote">Thanks a lot for your thoughts and shared experience.</div><div class="gmail_quote"><br></div><div class="gmail_quote">Cheers,</div><div class="gmail_quote">CGS</div>

<div class="gmail_quote"><br></div><div class="gmail_quote"><br></div><div class="gmail_quote"><br></div><div class="gmail_quote"><br></div><div class="gmail_quote">On Thu, Jul 12, 2012 at 5:17 PM, Joe Armstrong <span dir="ltr"><<a href="mailto:erlang@gmail.com" target="_blank">erlang@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">As you point out list processing is faster than binary processing.<br>

<br>

I'd keep things as lists as long as possible until you run into memory problems<br>

If you plot the number of strings against response times (or whatever)<br>

you should see a sudden decrease in performance when you start paging.<br>

At that point you have too much in memory - You could turn the oldest strings<br>

into binaries to save space.<br>

<br>

I generally keep string as lists when I'm working on them and turn<br>

them into binaries<br>

when I'm finished - sometimes even compressed binaries.<br>

<br>

Then it depends on the access patterns on the strings - random<br>

read-write access is horrible<br>

if you can split them into a read-only part and a write-part, you<br>

could keep the read-only bit<br>

as a binary and the writable bit as a list.<br>

<br>

It's worth spending a lot of effort to save a single disk access. Then<br>

it depends what you do with your strings. If you have a solid state<br>

disk and want read only access to the strings<br>

then you could store them on disk - or at least arrange so that the<br>

constant parts of the strings<br>

are on disk and the variable parts in memory. SSDs are about 10 times<br>

slower than RAM for reading and usually have multiple controllers so<br>

can be very fast - but you need to think a bit first.<br>

<br>

I'd start with a few measurements, try to stress the system and see<br>

where things go wrong.<br>

Plot the results - it's usually easy to see when things go wrong.<br>

<br>

Cheers<br>

<span class="HOEnZb"><font color="#888888"><br>

/Joe<br>

</font></span><div class="HOEnZb"><div class="h5"><br>

<br>

<br>

On Thu, Jul 12, 2012 at 2:46 PM, CGS <<a href="mailto:cgsmcmlxxv@gmail.com">cgsmcmlxxv@gmail.com</a>> wrote:<br>

> Hi,<br>

><br>

> I am trying to find a balance in between processing speed and RAM<br>

> consumption for sets of large strings (over 1 M characters per string). To<br>

> construct such lists is much faster than constructing its binary<br>

> counterpart. On the other hand, lists are using more RAM than binaries, and<br>

> that reduces the number of strings I can hold in memory (unless I transform<br>

> the lists in binaries and call GC after that, but that slows down the<br>

> processing time). Has anyone had this problem before? What was the solution?<br>

> Thoughts?<br>

><br>

> A middle way in between lists and binaries is using tuples, but handling<br>

> them is not as easy as in the case of lists or binaries, especially at<br>

> variable tuple size. Therefore, working with tuples seems not a good<br>

> solution. But I might be wrong, so, if anyone used tuples in an efficient<br>

> way for this case, please, let me know.<br>

><br>

> Any thought would be very much appreciated. Thank you.<br>

><br>

> CGS<br>

><br>

><br>

</div></div><div class="HOEnZb"><div class="h5">> _______________________________________________<br>

> erlang-questions mailing list<br>

> <a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>

> <a href="http://erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>

><br>

</div></div></blockquote></div><br>