[erlang-questions] lists, binaries or something else?

CGS cgsmcmlxxv@REDACTED
Fri Jul 13 17:57:17 CEST 2012


Yes, you are right. When I did the test, I thought also about that, so, I
made a third loop which should have appended the char to the binary string.
Unfortunately, I had a typo there and I ended up having the same loop for
binaries (and therefore the same numbers). Your message made me check it
again and I found that typo. I didn't find quite those ratios you got, but,
yes, they are comparable with your ratios. Thanks a lot.



On Fri, Jul 13, 2012 at 4:51 PM, Anders Nygren <anders.nygren@REDACTED>wrote:

> For some cases appending to a binary is much faster, see
> http://www.erlang.org/doc/efficiency_guide/binaryhandling.html#id65923
>
> So Your simple benchmark is not fair, You are comparing the best case
> for lists with the worst case for binaries. If You change it to
> appending instead of prepending You will get very different results.
> So as one of the first replies said "it depends".
>
> On my machine:
> For small N prepending to a list is ~2x faster than appending to a binary.
> When N=1E6 appending to a binary is ~2x faster than prepending to a list.
>
> /Anders
>
>
> On Fri, Jul 13, 2012 at 6:48 AM, CGS <cgsmcmlxxv@REDACTED> wrote:
> > Hi Bob,
> >
> > On Fri, Jul 13, 2012 at 2:32 AM, Bob Ippolito <bob@REDACTED> wrote:
> >>
> >> Sorry, but that's not enough information.
> >
> >
> > Sorry, but I didn't know what you meant by that. Here are the answers.
> >
> >>
> >>
> >> Where do these chunks come from: source code, some other process, ets?
> >
> >
> > Mostly from other processes.
> >
> >>
> >> How many chunks are in a string?
> >
> >
> > That is computed from the string size and chunk size (to be decided
> later).
> >
> >>
> >> Do you compose strings out of other strings, or just these chunks?
> >
> >
> > Just chunks inserted into existing/new strings.
> >
> >>
> >> Are you constructing them from tail to head like you would a list?
> >
> >
> > Unfortunately, not all the time.
> >
> >>
> >> Is the string constructed all at once, or over some time?
> >
> >
> > If you mean by that the string will be fully given in the same message or
> > whatever by other processes, the answer is no. Over some time may be the
> > answer, but with the remark that I have no idea what means "over some
> time"
> > as period of time (can get chunks one after the other for the same
> string or
> > for different strings).
> >
> >>
> >>
> >> It sounds like you have some code that's doing something with these
> >> strings, because you say that it's faster to use lists than binaries.
> Maybe
> >> if you post whatever it is you used to determine that, someone can help
> you
> >> find a better algorithm and/or data structure for that benchmark.
> >
> >
> > That is simple. For benchmark I just took two simple loops (only for
> > insertion):
> >
> > -export([loop_list/1, loop_binary/2]).
> >
> > loop_list(0) -> [];
> > loop_list(N) -> [107 | loop_list(N)].
> >
> > loop_binary(0,B) -> B;
> > loop_binary(N,B) -> loop_binary(N-1,<<107,B/binary>>).
> >
> > If you go in powers of tens and measure the execution time, you will see
> the
> > difference (you will also notice the drop in efficiency for binaries when
> > you need fragmentation for the same string, which is not visible in the
> case
> > of lists - or at least I couldn't notice). That is not a fully conclusive
> > test for my case, but it is quite a conclusive test for processing speed
> in
> > the two cases (simple expansion of the string by adding a char at the
> > beginning of the string), in which, for a 10^5 chars string, the list
> gains
> > at least one order of magnitude in processing time than its binary
> > counterpart (1) (2) (3).
> >
> > (1) Even if you change the list loop to have an accumulator in which the
> > list exists, there is still one order of magnitude difference.
> > (2) The results are specific for the machine on which the tests are done.
> > (3) That is just an example.
> >
> > CGS
> >
> >
> >>
> >>
> >>
> >> On Thursday, July 12, 2012, CGS wrote:
> >>>
> >>> Unfortunately, the project is still in the planning stage, so, no real
> >>> code was written yet. Nevertheless, I plan some open source projects
> for
> >>> some parts of the project.
> >>>
> >>> About each string, it is constructed from chunks of fixed size,
> usually,
> >>> much smaller than the string itself, hopefully.
> >>>
> >>>
> >>>
> >>> On Thu, Jul 12, 2012 at 7:29 PM, Bob Ippolito <bob@REDACTED> wrote:
> >>>
> >>> It would be helpful if you were a lot more specific about how these
> >>> strings are constructed and what the strings mean. Maybe if you shared
> some
> >>> of the code, you'd get better guidance.
> >>>
> >>>
> >>> On Thu, Jul 12, 2012 at 9:06 AM, CGS <cgsmcmlxxv@REDACTED> wrote:
> >>>
> >>>
> >>> Hi Joe,
> >>>
> >>> The main problem is to find out which strings are read-only and which
> >>> strings are read-write, and that requires an algorithm for itself
> >>> (processing time and extra space - I don't know how negligible are at
> this
> >>> moment) as I don't know from before which string will be used more
> >>> frequently and which less frequently. The second problem is I would
> like to
> >>> minimize the harddisk usage, so, to try to store as much information as
> >>> possible in RAM, but without slowing down the overall process. I know,
> I am
> >>> an idealist. :)
> >>>
> >>> I thought also about working with lists and keep them as binaries when
> I
> >>> don't use them, but, as I said before, that implies a lot of garbage to
> >>> collect which either can be collected immediately after invoking
> >>> list_to_binary/1, either allowing GC to appear naturally when there is
> >>> insufficient memory, or to invoke it at certain moments (either at
> regular
> >>> interval of time or based on a scheduler triggered by the application
> >>> usage). I am afraid that all may be quite inefficient, but they may
> work
> >>> faster than processing binaries directly. That I have no idea yet.
> That's
> >>> why I am asking here for opinions.
> >>>
> >>> Nevertheless, I didn't think of trying to split the strings in two
> >>> categories: read-only and read-write. That definitely is something I
> should
> >>> take into account.
> >>>
> >>> Thanks a lot for your thoughts and shared experience.
> >>>
> >>> Cheers,
> >>> CGS
> >>>
> >>>
> >>>
> >>>
> >>> On Thu, Jul 12, 2012 at 5:17 PM, Joe Armstrong <erlang@REDACTED>
> wrote:
> >>>
> >>> As you point out list processing is faster than binary processing.
> >>>
> >>> I'd keep things as lists as long as possible until you run into memory
> >>> problems
> >>> If you plot the number of strings against response times (or whatever)
> >>> you should see a sudden decrease in performance when you start paging.
> >>> At that point you have too much in memory - You could turn the oldest
> >>> strings
> >>> into binaries to save space.
> >>>
> >>> I generally keep string as lists when I'm working on them and turn
> >>> them into binaries
> >>> when I'm finished - sometimes even compressed binaries.
> >>>
> >>> Then it depends on the access patterns on the strings - random
> >>> read-write access is horrible
> >>> if you can split them into a read-only part and a write-part, you
> >>> could keep the read-only bit
> >>> as a binary and the writable bit as a list.
> >>>
> >>> It's worth spending a lot of effort to save a single disk access. Then
> >>> it depends what you do with your strings. If you have a solid state
> >>> disk and want read only access to the strings
> >>> then you could store them on disk - or at least arrange so that the
> >>> constant parts of the strings
> >>> are on disk and the variable parts in memory. SSDs are about 10 times
> >>> slower than RAM for reading and usually have multiple controllers so
> >>> can be very fast - but you need to think a bit first.
> >>>
> >>> I'd start with a few measurements, try to stress the system and see
> >>> where things go wrong.
> >>> Plot the results - it's usually easy to see when things go wrong.
> >>>
> >>> Cheers
> >>>
> >>> /Joe
> >>>
> >>>
> >>>
> >>> On Thu, Jul 12, 2012 at 2:46 PM, CGS <cgsmcmlxxv@REDACTED> wrote:
> >>> > Hi,
> >>> >
> >>> > I am trying to find a balance in between processing speed and RAM
> >>> > consumption for sets
> >
> >
> >
> > _______________________________________________
> > erlang-questions mailing list
> > erlang-questions@REDACTED
> > http://erlang.org/mailman/listinfo/erlang-questions
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20120713/05c91974/attachment.htm>


More information about the erlang-questions mailing list