[erlang-questions] string concatenation efficiency

Mon Feb 1 17:01:30 CET 2016

On Sat, Jan 30, 2016 at 5:04 PM, Khitai Pang <khitai.pang@REDACTED> wrote:
> Thank you all for your answers.
>
> The concatenated string is a redis key, it will be part of an erlang message
> that will be sent to a redis client erlang process.   I tried sending the
> iolist ["Item:{",ItemId,"}"], and it works fine. I think this is the most
> efficient way.

It *is* the most efficient way - but remember what you sent, some other program
has to receive - the amount of work done in transporting the message
and delivering it to the receiver will far outweigh the cost of building the
message prior to sending it.

The rule of thumb is that parsing is *always* slower than serialising
messages - why? in parsing you must take a decision at every byte. In
serializing a message
you take a decision at every node in a parse-tree.

So you should not be worrying about what you send - but you should worry about
how it will be received.

Building the message from variables into an I/O list or with "++"
has a trivial cost compared to all the machine cycles taken up
transporting the message.

Even measuring the cost of different methods won't tell you much.
You'll find out that A (IO-lists) is faster or slower than B (++) but all the
real time is taken up in X and Y (which you didn't know about)

People happily send KBytes encrypted data in XML, when single
unencrypted bytes might have sufficed.

The only thing you should worry about is end-to-end efficiency.

So:

  1) Make everything
  2) Measure
  3) Fast enough?
      Yes
           Happy
      No
          Can I wait ten years?
          No
             measure to find bit that takes the most time
             fix worse bit goto 2)
         Yes
             wait ten years - go to 2)

Patience is a virtue - the programs I wrote 20 years ago now run
thousands of times faster - not because I'm a better programmer but
because we have
GHz clocks now (they were MHz 20 years ago)

You want to write clear short, elegant code that will still run in 20 years time

Given the choice between clear code that is slow but fast-enough and
fast code that is unclear I always choose the clear clode.

The key point is fast-enough.

Cheers

/Joe

>
>
> Thanks
> Khitai
>
>
> On 2016/1/29 9:24, Richard A. O'Keefe wrote:
>>
>>
>> On 28/01/16 8:22 pm, Khitai Pang wrote:
>>>
>>> For string concatenation, which one of the following is the most
>>> efficient?
>>>
>>> 1)
>>> strings:join(["Item:{", ItemID, "}")], ""]
>>>
>>> 2)
>>> lists:append(["Item:{", ItemID, "}")])
>>>
>>> 3)
>>> "Item:{" ++ ItemID ++ "}"
>>>
>>> Here ItemID is a UUID.
>>
>> For something this size, it hardly matters.
>>
>> The definition of lists:append/1 is
>>    append([E])     -> E;
>>    append([H|T]) -> H ++ append(T);
>>    append([])      -> [].
>> so lists:append([A,B,C]) -> A ++ lists:append([B,C])
>>                                     -> A ++ B ++ lists:append([C])
>>                                     -> A ++ B ++ C.
>> Clearly, there is no way for this to be more efficient than
>> writing A ++ B ++ C directly.
>>
>> The definition of string:join/2 is
>>    join([], Sep) when is_list(Sep) -> [];
>>    join([H|T], Sep)                      ->    H ++ lists:append([Sep ++ X
>> || X <- T]).
>> -- at least in 18.1 -- which rather startled me because I was expecting
>>    join([E], Sep) when is_list(Sep) -> E;
>>    join([H|T], Sep) -> H ++ Sep ++ join(T, Sep);
>>    join([], Sep) when is_list(Sep) -> [].
>> Clearly, there is no way that either version of join/2 can be faster than
>> append/1.  In fact
>>  append/1 is measurably faster than
>>  the revised join/2, which is measurably faster than
>>  the original join/2,
>> but you have to push the sizes ridiculously high to make these
>> measurements.
>>
>> Really, I suggest you write whichever makes your intentions clearest
>> to human readers, get the program going, then measure it.  I would
>> be surprised if this particular issue made any significant difference.
>>
>> But why are you using strings at all?
>> Strings are an *interface* data type; you process them when you receive
>> data from an external source and you generate them (or rather you
>> generate iolists) when you are sending data to an external source, but
>> for internal processing you usually want some sort of tree.
>>
>> Seeing "Item:{$ItemID}" suggests that maybe you *are* generating output
>> for some external process, but in that case, perhaps you should just be
>> sending along the iolist ["Item:{",ItemId,"}"] and *not* concatenating the
>> pieces.
>>
>>
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions