[erlang-questions] strip end off of string

Fri May 7 07:30:55 CEST 2010

On May 7, 2010, at 1:20 AM, Wes James wrote:

> On Wed, May 5, 2010 at 8:08 PM, Richard O'Keefe <ok@REDACTED>  
> wrote:
>>
>
> <snip>
>
> Thanks for your comments on the above :)
>
>> t2(String) ->
>>   t2a(lists:reverse(String)).
>>
>
> The thought just came to me.  Does reverse actually reverse all the
> data

Yes.
> in string and work on it or does it just work from the end going
> backward.

It can't.
Let's make three strings.

    X = "bit",
	% X  : CONS, $a, ->X'
	% X' : CONS, $b, ->X"
	% X" : CONS, $c, []
    Y = "or" ++ X,
	% Y  : CONS, $o, ->Y'
	% Y' : CONS, $r, ->X
    Z = "rab" ++ X,
	% Z  : CONS, $r, ->Z'
	% Z' : CONS, $a, ->Z"
	% Z" : CONS, $b, ->X

The comment lines are supposed to show you an "assembly code level"
view of the actual data structure.  (Whether the CONS tags are in
the *records* or on the *pointers* is immaterial.  Normal C practice
is to put the tags in the records; normal Lisp implementation
practice is to put the tags on the pointers.)  So Z is a pointer
to a record holding an $r and a pointer to a record holding an $a
and a pointer to a record holding a $b and a pointer which is the
*same* pointer as X.  The three records of "bit" are *shared* by
both Y and Z.  So if you "worked from the end going backward",
(a) in the absence of back pointers, and they *are* absent,
     how could you do it at all?
(b) when you get to the "b" of "bit", how do you tell whether to
     continue with the "r" of "orbit", or with the "b" of "rabbit",
     or to stop?

If you have a BSD system, like a Mac, "man queue" will tell you
about <sys/queue.h>, which has SLIST, STAILQ, LIST, and TAILQ;
the one that resembles Erlang is SLIST.  Look in <sys/queue.h>
for the code.

The bottom line is that lists can *only* be traversed from front
to back, because there are *only* forward pointers.

>  It seems there would be a performance hit on long "String"s
> if reverse actually reversed all the items in to another string then
> worked on it.  Do you know how reverse actually does this?

Look at the code in lib/stdlib/src/lists.erl.
lists:reverse/1 is now a built-in function, but it _used_ to be

reverse(Xs) ->
     reverse(Xs, []).

reverse([H|T], Y) ->
     reverse(T, [H|Y]);
reverse([], X) ->
     X.

If you want to tell whether a string ends with something,
like "</TD>", for example, you could write

ends_with_TD("</TD>") ->
     true;
ends_with_TD([_|S]) ->
     ends_with_TD(S);
ends_with_TD([]) ->
     false.

One of the things I keep telling people is "STRINGS ARE WRONG".
For many purposes, a list of tokens is superior to a string.
(For most other purposes, some kind of tree is even better.
I've had spectacular speedups in C from using trees of
fragments instead of vectors of characters.)