<div style="white-space:pre-wrap">I am probably repeating what someone else already have said in some other similar thread. <br><br>The confusion between strings and [integer()] would have been greatly reduced if char() existed, $a wouldn't have to be syntactic sugar for 97 but  would actually be "character a". You would have to explicitly convert char() -> integer() and wise versa. This is how strings are implemented in ML and Haskell. <br><br>Regarding character encoding: inside Erlang Unicode could always be assumed, converson between different character encodings could be done on I/O. <br><br>/emil<br></div><br><div class="gmail_quote"><div dir="ltr">On Fri, 18 Mar 2016 at 00:51, Richard A. O'Keefe <<a href="mailto:ok@cs.otago.ac.nz">ok@cs.otago.ac.nz</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

<br>

On 17/03/16 11:53 pm, Steve Davis wrote:<br>

> > ROK said:<br>

> > Yawn.<br>

> (What am I doing trying to argue with ROK??? Am I MAD?)<br>

><br>

> 1) Why is it people rant about "string handling" in Erlang?<br>

<br>

Because it is not the same as Java.<br>

><br>

> 2) Principle of least surprise:<br>

> 1> [H|T] = [22,87,65,84,33].<br>

> [22,87,65,84,33]<br>

> 2> H.<br>

> 22<br>

> 3> T.<br>

> "WAT!”<br>

This is a legitimate complaint, but it confuses two things.<br>

There is *STRING HANDLING*, which is fine, and<br>

there is *LIST PRINTING*, which causes the confusion.<br>

<br>

For comparison, DEC-10 Prolog, PDP-11 Prolog, C-Prolog, and Quintus Prolog<br>

all did STRING HANDLING as lists of character codes, but<br>

all did LIST PRINTING without ever converting lists of numbers to strings.<br>

The answer was that there was a library procedure to print a list of<br>

integers as a string and you could call that whenever you wanted to,<br>

such as in a user-defined pretty-printing procedure.  Here's a transcript<br>

from SICStus Prolog:<br>

| ?- write([65,66,67]).<br>

[65,66,67]<br>

yes<br>

| ?- write("ABC").<br>

[65,66,67]<br>

yes<br>

<br>

The heuristic used by the debugger in some Prologs was that a list of<br>

integers between 32 and 126 inclusive was printed as a string; that<br>

broke down with Latin 1, and broke harder with Unicode.  The simple<br>

behaviour mandated by the standard that lists of integers print as<br>

lists of integers confuses people once, then they learn that string<br>

quotes are an input notation, not an output notation, and if they want<br>

string notation in output, they have to call a special procedure to get it.<br>

<br>

The ISO Prolog committee introduced a horrible alternative which the<br>

DEC-10 Prolog designers had experienced in some Lisp systems and<br>

learned to hate: flip a switch and "ABC" is read as ['A','B','C']. The<br>

principal reason given for that was that the output was semi-readable.<br>

One of my arguments against it was that this required every Prolog<br>

system to be able to hold 17*2**16 atoms, and I new for a fact that<br>

many would struggle to do so.  The retort was "they must be changed<br>

to make a special case for one-character atoms".  Oh well, no silver<br>

bullet.<br>

<br>

That does serve as a reminder, though, that using [a,b,c] instead of<br>

[$a,$b,$c] is *possible* in Erlang.<br>

<br>

Just to repeat the basic point: the printing of (some) integer lists as<br>

strings is SEPARABLE from the issue of how strings are represented and<br>

processed; that could be changed without anything else in the language<br>

changing.<br>

><br>

> 3) A codec should be perfectly reversible i.e. X = encode(decode(X)).<br>

> Without tagging, merely parsing out a string as a list is not<br>

> perfectly reversible.<br>

Here you are making a demand that very few other programming languages<br>

can support.  For example, take JavaScript.  "\u0041" is read as "A",<br>

and you are not going to get "\u0041" back from "A".  You're not even<br>

going to get "\x41" back from it, even though "\x41" == "A".<br>

<br>

Or take Erlang, where<br>

1> 'foo bar'.<br>

'foo bar'<br>

2> 'foobar'.<br>

foobar<br>

with the same kind of thing happening in Prolog.<br>

<br>

And of COURSE reading [1 /* one */, 2 /* deux */, 4 /* kvar */]<br>

in JavaScript preserves the comments so that re-encoding the<br>

data structure restores the input perfectly.  </sarc><br>

<br>

Or for that matter consider floating point numbers, where<br>

even the languages that produce the best possible conversions<br>

cannot promise that encode(decode(x)) == x.<br>

<br>

No, I'm sorry, this "perfectly reversible codec" requirement sets up<br>

a standard that NO programming language I'm aware of satisfies.<br>

It is, in fact, a straw man.  What you *can* ask, and what some<br>

language designers and implementers strive to give you, is<br>

     decode(encode(decode(x))) == decode(x).<br>

<br>

But to repeat the point made earlier, the way that lists of plausible<br>

character codes is printed is SEPARABLE from the way strings are<br>

represented and handled and in an ancestral language is SEPARATE.<br>

><br>

> 4) What is the right way to implement the function is_string(List)<br>

> correctly?<br>

><br>

> *ducks*<br>

<br>

That really is a "have you stopped beating your wife, answer yes or no"<br>

sort of question.<br>

<br>

It depends on the semantics you *want* it to have.  The Quintus<br>

library didn't provide any such predicate, but it did provide<br>

<br>

plausible_chars(Term)<br>

  when Term is a sequence of integers satisfying<br>

  is_graphic(C) or is_space(C),<br>

  possibly ending with a tail that is a variable or<br>

  a variable bound by numbervars/3.<br>

<br>

Notice the careful choice of name:  not IS (certainly) a string,<br>

but is a PLAUSIBLE list of characters.<br>

<br>

It was good enough for paying customers to be happy with the<br>

module it was part of (which was the one offering the<br>

non-usual portray_chars(Term) command).<br>

<br>

One of the representations Quintus used for strings (again, a<br>

library feature, not a core language feature) was in Erlang<br>

notation {external_string,FileName,Offset,Length}, and idea<br>

that struck the customer I developed it for as a great<br>

innovation, when I'd simply stolen it from Smalltalk!<br>

<br>

The thing is that STRINGS ARE WRONG for most things,<br>

however represented.  For example, when Java changed<br>

the representation of String so that slicing became a<br>

costly operation, I laughed, because I had my own representation<br>

of strings that provided O(1) concatenation as well as cheap<br>

slicing.  (Think Erlang iolists and you won't be far wrong.)<br>

The Pop2 language developed and used at Edinburgh<br>

represented file names as lists, e.g., [/dev/null] was in<br>

Erlang notation ['/',dev.'/',null].  This made file name<br>

manipulation easier than representing them as strings.<br>

Any time there is internal structure, any time there is scope<br>

for sharing substructure, any time you need to process<br>

the parts of a string, strings are wrong.<br>

<br>

The PERL lesson is that regular expressions are a fantastic<br>

tool for doing the wrong thing quite simply.<br>

<br>

<br>

<br>

_______________________________________________<br>

erlang-questions mailing list<br>

<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>

<a href="http://erlang.org/mailman/listinfo/erlang-questions" rel="noreferrer" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>

</blockquote></div>