[erlang-questions] Erlang Syntax and "Patterns" (Again)

Emil Holmstrom emil@REDACTED
Fri Mar 18 17:30:02 CET 2016


I am probably repeating what someone else already have said in some other
similar thread.

The confusion between strings and [integer()] would have been greatly
reduced if char() existed, $a wouldn't have to be syntactic sugar for 97
but would actually be "character a". You would have to explicitly convert
char() -> integer() and wise versa. This is how strings are implemented in
ML and Haskell.

Regarding character encoding: inside Erlang Unicode could always be
assumed, converson between different character encodings could be done on
I/O.

/emil

On Fri, 18 Mar 2016 at 00:51, Richard A. O'Keefe <ok@REDACTED> wrote:

>
>
> On 17/03/16 11:53 pm, Steve Davis wrote:
> > > ROK said:
> > > Yawn.
> > (What am I doing trying to argue with ROK??? Am I MAD?)
> >
> > 1) Why is it people rant about "string handling" in Erlang?
>
> Because it is not the same as Java.
> >
> > 2) Principle of least surprise:
> > 1> [H|T] = [22,87,65,84,33].
> > [22,87,65,84,33]
> > 2> H.
> > 22
> > 3> T.
> > "WAT!”
> This is a legitimate complaint, but it confuses two things.
> There is *STRING HANDLING*, which is fine, and
> there is *LIST PRINTING*, which causes the confusion.
>
> For comparison, DEC-10 Prolog, PDP-11 Prolog, C-Prolog, and Quintus Prolog
> all did STRING HANDLING as lists of character codes, but
> all did LIST PRINTING without ever converting lists of numbers to strings.
> The answer was that there was a library procedure to print a list of
> integers as a string and you could call that whenever you wanted to,
> such as in a user-defined pretty-printing procedure.  Here's a transcript
> from SICStus Prolog:
> | ?- write([65,66,67]).
> [65,66,67]
> yes
> | ?- write("ABC").
> [65,66,67]
> yes
>
> The heuristic used by the debugger in some Prologs was that a list of
> integers between 32 and 126 inclusive was printed as a string; that
> broke down with Latin 1, and broke harder with Unicode.  The simple
> behaviour mandated by the standard that lists of integers print as
> lists of integers confuses people once, then they learn that string
> quotes are an input notation, not an output notation, and if they want
> string notation in output, they have to call a special procedure to get it.
>
> The ISO Prolog committee introduced a horrible alternative which the
> DEC-10 Prolog designers had experienced in some Lisp systems and
> learned to hate: flip a switch and "ABC" is read as ['A','B','C']. The
> principal reason given for that was that the output was semi-readable.
> One of my arguments against it was that this required every Prolog
> system to be able to hold 17*2**16 atoms, and I new for a fact that
> many would struggle to do so.  The retort was "they must be changed
> to make a special case for one-character atoms".  Oh well, no silver
> bullet.
>
> That does serve as a reminder, though, that using [a,b,c] instead of
> [$a,$b,$c] is *possible* in Erlang.
>
> Just to repeat the basic point: the printing of (some) integer lists as
> strings is SEPARABLE from the issue of how strings are represented and
> processed; that could be changed without anything else in the language
> changing.
> >
> > 3) A codec should be perfectly reversible i.e. X = encode(decode(X)).
> > Without tagging, merely parsing out a string as a list is not
> > perfectly reversible.
> Here you are making a demand that very few other programming languages
> can support.  For example, take JavaScript.  "\u0041" is read as "A",
> and you are not going to get "\u0041" back from "A".  You're not even
> going to get "\x41" back from it, even though "\x41" == "A".
>
> Or take Erlang, where
> 1> 'foo bar'.
> 'foo bar'
> 2> 'foobar'.
> foobar
> with the same kind of thing happening in Prolog.
>
> And of COURSE reading [1 /* one */, 2 /* deux */, 4 /* kvar */]
> in JavaScript preserves the comments so that re-encoding the
> data structure restores the input perfectly.  </sarc>
>
> Or for that matter consider floating point numbers, where
> even the languages that produce the best possible conversions
> cannot promise that encode(decode(x)) == x.
>
> No, I'm sorry, this "perfectly reversible codec" requirement sets up
> a standard that NO programming language I'm aware of satisfies.
> It is, in fact, a straw man.  What you *can* ask, and what some
> language designers and implementers strive to give you, is
>      decode(encode(decode(x))) == decode(x).
>
> But to repeat the point made earlier, the way that lists of plausible
> character codes is printed is SEPARABLE from the way strings are
> represented and handled and in an ancestral language is SEPARATE.
> >
> > 4) What is the right way to implement the function is_string(List)
> > correctly?
> >
> > *ducks*
>
> That really is a "have you stopped beating your wife, answer yes or no"
> sort of question.
>
> It depends on the semantics you *want* it to have.  The Quintus
> library didn't provide any such predicate, but it did provide
>
> plausible_chars(Term)
>   when Term is a sequence of integers satisfying
>   is_graphic(C) or is_space(C),
>   possibly ending with a tail that is a variable or
>   a variable bound by numbervars/3.
>
> Notice the careful choice of name:  not IS (certainly) a string,
> but is a PLAUSIBLE list of characters.
>
> It was good enough for paying customers to be happy with the
> module it was part of (which was the one offering the
> non-usual portray_chars(Term) command).
>
> One of the representations Quintus used for strings (again, a
> library feature, not a core language feature) was in Erlang
> notation {external_string,FileName,Offset,Length}, and idea
> that struck the customer I developed it for as a great
> innovation, when I'd simply stolen it from Smalltalk!
>
> The thing is that STRINGS ARE WRONG for most things,
> however represented.  For example, when Java changed
> the representation of String so that slicing became a
> costly operation, I laughed, because I had my own representation
> of strings that provided O(1) concatenation as well as cheap
> slicing.  (Think Erlang iolists and you won't be far wrong.)
> The Pop2 language developed and used at Edinburgh
> represented file names as lists, e.g., [/dev/null] was in
> Erlang notation ['/',dev.'/',null].  This made file name
> manipulation easier than representing them as strings.
> Any time there is internal structure, any time there is scope
> for sharing substructure, any time you need to process
> the parts of a string, strings are wrong.
>
> The PERL lesson is that regular expressions are a fantastic
> tool for doing the wrong thing quite simply.
>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160318/85403c41/attachment.htm>


More information about the erlang-questions mailing list