[erlang-questions] Erlang Syntax and "Patterns" (Again)
Hynek Vychodil
vychodil.hynek@REDACTED
Fri Mar 18 18:15:22 CET 2016
A superlative suggestion sir, with only two minor drawbacks: one, Erlang is
dynamically typed language and two, Erlang is dynamically typed language. I
know that technically that’s only one drawback, but I thought it was such a
big one it was worth mentioning twice.
Hynek
On Fri, Mar 18, 2016 at 5:30 PM, Emil Holmstrom <emil@REDACTED> wrote:
> I am probably repeating what someone else already have said in some other
> similar thread.
>
> The confusion between strings and [integer()] would have been greatly
> reduced if char() existed, $a wouldn't have to be syntactic sugar for 97
> but would actually be "character a". You would have to explicitly convert
> char() -> integer() and wise versa. This is how strings are implemented in
> ML and Haskell.
>
> Regarding character encoding: inside Erlang Unicode could always be
> assumed, converson between different character encodings could be done on
> I/O.
>
> /emil
>
> On Fri, 18 Mar 2016 at 00:51, Richard A. O'Keefe <ok@REDACTED>
> wrote:
>
>>
>>
>> On 17/03/16 11:53 pm, Steve Davis wrote:
>> > > ROK said:
>> > > Yawn.
>> > (What am I doing trying to argue with ROK??? Am I MAD?)
>> >
>> > 1) Why is it people rant about "string handling" in Erlang?
>>
>> Because it is not the same as Java.
>> >
>> > 2) Principle of least surprise:
>> > 1> [H|T] = [22,87,65,84,33].
>> > [22,87,65,84,33]
>> > 2> H.
>> > 22
>> > 3> T.
>> > "WAT!”
>> This is a legitimate complaint, but it confuses two things.
>> There is *STRING HANDLING*, which is fine, and
>> there is *LIST PRINTING*, which causes the confusion.
>>
>> For comparison, DEC-10 Prolog, PDP-11 Prolog, C-Prolog, and Quintus Prolog
>> all did STRING HANDLING as lists of character codes, but
>> all did LIST PRINTING without ever converting lists of numbers to strings.
>> The answer was that there was a library procedure to print a list of
>> integers as a string and you could call that whenever you wanted to,
>> such as in a user-defined pretty-printing procedure. Here's a transcript
>> from SICStus Prolog:
>> | ?- write([65,66,67]).
>> [65,66,67]
>> yes
>> | ?- write("ABC").
>> [65,66,67]
>> yes
>>
>> The heuristic used by the debugger in some Prologs was that a list of
>> integers between 32 and 126 inclusive was printed as a string; that
>> broke down with Latin 1, and broke harder with Unicode. The simple
>> behaviour mandated by the standard that lists of integers print as
>> lists of integers confuses people once, then they learn that string
>> quotes are an input notation, not an output notation, and if they want
>> string notation in output, they have to call a special procedure to get
>> it.
>>
>> The ISO Prolog committee introduced a horrible alternative which the
>> DEC-10 Prolog designers had experienced in some Lisp systems and
>> learned to hate: flip a switch and "ABC" is read as ['A','B','C']. The
>> principal reason given for that was that the output was semi-readable.
>> One of my arguments against it was that this required every Prolog
>> system to be able to hold 17*2**16 atoms, and I new for a fact that
>> many would struggle to do so. The retort was "they must be changed
>> to make a special case for one-character atoms". Oh well, no silver
>> bullet.
>>
>> That does serve as a reminder, though, that using [a,b,c] instead of
>> [$a,$b,$c] is *possible* in Erlang.
>>
>> Just to repeat the basic point: the printing of (some) integer lists as
>> strings is SEPARABLE from the issue of how strings are represented and
>> processed; that could be changed without anything else in the language
>> changing.
>> >
>> > 3) A codec should be perfectly reversible i.e. X = encode(decode(X)).
>> > Without tagging, merely parsing out a string as a list is not
>> > perfectly reversible.
>> Here you are making a demand that very few other programming languages
>> can support. For example, take JavaScript. "\u0041" is read as "A",
>> and you are not going to get "\u0041" back from "A". You're not even
>> going to get "\x41" back from it, even though "\x41" == "A".
>>
>> Or take Erlang, where
>> 1> 'foo bar'.
>> 'foo bar'
>> 2> 'foobar'.
>> foobar
>> with the same kind of thing happening in Prolog.
>>
>> And of COURSE reading [1 /* one */, 2 /* deux */, 4 /* kvar */]
>> in JavaScript preserves the comments so that re-encoding the
>> data structure restores the input perfectly. </sarc>
>>
>> Or for that matter consider floating point numbers, where
>> even the languages that produce the best possible conversions
>> cannot promise that encode(decode(x)) == x.
>>
>> No, I'm sorry, this "perfectly reversible codec" requirement sets up
>> a standard that NO programming language I'm aware of satisfies.
>> It is, in fact, a straw man. What you *can* ask, and what some
>> language designers and implementers strive to give you, is
>> decode(encode(decode(x))) == decode(x).
>>
>> But to repeat the point made earlier, the way that lists of plausible
>> character codes is printed is SEPARABLE from the way strings are
>> represented and handled and in an ancestral language is SEPARATE.
>> >
>> > 4) What is the right way to implement the function is_string(List)
>> > correctly?
>> >
>> > *ducks*
>>
>> That really is a "have you stopped beating your wife, answer yes or no"
>> sort of question.
>>
>> It depends on the semantics you *want* it to have. The Quintus
>> library didn't provide any such predicate, but it did provide
>>
>> plausible_chars(Term)
>> when Term is a sequence of integers satisfying
>> is_graphic(C) or is_space(C),
>> possibly ending with a tail that is a variable or
>> a variable bound by numbervars/3.
>>
>> Notice the careful choice of name: not IS (certainly) a string,
>> but is a PLAUSIBLE list of characters.
>>
>> It was good enough for paying customers to be happy with the
>> module it was part of (which was the one offering the
>> non-usual portray_chars(Term) command).
>>
>> One of the representations Quintus used for strings (again, a
>> library feature, not a core language feature) was in Erlang
>> notation {external_string,FileName,Offset,Length}, and idea
>> that struck the customer I developed it for as a great
>> innovation, when I'd simply stolen it from Smalltalk!
>>
>> The thing is that STRINGS ARE WRONG for most things,
>> however represented. For example, when Java changed
>> the representation of String so that slicing became a
>> costly operation, I laughed, because I had my own representation
>> of strings that provided O(1) concatenation as well as cheap
>> slicing. (Think Erlang iolists and you won't be far wrong.)
>> The Pop2 language developed and used at Edinburgh
>> represented file names as lists, e.g., [/dev/null] was in
>> Erlang notation ['/',dev.'/',null]. This made file name
>> manipulation easier than representing them as strings.
>> Any time there is internal structure, any time there is scope
>> for sharing substructure, any time you need to process
>> the parts of a string, strings are wrong.
>>
>> The PERL lesson is that regular expressions are a fantastic
>> tool for doing the wrong thing quite simply.
>>
>>
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160318/d866ca42/attachment.htm>
More information about the erlang-questions
mailing list