[erlang-questions] Binary string literal syntax

Sean Hinde sean.hinde@REDACTED
Wed Jun 6 16:30:46 CEST 2018


---
snip - many good and interesting reasons angels fear to tread here
---

> 
> If I had to argue for something, it would be that a good "beginner" string type would be an opaque one that inherently carries its own encoding, and cannot be pattern-matched on unless you use a 'graphemed' + normalized iodata structure. If you wanted to switch to codepoints for handling, then you could convert it to a binary or to another type. But even then this would have a weakness because you would necessarily be forced to convert from say, a utf-8 byte stream coming from a socket, onto a different format: this is exactly what is annoying people today when they just want the damn strings to use "abc" because it's shorter to write.

I would fear an even louder chorus if we created a “beginner” string type that was not useable in most contexts a beginner might want to use a string!

Apple have always gone deep on this topic and the solution in Swift is quite ok, at the cost of having to explicitly export to utf-8 to send over the network / store in a file. 

> 
> I personally think that this is a clash between correctness and convenience. Currently Erlang is not necessarily 'correct', but it at least does not steer you entirely wrong through convenience since using utf8 (the default many people want) is cumbersome. I'd personally go for a 'correct' option (strongly typed strings that require conversions between formats based on context and usage), but I fear that this thread and most complaints about syntax worry first about convenience, and I don't know that they're easy to reconcile.

As an engineering tool for dealing with protocols I would describe Erlang has having tended towards the pragmatic, which could be described as a fine line between correct and convenient.

A new notation as a shorthand for utf8 string literals combined with the power of full binary string encoding and lists as code points doesn’t seem like it would be too misleading.

Of course in this imaginary language extension we can write any kind of program we like:

~u8”utf-8 string”
~u16”utf-16 string”
~u”unicode string”

Though I am with zxq9 that any changes really ought not to make the language worse or less understandable.

Rust seems to have got in a mess here:
https://github.com/rust-lang/rust/blob/master/src/grammar/raw-string-literal-ambiguity.md

Go picked utf8 for literal strings without too much complaint,

The slightly wicked side of me would find great enjoyment in a Hacker News post proclaiming Erlang as the one true language for string processing :)

Sean





More information about the erlang-questions mailing list