[erlang-questions] unicode in string literals

Wed Aug 1 09:30:19 CEST 2012

Hi Richard,

First, thanks for the detailed explanation. I see I am still confusing
some of the issues.

On Wed, Aug 1, 2012 at 3:56 AM, Richard O'Keefe <ok@REDACTED> wrote:
> On 31/07/2012, at 7:36 PM, Vlad Dumitrescu wrote:
> It's not clear to me what you mean by a 'project',

I mean a set of related code, some of it possibly third-party.

> but why should a module written by someone who wants
> comments in Māori (note the macron? Latin-4 or Unicode needed)
> use a module written by someone who wants comments in Swedish?

Maybe not in the long run, but there will be a (long) transition
period where legacy code will still be used by new code.

> The whole point of an -encoding directive is that it is something
> that syntaxtools should handle; by the time your code gets an AST
> or a token list, encodings are entirely a thing of the past.

Yes, but I am one of the guys that is going to write some of the tools
that will handle this conversion, so I do care about the details.

> SWI Prolog actually lets you change the encoding within a file,
> which sounds crazy but maybe Jan wanted the machinery to be there
> in case someone wanted ISO 2022 support.  (Because that's basically
> what 2022 *is*: switching encoding aspects on the fly.)

Are there any editors that can load/save a file with mixed encodings like that?

<...snip...>
> Converting between strings and binaries is the one place where Erlang
> source code should have any reason to care, and it does have a reason
> to care.  But you will perceive that it is the *binary* that needs to
> be associated with an encoding, not the *string*.
> of the system

Right. Good explanation!

I am still a little worried about two things:
- debugging a remote system that has different locale
- reading logs created by modules that have different encodings (some
modules might be legacy and not be aware that the world is not Latin-1
anymore).

regards,
Vlad