[erlang-questions] unicode in string literals
Richard O'Keefe
ok@REDACTED
Thu Aug 2 03:50:40 CEST 2012
On 1/08/2012, at 7:30 PM, Vlad Dumitrescu wrote:
>
>> but why should a module written by someone who wants
>> comments in Māori (note the macron? Latin-4 or Unicode needed)
>> use a module written by someone who wants comments in Swedish?
>
> Maybe not in the long run, but there will be a (long) transition
> period where legacy code will still be used by new code.
Sorry, my typing mistake here.
What I *meant* to write was "why should a [Māori] module
*NOT* use a [Swedish] one"? You were saying, or so I thought,
that there should be one project = one encoding, and I was saying
I thought that was too restrictive in practice.
>
>> The whole point of an -encoding directive is that it is something
>> that syntaxtools should handle; by the time your code gets an AST
>> or a token list, encodings are entirely a thing of the past.
>
> Yes, but I am one of the guys that is going to write some of the tools
> that will handle this conversion, so I do care about the details.
And by the time it gets to you, there won't *be* any details to care about.
>
>> SWI Prolog actually lets you change the encoding within a file,
>> which sounds crazy but maybe Jan wanted the machinery to be there
>> in case someone wanted ISO 2022 support. (Because that's basically
>> what 2022 *is*: switching encoding aspects on the fly.)
>
> Are there any editors that can load/save a file with mixed encodings like that?
I have no idea. There are a number of editors that claim to support
ISO 2022, which does mid-stream code switching, so they could presumably
be extended to support this. See for example
A model for input and output of multilingual text in a windowing environment
by Yutaka Kataoka, Masato Morisaki, Hiroshi Kuribayashi, and Hiroyoshi Ohara
ACM Transactions on Information Systems (TOIS)
Volume 10 Issue 4, Oct. 1992
>
> I am still a little worried about two things:
> - debugging a remote system that has different locale
> - reading logs created by modules that have different encodings (some
> modules might be legacy and not be aware that the world is not Latin-1
> anymore).
Ouch. And then there are all those documents that lie about the
encoding they're using. (Web pages claiming Latin 1 but being CP 1252
does not exhaust the possibilities.)
More information about the erlang-questions
mailing list