[erlang-questions] UTF8 and EDoc

Tue Oct 6 06:30:32 CEST 2009

2009/10/5 Tomas Abrahamsson <tomas.abrahamsson@REDACTED>:

> An option could be to adopt the way it is done in Python:
> it (re)uses the editor's encoding declaration. If it finds the text
>   -*- coding: utf-8 -*-  or  vim: set fileencoding=utf-8 :
> on the first or second line of the source file, then it sets
> the encoding for the entire source file accordingly. (It also
> understands unicode byte-order marks at the beginning
> of the file, which apparently makes life easier in editors
> on Windows.)

yuk! Not everyone editor has this information?
If a text file needs to inform an app of its encoding, then
either
a) Enclose the encoding in the file
(xml example encoding='utf-8')
b) Be explicit when calling up the application.
I also think a default encoding as a fallback is essential,
utf-8 being the obvious one.
The BOM (byte order mark) as the first character of a file
has not been successful.

>
> See http://www.python.org/peps/pep-0263.html for details.
>
> An advantage with this scheme seems to be that it fits nicely
> with editors. They already know how to handle this.

Only if you use the 'right' editor surely?

>
> It would probably require the Erlang compiler, edoc, and other tools
> to be modified to know about source file encodings, though.

What of programmatically generated files?

>
> I suppose that with the \u-escaping, existing tools would continue
> to work without modification, but it would be more work for the
> programmer to type the text in as \u-seqences, unless editors
> already know how to do such a transformation on the fly?

Or mimic python even more?
u"A utf-8 encoded string"
and a unicode('another unicode string')

a string operator and encoding function.

>
> If no such encoding declaration is found, Python assumes ASCII,

> but Erlang could maybe assume Latin-1.

Please move on to utf-8. Latin-1 is so restrictive..

regards

-- 
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.
http://www.dpawson.co.uk