[erlang-questions] utf8 in source files

Allan Wegan allanwegan@REDACTED
Tue Nov 9 07:10:53 CET 2010


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

>> Just let them use UTF-8 at will. As long as they do not use any
>> non-ASCII code points in language constructs except binary string
>> literals, they should be on the safe side anyway.
> 
> May be I'm wrong in my understanding of binary string literal is wrong.
> But using non-ASCII letters in binaries has been the place where I
> first realized, that I did store my erlang source code in utf-8 instead
> of latin-1.
> 
> The following code does print "Size: 6" if encoded in latin-1 and "Size:
> 12" if encoded in utf-8.

It works as expected: Put UTF-8 in and you get UTF-8 out. Put ISO-8859-1
in and you get exactly that out.
If you want to use UTF-8 "string literals", just edit the source code as
UTF-8 and use binaries for string storage. Beware, that $Ä does
currently _not_ work even when used in binaries. Only the first byte of
the code point in UTF-8 representation makes it into the resulting
binary (at least on my system running R14A on windows).

- -- 
Allan Wegan
Jabber: allanwegan@REDACTED
ICQ:    209459114
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (MingW32)

iQEcBAEBAgAGBQJM2OXtAAoJENm5axHh7AcxaTkH/1vRYAGezEM61TfqlXZyTbjf
G8R9Y5NSKnmp5ATAdLlU7tBpB5w+hopFmfCwD8V1RCKSt3NdmkrPPb/JA/prKWRX
c6LOXKMIzKrjgEykw3syyj6Gv3IEAbXOc09+SlMOvThT+fO8+SI+ld7Zlv9rmoH2
QcCUwRU1dcbL/eW93L7yQPgb5N7rhp/a7MJsBXVmKDEEBG51nshIgtlBISrzNhW3
nnMKMVVgrwCHFFBWYPVgTyvFB4iDRz7PfHGxl4R7UI2dI6GBmoWLwLfCFJacoGkq
cnyI2ZyKH5ULAnEuq6zeKnLHotiJBSLCbhTXtRHVY9shx4fT7+2EIho7D/CikTM=
=hCnQ
-----END PGP SIGNATURE-----


More information about the erlang-questions mailing list