utf-8 or other unicode support

Carsten Schultz carsten@REDACTED
Wed Jul 20 20:20:21 CEST 2005


Hi!

ke.han schrieb:
> Its my understanding that erlang strings are not utf-8 or otherwise
> unicode.
> I need to develop a web application to support Latin and Asian languages.
> 
> For HTML pages, I"m assuming I can just code them in the proper language
> and yaws will handle things (This is a big assumption and I would like
> feedback on this).

Unfortunately not true, but not a big problem either.

Here it csn be a good thing that Erlang has no string type anyway.
Strings are just lists of ints, and if your input and output routines
can deal with it, there is no reason to restrict yourself to ints <256.
 You can have larger ints and simply claim that they are unicode.

Regarding output, I have hacked yaws_api.erl to have a changed
definition of htmlize_char/1:

htmlize_char($>) ->
    <<">">>;
htmlize_char($<) ->
    <<"<">>;
htmlize_char($&) ->
    <<"&">>;
htmlize_char($") ->
    <<""">>;
htmlize_char(C) when C>255 ->
    [$&,$#,integer_to_list(C),$;];
htmlize_char(X) ->
    X.

(htmlize/1 also has to be modified slightly.)
For my application with only a few non-iso-8859-1 characters that works
nicely.  You might want to do your own utf-8-encoding at that point.

Regards,

Carsten

-- 
Carsten Schultz (2:38, 33:47)
http://carsten.codimi.de/
PGP/GPG key on the pgp.net key servers,
fingerprint on my home page.



More information about the erlang-questions mailing list