[erlang-questions] utf-8 PB

Thu Jun 17 04:57:46 CEST 2010

Jean-Yves F. Barbier wrote:
> Le Wed, 16 Jun 2010 18:40:26 +0200,
> Attila Rajmund Nohl <attila.r.nohl@REDACTED> a écrit :
>
> At this moment, I don't know how to do that.
> But I also read about internationalization and, AFAI understood, the
> easiest way to accomplish that is to use macros and redirect toward the
> directory & files according to the actual locale, is it the right way to do?
>
>   
>> As far as I know, an Erlang source file must use the latin-1 character
>> set. So if you want UTF-8 strings, read them from a file. That's
>> useful for a possible internationalization anyway.
>>     
>
>
>
>   
You can use gettext (http://github.com/etnt/gettext) for translations, 
usage in source code simple, although setting up gettext itself is a bit 
more work. Usage:

* Add gettext to your project and build it.
* Add code that start the gettext process and ensure that its loads its 
.po files (.po files are translation files that can be reloaded at any 
time).

* Include the gettext.hrl file in your erl files if they need to do 
translations.
* You can then use the TXT/1, TXT/2, STXT/2 and STXT/3 macors.
* TXT/1 and STXT/2 rely on that process-dictionary value 
(gettext_language) being set in the current process, if you retrieve 
translated texts from other processes, you should ensure that they use 
the same language (or pass your current choice along).
* TXT/2 and STXT/3 also take the language as an argument.

====================================================

Gettext using code will look like this:

io:format("Hello world", [])       becomes         ?TXT("Hello world")

while

FN = "....",
LN = "....",
?STXT("Good day $first_name$ $last_name$", [{first_name, FN}, 
{last_name, LN}]),

which allows you to reorder your format strings ($...$) arguments, 
replace code like:

io:format("Good day ~s ~s", [FN,  LN]),

which can't nicely handle needs like reordering the FN and LN arguments 
(in certain languages) and which yields format strings that are hard to 
understand for translators (they don't see what "~s ~s" is if they only 
have access to the po file).

====================================================

Notes:

* Add TXT/STXT usage as soon as possible if you want to avoid 
unnecessary work of rewriting io:formats, supply good $...$ format 
values and want to reduce the risk of adding to many / to few TXT/STXT 
usages because your unsure if its needed.

* It may be useful to write a run_in_language_context/2 function that 
takes a fun (the code to run) and a language to use while this fun is 
run - so that you change and revert the process-dictionary value 
(gettext_lamguage), without occasionally forgetting to change it back.

* po files can be manipulated by the GNU Gettext command line tools.

* Poedit may be useful for translators to translate

* If there is no translation you will simply get your original (source 
code string back).

* You should have a make build target (or tool) that can create a 
"master" po file, that contains the strings currently used by the 
TXT/STXT macros in the source code. This "master" file can then be 
merged with current (translated) po files using the GNU gettext tools.

"gettext_compile:parse_transform/2" that processes the erlang parse tree 
looking for gettext:key2str/1 [a] calls  and the 
"-compile({parse_transform,gettext_compile})." in gettext.hrl can be 
helpful for this.

[a]: gettext:key2str/1 is the basis of all text macros - erlc parse 
trees have their macros expanded so you can't look directly for them.

* The format string in TXT and STXT must be a "...." string in the 
source code, or the gettext_compile code will fail to extract it.

* It's probably fairly easy to write a regexp that can extract the 
'TXT("......"' parts, which should be faster than having to compile all 
files to create parse trees for use by gettext_compile.