[erlang-questions] utf-8 PB

Håkan Stenholm hokan.stenholm@REDACTED
Thu Jun 17 19:49:03 CEST 2010


Jean-Yves F. Barbier wrote:
> Le Thu, 17 Jun 2010 04:57:46 +0200,
> Håkan Stenholm <hokan.stenholm@REDACTED> a écrit :
>
> Thanks for this very complete howto Håkan!!
>
> Just one more precision: the way I understand it, I'll be able to
> translate any string I want AFAI use the syntax you gave me (?)
>
> Because I saw many project not written in Erlang where you can
> only translate some strings (ie: not the menu items, which could be very
> annoying if they don't speak English at all.)
>   
The idea is to autoextract all text (format strings) used by the 
TXT/STXT macros (no other strings), when building a "master" po-file, so 
you have full control over which texts will be translatable.
Or putting it another way, only the TXT/STXT macros that use texts 
existing in a language specific po (translation) file will be translated.

This also implies that only texts in the erl/hrl files will be extracted 
as this is the only place that the macros can exist. As texts are 
extracted per file it is also possible to limit which files are checked.


Example, a po file that translates English to Swedish will have entries 
like this:

msgid ""                           <----- meta data and encoding header
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2010-06-17 16:45+0200\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@REDACTED>\n"
"Language-Team: LANGUAGE <aa@REDACTED>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=iso-8859-1\n"
"Content-Transfer-Encoding: 8bit\n"

msgid "Hello $name$!"      <----- Text used in source code e.g. 
?STXT("Hello $name$", [.....])
                                                     Extracted by 
gettext_compile.erl related code
msgstr "Hej $name$!"        <----- Text (format string) used.
                                                     The same as msgid 
when extracted, but then translated by a translator.
                                                     If $name$ is = 
"World" the STXT call will return "Hej World!"
                                                     Note: the format 
string is translated before it is processed ($...$ values are substituted)
                                                     by STXT.
......


Note: it is a good idea to ensure that the source code only uses ONE 
language, the msgid:s in the po files will otherwise be in a mix of 
languages.

==============================

Warning:

I've only used gettext with languages that can be used with latin-1 
encode characters (Swedish, Danish, Finnish, Norwegian, English), for 
languages that need more than the first 256 unicode code points (these 
are identical to the latin-1 values of the same number) there may be issues.

The po file format supports various character encodings - latin-1, utf-8 
(encoding unicode code points as variable length byte sequences) ... , 
but I'm not sure if gettext will parse utf-8 encoded po files properly - 
it may need code to check the encoding (listed at the start of the po 
file as shown above) to convert the utf-8 byte sequences to single 
integers (unicode code points).
 
The best thing to do is to test what you get back from gettext when 
using utf-8 encoded po files.

Note: the GNU Gettext tools can compile po files to other non-text 
formats (.mo files) and tools like Poedit sometimes create them - but 
they arent' used (or needed) by the erlang gettext library, as it works 
directly with the .po file.

Note: there is po file validation code in gettext to avoid bugs like:

msgid "Hello $name$!"
msgstr "Hej World!"

> JY
>
> ...
>   
>> You can use gettext (http://github.com/etnt/gettext) for translations, 
>> usage in source code simple, although setting up gettext itself is a bit 
>> more work. Usage:
>>
>> * Add gettext to your project and build it.
>> * Add code that start the gettext process and ensure that its loads its 
>> .po files (.po files are translation files that can be reloaded at any 
>> time).
>>
>> * Include the gettext.hrl file in your erl files if they need to do 
>> translations.
>> * You can then use the TXT/1, TXT/2, STXT/2 and STXT/3 macors.
>> * TXT/1 and STXT/2 rely on that process-dictionary value 
>> (gettext_language) being set in the current process, if you retrieve 
>> translated texts from other processes, you should ensure that they use 
>> the same language (or pass your current choice along).
>> * TXT/2 and STXT/3 also take the language as an argument.
>>
>> ====================================================
>>
>> Gettext using code will look like this:
>>
>> io:format("Hello world", [])       becomes         ?TXT("Hello world")
>>
>> while
>>
>> FN = "....",
>> LN = "....",
>> ?STXT("Good day $first_name$ $last_name$", [{first_name, FN}, 
>> {last_name, LN}]),
>>
>> which allows you to reorder your format strings ($...$) arguments, 
>> replace code like:
>>
>> io:format("Good day ~s ~s", [FN,  LN]),
>>
>> which can't nicely handle needs like reordering the FN and LN arguments 
>> (in certain languages) and which yields format strings that are hard to 
>> understand for translators (they don't see what "~s ~s" is if they only 
>> have access to the po file).
>>
>>
>> ====================================================
>>
>> Notes:
>>
>> * Add TXT/STXT usage as soon as possible if you want to avoid 
>> unnecessary work of rewriting io:formats, supply good $...$ format 
>> values and want to reduce the risk of adding to many / to few TXT/STXT 
>> usages because your unsure if its needed.
>>
>> * It may be useful to write a run_in_language_context/2 function that 
>> takes a fun (the code to run) and a language to use while this fun is 
>> run - so that you change and revert the process-dictionary value 
>> (gettext_lamguage), without occasionally forgetting to change it back.
>>
>> * po files can be manipulated by the GNU Gettext command line tools.
>>
>> * Poedit may be useful for translators to translate
>>
>> * If there is no translation you will simply get your original (source 
>> code string back).
>>
>> * You should have a make build target (or tool) that can create a 
>> "master" po file, that contains the strings currently used by the 
>> TXT/STXT macros in the source code. This "master" file can then be 
>> merged with current (translated) po files using the GNU gettext tools.
>>
>> "gettext_compile:parse_transform/2" that processes the erlang parse tree 
>> looking for gettext:key2str/1 [a] calls  and the 
>> "-compile({parse_transform,gettext_compile})." in gettext.hrl can be 
>> helpful for this.
>>
>> [a]: gettext:key2str/1 is the basis of all text macros - erlc parse 
>> trees have their macros expanded so you can't look directly for them.
>>
>> * The format string in TXT and STXT must be a "...." string in the 
>> source code, or the gettext_compile code will fail to extract it.
>>
>> * It's probably fairly easy to write a regexp that can extract the 
>> 'TXT("......"' parts, which should be faster than having to compile all 
>> files to create parse trees for use by gettext_compile.
>>     
>
>
>
>   



More information about the erlang-questions mailing list