[erlang-questions] The importance of Basic Unicode Understanding in Erlang

Frédéric Trottier-Hébert fred.hebert@REDACTED
Tue Sep 27 20:47:54 CEST 2011

I'm replying to this with erlang-questions in CC.

On 2011-09-27, at 13:43 PM, Michael Uvarov wrote:

> Hi,
> There is two ways to solve this problem:
> use erlang for working with data;
> use a native (c or c++) implimentation.
> First variant is save, but slow.
> Second variant is fast, but if we will choose it we will have problems
> with concurency and stability of our application (nifs and drivers can
> crash whole vm).
> Why is erlang implimentation slow?
> Strings in Erlang are simple lists, 16 byte per char, no sequel form.

You're forgetting the binary format, that lets you have any number of bytes per characters. There is currently a match type for utf8, utf16 and utf32, and it can support BOMs if I recall.

> They are slow. If you work with advanced unicode algorithms
> (normalization, collation, splitting) and i18n (locale-dependible
> algorithms), you also needs global store for unidata and CLDR data.
> Second way is to use nifs and ICU. ICU is fast, well-tested. ICU
> allows multithreads, you only need to have a copy of resourse for each
> thread.
> But ICU uses UTF-16, which is not nice formatted in the Erlang shell.
> Also code of nif must be very simple and well-tested.
> Also ICU has API for processing dates and formating messages in the
> third format (first and second are printf format and erlang's
> io:format). But it is closer to gettext application.
> There are few ports ICU for Erlang. Basho has icu4e (nif for basic
> functions, no locales). There is Starling driver also. And I am
> writing my realization of nifs with locales.
> -- 
> С уважением,
> Уваров Михаил.
> Best regards,
> Uvarov Michael

I must admit to not being knowledgeable enough for the rest of this post, but I find it instructive. You bring the point of locales, which is also pretty interesting. What's a smart way to handle locales? should they be VM-specific, process-specific?

Fred Hébert

More information about the erlang-questions mailing list