[erlang-questions] [eeps] EEP 35 "Binary string modules" -- locales

Fri Nov 12 15:11:44 CET 2010

Not all text is meant for human consumption.  I'd even venture so far as
to say that the overwhelming mass of program generated text is not for
human consumption, its intended consumers are other programs.  The most
common locale programs "speak" is the default "C" (also called "POSIX")
locale.  It is complicated to solve the general problem of supporting
all human locales.  It is much simpler to just support a default locale.
Even programs intended to create/consume text for humans often have to
create/consume text in the C locale as well.

I've been told the anecdote that in the 70s a delegation of IBM compiler
engineers flew to Germany to proudly demonstrate their new optimizing
Fortran compiler and all it did was spew gibberish and crash because it
used the standard routines for reading/writing numbers, which in Germany
used commas for dots and dots for commas due to the then new locale
awareness of the OS.  Since then I've been convinced that it is a good
thing to have two separate sets of functions, one small, simple, and
fast handling only the default locale and another one huge, complicated,
and not so fast trying to handle all the intricacies of as many locales
as feasible.

Therefore I'd like to see to_integer and to_float in bstring, grokking
numbers in the C locale.  to_lower and to_upper too as long as it's
documented on which characters they are working on.  They wouldn't even
need to know if the bstring was iso8859-1 or utf-8 encoded as long as
they only touch ASCII characters.

I don't think it's practical to see bstring as locale independent.
Rather bstring should be seen as operating in the default locale.  One
being able to imagine a locale dependent variant of a function should
not be ground for omitting the function from bstring.  I can even
imagine concat(<<"Fuß">>, <<"Ball">>) being expected to result in
<<"Fussball">> in the DE_de locale.

	Christian.