[erlang-questions] Atom Unicode Support
Tue Feb 2 17:55:46 CET 2016
Yes, to_upper and to_lower would be nice to have, but I think the most
important is to_nfc(). to_nfc() and to_nfd() are the basis for unicode
support, and for distinct string() type, incompatible with both lists and
About unicode atoms and nfc - are you ready to have atoms 'è' and 'è',
which are actually both the same and different in the same time? The one of
them is 2 byte long in utf-8, another is 3 bytes long. But it is still the
same character, just in nfc and in nfd forms. Both variants are used in the
wild by different editors.
It may look harmless in most cases, but wait until you hash the value and
use the hash for the password check.
I myself born and live in a country with non-latin script, and use unicode
every day in software, but allowing unicode atoms isn't the most important
thing about supporting proper unicode in Erlang.
2016-02-02 16:59 GMT+03:00 Dmitrii Dimandt <>:
> On Sat, Jan 30, 2016 at 9:04 PM, José Valim
>> <> wrote:
>> > With all that said, are there any plans of supporting UTF-8 encoded
>> atoms on
>> > Erlang R19? If the feature is desired but not planned, I would love to
>> > contribute the compiler and bytecode changes above although I will
>> > need some guidance. If that is an option, I would love to get in touch.
>> It is not planned for OTP 19. IMO, the feature is desired,
>> but it is probably too late for OTP 19.
>> Extending the BEAM format is necessary but not sufficient.
>> It is also necessary to make sure that other code in OTP
>> doesn't break.
> In order to try and derail the "omg why unicode in atoms" discussion, I
> have a more pressing questions: are there plans for expanding Unicode
> support elsewehere? Hoping for at least a subset of
> https://github.com/erlang-unicode/i18n namely length, to_lower, to_upper
> etc. :)
> erlang-questions mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the erlang-questions