[erlang-questions] ux — a Unicode library
Fri Aug 19 13:25:18 CEST 2011
I tried this library because I needed it for some work and ended up finding it very useful, although pretty confusing on many points. If it would be possible to get information on this I might consider using it more:
- Why does it need to download mochiweb to deal with UTF stuff? I don't see the relationship with a web server and it worries me that versions could clash or that I could break it by using different stuff;
- I don't get why every time I start the app I get "** /usr/local/agner/packages/rebar-@REDACTED/ebin/rebar_utils.beam hides /usr/local/agner/packages/rebar-@REDACTED/ebin/rebar_utils.beam ... ** Found 82 name clashes in code paths ".
I figure the app checks its dependencies, but does it on a level way to broad or something. Not sure why it does this, especially when I figure rebar and agner aren't necessary for ux to work once compiled.
- It makes some sense to load items as it goes, but it does make the library a bit more unpredictable.
I also see a bunch of servers in there, but I assume they have to do with loading the data and whatnot? I'm somewhat afraid they might be serialising a bunch of calls, but I'm not really in a position to criticise the library given I don't have one to offer in exchange.
In any case, it's good news to see a library that does handle multiple codepoints as single graphemes (so that [16#0065, 16#0301] and  both equal to "é", a string of length 1) as I've had issues with that in some projects that could not calculate lengths right.
On 2011-08-19, at 03:02 AM, Dmitrii Dimandt wrote:
> Michael Uvarov, a member of the Russian Erlang community has developed a library to deal with Unicode: https://github.com/freeakk/ux
> It works with Unicode Characters Database and contains all the functions to do that, contained in ux_string:
> - case transforms: to_lower, to_upper
> - normalization: to_nfc, to_nfd, ...
> - works with grapheme clusters according to UTR29, http://unicode.org/reports/tr29/
> - length, ...
> - works with symbol types: types/1
> UCA (Collation) has also been implemented in ux_uca. This module provides string sorting and comparison. It also provides a function to search for a substring:
> - search
> - sort
> - compare
> - sort_key - generates a sequence which can be then compared against a binary
> Symbol data is stored in ETS and generated on first call. You can try it by running ./start-dev.sh
> erlang-questions mailing list
More information about the erlang-questions