[erlang-questions] trouble with erlang or erlang is a ghetto

Loïc Hoguin essen@REDACTED
Thu Jul 28 10:39:59 CEST 2011


Hello,

>> You can output UTF8 as binary, yes. Maybe as strings too (I'm not really
>> using those so I wouldn't know). But to give an example, can you search
>> inside your UTF8 text for the word "trouvé" including all different
>> variants of the é character (perhaps even just 'e')? Byte search isn't
>> doing any good here.
> 
> It sounds like you want a unicode normalization library, I don't think
> this is really a search problem. In Python you'd do this with the
> unicodedata module. You're right that there is nothing that ships with
> Erlang for this purpose, at least not that I know of. It seems like
> this might be easy to solve in a third party library, maybe a binding
> to ICU. At least one of these probably already exists.

Well yeah. Actually I should have just mentioned something simpler like
to_upper that produces quite unexpected effects when done wrong in Unicode.

I retract my statement though. Michael Uvarov forwarded me off-list to
this library that seems to be just what's needed for any kind of Unicode
string manipulation, although I didn't test it:
  https://github.com/freeakk/ux

-- 
Loïc Hoguin
Dev:Extend



More information about the erlang-questions mailing list