[erlang-questions] Fwd: String encoding and character set

dda headspin@REDACTED
Wed Jan 17 02:07:04 CET 2007


Nope. Let's take for instance a utf-8 string. As an Erlang list,
there's no way, in the language, to extract safely one character or
more from the string. You cannot extract, in say "유니코드는ISO엔코딩보다 훨씬
좋다." [that's Korean if you're wondering] the 6th to 11th characters –
ISO엔코딩 – without doing more contorsions than a circus artist. A list
is not a string, it's raw data left for us to muck with.

--
dda

On 1/17/07, Robert Virding <robert.virding@REDACTED> wrote:
> We do actually, in fact we have something much much better, a list.
> Using a list you don't have to worry about encodings but can use the
> unicode value directly in the string/list. This makes all processing
> much easier. Then when you are done you can convert it to what ever
> encoding you want.
>
> I don't really understand why anyone would want to process data in an
> unnecessarily complex format instead of a simple one.
>
> Robert
>
> dda wrote:
> > String types – at least well-implemented ones – don't just store a
> > string, but also encoding information. They are/should be geared
> > towards pain-free manipulation of text data, and by text I mean things
> > outside ASCII-land. Encodings-aware string manipulation functions
> > don't function on bytes, but on characters, a quite different notion.
> > We don't have this in Erlang.




More information about the erlang-questions mailing list