[erlang-questions] EEP 10

Per Melin per.melin@REDACTED
Thu May 15 23:45:00 CEST 2008


2008/5/15 David Mercer <dmercer@REDACTED>:
> I don't know of an "easy" way to write using the current bit syntax:
>
>        <<Ch/utf8,_/binary>> = BinString
>
> I'd have thought we would need the utf8 (and utf16) binary extensions to do
> this easily.  What is the "easy" way to do this without the proposed
> extensions?

Off the top of my head. I haven't really tested this, but I think it
should work. Don't know if it falls within what you'd call easy
though. I think it's easy enough.

Calling the function u(BinStr) should return Ch if u/1 is:

u(<<2#11110:5, Ch:28, _/binary>>) -> 2#11110 bsl 28 bor Ch;
u(<<2#1110:4, Ch:20, _/binary>>) -> 2#1110 bsl 20 bor Ch;
u(<<2#110:3, Ch:13, _/binary>>) -> 2#110 bsl 13 bor Ch;
u(<<Ch:8, _/binary>>) -> Ch.

Though, after googling for other solution (maybe I should have done
that first?) I realise that this would also happily process broken
UTF-8 without complaints.

(And it would be more efficient to reverse the order of the above
function definitions.)



More information about the erlang-questions mailing list