Tue Oct 26 01:16:07 CEST 2021
On 10/24/21 1:55 AM, Dan Gudmundsson wrote:
> In my opinion, this should not be done, strings and in particular
> unicode strings seem
> to be very confusing as it is with two representations in OTP APIs.
> UTF-8 (and friends) is an encoding of UNICODE codepoints, you should never
> operate on the encoding
The way I was thinking about io:format("~t8s~n",[[16#C2,16#A2]]). was
that adding the "8" was an assert, saying this list data must contain
UTF-8 integers. That helps to avoid ambiguity and catch any problems
while also allowing a list of bytes to be used in the same way as a
binary (with the addition of a single character to the io format
string). So, I wasn't thinking of it as operating on the encoding, but
rather being more specific about the string translation (t ==
translation, with 8|16|32 options to ensure the specific translation is
occurring or an error exception shows what the translation problem is).
> On Sun, Oct 24, 2021 at 10:35 AM Michael Truog <mjtruog@REDACTED
> <mailto:mjtruog@REDACTED>> wrote:
> I was wondering if there was interest in modifying the io
> of "~ts" to allow an integer between the t and s for forcing a
> particular unicode interpretation. That would allow a list of
> bytes to
> be interpreted as UTF8, to provide the same output as a binary:
> 1> io:format("~ts~n",[<<16#C2,16#A2>>]).
> 2> io:format("~t8s~n",[[16#C2,16#A2]]).
> I was also wondering if bytestring types would be added to
> Erlang/OTP, like:
> -type nonempty_bytestring() :: nonempty_list(byte()).
> -type bytestring() :: list(byte()).
> They are useful in iolists to ensure only bytes (not other
> integers) are
> in nested lists.
> Best Regards,
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the erlang-questions