Michael Truog mjtruog@REDACTED
Tue Oct 26 01:16:07 CEST 2021

On 10/24/21 1:55 AM, Dan Gudmundsson wrote:
> In my opinion, this should not be done, strings and in particular 
> unicode strings seem
> to be very confusing as it is with two representations in OTP APIs.
> UTF-8 (and friends) is an encoding of UNICODE codepoints, you should never
> operate on the encoding
The way I was thinking about io:format("~t8s~n",[[16#C2,16#A2]]). was 
that adding the "8" was an assert, saying this list data must contain 
UTF-8 integers.  That helps to avoid ambiguity and catch any problems 
while also allowing a list of bytes to be used in the same way as a 
binary (with the addition of a single character to the io format 
string).  So, I wasn't thinking of it as operating on the encoding, but 
rather being more specific about the string translation (t == 
translation, with 8|16|32 options to ensure the specific translation is 
occurring or an error exception shows what the translation problem is).

Best Regards,

> On Sun, Oct 24, 2021 at 10:35 AM Michael Truog <mjtruog@REDACTED 
> <mailto:mjtruog@REDACTED>> wrote:
>     I was wondering if there was interest in modifying the io
>     interpretation
>     of "~ts" to allow an integer between the t and s for forcing a
>     particular unicode interpretation.  That would allow a list of
>     bytes to
>     be interpreted as UTF8, to provide the same output as a binary:
>     1> io:format("~ts~n",[<<16#C2,16#A2>>]).
>     ¢
>     ok
>     2> io:format("~t8s~n",[[16#C2,16#A2]]).
>     ¢
>     ok
>     I was also wondering if bytestring types would be added to
>     Erlang/OTP, like:
>     -type nonempty_bytestring() :: nonempty_list(byte()).
>     -type bytestring() :: list(byte()).
>     They are useful in iolists to ensure only bytes (not other
>     integers) are
>     in nested lists.
>     Best Regards,
>     Michael

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the erlang-questions mailing list