[erlang-questions] String versus variable in binary literal
Richard Carlsson
carlsson.richard@REDACTED
Wed May 16 14:01:00 CEST 2012
On 05/16/2012 12:29 PM, Joe Armstrong wrote:
> On Wed, May 16, 2012 at 10:56 AM, Richard Carlsson
> <carlsson.richard@REDACTED> wrote:
>> The bit syntax doesn't (currently) support encoding strings that are not
>> constant literals. This is something that should be fixed, IMO.
>
> It's a bug (or should be a bug) - try this for size:
>
>> <<1223232321111,3476824682351,18368119>>.
> <<"Wow">>
>
> Isn't that beautiful :-)
Well, yes, but nothing strange about it. The default size for integers
is byte, and numbers get truncated to fit the desired size:
1> <<1223232321111:8,3476824682351:8,18368119:8>>.
<<"Wow">>
2> <<1223232321111:16,3476824682351,18368119>>.
<<"öWow">>
3> <<1223232321111:32,3476824682351:32,18368119:32>>.
<<206,83,246,87,130,230,111,111,1,24,70,119>>
What tends to surprise people is that the default field type is integer,
even if the given value is a constant of some other type, but if you add
the correct type specifier it works:
1> << <<1,2>> >>.
** exception error: bad argument
2> << <<1,2>>/binary >>.
<<1,2>>
3> << 3.14 >>.
** exception error: bad argument
4> << 3.14/float >>.
<<64,9,30,184,81,235,133,31>>
in fact, <<"abc">> works just because it's considered to be a special
notation for a number of integers: << $a, $b, $c >> = <<"abc">>. But
you're not allowed to write it as << [$a,$b,$c] >>, even though
[$a,$b,$c] = "abc".
Nowadays there is some extra syntax for UTF-<N> encodings:
1> <<"åäö"/utf8>>.
<<"åäö">>
2> <<"åäö"/utf16>>.
<<0,229,0,228,0,246>>
3> <<"åäö"/utf32>>.
<<0,0,0,229,0,0,0,228,0,0,0,246>>
However, <<String/utf8>> doesn't work, if String is a variable. This
could be fixed. But since <<String>> is interpreted as expecting String
to be an integer, there is still no way to easily insert a normal
Latin-1 string dynamically in a binary, even if <<String/utf8>> is made
to work. I would suggest the addition of type specifiers 'latin1' and
'ascii' for this purpose, where 'latin1' would accept only character
codes 0-255 (no truncation), and 'ascii' would only accept codes 0-127
(good for ensuring that http headers and similar things are 7-bit
clean). While you're at it, String should be allowed to be a chardata()
just as in the unicode module, not just flat lists of chars.
Oh, and atoms should be allowed in chardata() and iolist(). I think
that's about it.
/Richard
More information about the erlang-questions
mailing list