# [erlang-questions] String versus variable in binary literal

Richard Carlsson <>
Wed May 16 14:01:00 CEST 2012

```On 05/16/2012 12:29 PM, Joe Armstrong wrote:
> On Wed, May 16, 2012 at 10:56 AM, Richard Carlsson
> <>  wrote:
>> The bit syntax doesn't (currently) support encoding strings that are not
>> constant literals. This is something that should be fixed, IMO.
>
> It's a bug (or should be a bug) - try this for size:
>
>>   <<1223232321111,3476824682351,18368119>>.
> <<"Wow">>
>
> Isn't that beautiful :-)

Well, yes, but nothing strange about it. The default size for integers
is byte, and numbers get truncated to fit the desired size:

1> <<1223232321111:8,3476824682351:8,18368119:8>>.
<<"Wow">>
2> <<1223232321111:16,3476824682351,18368119>>.
<<"öWow">>
3> <<1223232321111:32,3476824682351:32,18368119:32>>.
<<206,83,246,87,130,230,111,111,1,24,70,119>>

What tends to surprise people is that the default field type is integer,
even if the given value is a constant of some other type, but if you add
the correct type specifier it works:

1> << <<1,2>> >>.
2> << <<1,2>>/binary >>.
<<1,2>>
3> << 3.14 >>.
4> << 3.14/float >>.
<<64,9,30,184,81,235,133,31>>

in fact, <<"abc">> works just because it's considered to be a special
notation for a number of integers: << \$a, \$b, \$c >> = <<"abc">>. But
you're not allowed to write it as << [\$a,\$b,\$c] >>, even though
[\$a,\$b,\$c] = "abc".

Nowadays there is some extra syntax for UTF-<N> encodings:

1> <<"åäö"/utf8>>.
<<"Ã¥Ã¤Ã¶">>
2> <<"åäö"/utf16>>.
<<0,229,0,228,0,246>>
3> <<"åäö"/utf32>>.
<<0,0,0,229,0,0,0,228,0,0,0,246>>

However, <<String/utf8>> doesn't work, if String is a variable. This
could be fixed. But since <<String>> is interpreted as expecting String
to be an integer, there is still no way to easily insert a normal
Latin-1 string dynamically in a binary, even if <<String/utf8>> is made
to work. I would suggest the addition of type specifiers 'latin1' and
'ascii' for this purpose, where 'latin1' would accept only character
codes 0-255 (no truncation), and 'ascii' would only accept codes 0-127
(good for ensuring that http headers and similar things are 7-bit
clean). While you're at it, String should be allowed to be a chardata()
just as in the unicode module, not just flat lists of chars.

Oh, and atoms should be allowed in chardata() and iolist(). I think