[erlang-bugs] Bug: erts_bld_string_n returns negative characters

Mikael Pettersson mikpe@REDACTED
Fri Jun 4 11:58:33 CEST 2010


Paul Guyot writes:
 > > I noticed a significant behaviour difference between ssl_imp new and ssl_imp old when using them with {packet, http} due to the fact that ssl_imp old decodes packets through inet driver (and the broker), while ssl_imp new decodes packets with erlang:decode_packet/3 and both do not generate the same data.
 > > 
 > > The (simplified) http packet is:
 > > <<71,69,84,32,47,230,157,177,228,186,172,32,72,84,84,80,47,49,46,49,13,10,13,10>>
 > > 
 > > With {ssl_imp old}, I get:
 > > {http_request,'GET',
 > >              {abs_path,[47,230,157,177,228,186,172]},
 > >              {1,1}}
 > > 
 > > With {ssl_imp, new}, I get:
 > > {http_request,'GET',
 > >              {abs_path,[47,-26,-99,-79,-28,-70,-84]},
 > >              {1,1}}
 > > 
 > > One can get the same result with:
 > > erlang:decode_packet(http, <<71,69,84,32,47,230,157,177,228,186,172,32,72,84,84,80,47,49,46,49,13,10,13,10>>, [{packet_size, 0}]).
 > > 
 > > erlang:decode_packet eventually calls erts_bld_string_n. Things happen line 513 of current dev branch on github :
 > > 
 > > http://github.com/erlang/otp/blob/dev/erts/emulator/beam/utils.c#L513
 > > 
 > > str[i] can be negative (> 0x7f) and therefore promoted to a small negative integer.
 > > 
 > > It seems to me that erts_bld_string_n is supposed to take ISO-8859-1 characters, for example when called from enif_make_string_len (which is therefore broken?). It should return small positive integers instead of negative ones for values > 0x7f. Line 513 should be replaced from:
 > > 	    res = CONS(*hpp, make_small(str[i]), res);
 > > to:
 > > 	    res = CONS(*hpp, make_small((const unsigned char) str[i]), res);
 > > or:
 > > 	    res = CONS(*hpp, make_small((byte) str[i]), res);
 > > 
 > > This change would mimic what happens with inet_drv. It encodes the string with ERL_DRV_STRING, which is then decoded in beam/io.c with buf_to_intlist, which goes like this:
 > > 
 > > 	tail = CONS(hp, make_small((byte)*buf), tail);

I'll just note that the Erlang VM has always been internally inconsistent
in how it processes random binary data: sometimes it uses 'unsigned char'
pointers (correct), sometimes it uses plain 'char' pointers (incorrect).
Earlier I've only seen this result in annoying GCC warnings when building
the VM, but here there's a case where the code obviously does the wrong
thing at runtime.  What's even worse is that the runtime behaviour depends
on a subtle property of the underlying machine: whether plain 'char' is
signed or unsigned.

Casting make_small()'s operand to (byte) or (unsigned char) [no 'const' needed]
is the correct short-term fix.


More information about the erlang-bugs mailing list