erts_bld_string_n returns negative characters

Paul Guyot pguyot@REDACTED
Fri Jun 4 00:29:29 CEST 2010


Hello,

I noticed a significant behaviour difference between ssl_imp new and ssl_imp old when using them with {packet, http} due to the fact that ssl_imp old decodes packets through inet driver (and the broker), while ssl_imp new decodes packets with erlang:decode_packet/3 and both do not generate the same data.

The (simplified) http packet is:
<<71,69,84,32,47,230,157,177,228,186,172,32,72,84,84,80,47,49,46,49,13,10,13,10>>

With {ssl_imp old}, I get:
{http_request,'GET',
              {abs_path,[47,230,157,177,228,186,172]},
              {1,1}}

With {ssl_imp, new}, I get:
{http_request,'GET',
              {abs_path,[47,-26,-99,-79,-28,-70,-84]},
              {1,1}}

One can get the same result with:
erlang:decode_packet(http, <<71,69,84,32,47,230,157,177,228,186,172,32,72,84,84,80,47,49,46,49,13,10,13,10>>, [{packet_size, 0}]).

erlang:decode_packet eventually calls erts_bld_string_n. Things happen line 513 of current dev branch on github :

http://github.com/erlang/otp/blob/dev/erts/emulator/beam/utils.c#L513

str[i] can be negative (> 0x7f) and therefore promoted to a small negative integer.

It seems to me that erts_bld_string_n is supposed to take ISO-8859-1 characters, for example when called from enif_make_string_len (which is therefore broken?). It should return small positive integers instead of negative ones for values > 0x7f. Line 513 should be replaced from:
	    res = CONS(*hpp, make_small(str[i]), res);
to:
	    res = CONS(*hpp, make_small((const unsigned char) str[i]), res);
or:
	    res = CONS(*hpp, make_small((byte) str[i]), res);

This change would mimic what happens with inet_drv. It encodes the string with ERL_DRV_STRING, which is then decoded in beam/io.c with buf_to_intlist, which goes like this:

	tail = CONS(hp, make_small((byte)*buf), tail);

http://github.com/erlang/otp/blob/dev/erts/emulator/beam/utils.c#L2881

Regards,

Paul

PS: I realize this is not a valid HTTP packet (URIs should be encoded as ASCII 7 bits), but curl 7.20.0 sends it.



More information about the erlang-bugs mailing list