[erlang-bugs] R12B-3: string:to_integer() sporadic failures

Lev Walkin vlm@REDACTED
Tue Sep 2 05:35:23 CEST 2008


Hi,


we all love string to integer conversion routines, such as
string:to_integer/1 or erlang:list_to_integer/1.

The functions serve us well and indeed provide us with advertised
functionality almost every time.

However, we've noticed some oddity on our production system which,
after two weeks of jaw-dropping musings, has boiled down to a strange
idempotence violation in the string:to_integer/1 implementation.

To those of you impatient enough, please look into the code
attached and try running it as nfail:test(10000) and go down from
there.

Here's that string:to_integer/1 function, having the following
signature:

     string:to_integer(String) -> {Int,Rest} | {error,Reason}

Here's a typical invocation resulting in a number and the
rest of the unparsed string returned:

36> string:to_integer("134217728,\n").
{134217728,",\n"}
37>

What would happen if we did it once more?

37> string:to_integer("134217728,\n").
{134217728,",\n"}
38>

By this time we can be reasonably sure that string:to_integer/1 will
return the same output given the same input. We have seen that this
is indeed the case by testing it twice. Could it be that testing
it N times would result in a bad behavior? Nah, unlikely, you say.

If you haven't looked at the attached code, it is time to do so.
In the code, we create a list of N results of the string:to_integer/1
application, like this:

	iterate(0, Acc) -> Acc;
	iterate(N, Acc) ->
	        iterate(N - 1,
	                [case string:to_integer("134217728,\n") of
	                        {Int, _} -> Int
	                end | Acc]).

This code utilizes tail recursion with an accumulator list
which gets prepended N times by string:to_integer/1 output,
undoubtedly an integer. (Implementation with a map over lists:seq()
output can do as well).

So we spawn and run this iterate/2 function, appropriately checking
that the list consists only of integers with value 134217728:

test(N) ->
         {Self, Ref} = {self(), make_ref()},
         spawn_opt(fun()->
                 L = iterate(N, []),
		% Filter out non-conforming entries
                 BadList = [X || X <- L, X =/= 134217728],
                 BadLen = length(BadList),
		% Here, BadLen should always be 0!
                 io:format("~b bad conversions: ~p~n", [BadLen, BadList]),
                 Self ! {done, Ref, BadLen}
                 end,
                 [link,{fullsweep_after,0}]),
         receive {done, Ref, Len} -> Len end.

Based on the smoke tests above, this code should always result
in something like this:

	38> nfail:test(100).
	0 bad conversions: []
	0
	39>

But let's start testing it thorougly, as if not believing ourselves
that such a simple function could possibly fail at times:

	[vlm@REDACTED:~]> erl
	Erlang (BEAM) emulator version 5.6.3 [source] [async-threads:0] 
[kernel-poll:false]
	
	Eshell V5.6.3  (abort with ^G)
	1> c(nfail).
	{ok,nfail}
	2> nfail:test(1).
	0 bad conversions: []
	0
	3> nfail:test(2).
	0 bad conversions: []
	0
	4> nfail:test(100).
	0 bad conversions: []
	0
	5> nfail:test(1000).
	2 bad conversions: [{134217728,",\n"},{134217728,",\n"}]
	2
	6> nfail:test(10000).
	5 bad conversions: [{134217728,",\n"},
	                    {134217728,",\n"},
	                    {134217728,",\n"},
	                    {134217728,",\n"},
	                    {134217728,",\n"}]
	5
	7>
	...
	32> nfail:test(810).
	0 bad conversions: []
	0
	33> nfail:test(811).
	2 bad conversions: [{134217728,",\n"},{134217728,",\n"}]
	2
	34> nfail:test(810).
	0 bad conversions: []
	0
	35> nfail:test(811).
	2 bad conversions: [{134217728,",\n"},{134217728,",\n"}]
	2
	36>


As you see, string:to_integer/1 consistently generates bad entries
which contain {134217728,",\n"} instead of 134217728, but only
when the number of invocation reaches about 1000 (811 in this
particular sequence of steps).

Perhaps this is a platform glitch? The above code was being executed
on a ppc (G4) Mac OS X 10.5 with R12B-3 (32-bit) built from scratch. 
Here's what Sun sparc v9 with Solaris 10 thinks about that test case:

	[vlm@REDACTED:~]> erl
	Erlang (BEAM) emulator version 5.6.3 [source] [async-threads:0] [hipe] 
[kernel-poll:false]
	
	Eshell V5.6.3  (abort with ^G)
	1> c(nfail).
	{ok,nfail}
	2> nfail:test(100).
	1 bad conversions: [{134217728,",\n"}]
	1
	3> nfail:test(1000).
	3 bad conversions: [{134217728,",\n"},{134217728,",\n"},{134217728,",\n"}]
	3
	4> nfail:test(10).
	0 bad conversions: []
	0
	5> nfail:test(50).
	0 bad conversions: []
	0
	6> c(nfail, [native]).
	{ok,nfail}
	7> nfail:test(500).
	4 bad conversions: [{134217728,",\n"},
	                    {134217728,",\n"},
	                    {134217728,",\n"},
	                    {134217728,",\n"}]
	4
	8>

And here's what 64-bit Intel box (am64, FreeBSD 6.3) thinks:

	[vlm@REDACTED ~]$ erl
	Erlang (BEAM) emulator version 5.6.3 [source] [64-bit] 
[async-threads:0] [hipe] [kernel-poll:false]
	
	Eshell V5.6.3  (abort with ^G)
	1> c(nfail).
	{ok,nfail}
	2> nfail:test(100).
	0 bad conversions: []
	0
	3> nfail:test(1000).
	0 bad conversions: []
	0
	4> nfail:test(10000).
	0 bad conversions: []
	0
	5> nfail:test(100000).
	0 bad conversions: []
	0
	6> nfail:test(1000000).
	0 bad conversions: []
	0
	7>

Oops, it went very well, suprisingly. Perhaps, it is a purely
non-Intel chip problem? Let's try it on a non 64bit machine,
such as Pentium D under Microsoft Windows Vista™ in 32-bit mode:

	Erlang (BEAM) emulator version 5.6.3 [smp:2] [async-threads:0]

	Eshell V5.6.3  (abort with ^G)
	1> c('c:/tmp/nfail.erl').
	{ok,nfail}
	2> nfail:test(100).
	0 bad conversions: []
	0
	3> nfail:test(1000).
	2 bad conversions: [{134217728,",\n"},{134217728,",\n"}]
	2
	4>

Aha! See the pattern: 32-bit Erlang installations on many hardware
platforms are having very similar problems. They are unable to
consistently convert a string containing an integer into an integer
value. Incidentally enough, the integer value for which Erlang
starts to misbehave is 134217728, which is 2^27. Trying it with
value "134217727" or lower does not create this idempotence problem.
Trying R11B-5 under Windows Vista™ 32-bit does not exhibit such
problem either, so it must be a specific issue to R12.

Please advise.


P.S. Thanks to Denis Titoruk and Vladimir Serov for investigating
this issue and coming up with a short test case.

-- 
Lev Walkin
vlm@REDACTED
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: nfail.erl
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20080901/860125ff/attachment.ksh>


More information about the erlang-bugs mailing list