[erlang-bugs] R12B-3: string:to_integer() sporadic failures

Edwin Fine erlang-questions_efine@REDACTED
Tue Sep 2 08:07:00 CEST 2008


You did is a great piece of detective work. This really piqued my interest
and I did a bit of playing around.

It seems that the failures always occur at the same point in the sequence.
What I did is add a sequence number to each conversion, then print them out.
I conjectured that the failed values are actually all integers but
internally incorrectly represented so that the inequality test fails. I also
looked at the C code for the BIF for string:to_integer, which was ...
interesting.

The bottom line is that every time I ran the test code, the conversions
always failed at the same points (ran on Windows XP SP3, Erlang R12B-3). I
tried varying the erl parameters (turning off SMP, fiddling with memory
allocation, etc) but to no avail.

46> nfail2:test(1000000).
6 bad conversions: [{308,{134217728,",\n"}},
                    {500,{134217728,",\n"}},
                    {1313,{134217728,",\n"}},
                    {3589,{134217728,",\n"}},
                    {5631,{134217728,",\n"}},
                    {61947,{134217728,",\n"}}]

The only thing I could think of that would vary as the code ran is the
memory allocation. From lookoing at the C code, I could see that bignums
need heap allocation (a macro called HAlloc). If there was something in the
to_integer C code that messed up the process heap (by growing the heap by
too small an amount) so that every now and then, data was written a tiny bit
past the end of the heap without crashing Erlang, this could account for the
behavior. The reason the numbers are always in the same places is because
there are heap reallocations as the heap grows, and if the bug is
consistent, it will always screw up in the same places.

To test this theory, I started Erlang with the +h flag that sets the heap
for each process to an initial size, to see if delaying heap allocation
fixed the problem.

It did fix the problem, sort of, well, it postponed it, as you will see.

G:\misc>erl +h 10000
Eshell V5.6  (abort with ^G)
1> nfail2:test(10000).
0 bad conversions: []
0
2> nfail2:test(100000).
2 bad conversions: [{5463,{134217728,",\n"}},{8964,{134217728,",\n"}}]
2

Notice that the places in which the error occurs are now different.

Hope this helps.


2008/9/1 Lev Walkin <vlm@REDACTED>

>
> Hi,
>
>
> we all love string to integer conversion routines, such as
> string:to_integer/1 or erlang:list_to_integer/1.
>
> The functions serve us well and indeed provide us with advertised
> functionality almost every time.
>
> However, we've noticed some oddity on our production system which,
> after two weeks of jaw-dropping musings, has boiled down to a strange
> idempotence violation in the string:to_integer/1 implementation.
>
> To those of you impatient enough, please look into the code
> attached and try running it as nfail:test(10000) and go down from
> there.
>
> Here's that string:to_integer/1 function, having the following
> signature:
>
>    string:to_integer(String) -> {Int,Rest} | {error,Reason}
>
> Here's a typical invocation resulting in a number and the
> rest of the unparsed string returned:
>
> 36> string:to_integer("134217728,\n").
> {134217728,",\n"}
> 37>
>
> What would happen if we did it once more?
>
> 37> string:to_integer("134217728,\n").
> {134217728,",\n"}
> 38>
>
> By this time we can be reasonably sure that string:to_integer/1 will
> return the same output given the same input. We have seen that this
> is indeed the case by testing it twice. Could it be that testing
> it N times would result in a bad behavior? Nah, unlikely, you say.
>
> If you haven't looked at the attached code, it is time to do so.
> In the code, we create a list of N results of the string:to_integer/1
> application, like this:
>
>        iterate(0, Acc) -> Acc;
>        iterate(N, Acc) ->
>                iterate(N - 1,
>                        [case string:to_integer("134217728,\n") of
>                                {Int, _} -> Int
>                        end | Acc]).
>
> This code utilizes tail recursion with an accumulator list
> which gets prepended N times by string:to_integer/1 output,
> undoubtedly an integer. (Implementation with a map over lists:seq()
> output can do as well).
>
> So we spawn and run this iterate/2 function, appropriately checking
> that the list consists only of integers with value 134217728:
>
> test(N) ->
>        {Self, Ref} = {self(), make_ref()},
>        spawn_opt(fun()->
>                L = iterate(N, []),
>                % Filter out non-conforming entries
>                BadList = [X || X <- L, X =/= 134217728],
>                BadLen = length(BadList),
>                % Here, BadLen should always be 0!
>                io:format("~b bad conversions: ~p~n", [BadLen, BadList]),
>                Self ! {done, Ref, BadLen}
>                end,
>                [link,{fullsweep_after,0}]),
>        receive {done, Ref, Len} -> Len end.
>
> Based on the smoke tests above, this code should always result
> in something like this:
>
>        38> nfail:test(100).
>        0 bad conversions: []
>        0
>        39>
>
> But let's start testing it thorougly, as if not believing ourselves
> that such a simple function could possibly fail at times:
>
>        [vlm@REDACTED:~]> erl
>        Erlang (BEAM) emulator version 5.6.3 [source] [async-threads:0]
> [kernel-poll:false]
>
>        Eshell V5.6.3  (abort with ^G)
>        1> c(nfail).
>        {ok,nfail}
>        2> nfail:test(1).
>        0 bad conversions: []
>        0
>        3> nfail:test(2).
>        0 bad conversions: []
>        0
>        4> nfail:test(100).
>        0 bad conversions: []
>        0
>        5> nfail:test(1000).
>        2 bad conversions: [{134217728,",\n"},{134217728,",\n"}]
>        2
>        6> nfail:test(10000).
>        5 bad conversions: [{134217728,",\n"},
>                            {134217728,",\n"},
>                            {134217728,",\n"},
>                            {134217728,",\n"},
>                            {134217728,",\n"}]
>        5
>        7>
>        ...
>        32> nfail:test(810).
>        0 bad conversions: []
>        0
>        33> nfail:test(811).
>        2 bad conversions: [{134217728,",\n"},{134217728,",\n"}]
>        2
>        34> nfail:test(810).
>        0 bad conversions: []
>        0
>        35> nfail:test(811).
>        2 bad conversions: [{134217728,",\n"},{134217728,",\n"}]
>        2
>        36>
>
>
> As you see, string:to_integer/1 consistently generates bad entries
> which contain {134217728,",\n"} instead of 134217728, but only
> when the number of invocation reaches about 1000 (811 in this
> particular sequence of steps).
>
> Perhaps this is a platform glitch? The above code was being executed
> on a ppc (G4) Mac OS X 10.5 with R12B-3 (32-bit) built from scratch. Here's
> what Sun sparc v9 with Solaris 10 thinks about that test case:
>
>        [vlm@REDACTED:~]> erl
>        Erlang (BEAM) emulator version 5.6.3 [source] [async-threads:0]
> [hipe] [kernel-poll:false]
>
>        Eshell V5.6.3  (abort with ^G)
>        1> c(nfail).
>        {ok,nfail}
>        2> nfail:test(100).
>        1 bad conversions: [{134217728,",\n"}]
>        1
>        3> nfail:test(1000).
>        3 bad conversions:
> [{134217728,",\n"},{134217728,",\n"},{134217728,",\n"}]
>        3
>        4> nfail:test(10).
>        0 bad conversions: []
>        0
>        5> nfail:test(50).
>        0 bad conversions: []
>        0
>        6> c(nfail, [native]).
>        {ok,nfail}
>        7> nfail:test(500).
>        4 bad conversions: [{134217728,",\n"},
>                            {134217728,",\n"},
>                            {134217728,",\n"},
>                            {134217728,",\n"}]
>        4
>        8>
>
> And here's what 64-bit Intel box (am64, FreeBSD 6.3) thinks:
>
>        [vlm@REDACTED ~]$ erl
>        Erlang (BEAM) emulator version 5.6.3 [source] [64-bit]
> [async-threads:0] [hipe] [kernel-poll:false]
>
>        Eshell V5.6.3  (abort with ^G)
>        1> c(nfail).
>        {ok,nfail}
>        2> nfail:test(100).
>        0 bad conversions: []
>        0
>        3> nfail:test(1000).
>        0 bad conversions: []
>        0
>        4> nfail:test(10000).
>        0 bad conversions: []
>        0
>        5> nfail:test(100000).
>        0 bad conversions: []
>        0
>        6> nfail:test(1000000).
>        0 bad conversions: []
>        0
>        7>
>
> Oops, it went very well, suprisingly. Perhaps, it is a purely
> non-Intel chip problem? Let's try it on a non 64bit machine,
> such as Pentium D under Microsoft Windows Vista™ in 32-bit mode:
>
>        Erlang (BEAM) emulator version 5.6.3 [smp:2] [async-threads:0]
>
>        Eshell V5.6.3  (abort with ^G)
>        1> c('c:/tmp/nfail.erl').
>        {ok,nfail}
>        2> nfail:test(100).
>        0 bad conversions: []
>        0
>        3> nfail:test(1000).
>        2 bad conversions: [{134217728,",\n"},{134217728,",\n"}]
>        2
>        4>
>
> Aha! See the pattern: 32-bit Erlang installations on many hardware
> platforms are having very similar problems. They are unable to
> consistently convert a string containing an integer into an integer
> value. Incidentally enough, the integer value for which Erlang
> starts to misbehave is 134217728, which is 2^27. Trying it with
> value "134217727" or lower does not create this idempotence problem.
> Trying R11B-5 under Windows Vista™ 32-bit does not exhibit such
> problem either, so it must be a specific issue to R12.
>
> Please advise.
>
>
> P.S. Thanks to Denis Titoruk and Vladimir Serov for investigating
> this issue and coming up with a short test case.
>
> --
> Lev Walkin
> vlm@REDACTED
>
> -module(nfail).
> -export([test/1]).
>
> iterate(0, Acc) -> Acc;
> iterate(N, Acc) ->
>        iterate(N - 1,
>                [case string:to_integer("134217728,\n") of
>                        {Int, _} -> Int
>                end | Acc]).
>
> test(N) ->
>        {Self, Ref} = {self(), make_ref()},
>        spawn_opt(fun()->
>                L = iterate(N, []),
>                BadList = [X || X <- L, X =/= 134217728],
>                BadLen = length(BadList),
>                io:format("~b bad conversions: ~p~n", [BadLen, BadList]),
>                Self ! {done, Ref, BadLen}
>                end,
>                [link,{fullsweep_after,0}]),
>        receive {done, Ref, Len} -> Len end.
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-bugs
>



-- 
For every expert there is an equal and opposite expert - Arthur C. Clarke
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20080902/021f24c4/attachment.htm>


More information about the erlang-bugs mailing list