From lukas@REDACTED Mon Aug 4 10:56:13 2014 From: lukas@REDACTED (Lukas Larsson) Date: Mon, 4 Aug 2014 10:56:13 +0200 Subject: [erlang-bugs] erl_lock_count segfault In-Reply-To: References: Message-ID: <53DF4AAD.3050008@erlang.org> Hello, This happens when malloc returns NULL. I'll add a check so that there is a nicer error message, but there is not much we can do when there is no memory left. Lukas On 25/07/14 17:49, Louis-Philippe Gauthier wrote: > Hi, > Looks like in some certain conditions I get a segfault when starting > the VM when compiled with lock counter. > > https://gist.github.com/lpgauth/27e66f7a2104d8b5af74 > > LP > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ingela.Anderton.Andin@REDACTED Mon Aug 4 11:28:42 2014 From: Ingela.Anderton.Andin@REDACTED (Ingela Anderton Andin) Date: Mon, 4 Aug 2014 11:28:42 +0200 Subject: [erlang-bugs] inets stand_alone mode In-Reply-To: <53D94975.1020107@gmail.com> References: <53D94975.1020107@gmail.com> Message-ID: <53DF524A.2020507@ericsson.com> Hi! On 07/30/2014 09:37 PM, Michael Truog wrote: > Hi, > > I ran into a problem that exists in 17.1.2 with inets between inets mode > and stand_along mode with httpc: > > This works: > 1> application:start(inets). > ok > 2> inets:start(httpc, [{profile, foobar}], inets). > {ok,<0.53.0>} > 3> inets:start(httpc, [{profile, foobar}], inets). > {error,{already_started,<0.53.0>}} > > This is the problem (unable to use a try/catch to get the exit, since > that is within an inets process): > 1> inets:start(httpc, [{profile, foobar}], stand_alone). > {ok,<0.35.0>} > 2> inets:start(httpc, [{profile, foobar}], stand_alone). You are trying to start the same profile twice that will not work fine. You can start a profile and then access that profile from different processes, but you can only start a profile once. A profile should be viewed in the same way as an incognito windows in chrome. Regards Ingela Erlang/OTP Team - Ericsson AB > > =ERROR REPORT==== 30-Jul-2014::12:24:22 === > ** Generic server <0.35.0> terminating > ** Last message in was {'EXIT',<0.33.0>, > {'EXIT', > {badarg, > [{ets,new, > [stand_alone_foobar__session_db, > [public,set,named_table,{keypos,2}]], > []}, > {httpc_manager,do_init,2, > [{file,"httpc_manager.erl"},{line,421}]}, > {httpc_manager,init,1, > [{file,"httpc_manager.erl"},{line,406}]}, > {gen_server,init_it,6, > [{file,"gen_server.erl"},{line,306}]}, > {proc_lib,init_p_do_apply,3, > [{file,"proc_lib.erl"},{line,239}]}]}}} > ** When Server state == {state,[],stand_alone_foobar__handler_db, > {cookie_db,undefined,16402}, > stand_alone_foobar__session_db,stand_alone_foobar, > {options, > {undefined,[]}, > {undefined,[]}, > 0,2,5,120000,2,disabled,false,inet,default, > default,[]}} > ** Reason for termination == > ** {'EXIT',{badarg,[{ets,new, > [stand_alone_foobar__session_db, > [public,set,named_table,{keypos,2}]], > []}, > {httpc_manager,do_init,2, > [{file,"httpc_manager.erl"},{line,421}]}, > {httpc_manager,init,1, > [{file,"httpc_manager.erl"},{line,406}]}, > {gen_server,init_it,6, > [{file,"gen_server.erl"},{line,306}]}, > {proc_lib,init_p_do_apply,3, > [{file,"proc_lib.erl"},{line,239}]}]}} > ** exception exit: {'EXIT', > {badarg, > [{ets,new, > [stand_alone_foobar__session_db, > [public,set,named_table,{keypos,2}]], > []}, > {httpc_manager,do_init,2, > [{file,"httpc_manager.erl"},{line,421}]}, > {httpc_manager,init,1, > [{file,"httpc_manager.erl"},{line,406}]}, > {gen_server,init_it,6, > [{file,"gen_server.erl"},{line,306}]}, > {proc_lib,init_p_do_apply,3, > [{file,"proc_lib.erl"},{line,239}]}]}} > > I am not sure if this was a known issue, but it should be a bug, since > it seems valid that two standalone Erlang processes might use the same > httpc profile data in ets. > > Thanks, > Michael > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs From lukas@REDACTED Mon Aug 4 12:09:11 2014 From: lukas@REDACTED (Lukas Larsson) Date: Mon, 4 Aug 2014 12:09:11 +0200 Subject: [erlang-bugs] enif_make_int64 => -0 for INT64_MIN with gcc 4.9.1 In-Reply-To: References: Message-ID: <53DF5BC7.7010002@erlang.org> Hello, I've added fixes for the two places that you have found bugs. I'll see if I can find some more errors from the sanitizers. Lukas On 21/07/14 00:34, Tuncer Ayaz wrote: > On Sun, Jul 20, 2014 at 11:56 PM, Tomas Abrahamsson wrote: >> Hi, that sanitizer branch was most useful! I hope it will make it >> into erlang. This is a run of the previous code with the sanitizers >> merged into OTP-17.1.1: > There are more sanitizers you can try out. Usan is comparatively > low-impact, but the others cause considerable slowdowns, so don't be > surprised about that. Asan and Tsan are not cheap, but easier to use > than Valgrind. Also, you can try the same sanitizers with clang, but > the names might differ. > >> Erlang/OTP 17 [erts-6.1.2] [source-cc894a7] [smp:4:4] >> [async-threads:10] [kernel-poll:false] >> >> beam/utils.c:453:14: runtime error: negation of -9223372036854775808 >> cannot be represented in type 'long long int'; cast to an unsigned >> type to negate this value to itself >> >> beam/big.c:1548:4: runtime error: negation of -9223372036854775808 >> cannot be represented in type 'long long int'; cast to an unsigned >> type to negate this value to itself > I just tried and found beam/big.c:1548:4, too. > >> The relevant files do indeed look like they are negating INT64_MIN. >> (by the way, in utils.c, the erts_bld_sint64 takes else branch, >> as expected, but advances szp by 0.) >> >> Cramming in an (unsigned long long int) cast in there >> like the sanitizer error says, actually does make it >> print -9223372036854775808 in the erlang code, >> and no sanitizer error is printed anymore. >> >> utils.c >> 443 Eterm >> 444 erts_bld_sint64(Uint **hpp, Uint *szp, Sint64 si64) >> 445 { >> 446 Eterm res = THE_NON_VALUE; >> 447 if (IS_SSMALL(si64)) { >> 448 if (hpp) >> 449 res = make_small((Sint) si64); >> 450 } >> 451 else { >> 452 if (szp) >> 453 *szp += ERTS_SINT64_HEAP_SIZE(si64); >> 454 if (hpp) >> 455 res = erts_sint64_to_big(si64, hpp); >> 456 } >> 457 return res; >> 458 } >> >> big.h >> 101 #define ERTS_SINT64_HEAP_SIZE(X) \ >> 102 (IS_SSMALL((X)) \ >> 103 ? 0 \ >> 104 : ERTS_UINT64_BIG_HEAP_SIZE__((X) >= 0 ? (X) : -(X))) >> >> big.c >> 1540 Eterm erts_sint64_to_big(Sint64 x, Eterm **hpp) >> 1541 { >> 1542 Eterm *hp = *hpp; >> 1543 int neg; >> 1544 if (x >= 0) >> 1545 neg = 0; >> 1546 else { >> 1547 neg = 1; >> 1548 x = -x; >> 1549 } >> ... >> >> I tried changing big.h:104 into: >> : ERTS_UINT64_BIG_HEAP_SIZE__((X) >= 0 ? (X) : -((unsigned >> long long int)X))) >> and big.c:1548 into: >> x = -(unsigned long long int)x; >> >> This could be an interesting quick check property, to verify no >> undefined behaviours for various programs and values. >> >> BRs >> Tomas >> >> On Sun, Jul 20, 2014 at 10:00 PM, Tuncer Ayaz wrote: >>> On Sun, Jul 20, 2014 at 3:27 AM, Tomas Abrahamsson wrote: >>>> Hi, >>>> >>>> I upgraded gcc from version 4.8.1 to 4.9.1, recompiled Erlang/OTP-17.1.1 >>>> and my nif, and now enif_make_int64 creates -0 for INT64_MIN. >>>> >>>> % make test >>>> erl -s int64nif go -s erlang halt >>>> Erlang/OTP 17 [erts-6.1.1] [source] [smp:4:4] [async-threads:10] >>>> [kernel-poll:false] >>>> >>>> -0 >>>> >>>> When Erlang was compiled with gcc-4.8.1, it printed -9223372036854775808, >>>> I've attached the test programs, here are the important lines: >>>> >>>> go() -> >>>> io:format("~p~n", [int64_from_nif()]). >>>> >>>> The NIF C-code contains: >>>> >>>> static ERL_NIF_TERM >>>> int64_from_nif(ErlNifEnv *env, int argc, const ERL_NIF_TERM argv[]) >>>> { >>>> return enif_make_int64(env, INT64_MIN); >>>> } >>>> >>>> System information: >>>> >>>> OS: linux, 32-bit core i5, 3.14-1-686-pae (debian unstable) >>>> Erlang/OTP: built from scratch (both times) from the OTP-17.1.1 git tag. >>>> gcc: initially: gcc (Debian 4.8.1-10) 4.8.1 >>>> with this one, it prints -9223372036854775808 >>>> gcc: after upgrade: gcc (Debian 4.9.1-1) 4.9.1 >>>> with this one, it prints -0 >>>> >>>> The only change I did was apt-get install gcc and then >>>> rebuilding Erlang. >>> I can confirm this on linux amd64. >>> >>> 17.1.2 gcc 4.8.2 -> -9223372036854775808 >>> 17.1.2 gcc 4.9.0 -> -0 >>> 17.1.2 gcc 4.9.1 -> -0 >>> >>> 16B03-1 gcc 4.8.2 -> -9223372036854775808 >>> 16B03-1 gcc 4.9.0 -> 0 >>> 16B03-1 gcc 4.9.1 -> 0 >>> >>> >>> I suppose you're not able to distill this to stand-alone C code, right? >>> >>> I don't think it will reveal anything, but it's an easy way to >>> find bugs, so maybe you can try building with >>> -fsanitize=undefined. If you want to do that, I'd suggest to apply >>> the patch available at https://github.com/erlang/otp/pull/429 and >>> run "configure --enable-sanitizers=undefined". > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > From n.oxyde@REDACTED Mon Aug 4 12:28:59 2014 From: n.oxyde@REDACTED (Anthony Ramine) Date: Mon, 4 Aug 2014 12:28:59 +0200 Subject: [erlang-bugs] enif_make_int64 => -0 for INT64_MIN with gcc 4.9.1 In-Reply-To: <53DF5BC7.7010002@erlang.org> References: <53DF5BC7.7010002@erlang.org> Message-ID: <5263DD26-BFF3-429C-957A-24BE27669394@gmail.com> There is also https://github.com/erlang/otp/pull/388 -- Anthony Ramine Le 4 ao?t 2014 ? 12:09, Lukas Larsson a ?crit : > Hello, > > I've added fixes for the two places that you have found bugs. I'll see if I can find some more errors from the sanitizers. > > Lukas > On 21/07/14 00:34, Tuncer Ayaz wrote: >> On Sun, Jul 20, 2014 at 11:56 PM, Tomas Abrahamsson wrote: >>> Hi, that sanitizer branch was most useful! I hope it will make it >>> into erlang. This is a run of the previous code with the sanitizers >>> merged into OTP-17.1.1: >> There are more sanitizers you can try out. Usan is comparatively >> low-impact, but the others cause considerable slowdowns, so don't be >> surprised about that. Asan and Tsan are not cheap, but easier to use >> than Valgrind. Also, you can try the same sanitizers with clang, but >> the names might differ. >> >>> Erlang/OTP 17 [erts-6.1.2] [source-cc894a7] [smp:4:4] >>> [async-threads:10] [kernel-poll:false] >>> >>> beam/utils.c:453:14: runtime error: negation of -9223372036854775808 >>> cannot be represented in type 'long long int'; cast to an unsigned >>> type to negate this value to itself >>> >>> beam/big.c:1548:4: runtime error: negation of -9223372036854775808 >>> cannot be represented in type 'long long int'; cast to an unsigned >>> type to negate this value to itself >> I just tried and found beam/big.c:1548:4, too. >> >>> The relevant files do indeed look like they are negating INT64_MIN. >>> (by the way, in utils.c, the erts_bld_sint64 takes else branch, >>> as expected, but advances szp by 0.) >>> >>> Cramming in an (unsigned long long int) cast in there >>> like the sanitizer error says, actually does make it >>> print -9223372036854775808 in the erlang code, >>> and no sanitizer error is printed anymore. >>> >>> utils.c >>> 443 Eterm >>> 444 erts_bld_sint64(Uint **hpp, Uint *szp, Sint64 si64) >>> 445 { >>> 446 Eterm res = THE_NON_VALUE; >>> 447 if (IS_SSMALL(si64)) { >>> 448 if (hpp) >>> 449 res = make_small((Sint) si64); >>> 450 } >>> 451 else { >>> 452 if (szp) >>> 453 *szp += ERTS_SINT64_HEAP_SIZE(si64); >>> 454 if (hpp) >>> 455 res = erts_sint64_to_big(si64, hpp); >>> 456 } >>> 457 return res; >>> 458 } >>> >>> big.h >>> 101 #define ERTS_SINT64_HEAP_SIZE(X) \ >>> 102 (IS_SSMALL((X)) \ >>> 103 ? 0 \ >>> 104 : ERTS_UINT64_BIG_HEAP_SIZE__((X) >= 0 ? (X) : -(X))) >>> >>> big.c >>> 1540 Eterm erts_sint64_to_big(Sint64 x, Eterm **hpp) >>> 1541 { >>> 1542 Eterm *hp = *hpp; >>> 1543 int neg; >>> 1544 if (x >= 0) >>> 1545 neg = 0; >>> 1546 else { >>> 1547 neg = 1; >>> 1548 x = -x; >>> 1549 } >>> ... >>> >>> I tried changing big.h:104 into: >>> : ERTS_UINT64_BIG_HEAP_SIZE__((X) >= 0 ? (X) : -((unsigned >>> long long int)X))) >>> and big.c:1548 into: >>> x = -(unsigned long long int)x; >>> >>> This could be an interesting quick check property, to verify no >>> undefined behaviours for various programs and values. >>> >>> BRs >>> Tomas >>> >>> On Sun, Jul 20, 2014 at 10:00 PM, Tuncer Ayaz wrote: >>>> On Sun, Jul 20, 2014 at 3:27 AM, Tomas Abrahamsson wrote: >>>>> Hi, >>>>> >>>>> I upgraded gcc from version 4.8.1 to 4.9.1, recompiled Erlang/OTP-17.1.1 >>>>> and my nif, and now enif_make_int64 creates -0 for INT64_MIN. >>>>> >>>>> % make test >>>>> erl -s int64nif go -s erlang halt >>>>> Erlang/OTP 17 [erts-6.1.1] [source] [smp:4:4] [async-threads:10] >>>>> [kernel-poll:false] >>>>> >>>>> -0 >>>>> >>>>> When Erlang was compiled with gcc-4.8.1, it printed -9223372036854775808, >>>>> I've attached the test programs, here are the important lines: >>>>> >>>>> go() -> >>>>> io:format("~p~n", [int64_from_nif()]). >>>>> >>>>> The NIF C-code contains: >>>>> >>>>> static ERL_NIF_TERM >>>>> int64_from_nif(ErlNifEnv *env, int argc, const ERL_NIF_TERM argv[]) >>>>> { >>>>> return enif_make_int64(env, INT64_MIN); >>>>> } >>>>> >>>>> System information: >>>>> >>>>> OS: linux, 32-bit core i5, 3.14-1-686-pae (debian unstable) >>>>> Erlang/OTP: built from scratch (both times) from the OTP-17.1.1 git tag. >>>>> gcc: initially: gcc (Debian 4.8.1-10) 4.8.1 >>>>> with this one, it prints -9223372036854775808 >>>>> gcc: after upgrade: gcc (Debian 4.9.1-1) 4.9.1 >>>>> with this one, it prints -0 >>>>> >>>>> The only change I did was apt-get install gcc and then >>>>> rebuilding Erlang. >>>> I can confirm this on linux amd64. >>>> >>>> 17.1.2 gcc 4.8.2 -> -9223372036854775808 >>>> 17.1.2 gcc 4.9.0 -> -0 >>>> 17.1.2 gcc 4.9.1 -> -0 >>>> >>>> 16B03-1 gcc 4.8.2 -> -9223372036854775808 >>>> 16B03-1 gcc 4.9.0 -> 0 >>>> 16B03-1 gcc 4.9.1 -> 0 >>>> >>>> >>>> I suppose you're not able to distill this to stand-alone C code, right? >>>> >>>> I don't think it will reveal anything, but it's an easy way to >>>> find bugs, so maybe you can try building with >>>> -fsanitize=undefined. If you want to do that, I'd suggest to apply >>>> the patch available at https://github.com/erlang/otp/pull/429 and >>>> run "configure --enable-sanitizers=undefined". >> _______________________________________________ >> erlang-bugs mailing list >> erlang-bugs@REDACTED >> http://erlang.org/mailman/listinfo/erlang-bugs >> > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs From louis-philippe.gauthier@REDACTED Mon Aug 4 16:02:53 2014 From: louis-philippe.gauthier@REDACTED (Louis-Philippe Gauthier) Date: Mon, 4 Aug 2014 10:02:53 -0400 Subject: [erlang-bugs] erl_lock_count segfault In-Reply-To: <53DF4AAD.3050008@erlang.org> References: <53DF4AAD.3050008@erlang.org> Message-ID: Are you sure? This happens on a system with 30+ GB of free memory... On Mon, Aug 4, 2014 at 4:56 AM, Lukas Larsson wrote: > Hello, > > This happens when malloc returns NULL. I'll add a check so that there is a > nicer error message, but there is not much we can do when there is no > memory left. > > Lukas > > On 25/07/14 17:49, Louis-Philippe Gauthier wrote: > > Hi, > Looks like in some certain conditions I get a segfault when starting the > VM when compiled with lock counter. > > https://gist.github.com/lpgauth/27e66f7a2104d8b5af74 > > LP > > > _______________________________________________ > erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lukas@REDACTED Mon Aug 4 16:20:34 2014 From: lukas@REDACTED (Lukas Larsson) Date: Mon, 4 Aug 2014 16:20:34 +0200 Subject: [erlang-bugs] erl_lock_count segfault In-Reply-To: References: <53DF4AAD.3050008@erlang.org> Message-ID: <53DF96B2.5060308@erlang.org> Well, you can never be entierly sure, but it fails in this line[1] and I don't really see that happening unless malloc returns NULL. Do you still have the core file? Could you do a "bt full"+"info registers"+"disassemble lcnt_thread_data_alloc" on the thread that fails? Lukas [1]: https://github.com/erlang/otp/blob/bbfc75ea8795d26a2fe9254f3f646e761f2ad61e/erts/emulator/beam/erl_lock_count.c#L154 On 04/08/14 16:02, Louis-Philippe Gauthier wrote: > Are you sure? This happens on a system with 30+ GB of free memory... > > > On Mon, Aug 4, 2014 at 4:56 AM, Lukas Larsson > wrote: > > Hello, > > This happens when malloc returns NULL. I'll add a check so that > there is a nicer error message, but there is not much we can do > when there is no memory left. > > Lukas > > On 25/07/14 17:49, Louis-Philippe Gauthier wrote: >> Hi, >> Looks like in some certain conditions I get a segfault when >> starting the VM when compiled with lock counter. >> >> https://gist.github.com/lpgauth/27e66f7a2104d8b5af74 >> >> LP >> >> >> _______________________________________________ >> erlang-bugs mailing list >> erlang-bugs@REDACTED >> http://erlang.org/mailman/listinfo/erlang-bugs > > > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs -------------- next part -------------- An HTML attachment was scrubbed... URL: From raimo+erlang-bugs@REDACTED Wed Aug 6 11:51:33 2014 From: raimo+erlang-bugs@REDACTED (Raimo Niskanen) Date: Wed, 6 Aug 2014 11:51:33 +0200 Subject: [erlang-bugs] snmp agent inform w/AES privacy not working In-Reply-To: References: Message-ID: <20140806095133.GA12357@erix.ericsson.se> Sorry about the looong delay but I have just created a ticket for this, now awaiting to be prioritized... / Raimo On Tue, Feb 25, 2014 at 11:56:32AM -0600, Daniel Goertzen wrote: > The SNMP agent AES initialization vector calculation is definitely wrong. > The IV is composed from the authoritative engine boots, engine time, and a > random locally generated number. The agent is currently always using the > *local* engine to get engine boots and engine time, which happens to be > correct for GET, SET, and TRAP, but is wrong for INFORM. > > The attached patch fixes it. When composing a packet for transmission, the > existing code collects the correct engine parameters, so this patch just > uses those for the AES IV instead of going off and getting the wrong local > engine params. The patch looks bigger than it really is because the order > of packet composition had to be changed slightly. > > With this patch applied, I am able to send AES encrypted informs. AES > encrypted traps also continued to work. > > Cheers, > Dan. > > > On Mon, Feb 24, 2014 at 4:57 PM, Daniel Goertzen > wrote: > > > I am struggling to get SNMP informs with AES privacy working. I have no > > problems with DES privacy on informs. > > > > In snmpa_usm.erl I see that the *local engine* boots and time is passed to > > snmp_usm:aes_encrypt() which forms part of the IV.... > > > > > > > > However RFC 3826 states that the *authoritative* engine boots and time > > should be used, and in the case of informs the authoritative engine is the > > inform target engine, not the local engine.... > > > > [from RFC 3826] > > > > 3.1.2.1. AES Encryption Key and IV > > > > The first 128 bits of the localized key Kul are used as the AES > > encryption key. The 128-bit IV is obtained as the concatenation of > > the authoritative SNMP engine's 32-bit snmpEngineBoots, the SNMP > > engine's 32-bit snmpEngineTime, and a local 64-bit integer. The 64- > > bit integer is initialized to a pseudo-random value at boot time. > > > > > > > > Could this be why AES privacy is not working for informs? > > > > Dan. > > > diff --git a/lib/snmp/src/agent/snmpa_usm.erl b/lib/snmp/src/agent/snmpa_usm.erl > index 719ea4e..0c3528a 100644 > --- a/lib/snmp/src/agent/snmpa_usm.erl > +++ b/lib/snmp/src/agent/snmpa_usm.erl > @@ -474,6 +474,23 @@ generate_outgoing_msg(Message, SecEngineID, SecName, SecData, SecLevel, > _ -> % 3.1.1a > SecData > end, > + %% 3.1.6 > + SnmpEngineID = LocalEngineID, > + ?vtrace("generate_outgoing_msg -> SnmpEngineID: ~p [3.1.6]", > + [SnmpEngineID]), > + {MsgAuthEngineBoots, MsgAuthEngineTime} = > + case snmp_misc:is_auth(SecLevel) of > + false when SecData =:= [] -> % not a response > + {0, 0}; > + false when UserName =:= "" -> % reply (report) to discovery step 1 > + {0, 0}; > + true when SecEngineID =/= SnmpEngineID -> > + {get_engine_boots(SecEngineID), > + get_engine_time(SecEngineID)}; > + _ -> > + {get_local_engine_boots(SnmpEngineID), > + get_local_engine_time(SnmpEngineID)} > + end, > %% 3.1.4 > ?vtrace("generate_outgoing_msg -> [3.1.4]" > "~n UserName: ~p" > @@ -482,24 +499,7 @@ generate_outgoing_msg(Message, SecEngineID, SecName, SecData, SecLevel, > [UserName, AuthProtocol, PrivProtocol]), > ScopedPduBytes = Message#message.data, > {ScopedPduData, MsgPrivParams} = > - encrypt(ScopedPduBytes, PrivProtocol, PrivKey, SecLevel), > - SnmpEngineID = LocalEngineID, > - ?vtrace("generate_outgoing_msg -> SnmpEngineID: ~p [3.1.6]", > - [SnmpEngineID]), > - %% 3.1.6 > - {MsgAuthEngineBoots, MsgAuthEngineTime} = > - case snmp_misc:is_auth(SecLevel) of > - false when SecData =:= [] -> % not a response > - {0, 0}; > - false when UserName =:= "" -> % reply (report) to discovery step 1 > - {0, 0}; > - true when SecEngineID =/= SnmpEngineID -> > - {get_engine_boots(SecEngineID), > - get_engine_time(SecEngineID)}; > - _ -> > - {get_local_engine_boots(SnmpEngineID), > - get_local_engine_time(SnmpEngineID)} > - end, > + encrypt(ScopedPduBytes, PrivProtocol, PrivKey, SecLevel, MsgAuthEngineBoots, MsgAuthEngineTime), > %% 3.1.5 - 3.1.7 > ?vtrace("generate_outgoing_msg -> [3.1.5 - 3.1.7]",[]), > UsmSecParams = > @@ -560,12 +560,14 @@ generate_discovery_msg(Message, > end > end, > ScopedPduBytes = Message#message.data, > + Boots = 0, > + Time = 0, > {ScopedPduData, MsgPrivParams} = > - encrypt(ScopedPduBytes, PrivProtocol, PrivKey, SecLevel), > + encrypt(ScopedPduBytes, PrivProtocol, PrivKey, SecLevel, Boots, Time), > UsmSecParams = > #usmSecurityParameters{msgAuthoritativeEngineID = SecEngineID, > - msgAuthoritativeEngineBoots = 0, % Boots, > - msgAuthoritativeEngineTime = 0, % Time, > + msgAuthoritativeEngineBoots = Boots, > + msgAuthoritativeEngineTime = Time, > msgUserName = UserName, > msgPrivacyParameters = MsgPrivParams}, > Message2 = Message#message{data = ScopedPduData}, > @@ -574,14 +576,14 @@ generate_discovery_msg(Message, > > > %% Ret: {ScopedPDU, MsgPrivParams} - both are already encoded as OCTET STRINGs > -encrypt(Data, PrivProtocol, PrivKey, SecLevel) -> > +encrypt(Data, PrivProtocol, PrivKey, SecLevel, AuthEngineBoots, AuthEngineTime) -> > case snmp_misc:is_priv(SecLevel) of > false -> % 3.1.4b > ?vtrace("encrypt -> 3.1.4b",[]), > {Data, []}; > true -> % 3.1.4a > ?vtrace("encrypt -> 3.1.4a",[]), > - case (catch try_encrypt(PrivProtocol, PrivKey, Data)) of > + case (catch try_encrypt(PrivProtocol, PrivKey, Data, AuthEngineBoots, AuthEngineTime)) of > {ok, ScopedPduData, MsgPrivParams} -> > ?vtrace("encrypt -> encrypted - now encode tag",[]), > {snmp_pdus:enc_oct_str_tag(ScopedPduData), MsgPrivParams}; > @@ -596,12 +598,12 @@ encrypt(Data, PrivProtocol, PrivKey, SecLevel) -> > end > end. > > -try_encrypt(?usmNoPrivProtocol, _PrivKey, _Data) -> % 3.1.2 > +try_encrypt(?usmNoPrivProtocol, _PrivKey, _Data, _AuthEngineBoots, _AuthEngineTime) -> % 3.1.2 > error(unsupportedSecurityLevel); > -try_encrypt(?usmDESPrivProtocol, PrivKey, Data) -> > +try_encrypt(?usmDESPrivProtocol, PrivKey, Data, _AuthEngineBoots, _AuthEngineTime) -> > des_encrypt(PrivKey, Data); > -try_encrypt(?usmAesCfb128Protocol, PrivKey, Data) -> > - aes_encrypt(PrivKey, Data). > +try_encrypt(?usmAesCfb128Protocol, PrivKey, Data, AuthEngineBoots, AuthEngineTime) -> > + aes_encrypt(PrivKey, Data, AuthEngineBoots, AuthEngineTime). > > > authenticate_outgoing(Message, UsmSecParams, > @@ -654,10 +656,8 @@ get_des_salt() -> > EngineBoots = snmp_framework_mib:get_engine_boots(), > [?i32(EngineBoots), ?i32(SaltInt)]. > > -aes_encrypt(PrivKey, Data) -> > - EngineBoots = snmp_framework_mib:get_engine_boots(), > - EngineTime = snmp_framework_mib:get_engine_time(), > - snmp_usm:aes_encrypt(PrivKey, Data, fun get_aes_salt/0, > +aes_encrypt(PrivKey, Data, EngineBoots, EngineTime) -> > + snmp_usm:aes_encrypt(PrivKey, Data, fun get_aes_salt/0, > EngineBoots, EngineTime). > > aes_decrypt(PrivKey, UsmSecParams, EncData) -> > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs -- / Raimo Niskanen, Erlang/OTP, Ericsson AB From raimo+erlang-bugs@REDACTED Wed Aug 6 12:10:04 2014 From: raimo+erlang-bugs@REDACTED (Raimo Niskanen) Date: Wed, 6 Aug 2014 12:10:04 +0200 Subject: [erlang-bugs] IPv6 problems in erlang's SNMP stack In-Reply-To: <1390558255.31834.17.camel@ax-sze> References: <1390558255.31834.17.camel@ax-sze> Message-ID: <20140806101004.GB12357@erix.ericsson.se> I just would like to inform you all that in the upcoming 17.2 OTP patch release we are starting to fix snmp for IPv6. The agent should be able to handle IPv4+IPv6, but the manager just IPv4 or IPv6. The documentation is not updated, though... And the code for creating configurations is not fully updated. Dual stack manager is in the pipeline. The new code is supposed to be backwards compatible unless you have written a custom net_if process. And of course there might be bugs. Best regards / Raimo Niskanen On Fri, Jan 24, 2014 at 11:10:55AM +0100, Stefan Zegenhagen wrote: > Dear all, > > > First problem: I found no easy way to pass the "inet6" socket option > into erlang's snmpa_net_if implementation to open the UDP socket in IPv6 > mode. Such an option would be required to convince erlang's SNMP stack > to talk IPv6. > > Second problem: when a socket is opened in IPv6 mode on real dual-stack > operating systems, it is possible to query the SNMP agent via IPv4 *AND* > IPv6 simultaneously. Sending traps to IPv6 trap receivers works as well, > but trap receivers that are configured with IPv4 addresses always fail > with an eafnosupport error. SNMP-TARGET-MIB explicitly allows trap > receivers to have different IP address types, so I would expect such a > feature to be supported. > > I know that it is possible (the Linux kernel sends out IPv6 packets > addressed to "::FFFF:A.B.C.D" via IPv4). I also know that there is a > portability problem, since operating systems may behave differently when > it comes to dual-stack operation details (its probably best being fixed > in the socket driver). Still, this is an important issue for us at the > moment and I guess others will face the same problem when IPv6 spreads > wider. > > > Kind regards, > > -- > Dr. Stefan Zegenhagen > > arcutronix GmbH > Garbsener Landstr. 10 > 30419 Hannover > Germany > > Tel: +49 511 277-2734 > Fax: +49 511 277-2709 > Email: stefan.zegenhagen@REDACTED > Web: www.arcutronix.com > > *Synchronize the Ethernet* > > General Managers: Dipl. Ing. Juergen Schroeder, Dr. Josef Gfrerer - > Legal Form: GmbH, Registered office: Hannover, HRB 202442, Amtsgericht > Hannover; Ust-Id: DE257551767. > > Please consider the environment before printing this message. > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs -- / Raimo Niskanen, Erlang/OTP, Ericsson AB From rickard@REDACTED Wed Aug 6 17:15:54 2014 From: rickard@REDACTED (Rickard Green) Date: Wed, 6 Aug 2014 17:15:54 +0200 Subject: [erlang-bugs] Weird behaviour of gen_tcp:send/2 and erlang:port_command/3 with nosuspend to the same port on R16B03 In-Reply-To: References: <53A9B544.1080508@erlang.org> Message-ID: Hi! Sorry for the late response. We've been understaffed during vacation times. A fix for this issue can be found in the rickard/nosuspend-bug/OTP-12082 branch in my github repo . The fix is based on OTP-17.1 and will be included in the next maintenance patch package (most likely as it is in the above branch). Release note for the fix: OTP-12082 A bug in the VM code implementing sending of signals to ports could cause the receiving port queue to remain in a busy state forever. When this state had been reached, processes sending command signals to the port either got suspended forever, or, if the nosuspend feature was used, always failed to send to the port. In order for this bug to be triggered on a port, one had to at least once utilize the nosuspend functionality when passing a signal to the port. This by either calling -- port_command(Port, Data, [nosuspend | Options]), -- erlang:send(Port, {PortOwner, {command, Data}}, [nosuspend | Options]), -- erlang:send_nosuspend(Port, {PortOwner, {command, Data}}), or -- erlang:send_nosuspend(Port, {PortOwner, {command, Data}}, Options). Thanks Vasily Demidenok for reporting the issue, and Sergey Kudryashov for providing a testcase. Regards, Rickard Green, Erlang/OTP, Ericsson AB On Mon, Jul 28, 2014 at 11:16 AM, Vasily Demidenok wrote: > Hello again > The friend of mine faced with the same problem. He assumes it's not tcp > driver related problem, but port internals changes made in R16 release. > Downgrading to R15B03 fixed their problem. +n options with d/s/a params also > did not help. > > The app to reproduce bug: > https://github.com/kudryashov-sv/ErlangCPort.git > > > 2014-07-05 0:47 GMT+04:00 Vasily Demidenok : > >> The problem remains even if only erlang:port_command/3 with nosuspend >> option is used. (no calls for gen_tcp:send and many processes write to the >> same socket) >> >> >> 2014-06-24 21:28 GMT+04:00 Lukas Larsson : >> >>> Hello, >>> >>> I was able to reproduce your testcase after removing all the "msg" >>> printouts and starting a couple of clients at the same time. It seems that >>> the sockets are hitting the high_msgq_watermark limit and then as data gets >>> flushes they are not set to run again. I'll see if I can dig out what it is >>> that is causing this behavior. >>> >>> Lukas >>> >>> On 24/06/14 16:56, Vasily Demidenok wrote: >>> >>> Hello list, we faced with some gen_tcp related problems after switch from >>> erlang R15B03 to R16B03-01 >>> >>> The problem is as following: When server produce data faster then >>> consumer can handle, after >>> the out server's buffers are full and incoming client's buffers are full >>> gen_tcp:send/2 call on server side blocks forever in >>> erts_internal:port_command/3. After this, even when client consumes all >>> the data and the buffers are empty server process remains to be suspended >>> in that call >>> >>> This problem does not occur always, but quite often. >>> >>> Some details on implementation are below, I also shrink the example to >>> this small app so you can check the code: >>> https://github.com/define-null/tcp_failing_ex >>> >>> Server is implemented in such a way, that it listen on 8899 port, then >>> when client connect to it spawn main srv process and plenty of workers, >>> which start to write to this port after client send some special msg. The >>> main process is responsible for commands from the client and send responses >>> via gen_tcp:send/2, while workers try to write some stream data to the >>> client and use erang:port_command with no-suspend. So workers send only >>> up-to-date data, dropping any in case client is slow. >>> >>> The behaviour which we see is as following: >>> At first phase producer fills OS and erlang driver's buffers. Consumer >>> read data as it arrives and server drop data which it cannot send. So we see >>> buffer size growing on both side out queue of the server and in queue of the >>> client respectively >>> >>> After some moment in time, i guess when the buffers are completely >>> filled, server try respond to >>> ping message of the client, using gen_tcp:send/2 call. After that, it >>> blocks there forever, even after client consumes all the messages. The >>> situation does not change and the srv process remains in the suspended >>> state, while it's incoming buffer begins to grow when client send more ping >>> messages. >>> >>> Below is the output on the system with two slow clients, where for the >>> first client server's process is already blocked in gen_tcp:send/2 call, >>> while the second is served well. >>> >>> Every 2.0s: netstat -al | grep 8899 >>> Tue Jun 24 16:34:51 2014 >>> >>> tcp4 36 0 localhost.8899 localhost.63263 >>> ESTABLISHED >>> tcp4 0 0 localhost.63263 localhost.8899 >>> ESTABLISHED >>> tcp4 0 130990 localhost.8899 localhost.63257 >>> ESTABLISHED >>> tcp4 619190 0 localhost.63257 localhost.8899 >>> ESTABLISHED >>> tcp4 0 0 *.8899 *.* >>> LISTEN >>> >>> This is the output for the client process from github example, where we >>> see that >>> after send operation (ping msg) no incoming msg come any more. >>> {{2014,6,24},{16,33,28}}: msg >>> {{2014,6,24},{16,33,28}}: msg >>> {{2014,6,24},{16,33,28}}: msg >>> {{2014,6,24},{16,33,28}}: msg >>> {{2014,6,24},{16,33,28}}: msg >>> {{2014,6,24},{16,33,28}}: msg >>> {{2014,6,24},{16,33,28}}: msg >>> {{2014,6,24},{16,33,28}}: msg >>> {{2014,6,24},{16,33,28}}: msg >>> {{2014,6,24},{16,33,28}}: msg >>> {{2014,6,24},{16,33,48}} before send >>> {{2014,6,24},{16,33,48}} after send ok >>> {{2014,6,24},{16,34,9}} before send >>> {{2014,6,24},{16,34,9}} after send ok >>> {{2014,6,24},{16,34,30}} before send >>> {{2014,6,24},{16,34,30}} after send ok >>> {{2014,6,24},{16,34,51}} before send >>> {{2014,6,24},{16,34,51}} after send ok >>> {{2014,6,24},{16,35,12}} before send >>> .... >>> >>> Server blocked process output: >>> >>> {{2014,6,24},{16,33,21}}: <0.95.0> ping >>> {{2014,6,24},{16,33,21}} bsend: <0.95.0> >>> {{2014,6,24},{16,33,21}} asend: <0.95.0> ok >>> {{2014,6,24},{16,33,48}}: <0.95.0> ping >>> {{2014,6,24},{16,33,48}} bsend: <0.95.0> >>> %% (no asend message after it) >>> >>> (tcp_failing_node@REDACTED)1> erlang:process_info(pid(0,95,0)). >>> [{current_function,{erts_internal,port_command,3}}, >>> {initial_call,{proc_lib,init_p,5}}, >>> >>> Bug is not always reproducible, but occurs quite often. The problem is >>> that even >>> after server's out buffers are empty data does not arrive to the client, >>> and incoming buffer grow >>> as client send ping messages to the server. (So erlang:port_command/3 >>> with no-suspend always return false >>> when another main server process for this connection is suspended in >>> gen_tcp:send/2) >>> >>> And then it's getting only worse as already mentioned >>> >>> Every 2.0s: netstat -al | grep 8899 >>> Tue Jun 24 16:56:59 2014 >>> >>> tcp4 804 0 localhost.8899 localhost.63263 >>> ESTABLISHED >>> tcp4 0 0 localhost.63263 localhost.8899 >>> ESTABLISHED >>> tcp4 0 0 localhost.8899 localhost.63257 >>> ESTABLISHED >>> tcp4 0 0 localhost.63257 localhost.8899 >>> ESTABLISHED >>> tcp4 0 0 *.8899 *.* LISTEN >>> >>> We faced with this after switching to R16B03 from R15B03, I know there >>> were some changes in port_command handling, i guess why we got such >>> behaviour? >>> >>> >>> _______________________________________________ >>> erlang-bugs mailing list >>> erlang-bugs@REDACTED >>> http://erlang.org/mailman/listinfo/erlang-bugs >>> >>> >> > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > From rickard@REDACTED Fri Aug 8 00:44:37 2014 From: rickard@REDACTED (Rickard Green) Date: Fri, 8 Aug 2014 00:44:37 +0200 Subject: [erlang-bugs] r15b03-1 SEGV in erts_port_task_schedule() In-Reply-To: <21463.45069.249946.809608@gargle.gargle.HOWL> References: <21462.24011.626017.696972@gargle.gargle.HOWL> <21463.45069.249946.809608@gargle.gargle.HOWL> Message-ID: On Tue, Jul 29, 2014 at 4:30 PM, Mikael Pettersson wrote: > Mikael Pettersson writes: > > This is a followup to my previous report in > > , > > but it's for a different function in erl_port_task.c. > > > > We've gotten a new SEGV with r15b03-1. This time we managed to > > capture a truncated core dump (just threads list and registers, > > no thread stacks or heap memory): > > > > Program terminated with signal 11, Segmentation fault. > > #0 enqueue_task (ptp=, > > ptqp=) > > at beam/erl_port_task.c:327 > > 327 ptp->prev = ptqp->last; > > (gdb) bt > > #0 enqueue_task (ptp=, > > ptqp=) > > at beam/erl_port_task.c:327 > > #1 erts_port_task_schedule (id=, > > id@REDACTED=, > > pthp=, > > type=, > > event=, > > event_data=) > > at beam/erl_port_task.c:615 > > (gdb) > > > > The code that faulted is > > > > 0x00000000004b8203 <+419>: mov 0x10(%r15),%rax > > 0x00000000004b8207 <+423>: mov 0x10(%rsp),%rbx > > 0x00000000004b820c <+428>: movq $0x0,0x8(%rbx) > > => 0x00000000004b8214 <+436>: mov 0x8(%rax),%rcx > > 0x00000000004b8218 <+440>: mov %rax,0x10(%rbx) > > 0x00000000004b821c <+444>: mov %rcx,(%rbx) > > > > which is enqueue_task() [line 327] as inlined in erts_port_task_schedule() > > [line 615]. At this point, %rax is zero according to gdb's registers dump. > > > > The relevant part of erts_port_task_schedule() is: > > > > ==snip== > > if (!pp->sched.taskq) > > pp->sched.taskq = port_taskq_init(port_taskq_alloc(), pp); > > > > ASSERT(ptp); > > > > ptp->type = type; > > ptp->event = event; > > ptp->event_data = event_data; > > > > set_handle(ptp, pthp); > > > > switch (type) { > > case ERTS_PORT_TASK_FREE: > > erl_exit(ERTS_ABORT_EXIT, > > "erts_port_task_schedule(): Cannot schedule free task\n"); > > break; > > case ERTS_PORT_TASK_INPUT: > > case ERTS_PORT_TASK_OUTPUT: > > case ERTS_PORT_TASK_EVENT: > > erts_smp_atomic_inc_relb(&erts_port_task_outstanding_io_tasks); > > /* Fall through... */ > > default: > > enqueue_task(pp->sched.taskq, ptp); > > break; > > } > > ==snip== > > > > The SEGV implies that pp->sched.taskq is NULL at the call to enqueue_task(). > > > > The erts_smp_atomic_inc_relb() and set_handle() calls do not affect *pp, > > and I don't see any aliasing between *ptp and *pp, so the assignments to > > *ptp do not affect *pp either. > > > > So for pp->sched.taskq to be NULL at the bottom it would have to be NULL > > after the call to port_taskq_init(), which implies that port_taskq_alloc() > > returned NULL. > > > > port_taskq_alloc() is generated via ERTS_SCHED_PREF_QUICK_ALLOC_IMPL; > > if one expands that it becomes: > > > > void erts_alloc_n_enomem(ErtsAlcType_t,Uint) > > __attribute__((noreturn)); > > > > static __inline__ > > void *erts_alloc(ErtsAlcType_t type, Uint size) > > { > > void *res; > > res = (*erts_allctrs[(((type) >> (0)) & (15))].alloc)( > > (((type) >> (7)) & (255)), > > erts_allctrs[(((type) >> (0)) & (15))].extra, > > size); > > if (!res) > > erts_alloc_n_enomem((((type) >> (7)) & (255)), size); > > return res; > > } > > > > static __inline__ ErtsPortTaskQueue * port_taskq_alloc(void) > > { > > ErtsPortTaskQueue *res = port_taskq_pre_alloc(); > > if (!res) > > res = erts_alloc((4564), sizeof(ErtsPortTaskQueue)); > > return res; > > } > > > > But given this code, I don't see how erts_alloc() or port_taskq_alloc() > > could ever return NULL. > > > > Which leads me to suspect that there's a concurrency bug that's > > causing *pp to be clobbered behind our backs. > > > > Ideas? > Thanks for the excellent bug-report! I've found a concurrency bug (as you suspected) that is likely to have caused the crash you got. The fix can be found in the rickard/port-emigrate-bug/OTP-12084 branch in my github repo . The fix is based on the OTP_R15B03-1 tag. I've only briefly tested the fix, but will test it more thoroughly. If further changes are needed I'll post here again. During the call to erts_check_emigration_need() from erts_port_task_schedule() we may unlock the lock on the current run-queue and then nothing prevents another thread from migrating the port. The bug is that we did not check if a migration had been made after the call, but instead assumed that it either stayed on current run-queue or that we should emigrate it. This way the other thread could modify the pp->sched.taskq field under another run-queue lock at the same time as the thread calling erts_port_task_schedule() was also modifying it. This functionality was rewritten in erts-5.10 (R16), and this bug was automatically fixed by the rewrite, so R16 and later are not effected by this bug. The answers to your questions below apply to R15. Due to the rewrite of this functionality some parts differ as of R16. > I've studied erl_port_task.c and some related code, and think I > understand how the locking is _supposed_ to work: Before you can > access port->sched you have to erts_smp_runq_lock() the runq > stored (as an atomic integer) in port->run_queue. > > runq = erts_port_runq(port); > > does this automatically, but you then end up with two variables > that must stay coherent until you unlock. > Yes and this is what failed in erts_port_task_schedule(). > Q1. I see a number of places in erl_port_task.c where the code > temporarily releases the runq lock, calls out somewhere, and then > just locks the runq while still holding on to the original port. > What, if anything, guarantees that these two items still belong > together. Can't the port migrate to another runq while the runq > was unlocked? This depends on the current state of the port. A port can only migrate from one run-queue to another in the following situations: 1. When a port (not port-task) is being scheduled, i.e. a port is enqueued into a run-queue. (emigrate path) 2. When a port is being rescheduled after execution of port-tasks. This is more or less the same as above. (emigrate path) 3. A port can be stolen by other run-queues while it is queued on a run-queue. (immigrate path and work-stealing) That is, it either needs to be in a run-queue, or needs to be about to be put in a run-queue in order for it to be migrated. While it is executing tasks, it cannot be in a run-queue nor be enqueued into a run-queue, and can therefor not be migrated. The state changes from "waiting for tasks", "runnable" (in run-queue), and "executing" are made atomical by locking the run-queue lock of the run-queue that the port currently is assigned to (pp->run_queue). When migrating a port, one needs to lock both involved run-queue locks at the same time. > If so, then it seems we have to reload runq (if the > port is the object of interest) or revalidate and possibly bail out > if they are now no longer connected. Yes, unless we know that the port is in a state where it cannot be migrated. The bug was however a missed revalidation. The unlock/lock operations of the run-queue without reloading the run-queue from pp->run_queue made in erts_port_task_execute() are however safe. This since the port is in an executing state and therefor cannot be migrated. The scheduler thread that pops the port from its run-queue for execution set pp->sched.exe_taskq before unlocking the run-queue lock. Other threads detects that the port is executing since the pp->sched.exe_taskq field is set. Since it is not in any run-queue and flagged as executing (by pp->sched.exe_taskq), it cannot be migrated. That is, it is safe to unlock/lock the same run-queue without reloading the assigned run-queue until pp->sched.exe_taskq is cleared. > > Q2. Is erts_check_emigration_need(runq) _guaranteed_ to return a > value that is different from the given runq? Yes. > Looking at the code, > that's not obviously true. :-) erts_check_emigration_need() reads the emigration path set up by the check_balance() function (in erl_process.c). If there exist an emigration path on a run-queue, it should always point to another run-queue. I've taken a look at the code (end of check_balance() function) and I'm quite confident that it is ok. > If it ever returned the same runq it > was given, very bad things(TM) would happen... > Yes. > /Mikael Regards, Rickard Green, Erlang/OTP, Ericsson AB From mikpelinux@REDACTED Fri Aug 8 13:14:23 2014 From: mikpelinux@REDACTED (Mikael Pettersson) Date: Fri, 8 Aug 2014 13:14:23 +0200 Subject: [erlang-bugs] r15b03-1 SEGV in erts_port_task_schedule() In-Reply-To: References: <21462.24011.626017.696972@gargle.gargle.HOWL> <21463.45069.249946.809608@gargle.gargle.HOWL> Message-ID: <21476.45327.382601.465688@gargle.gargle.HOWL> Rickard Green writes: > On Tue, Jul 29, 2014 at 4:30 PM, Mikael Pettersson wrote: > > Mikael Pettersson writes: > > > This is a followup to my previous report in > > > , > > > but it's for a different function in erl_port_task.c. > > > > > > We've gotten a new SEGV with r15b03-1. This time we managed to > > > capture a truncated core dump (just threads list and registers, > > > no thread stacks or heap memory): > > > > > > Program terminated with signal 11, Segmentation fault. > > > #0 enqueue_task (ptp=, > > > ptqp=) > > > at beam/erl_port_task.c:327 > > > 327 ptp->prev = ptqp->last; > > > (gdb) bt > > > #0 enqueue_task (ptp=, > > > ptqp=) > > > at beam/erl_port_task.c:327 > > > #1 erts_port_task_schedule (id=, > > > id@REDACTED=, > > > pthp=, > > > type=, > > > event=, > > > event_data=) > > > at beam/erl_port_task.c:615 > > > (gdb) > > > > > > The code that faulted is > > > > > > 0x00000000004b8203 <+419>: mov 0x10(%r15),%rax > > > 0x00000000004b8207 <+423>: mov 0x10(%rsp),%rbx > > > 0x00000000004b820c <+428>: movq $0x0,0x8(%rbx) > > > => 0x00000000004b8214 <+436>: mov 0x8(%rax),%rcx > > > 0x00000000004b8218 <+440>: mov %rax,0x10(%rbx) > > > 0x00000000004b821c <+444>: mov %rcx,(%rbx) > > > > > > which is enqueue_task() [line 327] as inlined in erts_port_task_schedule() > > > [line 615]. At this point, %rax is zero according to gdb's registers dump. > > > > > > The relevant part of erts_port_task_schedule() is: > > > > > > ==snip== > > > if (!pp->sched.taskq) > > > pp->sched.taskq = port_taskq_init(port_taskq_alloc(), pp); > > > > > > ASSERT(ptp); > > > > > > ptp->type = type; > > > ptp->event = event; > > > ptp->event_data = event_data; > > > > > > set_handle(ptp, pthp); > > > > > > switch (type) { > > > case ERTS_PORT_TASK_FREE: > > > erl_exit(ERTS_ABORT_EXIT, > > > "erts_port_task_schedule(): Cannot schedule free task\n"); > > > break; > > > case ERTS_PORT_TASK_INPUT: > > > case ERTS_PORT_TASK_OUTPUT: > > > case ERTS_PORT_TASK_EVENT: > > > erts_smp_atomic_inc_relb(&erts_port_task_outstanding_io_tasks); > > > /* Fall through... */ > > > default: > > > enqueue_task(pp->sched.taskq, ptp); > > > break; > > > } > > > ==snip== > > > > > > The SEGV implies that pp->sched.taskq is NULL at the call to enqueue_task(). > > > > > > The erts_smp_atomic_inc_relb() and set_handle() calls do not affect *pp, > > > and I don't see any aliasing between *ptp and *pp, so the assignments to > > > *ptp do not affect *pp either. > > > > > > So for pp->sched.taskq to be NULL at the bottom it would have to be NULL > > > after the call to port_taskq_init(), which implies that port_taskq_alloc() > > > returned NULL. > > > > > > port_taskq_alloc() is generated via ERTS_SCHED_PREF_QUICK_ALLOC_IMPL; > > > if one expands that it becomes: > > > > > > void erts_alloc_n_enomem(ErtsAlcType_t,Uint) > > > __attribute__((noreturn)); > > > > > > static __inline__ > > > void *erts_alloc(ErtsAlcType_t type, Uint size) > > > { > > > void *res; > > > res = (*erts_allctrs[(((type) >> (0)) & (15))].alloc)( > > > (((type) >> (7)) & (255)), > > > erts_allctrs[(((type) >> (0)) & (15))].extra, > > > size); > > > if (!res) > > > erts_alloc_n_enomem((((type) >> (7)) & (255)), size); > > > return res; > > > } > > > > > > static __inline__ ErtsPortTaskQueue * port_taskq_alloc(void) > > > { > > > ErtsPortTaskQueue *res = port_taskq_pre_alloc(); > > > if (!res) > > > res = erts_alloc((4564), sizeof(ErtsPortTaskQueue)); > > > return res; > > > } > > > > > > But given this code, I don't see how erts_alloc() or port_taskq_alloc() > > > could ever return NULL. > > > > > > Which leads me to suspect that there's a concurrency bug that's > > > causing *pp to be clobbered behind our backs. > > > > > > Ideas? > > > > Thanks for the excellent bug-report! I've found a concurrency bug (as > you suspected) that is likely to have caused the crash you got. > > The fix can be found in the rickard/port-emigrate-bug/OTP-12084 branch > in my github repo > . > The fix is based on the OTP_R15B03-1 tag. I've only briefly tested the > fix, but will test it more thoroughly. If further changes are needed > I'll post here again. Thanks Rickard! The fix looks sane enough; is it safe (but possibly incomplete) to use right now, or do you want us to wait until you've done more testing? BTW, I have a debug patch in my own r15 branch which complains if it detects a mis-match when the runq lock is re-taken, and it triggered once this week when I ran mnesia's test suite. /Mikael From rickard@REDACTED Fri Aug 8 15:37:12 2014 From: rickard@REDACTED (Rickard Green) Date: Fri, 8 Aug 2014 15:37:12 +0200 Subject: [erlang-bugs] r15b03-1 SEGV in erts_port_task_schedule() In-Reply-To: <21476.45327.382601.465688@gargle.gargle.HOWL> References: <21462.24011.626017.696972@gargle.gargle.HOWL> <21463.45069.249946.809608@gargle.gargle.HOWL> <21476.45327.382601.465688@gargle.gargle.HOWL> Message-ID: <53E4D288.2060305@erlang.org> On 08/08/2014 01:14 PM, Mikael Pettersson wrote: > Rickard Green writes: > > On Tue, Jul 29, 2014 at 4:30 PM, Mikael Pettersson wrote: > > > Mikael Pettersson writes: > > > > This is a followup to my previous report in > > > > , > > > > but it's for a different function in erl_port_task.c. > > > > > > > > We've gotten a new SEGV with r15b03-1. This time we managed to > > > > capture a truncated core dump (just threads list and registers, > > > > no thread stacks or heap memory): > > > > > > > > Program terminated with signal 11, Segmentation fault. > > > > #0 enqueue_task (ptp=, > > > > ptqp=) > > > > at beam/erl_port_task.c:327 > > > > 327 ptp->prev = ptqp->last; > > > > (gdb) bt > > > > #0 enqueue_task (ptp=, > > > > ptqp=) > > > > at beam/erl_port_task.c:327 > > > > #1 erts_port_task_schedule (id=, > > > > id@REDACTED=, > > > > pthp=, > > > > type=, > > > > event=, > > > > event_data=) > > > > at beam/erl_port_task.c:615 > > > > (gdb) > > > > > > > > The code that faulted is > > > > > > > > 0x00000000004b8203 <+419>: mov 0x10(%r15),%rax > > > > 0x00000000004b8207 <+423>: mov 0x10(%rsp),%rbx > > > > 0x00000000004b820c <+428>: movq $0x0,0x8(%rbx) > > > > => 0x00000000004b8214 <+436>: mov 0x8(%rax),%rcx > > > > 0x00000000004b8218 <+440>: mov %rax,0x10(%rbx) > > > > 0x00000000004b821c <+444>: mov %rcx,(%rbx) > > > > > > > > which is enqueue_task() [line 327] as inlined in erts_port_task_schedule() > > > > [line 615]. At this point, %rax is zero according to gdb's registers dump. > > > > > > > > The relevant part of erts_port_task_schedule() is: > > > > > > > > ==snip== > > > > if (!pp->sched.taskq) > > > > pp->sched.taskq = port_taskq_init(port_taskq_alloc(), pp); > > > > > > > > ASSERT(ptp); > > > > > > > > ptp->type = type; > > > > ptp->event = event; > > > > ptp->event_data = event_data; > > > > > > > > set_handle(ptp, pthp); > > > > > > > > switch (type) { > > > > case ERTS_PORT_TASK_FREE: > > > > erl_exit(ERTS_ABORT_EXIT, > > > > "erts_port_task_schedule(): Cannot schedule free task\n"); > > > > break; > > > > case ERTS_PORT_TASK_INPUT: > > > > case ERTS_PORT_TASK_OUTPUT: > > > > case ERTS_PORT_TASK_EVENT: > > > > erts_smp_atomic_inc_relb(&erts_port_task_outstanding_io_tasks); > > > > /* Fall through... */ > > > > default: > > > > enqueue_task(pp->sched.taskq, ptp); > > > > break; > > > > } > > > > ==snip== > > > > > > > > The SEGV implies that pp->sched.taskq is NULL at the call to enqueue_task(). > > > > > > > > The erts_smp_atomic_inc_relb() and set_handle() calls do not affect *pp, > > > > and I don't see any aliasing between *ptp and *pp, so the assignments to > > > > *ptp do not affect *pp either. > > > > > > > > So for pp->sched.taskq to be NULL at the bottom it would have to be NULL > > > > after the call to port_taskq_init(), which implies that port_taskq_alloc() > > > > returned NULL. > > > > > > > > port_taskq_alloc() is generated via ERTS_SCHED_PREF_QUICK_ALLOC_IMPL; > > > > if one expands that it becomes: > > > > > > > > void erts_alloc_n_enomem(ErtsAlcType_t,Uint) > > > > __attribute__((noreturn)); > > > > > > > > static __inline__ > > > > void *erts_alloc(ErtsAlcType_t type, Uint size) > > > > { > > > > void *res; > > > > res = (*erts_allctrs[(((type) >> (0)) & (15))].alloc)( > > > > (((type) >> (7)) & (255)), > > > > erts_allctrs[(((type) >> (0)) & (15))].extra, > > > > size); > > > > if (!res) > > > > erts_alloc_n_enomem((((type) >> (7)) & (255)), size); > > > > return res; > > > > } > > > > > > > > static __inline__ ErtsPortTaskQueue * port_taskq_alloc(void) > > > > { > > > > ErtsPortTaskQueue *res = port_taskq_pre_alloc(); > > > > if (!res) > > > > res = erts_alloc((4564), sizeof(ErtsPortTaskQueue)); > > > > return res; > > > > } > > > > > > > > But given this code, I don't see how erts_alloc() or port_taskq_alloc() > > > > could ever return NULL. > > > > > > > > Which leads me to suspect that there's a concurrency bug that's > > > > causing *pp to be clobbered behind our backs. > > > > > > > > Ideas? > > > > > > > Thanks for the excellent bug-report! I've found a concurrency bug (as > > you suspected) that is likely to have caused the crash you got. > > > > The fix can be found in the rickard/port-emigrate-bug/OTP-12084 branch > > in my github repo > > . > > The fix is based on the OTP_R15B03-1 tag. I've only briefly tested the > > fix, but will test it more thoroughly. If further changes are needed > > I'll post here again. > > Thanks Rickard! The fix looks sane enough; is it safe (but possibly > incomplete) to use right now, or do you want us to wait until you've > done more testing? > It is safe to use. > BTW, I have a debug patch in my own r15 branch which complains if it > detects a mis-match when the runq lock is re-taken, and it triggered > once this week when I ran mnesia's test suite. > I'll do the same test. Please let me know if it should trigger for you with the port-emigrate-bug branch. Regards, Rickard > /Mikael > -- Rickard Green, Erlang/OTP, Ericsson AB. From rickard@REDACTED Fri Aug 8 16:08:49 2014 From: rickard@REDACTED (Rickard Green) Date: Fri, 8 Aug 2014 16:08:49 +0200 Subject: [erlang-bugs] erlang vm crash/coredump In-Reply-To: References: Message-ID: On Thu, Jul 24, 2014 at 5:03 AM, o???o <53817681@REDACTED> wrote: > Jul 23 19:11:31 imtestserver kernel: [27024.541714] TCP: TCP: Possible SYN > flooding on port 5222. Sending cookies. Check SNMP counters. > Jul 23 19:12:14 imtestserver kernel: [27067.574868] beam.smp[7347]: segfault > at 20 ip 00000000005342d4 sp 00007f343baf5d00 error 4 in > beam.smp[400000+267000] > Jul 23 19:14:22 imtestserver kernel: [27196.203390] TCP: TCP: Possible SYN > flooding on port 5222. Sending cookies. Check SNMP counters. > Jul 23 19:15:01 imtestserver CRON[7894]: (root) CMD (command -v debian-sa1 > > /dev/null && debian-sa1 1 1) > Jul 23 19:19:32 imtestserver kernel: [27505.432769] beam.smp[7643]: segfault > at 20 ip 00000000005342d4 sp 00007f2d98f76d00 error 4 in > beam.smp[400000+267000] > > ? > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > On Mon, Jul 28, 2014 at 11:41 AM, o???o <53817681@REDACTED> wrote: > Dear, > > a bug? in erlang vm sourcecode , port_get_data_1 process arg is NULL > , I had fixed it,thanks. > > I guess that the above mail about port_get_data_1 sent to the "Weird behaviour of gen_tcp:send/2 and ..." thread belongs in this thread. Correct? A NULL argument to port_get_data_1 seems strange. Can you give more information on this? Regards, Rickard -- Rickard Green, Erlang/OTP, Ericsson AB From tuncer.ayaz@REDACTED Thu Aug 14 19:59:38 2014 From: tuncer.ayaz@REDACTED (Tuncer Ayaz) Date: Thu, 14 Aug 2014 19:59:38 +0200 Subject: [erlang-bugs] enif_make_int64 => -0 for INT64_MIN with gcc 4.9.1 In-Reply-To: <5263DD26-BFF3-429C-957A-24BE27669394@gmail.com> References: <53DF5BC7.7010002@erlang.org> <5263DD26-BFF3-429C-957A-24BE27669394@gmail.com> Message-ID: On Mon, Aug 4, 2014 at 12:28 PM, Anthony Ramine wrote: > There is also https://github.com/erlang/otp/pull/388 Thanks, Anthony. #388 and another fix from Lukas got merged today: https://github.com/erlang/otp/commit/a8cbf02 https://github.com/erlang/otp/commit/acf19fc https://github.com/erlang/otp/commit/9d3c229 Tomas, can you confirm that today's maint or master works for you? From tuncer.ayaz@REDACTED Thu Aug 14 20:06:05 2014 From: tuncer.ayaz@REDACTED (Tuncer Ayaz) Date: Thu, 14 Aug 2014 20:06:05 +0200 Subject: [erlang-bugs] enif_make_int64 => -0 for INT64_MIN with gcc 4.9.1 In-Reply-To: References: <53DF5BC7.7010002@erlang.org> <5263DD26-BFF3-429C-957A-24BE27669394@gmail.com> Message-ID: On Thu, Aug 14, 2014 at 7:59 PM, Tuncer Ayaz wrote: > On Mon, Aug 4, 2014 at 12:28 PM, Anthony Ramine wrote: >> There is also https://github.com/erlang/otp/pull/388 > > Thanks, Anthony. > > #388 and another fix from Lukas got merged today: > https://github.com/erlang/otp/commit/a8cbf02 > https://github.com/erlang/otp/commit/acf19fc > https://github.com/erlang/otp/commit/9d3c229 > > Tomas, can you confirm that today's maint or master works for you? Just started building maint and I've been seeing this: beam/bif.c:2824:5: runtime error: signed integer overflow: 429496729 * 10 cannot be represented in type 'int' From tomas.abrahamsson@REDACTED Thu Aug 14 23:28:15 2014 From: tomas.abrahamsson@REDACTED (Tomas Abrahamsson) Date: Thu, 14 Aug 2014 23:28:15 +0200 Subject: [erlang-bugs] enif_make_int64 => -0 for INT64_MIN with gcc 4.9.1 In-Reply-To: References: <53DF5BC7.7010002@erlang.org> <5263DD26-BFF3-429C-957A-24BE27669394@gmail.com> Message-ID: > Tomas, can you confirm that today's maint or master works for you? Confirming. The trouble I had initially seems to have been fixed. I've checked maint, 9de7cc7. BRs From lukas@REDACTED Fri Aug 15 10:28:02 2014 From: lukas@REDACTED (Lukas Larsson) Date: Fri, 15 Aug 2014 10:28:02 +0200 Subject: [erlang-bugs] enif_make_int64 => -0 for INT64_MIN with gcc 4.9.1 In-Reply-To: References: <53DF5BC7.7010002@erlang.org> <5263DD26-BFF3-429C-957A-24BE27669394@gmail.com> Message-ID: <53EDC492.6050809@erlang.org> On 14/08/14 20:06, Tuncer Ayaz wrote: > Just started building maint and I've been seeing this: > beam/bif.c:2824:5: runtime error: signed integer overflow: > 429496729 * 10 cannot be represented in type 'int' > Yeah I saw that one, but since we use n to determine if we are to use i, i will never be used while overflowed. Lukas From mikpelinux@REDACTED Fri Aug 15 10:42:26 2014 From: mikpelinux@REDACTED (Mikael Pettersson) Date: Fri, 15 Aug 2014 10:42:26 +0200 Subject: [erlang-bugs] enif_make_int64 => -0 for INT64_MIN with gcc 4.9.1 In-Reply-To: <53EDC492.6050809@erlang.org> References: <53DF5BC7.7010002@erlang.org> <5263DD26-BFF3-429C-957A-24BE27669394@gmail.com> <53EDC492.6050809@erlang.org> Message-ID: <21485.51186.785472.310413@gargle.gargle.HOWL> Lukas Larsson writes: > On 14/08/14 20:06, Tuncer Ayaz wrote: > > Just started building maint and I've been seeing this: > > beam/bif.c:2824:5: runtime error: signed integer overflow: > > 429496729 * 10 cannot be represented in type 'int' > > > Yeah I saw that one, but since we use n to determine if we are to use i, > i will never be used while overflowed. That's unfortunately the wrong way to think about signed overflow in C. It is _never_ about whether the computed value is used or not; the mere fact that overflow occurred causes undefined behaviour (UB). Furthermore, the worst consequence of UB is not that the HW does something wrong, it's that the compiler(s) can validly assume that UB doesn't occur, and transform the code accordingly. And GCC _will_ do that unless you tell it not to. For this particular spot, the patch I'm using is: --- otp_src_17.1/erts/emulator/beam/bif.c.~1~ 2014-06-23 21:10:57.000000000 +0200 +++ otp_src_17.1/erts/emulator/beam/bif.c 2014-07-24 16:29:42.395034763 +0200 @@ -2767,6 +2767,7 @@ BIF_RETTYPE integer_to_list_1(BIF_ALIST_ static int do_list_to_integer(Process *p, Eterm orig_list, Eterm *integer, Eterm *rest) { + Uint ufixval = 0; /* preliminary value, if it fits in a fixnum */ Sint i = 0; int skip = 0; int neg = 0; @@ -2821,8 +2822,8 @@ static int do_list_to_integer(Process *p unsigned_val(CAR(list_val(lst))) > '9') { break; } - i = i * 10; - i = i + unsigned_val(CAR(list_val(lst))) - '0'; + ufixval *= 10; + ufixval += unsigned_val(CAR(list_val(lst))) - '0'; n++; lst = CDR(list_val(lst)); if (is_nil(lst)) { @@ -2846,8 +2847,8 @@ static int do_list_to_integer(Process *p */ if (n <= SMALL_DIGITS) { /* It must be small */ - if (neg) i = -i; - res = make_small(i); + if (neg) ufixval = -ufixval; + res = make_small((Sint)ufixval); } else { lg2 = (n+1)*230/69+1; m = (lg2+D_EXP-1)/D_EXP; /* number of digits */ /Mikael From lukas@REDACTED Fri Aug 15 11:46:52 2014 From: lukas@REDACTED (Lukas Larsson) Date: Fri, 15 Aug 2014 11:46:52 +0200 Subject: [erlang-bugs] enif_make_int64 => -0 for INT64_MIN with gcc 4.9.1 In-Reply-To: <21485.51186.785472.310413@gargle.gargle.HOWL> References: <53DF5BC7.7010002@erlang.org> <5263DD26-BFF3-429C-957A-24BE27669394@gmail.com> <53EDC492.6050809@erlang.org> <21485.51186.785472.310413@gargle.gargle.HOWL> Message-ID: <53EDD70C.4010406@erlang.org> On 15/08/14 10:42, Mikael Pettersson wrote: > Lukas Larsson writes: > > On 14/08/14 20:06, Tuncer Ayaz wrote: > > > Just started building maint and I've been seeing this: > > > beam/bif.c:2824:5: runtime error: signed integer overflow: > > > 429496729 * 10 cannot be represented in type 'int' > > > > > Yeah I saw that one, but since we use n to determine if we are to use i, > > i will never be used while overflowed. > > That's unfortunately the wrong way to think about signed overflow in C. > It is _never_ about whether the computed value is used or not; the mere > fact that overflow occurred causes undefined behaviour (UB). Furthermore, > the worst consequence of UB is not that the HW does something wrong, it's > that the compiler(s) can validly assume that UB doesn't occur, and transform > the code accordingly. And GCC _will_ do that unless you tell it not to. Thanks for pointing this out! It would appear that when I hit UB, deamons might fly out of my nose. I'll fix the other UBs I've found as well. Lukas From jesper.louis.andersen@REDACTED Mon Aug 18 14:36:01 2014 From: jesper.louis.andersen@REDACTED (Jesper Louis Andersen) Date: Mon, 18 Aug 2014 14:36:01 +0200 Subject: [erlang-bugs] Problem with binary_to_integer/1 and list_to_integer/1 (QuickCheck test case) Message-ID: Hi, While working on transit-erlang, Isaiah Peng and I found what we believe to be a bug in the Erlang compiler: 9> A = -576460752303423488. -576460752303423488 10> B = binary_to_integer(integer_to_binary(A)). -576460752303423488 11> A == B. false 12> A. -576460752303423488 13> B. -576460752303423488 14> I expected command 11 (A == B) to return true, as the numbers are the same. But it looks like constants are not treated the same way as converted vaues for some reason and the equality test fails. This fails in the interpreter and in compiled code. It *also* fails with list_to_integer/1 and integer_to_list/1. The number is not chosen arbitrarily. It is -1 * 2^59 which is a borderline number on a 64bit machine. (OTP release 17.1). Isaiah notes that these borderline numbers are not caught by the OTP test cases. They probably should be. In the interest of full exploration, I've written a QuickCheck test case to catch the remaining trouble. It explicitly tests the borderline numbers and only finds this error. https://gist.github.com/jlouis/52b68d9d4150af3bd00c -module(integer_coding). -compile(export_all). -include_lib("eqc/include/eqc.hrl"). power(_N, 0) -> 1; power(N, P) -> N * power(N, P-1). perturb() -> elements([0, 1, -1, 2, -2, 3, -3, 4, -4, 5, -5]). sign() -> elements([1, -1]). nat_power() -> frequency([{1, elements([27, 28, 29, 31, 32, 33, 59, 60, 61, 63, 64, 65])}, {1, nat()}]). interesting_int() -> ?LET({K, Sign, Perturb}, {nat_power(), sign(), perturb()}, power(2, K)*Sign + Perturb). prop_binary_iso() -> ?FORALL(K, interesting_int(), begin I = binary_to_integer(integer_to_binary(K)), I == K end). prop_list_iso() -> ?FORALL(K, interesting_int(), begin I = list_to_integer(integer_to_list(K)), I == K end). all() -> eqc:module({numtests, 3000}, ?MODULE). t() -> eqc:quickcheck(eqc:testing_time(300, prop_binary_iso())). [...] Produces: 8> integer_coding:all(). prop_list_iso: .........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Failed! After 1498 tests. -576460752303423488 prop_binary_iso: ...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Failed! After 588 tests. -576460752303423488 [prop_list_iso,prop_binary_iso] 9> -- J. -------------- next part -------------- An HTML attachment was scrubbed... URL: From lukas@REDACTED Mon Aug 18 14:44:08 2014 From: lukas@REDACTED (Lukas Larsson) Date: Mon, 18 Aug 2014 14:44:08 +0200 Subject: [erlang-bugs] Problem with binary_to_integer/1 and list_to_integer/1 (QuickCheck test case) In-Reply-To: References: Message-ID: <53F1F518.3060209@erlang.org> Hello Jesper, There seems to be a bug in the generic c-code for binary/list_to_integer for that specific value where a bignum is returned when a small should have been returned. If no-one feels like submitting a patch to fix it, I'll take a look later this week. Thanks for the bug report! Lukas On 18/08/14 14:36, Jesper Louis Andersen wrote: > Hi, > > While working on transit-erlang, Isaiah Peng and I found what we > believe to be a bug in the Erlang compiler: > > 9> A = -576460752303423488. > -576460752303423488 > 10> B = binary_to_integer(integer_to_binary(A)). > -576460752303423488 > 11> A == B. > false > 12> A. > -576460752303423488 > 13> B. > -576460752303423488 > 14> > > I expected command 11 (A == B) to return true, as the numbers are the > same. But it looks like constants are not treated the same way as > converted vaues for some reason and the equality test fails. > > This fails in the interpreter and in compiled code. It *also* fails > with list_to_integer/1 and integer_to_list/1. The number is not > chosen arbitrarily. It is -1 * 2^59 which is a borderline number on a > 64bit machine. (OTP release 17.1). Isaiah notes that these borderline > numbers are not caught by the OTP test cases. They probably should be. > > In the interest of full exploration, I've written a QuickCheck test > case to catch the remaining trouble. It explicitly tests the > borderline numbers and only finds this error. > > https://gist.github.com/jlouis/52b68d9d4150af3bd00c > > -module(integer_coding). > > -compile(export_all). > > -include_lib("eqc/include/eqc.hrl"). > > power(_N, 0) -> 1; > power(N, P) -> N * power(N, P-1). > > perturb() -> > elements([0, 1, -1, 2, -2, 3, -3, 4, -4, 5, -5]). > > sign() -> > elements([1, -1]). > > nat_power() -> > frequency([{1, elements([27, 28, 29, 31, 32, 33, 59, 60, 61, 63, 64, 65])}, > {1, nat()}]). > > interesting_int() -> > ?LET({K, Sign, Perturb}, {nat_power(), sign(), perturb()}, > power(2, K)*Sign + Perturb). > > prop_binary_iso() -> > ?FORALL(K, interesting_int(), > begin > I = binary_to_integer(integer_to_binary(K)), > I == K > end). > > prop_list_iso() -> > ?FORALL(K, interesting_int(), > begin > I = list_to_integer(integer_to_list(K)), > I == K > end). > > all() -> > eqc:module({numtests, 3000}, ?MODULE). > > t() -> > eqc:quickcheck(eqc:testing_time(300, prop_binary_iso())). > > > [...] > > Produces: > > 8> integer_coding:all(). > prop_list_iso: .........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Failed! After 1498 tests. > -576460752303423488 > prop_binary_iso: ...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Failed! After 588 tests. > -576460752303423488 > [prop_list_iso,prop_binary_iso] > 9> > > -- > J. > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikpelinux@REDACTED Mon Aug 18 14:58:08 2014 From: mikpelinux@REDACTED (Mikael Pettersson) Date: Mon, 18 Aug 2014 14:58:08 +0200 Subject: [erlang-bugs] Problem with binary_to_integer/1 and list_to_integer/1 (QuickCheck test case) In-Reply-To: <53F1F518.3060209@erlang.org> References: <53F1F518.3060209@erlang.org> Message-ID: <21489.63584.121471.876869@gargle.gargle.HOWL> Lukas Larsson writes: > Hello Jesper, > > There seems to be a bug in the generic c-code for binary/list_to_integer > for that specific value where a bignum is returned when a small should > have been returned. If no-one feels like submitting a patch to fix it, > I'll take a look later this week. > > Thanks for the bug report! Using hipe_bifs:show_term/1 it's easy to determine why ==/2 fails: 1> A = -576460752303423488. -576460752303423488 2> B = binary_to_integer(integer_to_binary(A)). -576460752303423488 3> A == B. false 4> hipe_bifs:show_term(A). 0x800000000000000f -576460752303423488 true 5> hipe_bifs:show_term(B). 0x00007f4f8d4ae5fa 0x00007f4f8d4ae5f8: 0x000000000000004c 0x00007f4f8d4ae600: 0x0800000000000000 -576460752303423488 true That is, A is a fixnum but B became a bignum. That's not allowed in the VM: every integer that fits in a fixnum MUST be a fixnum. The bug also exists in R16B03-1. R15 doesn't seem to have integer_to_binary/1. > > Lukas > On 18/08/14 14:36, Jesper Louis Andersen wrote: > > Hi, > > > > While working on transit-erlang, Isaiah Peng and I found what we > > believe to be a bug in the Erlang compiler: > > > > 9> A = -576460752303423488. > > -576460752303423488 > > 10> B = binary_to_integer(integer_to_binary(A)). > > -576460752303423488 > > 11> A == B. > > false > > 12> A. > > -576460752303423488 > > 13> B. > > -576460752303423488 > > 14> > > > > I expected command 11 (A == B) to return true, as the numbers are the > > same. But it looks like constants are not treated the same way as > > converted vaues for some reason and the equality test fails. > > > > This fails in the interpreter and in compiled code. It *also* fails > > with list_to_integer/1 and integer_to_list/1. The number is not > > chosen arbitrarily. It is -1 * 2^59 which is a borderline number on a > > 64bit machine. (OTP release 17.1). Isaiah notes that these borderline > > numbers are not caught by the OTP test cases. They probably should be. > > > > In the interest of full exploration, I've written a QuickCheck test > > case to catch the remaining trouble. It explicitly tests the > > borderline numbers and only finds this error. > > > > https://gist.github.com/jlouis/52b68d9d4150af3bd00c > > > > -module(integer_coding). > > > > -compile(export_all). > > > > -include_lib("eqc/include/eqc.hrl"). > > > > power(_N, 0) -> 1; > > power(N, P) -> N * power(N, P-1). > > > > perturb() -> > > elements([0, 1, -1, 2, -2, 3, -3, 4, -4, 5, -5]). > > > > sign() -> > > elements([1, -1]). > > > > nat_power() -> > > frequency([{1, elements([27, 28, 29, 31, 32, 33, 59, 60, 61, 63, 64, 65])}, > > {1, nat()}]). > > > > interesting_int() -> > > ?LET({K, Sign, Perturb}, {nat_power(), sign(), perturb()}, > > power(2, K)*Sign + Perturb). > > > > prop_binary_iso() -> > > ?FORALL(K, interesting_int(), > > begin > > I = binary_to_integer(integer_to_binary(K)), > > I == K > > end). > > > > prop_list_iso() -> > > ?FORALL(K, interesting_int(), > > begin > > I = list_to_integer(integer_to_list(K)), > > I == K > > end). > > > > all() -> > > eqc:module({numtests, 3000}, ?MODULE). > > > > t() -> > > eqc:quickcheck(eqc:testing_time(300, prop_binary_iso())). > > > > > > [...] > > > > Produces: > > > > 8> integer_coding:all(). > > prop_list_iso: .........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Failed! After 1498 tests. > > -576460752303423488 > > prop_binary_iso: ...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Failed! After 588 tests. > > -576460752303423488 > > [prop_list_iso,prop_binary_iso] > > 9> > > > > -- > > J. > > > > > > _______________________________________________ > > erlang-bugs mailing list > > erlang-bugs@REDACTED > > http://erlang.org/mailman/listinfo/erlang-bugs > > > ---------------------------------------------------------------------- > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs -- From mikpelinux@REDACTED Tue Aug 19 11:30:40 2014 From: mikpelinux@REDACTED (Mikael Pettersson) Date: Tue, 19 Aug 2014 11:30:40 +0200 Subject: [erlang-bugs] Problem with binary_to_integer/1 and list_to_integer/1 (QuickCheck test case) In-Reply-To: <53F1F518.3060209@erlang.org> References: <53F1F518.3060209@erlang.org> Message-ID: <21491.6464.400195.722016@gargle.gargle.HOWL> Lukas Larsson writes: > Hello Jesper, > > There seems to be a bug in the generic c-code for binary/list_to_integer > for that specific value where a bignum is returned when a small should > have been returned. If no-one feels like submitting a patch to fix it, > I'll take a look later this week. The bug in binary_to_integer is because of a (classic) algorithm mistake in big.c:erts_chars_to_integer. That code converts in unsigned, keeping the representation as fixnum as long as possible, and only at the very last step negates if the chars started with '-'. However, for the largest permitted negative fixnum you'll have, in the last step, a positive number that's just beyond the range for fixnums, so it is represented as a bignum. The negation step just flips the sign bit, without checking if the negated value now should be a fixnum. I haven't looked at the list_to_integer case yet, but suspect it's similar. /Mikael > > Thanks for the bug report! > > Lukas > On 18/08/14 14:36, Jesper Louis Andersen wrote: > > Hi, > > > > While working on transit-erlang, Isaiah Peng and I found what we > > believe to be a bug in the Erlang compiler: > > > > 9> A = -576460752303423488. > > -576460752303423488 > > 10> B = binary_to_integer(integer_to_binary(A)). > > -576460752303423488 > > 11> A == B. > > false > > 12> A. > > -576460752303423488 > > 13> B. > > -576460752303423488 > > 14> > > > > I expected command 11 (A == B) to return true, as the numbers are the > > same. But it looks like constants are not treated the same way as > > converted vaues for some reason and the equality test fails. > > > > This fails in the interpreter and in compiled code. It *also* fails > > with list_to_integer/1 and integer_to_list/1. The number is not > > chosen arbitrarily. It is -1 * 2^59 which is a borderline number on a > > 64bit machine. (OTP release 17.1). Isaiah notes that these borderline > > numbers are not caught by the OTP test cases. They probably should be. > > > > In the interest of full exploration, I've written a QuickCheck test > > case to catch the remaining trouble. It explicitly tests the > > borderline numbers and only finds this error. > > > > https://gist.github.com/jlouis/52b68d9d4150af3bd00c > > > > -module(integer_coding). > > > > -compile(export_all). > > > > -include_lib("eqc/include/eqc.hrl"). > > > > power(_N, 0) -> 1; > > power(N, P) -> N * power(N, P-1). > > > > perturb() -> > > elements([0, 1, -1, 2, -2, 3, -3, 4, -4, 5, -5]). > > > > sign() -> > > elements([1, -1]). > > > > nat_power() -> > > frequency([{1, elements([27, 28, 29, 31, 32, 33, 59, 60, 61, 63, 64, 65])}, > > {1, nat()}]). > > > > interesting_int() -> > > ?LET({K, Sign, Perturb}, {nat_power(), sign(), perturb()}, > > power(2, K)*Sign + Perturb). > > > > prop_binary_iso() -> > > ?FORALL(K, interesting_int(), > > begin > > I = binary_to_integer(integer_to_binary(K)), > > I == K > > end). > > > > prop_list_iso() -> > > ?FORALL(K, interesting_int(), > > begin > > I = list_to_integer(integer_to_list(K)), > > I == K > > end). > > > > all() -> > > eqc:module({numtests, 3000}, ?MODULE). > > > > t() -> > > eqc:quickcheck(eqc:testing_time(300, prop_binary_iso())). > > > > > > [...] > > > > Produces: > > > > 8> integer_coding:all(). > > prop_list_iso: .........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Failed! After 1498 tests. > > -576460752303423488 > > prop_binary_iso: ...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Failed! After 588 tests. > > -576460752303423488 > > [prop_list_iso,prop_binary_iso] > > 9> > > > > -- > > J. > > > > > > _______________________________________________ > > erlang-bugs mailing list > > erlang-bugs@REDACTED > > http://erlang.org/mailman/listinfo/erlang-bugs > > > ---------------------------------------------------------------------- > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs -- From mikpelinux@REDACTED Tue Aug 19 12:37:11 2014 From: mikpelinux@REDACTED (Mikael Pettersson) Date: Tue, 19 Aug 2014 12:37:11 +0200 Subject: [erlang-bugs] Problem with binary_to_integer/1 and list_to_integer/1 (QuickCheck test case) In-Reply-To: <53F1F518.3060209@erlang.org> References: <53F1F518.3060209@erlang.org> Message-ID: <21491.10455.469391.667147@gargle.gargle.HOWL> Lukas Larsson writes: > Hello Jesper, > > There seems to be a bug in the generic c-code for binary/list_to_integer > for that specific value where a bignum is returned when a small should > have been returned. If no-one feels like submitting a patch to fix it, > I'll take a look later this week. I have a patch for 17.1 which I intend to submit later today or tomorrow. It fixes both binary_to_integer and list_to_integer. /Mikael > > Thanks for the bug report! > > Lukas > On 18/08/14 14:36, Jesper Louis Andersen wrote: > > Hi, > > > > While working on transit-erlang, Isaiah Peng and I found what we > > believe to be a bug in the Erlang compiler: > > > > 9> A = -576460752303423488. > > -576460752303423488 > > 10> B = binary_to_integer(integer_to_binary(A)). > > -576460752303423488 > > 11> A == B. > > false > > 12> A. > > -576460752303423488 > > 13> B. > > -576460752303423488 > > 14> > > > > I expected command 11 (A == B) to return true, as the numbers are the > > same. But it looks like constants are not treated the same way as > > converted vaues for some reason and the equality test fails. > > > > This fails in the interpreter and in compiled code. It *also* fails > > with list_to_integer/1 and integer_to_list/1. The number is not > > chosen arbitrarily. It is -1 * 2^59 which is a borderline number on a > > 64bit machine. (OTP release 17.1). Isaiah notes that these borderline > > numbers are not caught by the OTP test cases. They probably should be. > > > > In the interest of full exploration, I've written a QuickCheck test > > case to catch the remaining trouble. It explicitly tests the > > borderline numbers and only finds this error. > > > > https://gist.github.com/jlouis/52b68d9d4150af3bd00c > > > > -module(integer_coding). > > > > -compile(export_all). > > > > -include_lib("eqc/include/eqc.hrl"). > > > > power(_N, 0) -> 1; > > power(N, P) -> N * power(N, P-1). > > > > perturb() -> > > elements([0, 1, -1, 2, -2, 3, -3, 4, -4, 5, -5]). > > > > sign() -> > > elements([1, -1]). > > > > nat_power() -> > > frequency([{1, elements([27, 28, 29, 31, 32, 33, 59, 60, 61, 63, 64, 65])}, > > {1, nat()}]). > > > > interesting_int() -> > > ?LET({K, Sign, Perturb}, {nat_power(), sign(), perturb()}, > > power(2, K)*Sign + Perturb). > > > > prop_binary_iso() -> > > ?FORALL(K, interesting_int(), > > begin > > I = binary_to_integer(integer_to_binary(K)), > > I == K > > end). > > > > prop_list_iso() -> > > ?FORALL(K, interesting_int(), > > begin > > I = list_to_integer(integer_to_list(K)), > > I == K > > end). > > > > all() -> > > eqc:module({numtests, 3000}, ?MODULE). > > > > t() -> > > eqc:quickcheck(eqc:testing_time(300, prop_binary_iso())). > > > > > > [...] > > > > Produces: > > > > 8> integer_coding:all(). > > prop_list_iso: .........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Failed! After 1498 tests. > > -576460752303423488 > > prop_binary_iso: ...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Failed! After 588 tests. > > -576460752303423488 > > [prop_list_iso,prop_binary_iso] > > 9> > > > > -- > > J. > > > > > > _______________________________________________ > > erlang-bugs mailing list > > erlang-bugs@REDACTED > > http://erlang.org/mailman/listinfo/erlang-bugs > > > ---------------------------------------------------------------------- > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs -- From carlsson.richard@REDACTED Fri Aug 22 14:22:35 2014 From: carlsson.richard@REDACTED (Richard Carlsson) Date: Fri, 22 Aug 2014 14:22:35 +0200 Subject: [erlang-bugs] error in documentation of erl +swct flag Message-ID: The erl documentation (http://www.erlang.org/doc/man/erl.html) currently says: *+sws very_eager|eager|medium|lazy|very_lazy* Set scheduler wake cleanup threshold. Default is medium. This flag controls how eager schedulers should be requesting wake up due to certain cleanup operations. When a lazy setting is used, more outstanding cleanup operations can be left undone while a scheduler is idling. When an eager setting is used, schedulers will more frequently be woken, potentially increasing CPU-utilization. *NOTE:* This flag may be removed or changed at any time without prior notice. *+sws default|legacy* Set scheduler wakeup strategy. Default strategy changed in erts-5.10/OTP-R16A. This strategy was previously known as proposal in OTP-R15. The legacy strategy was used as default from R13 up to and including R15. *NOTE:* This flag may be removed or changed at any time without prior notice. I think that the first of these should be +swct (which was first mentioned in the R16B01 release notes). /Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From rickard@REDACTED Fri Aug 22 16:12:21 2014 From: rickard@REDACTED (Rickard Green) Date: Fri, 22 Aug 2014 16:12:21 +0200 Subject: [erlang-bugs] error in documentation of erl +swct flag In-Reply-To: References: Message-ID: On Fri, Aug 22, 2014 at 2:22 PM, Richard Carlsson wrote: > The erl documentation (http://www.erlang.org/doc/man/erl.html) currently > says: > +sws very_eager|eager|medium|lazy|very_lazy > > Set scheduler wake cleanup threshold. Default is medium. This flag controls > how eager schedulers should be requesting wake up due to certain cleanup > operations. When a lazy setting is used, more outstanding cleanup operations > can be left undone while a scheduler is idling. When an eager setting is > used, schedulers will more frequently be woken, potentially increasing > CPU-utilization. > > NOTE: This flag may be removed or changed at any time without prior notice. > > +sws default|legacy > > Set scheduler wakeup strategy. Default strategy changed in > erts-5.10/OTP-R16A. This strategy was previously known as proposal in > OTP-R15. The legacy strategy was used as default from R13 up to and > including R15. > > NOTE: This flag may be removed or changed at any time without prior notice. > > > I think that the first of these should be +swct (which was first mentioned > in the R16B01 release notes). > > /Richard > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > Thanks for the bug report. I just happened to notice this last week too, so there is already a fix for it in the maint and master branches . Regards, Rickard -- Rickard Green, Erlang/OTP, Ericsson AB