From mikpelinux@REDACTED Mon Dec 1 10:42:50 2014 From: mikpelinux@REDACTED (Mikael Pettersson) Date: Mon, 1 Dec 2014 10:42:50 +0100 Subject: [erlang-bugs] Minor issue: dialyzer on ARM: Compiling some key modules to native code => Illegal instruction (core dumped) In-Reply-To: References: Message-ID: <21628.14362.31412.369346@gargle.gargle.HOWL> Mattias Waldau writes: > Hi, > > I found the --no_native flag, so dialyzer is working nicely. I just > wanted to report this. > > I made build from src. Downloaded zip from github today (2014-11-27). I > have run the test suite, no complaints. Following up to the mailing list. Background: ARM processors typically support two related but different instruction sets and execution modes: ARM (the ordinary one) and Thumb (an alternative one which offers higher code density at the expense of lower performance). With ARMv7, the Thumb mode has been improved and some environments make it the default. The compiler and linker allow ARM and Thumb code to coexist in a process by tagging code with its mode, detecting calls between modes, and using special instructions which allow the processor to switch mode at procedure calls and returns. HiPE on ARM generates ARM code, and the runtime support is also ARM. Debugging on Mattias' system showed that 1. His C compiler defaults to generating Thumb code, not ARM. Therefore, his BEAM runs in Thumb mode, except when in HiPE code. 2. The crash occurs because the thread is executing HiPE code (an assembly-coded BIF wrapper), which is ARM, but the thread state specifies that it is in Thumb mode. The processor sees instruction encodings it doesn't recognize and faults. I'm assuming there's an incorrect mode-switch between HiPE (ARM) and BEAM (Thumb) somewhere, but at the moment I can't say where. It's possible to work around this problem by forcing the VM to be compiled in ARM mode (by adding "-marm" to CFLAGS). I prepared a patch to do that, and it fixed the problem on Mattias' system. Since changing the compilation mode may or may not be what the user or system builder intended, I'm reluctant to do this silently. Therefore I'm considering removing the auto-enable of HiPE on ARM. Users will have to explicitly enable HiPE, and accept that BEAM will be in ARM mode not Thumb. It might be possible to detect if the C compiler defaults to Thumb and error out if HiPE is force-enabled, and disable HiPE otherwise. But I'm not entirely happy with this approach. Comments? /Mikael From kostis@REDACTED Mon Dec 1 14:58:58 2014 From: kostis@REDACTED (Kostis Sagonas) Date: Mon, 01 Dec 2014 14:58:58 +0100 Subject: [erlang-bugs] Minor issue: dialyzer on ARM: Compiling some key modules to native code => Illegal instruction (core dumped) In-Reply-To: <21628.14362.31412.369346@gargle.gargle.HOWL> References: <21628.14362.31412.369346@gargle.gargle.HOWL> Message-ID: <547C7422.7000800@cs.ntua.gr> On 12/01/2014 10:42 AM, Mikael Pettersson wrote: > Background: ARM processors typically support two related but different > instruction sets and execution modes: ARM (the ordinary one) and Thumb > (an alternative one which offers higher code density at the expense of > lower performance). With ARMv7, the Thumb mode has been improved and > some environments make it the default. > ... > It's possible to work around this problem by forcing the VM to be > compiled in ARM mode (by adding "-marm" to CFLAGS). I prepared a > patch to do that, and it fixed the problem on Mattias' system. > > Since changing the compilation mode may or may not be what the > user or system builder intended, I'm reluctant to do this silently. > Therefore I'm considering removing the auto-enable of HiPE on ARM. > Users will have to explicitly enable HiPE, and accept that BEAM will > be in ARM mode not Thumb. I do not have a strong opinion but, from what you are describing and from some googling on ARM vs thumb on the internet, it seems to me that adding -marm to the CFLAGS will result in BEAM itself executing faster (or at least not slower than with thumb), and the main advantage of the Thumb mode is in better memory (i-cache?) utilization. I am not sure why some (recent?) C compilers make Thumb the default, but I guess it's because the ARM processors are often used in embedded applications and they primarily want to optimize for code size rather than speed. Not sure this is the primary goal of Erlang developers who download and install Erlang/OTP and e.g. want to run dialyzer (as Mattias) on that machine. So my current vote would go to add the -marm option anyway when building BEAM, and leave HiPE enabled by default on that platform too. Erlang developers who are primarily interested in the memory benefits of the Thumb mode can explicitly enable it with an appropriate flag which would also disable HiPE then, at least until the following is fixed. It would be nice to investigate where the incorrect mode switch between HiPE and BEAM is and eliminate this if it's not too much work. Kostis From ryan.havvy@REDACTED Mon Dec 1 23:19:01 2014 From: ryan.havvy@REDACTED (Ryan Scheel) Date: Mon, 1 Dec 2014 14:19:01 -0800 Subject: [erlang-bugs] BREAK 'A' is not documented in BREAK prompt. Message-ID: ``` [havvy@REDACTED:~/wiki/project]$ erl Erlang/OTP 17 [erts-6.2] [source] [64-bit] [async-threads:10] [hipe] [kernel-poll:false] Eshell V6.2 (abort with ^G) 1> BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded (v)ersion (k)ill (D)b-tables (d)istribution A Crash dump was written to: erl_crash.dump Crash dump requested by userAborted ``` The crash is intended as per the C file, but I could not find any documentation on this behavior outside of code, and it should definitely be listed in the BREAK prompt. It's confusing that a typo (I hit capslock before hitting 'a' by mistake) would cause a crash, and if it wasn't for asking in IRC, I'd still think this was an implementation bug instead of the documentation bug that it is. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rvirding@REDACTED Thu Dec 4 23:13:27 2014 From: rvirding@REDACTED (Robert Virding) Date: Thu, 4 Dec 2014 23:13:27 +0100 Subject: [erlang-bugs] Minor issue: dialyzer on ARM: Compiling some key modules to native code => Illegal instruction (core dumped) In-Reply-To: <547C7422.7000800@cs.ntua.gr> References: <21628.14362.31412.369346@gargle.gargle.HOWL> <547C7422.7000800@cs.ntua.gr> Message-ID: Wouldn't a generally better solution be to make the no_native the default and instead have to explicitly turn it on with a --native flag? I have found I unless I am checking very many files it usually goes faster to turn it off, at least when looking at the total time. Robert On 1 December 2014 at 14:58, Kostis Sagonas wrote: > On 12/01/2014 10:42 AM, Mikael Pettersson wrote: > >> Background: ARM processors typically support two related but different >> instruction sets and execution modes: ARM (the ordinary one) and Thumb >> (an alternative one which offers higher code density at the expense of >> lower performance). With ARMv7, the Thumb mode has been improved and >> some environments make it the default. >> ... >> It's possible to work around this problem by forcing the VM to be >> compiled in ARM mode (by adding "-marm" to CFLAGS). I prepared a >> patch to do that, and it fixed the problem on Mattias' system. >> >> Since changing the compilation mode may or may not be what the >> user or system builder intended, I'm reluctant to do this silently. >> Therefore I'm considering removing the auto-enable of HiPE on ARM. >> Users will have to explicitly enable HiPE, and accept that BEAM will >> be in ARM mode not Thumb. >> > > I do not have a strong opinion but, from what you are describing and from > some googling on ARM vs thumb on the internet, it seems to me that adding > -marm to the CFLAGS will result in BEAM itself executing faster (or at > least not slower than with thumb), and the main advantage of the Thumb mode > is in better memory (i-cache?) utilization. I am not sure why some > (recent?) C compilers make Thumb the default, but I guess it's because the > ARM processors are often used in embedded applications and they primarily > want to optimize for code size rather than speed. Not sure this is the > primary goal of Erlang developers who download and install Erlang/OTP and > e.g. want to run dialyzer (as Mattias) on that machine. > > So my current vote would go to add the -marm option anyway when building > BEAM, and leave HiPE enabled by default on that platform too. Erlang > developers who are primarily interested in the memory benefits of the Thumb > mode can explicitly enable it with an appropriate flag which would also > disable HiPE then, at least until the following is fixed. > > It would be nice to investigate where the incorrect mode switch between > HiPE and BEAM is and eliminate this if it's not too much work. > > Kostis > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kostis@REDACTED Thu Dec 4 23:17:13 2014 From: kostis@REDACTED (Kostis Sagonas) Date: Fri, 05 Dec 2014 00:17:13 +0200 Subject: [erlang-bugs] Minor issue: dialyzer on ARM: Compiling some key modules to native code => Illegal instruction (core dumped) In-Reply-To: References: <21628.14362.31412.369346@gargle.gargle.HOWL> <547C7422.7000800@cs.ntua.gr> Message-ID: <5480DD69.6040304@cs.ntua.gr> On 12/05/2014 12:13 AM, Robert Virding wrote: > Wouldn't a generally better solution be to make the no_native the > default and instead have to explicitly turn it on with a --native flag? > I have found I unless I am checking very many files it usually goes > faster to turn it off, at least when looking at the total time. The automatic compilation of Dialyzer and erl*types files to native code is triggered only if you analyze many files (currently many >= 20). Look into lib/dialyzer/src/dialyzer_cl.erl Kostis From alex@REDACTED Fri Dec 5 09:00:07 2014 From: alex@REDACTED (Alex Wilson) Date: Fri, 05 Dec 2014 18:00:07 +1000 Subject: [erlang-bugs] High latency and CPU usage on *BSD (pull req #528) Message-ID: <54816607.8080601@cooperi.net> Hi all, I submitted a pull request a bit over a month ago (#528) to fix issues around high latency and CPU usage, especially on OpenBSD (but also a lot of the rest of the BSD family) due to os_mon forking and shelling out to run "uptime" and "ps". Is it possible I could get some feedback on this? Is there a reason it's been sitting untouched? Without the patch, Riak is borderline unuseable on OpenBSD 5.6, as are a few other libraries and apps, so I was hoping it might get a little more attention. Sorry to nag! I'd much rather have the issue fixed upstream than have to put local patches into packaging on 4 platforms separately with the next release... -Alex From egil@REDACTED Fri Dec 5 17:38:32 2014 From: egil@REDACTED (=?windows-1252?Q?Bj=F6rn-Egil_Dahlberg?=) Date: Fri, 5 Dec 2014 17:38:32 +0100 Subject: [erlang-bugs] High latency and CPU usage on *BSD (pull req #528) In-Reply-To: <54816607.8080601@cooperi.net> References: <54816607.8080601@cooperi.net> Message-ID: <5481DF88.7060103@erlang.org> It's in our backlog. It won't be in 17.4. Once reviewed and passed the review stage it will probably be in 17.5. (or master depending on how much it changes things). // Bj?rn-Egil On 2014-12-05 09:00, Alex Wilson wrote: > Hi all, > > I submitted a pull request a bit over a month ago (#528) to fix issues > around high latency and CPU usage, especially on OpenBSD (but also a > lot of the rest of the BSD family) due to os_mon forking and shelling > out to run "uptime" and "ps". > > Is it possible I could get some feedback on this? Is there a reason > it's been sitting untouched? Without the patch, Riak is borderline > unuseable on OpenBSD 5.6, as are a few other libraries and apps, so I > was hoping it might get a little more attention. > > Sorry to nag! I'd much rather have the issue fixed upstream than have > to put local patches into packaging on 4 platforms separately with the > next release... > > -Alex > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > > From sdl.web@REDACTED Sun Dec 7 03:18:33 2014 From: sdl.web@REDACTED (Leo Liu) Date: Sun, 07 Dec 2014 10:18:33 +0800 Subject: [erlang-bugs] wx-config on centos7 Message-ID: <87mw702oee.fsf@gmail.com> wx-config belongs to the legacy gtk2-based package wxGTK-devel 2.8.12 and usually not installed. wxGTK3-devel provides wx-config-3.0. Could someone fix the wx application to build correctly on centos7. Thanks. Leo From mattias.waldau@REDACTED Sun Dec 7 18:49:06 2014 From: mattias.waldau@REDACTED (Mattias Waldau) Date: Sun, 7 Dec 2014 18:49:06 +0100 Subject: [erlang-bugs] Minor issue: dialyzer on ARM: Compiling some key modules to native code => Illegal instruction (core dumped) In-Reply-To: References: <21628.14362.31412.369346@gargle.gargle.HOWL> <547C7422.7000800@cs.ntua.gr> Message-ID: Hi Robert, I am not really sure why it is a good idea to make no_native the default for ARM? Is this only for dialyzer? ARM will become more common, and I must say that the performance of this Samsung OctaCore make me believe in ARM on servers, i.e. a typical place for Erlang. Thus, since we have a working native compiler for ARM, why not used it? /mattias On 04/12/2014 23:13, Robert Virding wrote: > Wouldn't a generally better solution be to make the no_native the > default and instead have to explicitly turn it on with a --native > flag? I have found I unless I am checking very many files it usually > goes faster to turn it off, at least when looking at the total time. > > Robert > > From yurinvv@REDACTED Wed Dec 10 08:03:45 2014 From: yurinvv@REDACTED (Slava Yurin) Date: Wed, 10 Dec 2014 13:03:45 +0600 Subject: [erlang-bugs] gen_tcp:send and file:sendfile Message-ID: <720381418195025@web2m.yandex.ru> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: send_file_test.erl Type: application/octet-stream Size: 1226 bytes Desc: not available URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: shell1.log URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: shell2.log URL: From me@REDACTED Wed Dec 10 17:58:40 2014 From: me@REDACTED (Vladislav Titov) Date: Wed, 10 Dec 2014 16:58:40 +0000 Subject: [erlang-bugs] Mnesia:add_table_copy causes crashes when no replicas available Message-ID: Hi This is on a fully in-RAM mnesia cluster. When performing mnesia:add_table_copy(tab, node(), ram_copies) when no active replicas of tab are available, it replies back correctly with {aborted, {system_limit, tab, ...}}. However, looking at the mnesia_gvar afterwards, the {schema, local_tables} key lists the table in it. This (?) then causes mnesia to shut down when a node that is listed as having the table gets started and adopts it as an orphan. In R14 (our target release for now): FATAL ** Sender failed: {error, {no_exists, tab}} Or in R17: FATAL ** Cannot load table foo from disc: {not_loaded, storage_unknown} This then causes mnesia to shutdown. The R14 case only happens if the node listed as actually holding the table starts, and then re-starts. In R17 it seems to happen straight away on first startup. The active_replicas option for the table ends up listing the node() as well at some point. I've attached a repo case. Any idea how I can work around this? Obviously checking active_replicas before a copy would be a good idea, but it wouldn't protect against race conditions when nodes are yo-yoing. Would cleaning up the local_tables gvar be a good idea if system_limit happens? thanks, vlad -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: bug_add_table_copy.escript Type: application/octet-stream Size: 1869 bytes Desc: not available URL: From dangud@REDACTED Wed Dec 10 20:31:38 2014 From: dangud@REDACTED (Dan Gudmundsson) Date: Wed, 10 Dec 2014 20:31:38 +0100 Subject: [erlang-bugs] Mnesia:add_table_copy causes crashes when no replicas available In-Reply-To: References: Message-ID: Thanks for the bug-report I will take a look at it. /Dan On Wed, Dec 10, 2014 at 5:58 PM, Vladislav Titov wrote: > Hi > > This is on a fully in-RAM mnesia cluster. > > When performing mnesia:add_table_copy(tab, node(), ram_copies) when no > active replicas of tab are available, it replies back correctly with > {aborted, {system_limit, tab, ...}}. > > However, looking at the mnesia_gvar afterwards, the {schema, local_tables} > key lists the table in it. > > This (?) then causes mnesia to shut down when a node that is listed as > having the table gets started and adopts it as an orphan. In R14 (our > target release for now): > > FATAL ** Sender failed: {error, {no_exists, tab}} > > Or in R17: > > FATAL ** Cannot load table foo from disc: {not_loaded, storage_unknown} > > This then causes mnesia to shutdown. > > The R14 case only happens if the node listed as actually holding the table > starts, and then re-starts. In R17 it seems to happen straight away on > first startup. > > The active_replicas option for the table ends up listing the node() as > well at some point. > > I've attached a repo case. > > Any idea how I can work around this? Obviously checking active_replicas > before a copy would be a good idea, but it wouldn't protect against race > conditions when nodes are yo-yoing. Would cleaning up the local_tables gvar > be a good idea if system_limit happens? > > thanks, > vlad > > > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lukas@REDACTED Thu Dec 11 10:06:10 2014 From: lukas@REDACTED (Lukas Larsson) Date: Thu, 11 Dec 2014 10:06:10 +0100 Subject: [erlang-bugs] gen_tcp:send and file:sendfile In-Reply-To: <720381418195025@web2m.yandex.ru> References: <720381418195025@web2m.yandex.ru> Message-ID: <54895E82.4000404@erlang.org> Hello, Thanks for the bug report, I have seen something similar that might be connected to this in our nightly builds. I will try to reproduce the error. Lukas On 10/12/14 08:03, Slava Yurin wrote: > Hi all. > I have error behavior of gen_tcp:send/2 + file:sendfile/5. > If data for send falls into buffer in port and happen file:sendfile/5 > call, then > buffer not flushed and data from file send. After that any call > gen_tcp:send/2 > only append data to buffer and not send anything. And if not set > send_timeout > option gen_tcp:send will hang after exceed size of buffer. This is my > guess. > I attach test file and log of it usage. > Have reproducible error only when [{delay_send, true}, {nodelay, > false}], but in > real usage see same behavior without this options. Can't reproduce it on > localhost and 1 file. > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.pailleau@REDACTED Sat Dec 13 14:26:13 2014 From: eric.pailleau@REDACTED (PAILLEAU Eric) Date: Sat, 13 Dec 2014 14:26:13 +0100 Subject: [erlang-bugs] Missing return values for mnesia:subscribe/1 in documentation In-Reply-To: <54629058.4080105@wanadoo.fr> References: <54629058.4080105@wanadoo.fr> Message-ID: <548C3E75.5030501@wanadoo.fr> Hi, was the mail below taken into account by OTP team ? Regards Le 11/11/2014 23:40, PAILLEAU Eric a ?crit : > Hi, > I found that in online documentation is missing possible returned values > of mnesia:subscribe/1 . > > By trying, looks like {ok, nodes()} when OK, but what on error ? > > I cannot see more info in User's Guide either ... > > ---8<------------------------------------------------------------------------------ > > subscribe(EventCategory) > > Ensures that a copy of all events of type EventCategory are sent to the > caller. The event types available are described in the Mnesia User's Guide. > ---8<------------------------------------------------------------------------------ > > > Regards. > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > From dangud@REDACTED Sat Dec 13 14:43:19 2014 From: dangud@REDACTED (Dan Gudmundsson) Date: Sat, 13 Dec 2014 14:43:19 +0100 Subject: [erlang-bugs] Missing return values for mnesia:subscribe/1 in documentation In-Reply-To: <548C3E75.5030501@wanadoo.fr> References: <54629058.4080105@wanadoo.fr> <548C3E75.5030501@wanadoo.fr> Message-ID: Waiting for a patch :-) On Sat, Dec 13, 2014 at 2:26 PM, PAILLEAU Eric wrote: > > Hi, > was the mail below taken into account by OTP team ? > Regards > > Le 11/11/2014 23:40, PAILLEAU Eric a ?crit : > > Hi, >> I found that in online documentation is missing possible returned values >> of mnesia:subscribe/1 . >> >> By trying, looks like {ok, nodes()} when OK, but what on error ? >> >> I cannot see more info in User's Guide either ... >> >> ---8<------------------------------------------------------- >> ----------------------- >> >> subscribe(EventCategory) >> >> Ensures that a copy of all events of type EventCategory are sent to the >> caller. The event types available are described in the Mnesia User's >> Guide. >> ---8<------------------------------------------------------- >> ----------------------- >> >> >> Regards. >> >> >> _______________________________________________ >> erlang-bugs mailing list >> erlang-bugs@REDACTED >> http://erlang.org/mailman/listinfo/erlang-bugs >> >> > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.pailleau@REDACTED Sat Dec 13 14:56:54 2014 From: eric.pailleau@REDACTED (PAILLEAU Eric) Date: Sat, 13 Dec 2014 14:56:54 +0100 Subject: [erlang-bugs] Missing return values for mnesia:subscribe/1 in documentation In-Reply-To: References: <54629058.4080105@wanadoo.fr> <548C3E75.5030501@wanadoo.fr> Message-ID: <548C45A6.9050307@wanadoo.fr> Hi Dan, I'm not the last to do PRs, with my small free time. It should simply be nice to have an answer immediately on bug list on whether OTP team do something or wait somebody else to do something. BTW the "What you could do" tab is still empty. It could be a proper way to ask to community to do some work that OTP won't do. I supposed OTP could have the answer, so does somebody know the answer before I read the mnesia code and do a PR ? Regards Le 13/12/2014 14:43, Dan Gudmundsson a ?crit : > Waiting for a patch :-) > > > Hi, > was the mail below taken into account by OTP team ? > Regards > > Le 11/11/2014 23:40, PAILLEAU Eric a ?crit : > > Hi, > I found that in online documentation is missing possible > returned values > of mnesia:subscribe/1 . > > By trying, looks like {ok, nodes()} when OK, but what on error ? > > I cannot see more info in User's Guide either ... > > ---8<-------------------------__------------------------------__----------------------- > > subscribe(EventCategory) > > Ensures that a copy of all events of type EventCategory are sent > to the > caller. The event types available are described in the Mnesia > User's Guide. > ---8<-------------------------__------------------------------__----------------------- > > > Regards. > > > _________________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/__listinfo/erlang-bugs > > > > _________________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/__listinfo/erlang-bugs > > From mikpelinux@REDACTED Mon Dec 15 13:16:46 2014 From: mikpelinux@REDACTED (Mikael Pettersson) Date: Mon, 15 Dec 2014 13:16:46 +0100 Subject: [erlang-bugs] SEGV in process_main() line 3163 [r15B03] Message-ID: <21646.53550.37641.137535@gargle.gargle.HOWL> [2nd attempt to send this, my apologies if you seee this twice] We've had two segfaults now in r15's process_main(), line 3163, which is the register flushing loop just before the current process is swapped out: ==snip== argp = c_p->arg_reg; for (i = c_p->arity - 1; i > 0; i--) { => argp[i] = reg[i]; } c_p->arg_reg[0] = r(0); SWAPOUT; ==snip== The core file is unfortunately truncated: I can see the registers at the point of the SEGV, but not inspect any memory. The registers and disassembly are: ==snip== Program terminated with signal 11, Segmentation fault. #0 process_main () at beam/beam_emu.c:3163 3163 beam/beam_emu.c: No such file or directory. (gdb) info reg rax 0x7e7d77fff3f8 139077349274616 rbx 0x7f243b82feb8 139793593990840 rcx 0x0 0 rdx 0x53ba78 5487224 rsi 0x7e7d75622030 139077305376816 rdi 0x0 0 rbp 0x1414400 0x1414400 rsp 0x7f2467432cf0 0x7f2467432cf0 r8 0x0 0 r9 0x0 0 r10 0x0 0 r11 0x246 582 r12 0x7f2471b407c8 139794503174088 r13 0x7e7f4309cae0 139085050661600 r14 0x7e7f42e57168 139085048279400 r15 0xc63f 50751 rip 0x5425e4 0x5425e4 eflags 0x10202 [ IF RF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 (gdb) disassemble 0x5425a6,0x542610 Dump of assembler code from 0x5425a6 to 0x542610: 0x00000000005425a6 : mov 0x90(%rbp),%rdx 0x00000000005425ad : mov %rax,0x98(%rbp) 0x00000000005425b4 : mov %edx,0xa0(%rbp) 0x00000000005425ba : mov 0xd0(%rbp),%rcx 0x00000000005425c1 : lea -0x1(%rdx),%eax 0x00000000005425c4 : mov 0x98(%rbp),%rsi 0x00000000005425cb : test %eax,%eax 0x00000000005425cd : mov %rcx,0x48(%rsp) 0x00000000005425d2 : jle 0x5425fd 0x00000000005425d4 : cltq 0x00000000005425d6 : sub $0x2,%edx 0x00000000005425d9 : shl $0x3,%rax 0x00000000005425dd : add %rax,%r12 0x00000000005425e0 : lea (%rsi,%rax,1),%rax => 0x00000000005425e4 : mov (%r12),%rcx 0x00000000005425e8 : sub $0x1,%edx 0x00000000005425eb : sub $0x8,%r12 0x00000000005425ef : mov %rcx,(%rax) 0x00000000005425f2 : lea 0x1(%rdx),%ecx 0x00000000005425f5 : sub $0x8,%rax 0x00000000005425f9 : test %ecx,%ecx 0x00000000005425fb : jg 0x5425e4 0x00000000005425fd : mov %r15,(%rsi) 0x0000000000542600 : mov %r14,0x0(%rbp) 0x0000000000542604 : mov $0x8,%esi 0x0000000000542609 : mov %r13,0x8(%rbp) 0x000000000054260d : mov %rbx,0xe0(%rbp) End of assembler dump. ==snip== I interpret this as follows: 1. c_p == %rbp == 0x1414400 2. &argp[i] == %rax == 0x7e7d77fff3f8 from this I deduce that c_p->arg_reg != c_p->def_arg_reg, so it points to a dynamically allocated area separate from *c_p 3. i == c_p->arity - 1 == %rdx == 0x53ba78 this is clearly bonkers, and what's causing references into unmapped memory 4. ®[i] == %r12 == 0x7f2471b407c8 this is consistent with indexing a frame-local array at 0x53ba78 Basically, my conclusion is that c_p->arity has been clobbered, causing out-of-range accesses in this loop. We've had this exact crash twice now, in August and last Thursday (Dec 11). I realize the lack of a complete core dump makes this impossible to debug. What I'm hoping for is that someone might recollect some post-R15 change or fix that might have something to do with unexpected clobbers of process structs. /Mikael From jesper.louis.andersen@REDACTED Tue Dec 16 01:07:52 2014 From: jesper.louis.andersen@REDACTED (Jesper Louis Andersen) Date: Tue, 16 Dec 2014 01:07:52 +0100 Subject: [erlang-bugs] Dialyzer can't compile map correctly Message-ID: Hi OTP team and other interested parties. While I was building up the enacl application, I have discovered a problem where I can crash the dialyzer. Attached are two minimized files which exposes the problem. To compile this I did: erlc +debug_info *.erl dialyzer --build_plt --apps kernel stdlib dialyzer *.beam and it produces the following IDE (Internal Dialyzer Error): erlc +debug_info *.erl dialyzer *.beam Checking whether the PLT /home/jlouis/.dialyzer_plt is up-to-date... yes Proceeding with analysis... =ERROR REPORT==== 16-Dec-2014::01:00:27 === Error in process <0.48.0> with exit value: {{case_clause,map},[{dialyzer_dataflow,find_terminals,1,[{file,"dialyzer_dataflow.erl"},{line,3451}]},{dialyzer_dataflow,find_terminals_list,3,[{file,"dialyzer_dataflow.erl"},{line,3504}]},{dialyzer_dataflow,classify_returns... dialyzer: Analysis failed with error: {{case_clause,map}, [{dialyzer_dataflow,find_terminals,1, [{file,"dialyzer_dataflow.erl"},{line,3451}]}, {dialyzer_dataflow,find_terminals_list,3, [{file,"dialyzer_dataflow.erl"},{line,3504}]}, {dialyzer_dataflow,classify_returns,1, [{file,"dialyzer_dataflow.erl"},{line,3443}]}, {dialyzer_dataflow,'-state__get_warnings/2-fun-0-',7, [{file,"dialyzer_dataflow.erl"},{line,2908}]}, {lists,foldl,3,[{file,"lists.erl"},{line,1261}]}, {dialyzer_dataflow,state__get_warnings,2, [{file,"dialyzer_dataflow.erl"},{line,2934}]}, {dialyzer_dataflow,get_warnings,5, [{file,"dialyzer_dataflow.erl"},{line,142}]}, {dialyzer_succ_typings,collect_warnings,2, [{file,"dialyzer_succ_typings.erl"},{line,182}]}]} Last messages in the log cache: Reading files and computing callgraph... done in 0.06 secs Removing edges... done in 0.01 secs Makefile:2: recipe for target 'all' failed make: *** [all] Error 1 I have attached the two culprit files. They have the following sizes: jlouis@REDACTED:~/tmp/problem$ wc *.erl 12 29 211 enacl.erl 23 36 460 enacl_nif.erl 35 65 671 total and I think it may be possible to shrink further down, but I think it is already small enough to be workable. Do note: while one part is a NIF, you don't need the underlying C code to break the dialyzer. The full repository is at: https://github.com/jlouis/enacl and the commit ID 23e535fcc23c1 should have the error if you want to look at the full repository. It does require a properly installed libsodium, which is not in Debian/Ubuntu for instance, which is why I have tried to narrow down the problem before reporting it. I hope this is enough to track down the error, perhaps by looking at the backtrace and error case alone. Otherwise, please come back to me. -- J. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: enacl.erl Type: text/x-erlang Size: 211 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: enacl_nif.erl Type: text/x-erlang Size: 460 bytes Desc: not available URL: From jesper.louis.andersen@REDACTED Tue Dec 16 01:09:10 2014 From: jesper.louis.andersen@REDACTED (Jesper Louis Andersen) Date: Tue, 16 Dec 2014 01:09:10 +0100 Subject: [erlang-bugs] Dialyzer can't compile map correctly In-Reply-To: References: Message-ID: Oh, I forgot this important information: This fails with a build of OTP 17.3.4 and 17.4, compiled with --enable-dirty-schedulers --disable-hipe On Tue, Dec 16, 2014 at 1:07 AM, Jesper Louis Andersen < jesper.louis.andersen@REDACTED> wrote: > > Hi OTP team and other interested parties. > > While I was building up the enacl application, I have discovered a problem > where I can crash the dialyzer. Attached are two minimized files which > exposes the problem. To compile this I did: > > erlc +debug_info *.erl > dialyzer --build_plt --apps kernel stdlib > dialyzer *.beam > > and it produces the following IDE (Internal Dialyzer Error): > > erlc +debug_info *.erl > dialyzer *.beam > Checking whether the PLT /home/jlouis/.dialyzer_plt is up-to-date... yes > Proceeding with analysis... > =ERROR REPORT==== 16-Dec-2014::01:00:27 === > Error in process <0.48.0> with exit value: > {{case_clause,map},[{dialyzer_dataflow,find_terminals,1,[{file,"dialyzer_dataflow.erl"},{line,3451}]},{dialyzer_dataflow,find_terminals_list,3,[{file,"dialyzer_dataflow.erl"},{line,3504}]},{dialyzer_dataflow,classify_returns... > > > dialyzer: Analysis failed with error: > {{case_clause,map}, > [{dialyzer_dataflow,find_terminals,1, > [{file,"dialyzer_dataflow.erl"},{line,3451}]}, > {dialyzer_dataflow,find_terminals_list,3, > [{file,"dialyzer_dataflow.erl"},{line,3504}]}, > {dialyzer_dataflow,classify_returns,1, > [{file,"dialyzer_dataflow.erl"},{line,3443}]}, > {dialyzer_dataflow,'-state__get_warnings/2-fun-0-',7, > [{file,"dialyzer_dataflow.erl"},{line,2908}]}, > {lists,foldl,3,[{file,"lists.erl"},{line,1261}]}, > {dialyzer_dataflow,state__get_warnings,2, > [{file,"dialyzer_dataflow.erl"},{line,2934}]}, > {dialyzer_dataflow,get_warnings,5, > [{file,"dialyzer_dataflow.erl"},{line,142}]}, > {dialyzer_succ_typings,collect_warnings,2, > [{file,"dialyzer_succ_typings.erl"},{line,182}]}]} > Last messages in the log cache: > Reading files and computing callgraph... done in 0.06 secs > Removing edges... done in 0.01 secs > Makefile:2: recipe for target 'all' failed > make: *** [all] Error 1 > > I have attached the two culprit files. They have the following sizes: > > jlouis@REDACTED:~/tmp/problem$ wc *.erl > 12 29 211 enacl.erl > 23 36 460 enacl_nif.erl > 35 65 671 total > > and I think it may be possible to shrink further down, but I think it is > already small enough to be workable. Do note: while one part is a NIF, you > don't need the underlying C code to break the dialyzer. > > The full repository is at: > > https://github.com/jlouis/enacl > > and the commit ID 23e535fcc23c1 should have the error if you want to look > at the full repository. It does require a properly installed libsodium, > which is not in Debian/Ubuntu for instance, which is why I have tried to > narrow down the problem before reporting it. > > I hope this is enough to track down the error, perhaps by looking at the > backtrace and error case alone. Otherwise, please come back to me. > > -- > J. > -- J. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wallentin.dahlberg@REDACTED Tue Dec 16 01:28:58 2014 From: wallentin.dahlberg@REDACTED (=?UTF-8?Q?Bj=C3=B6rn=2DEgil_Dahlberg?=) Date: Tue, 16 Dec 2014 01:28:58 +0100 Subject: [erlang-bugs] Dialyzer can't compile map correctly In-Reply-To: References: Message-ID: Great. I thought I fixed that in 17.4. If it is the same problem that is .. apparently there were more of them. 2014-12-16 1:09 GMT+01:00 Jesper Louis Andersen < jesper.louis.andersen@REDACTED>: > > Oh, I forgot this important information: > > This fails with a build of OTP 17.3.4 and 17.4, compiled with > --enable-dirty-schedulers --disable-hipe > > > On Tue, Dec 16, 2014 at 1:07 AM, Jesper Louis Andersen < > jesper.louis.andersen@REDACTED> wrote: >> >> Hi OTP team and other interested parties. >> >> While I was building up the enacl application, I have discovered a >> problem where I can crash the dialyzer. Attached are two minimized files >> which exposes the problem. To compile this I did: >> >> erlc +debug_info *.erl >> dialyzer --build_plt --apps kernel stdlib >> dialyzer *.beam >> >> and it produces the following IDE (Internal Dialyzer Error): >> >> erlc +debug_info *.erl >> dialyzer *.beam >> Checking whether the PLT /home/jlouis/.dialyzer_plt is up-to-date... yes >> Proceeding with analysis... >> =ERROR REPORT==== 16-Dec-2014::01:00:27 === >> Error in process <0.48.0> with exit value: >> {{case_clause,map},[{dialyzer_dataflow,find_terminals,1,[{file,"dialyzer_dataflow.erl"},{line,3451}]},{dialyzer_dataflow,find_terminals_list,3,[{file,"dialyzer_dataflow.erl"},{line,3504}]},{dialyzer_dataflow,classify_returns... >> >> >> dialyzer: Analysis failed with error: >> {{case_clause,map}, >> [{dialyzer_dataflow,find_terminals,1, >> [{file,"dialyzer_dataflow.erl"},{line,3451}]}, >> {dialyzer_dataflow,find_terminals_list,3, >> [{file,"dialyzer_dataflow.erl"},{line,3504}]}, >> {dialyzer_dataflow,classify_returns,1, >> [{file,"dialyzer_dataflow.erl"},{line,3443}]}, >> {dialyzer_dataflow,'-state__get_warnings/2-fun-0-',7, >> [{file,"dialyzer_dataflow.erl"},{line,2908}]}, >> {lists,foldl,3,[{file,"lists.erl"},{line,1261}]}, >> {dialyzer_dataflow,state__get_warnings,2, >> [{file,"dialyzer_dataflow.erl"},{line,2934}]}, >> {dialyzer_dataflow,get_warnings,5, >> [{file,"dialyzer_dataflow.erl"},{line,142}]}, >> {dialyzer_succ_typings,collect_warnings,2, >> >> [{file,"dialyzer_succ_typings.erl"},{line,182}]}]} >> Last messages in the log cache: >> Reading files and computing callgraph... done in 0.06 secs >> Removing edges... done in 0.01 secs >> Makefile:2: recipe for target 'all' failed >> make: *** [all] Error 1 >> >> I have attached the two culprit files. They have the following sizes: >> >> jlouis@REDACTED:~/tmp/problem$ wc *.erl >> 12 29 211 enacl.erl >> 23 36 460 enacl_nif.erl >> 35 65 671 total >> >> and I think it may be possible to shrink further down, but I think it is >> already small enough to be workable. Do note: while one part is a NIF, you >> don't need the underlying C code to break the dialyzer. >> >> The full repository is at: >> >> https://github.com/jlouis/enacl >> >> and the commit ID 23e535fcc23c1 should have the error if you want to look >> at the full repository. It does require a properly installed libsodium, >> which is not in Debian/Ubuntu for instance, which is why I have tried to >> narrow down the problem before reporting it. >> >> I hope this is enough to track down the error, perhaps by looking at the >> backtrace and error case alone. Otherwise, please come back to me. >> >> -- >> J. >> > > > -- > J. > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kostis@REDACTED Tue Dec 16 01:46:23 2014 From: kostis@REDACTED (Kostis Sagonas) Date: Tue, 16 Dec 2014 01:46:23 +0100 Subject: [erlang-bugs] Dialyzer can't compile map correctly In-Reply-To: References: Message-ID: <548F80DF.9030304@cs.ntua.gr> On 12/16/2014 01:07 AM, Jesper Louis Andersen wrote: > Hi OTP team and other interested parties. > > While I was building up the enacl application, I have discovered a > problem where I can crash the dialyzer. Dialyzer should of course not crash and should be fixed, but your program is crap and dialyzer (rightfully) throws up on such code ;) It's caused by the following two functions which make the crypto_box_keypair/0 function not returning: not_loaded() -> error({nif_not_loaded, ?MODULE}). crypto_box_keypair() -> not_loaded(). So your use of the call to enacl_nif:crypto_box_keypair() in the enacl module makes Dialyzer think that the particular call will not return. Thus, the -spec box_keypair() -> #{ atom() => binary() }. that you have specified is wrong as far as Dialyzer is concerned. It crashes when in the phase that it tries to generate the warning. Dialyzer should of course be fixed so that its analysis does not crash when generating warnings for specs that contain (erroneous) map types, but this will not fix the problem(s) in your code base. Kostis From jesper.louis.andersen@REDACTED Tue Dec 16 01:55:17 2014 From: jesper.louis.andersen@REDACTED (Jesper Louis Andersen) Date: Tue, 16 Dec 2014 01:55:17 +0100 Subject: [erlang-bugs] Dialyzer can't compile map correctly In-Reply-To: <548F80DF.9030304@cs.ntua.gr> References: <548F80DF.9030304@cs.ntua.gr> Message-ID: On Tue, Dec 16, 2014 at 1:46 AM, Kostis Sagonas wrote: I indeed see as to why this is true: Thus, the -spec box_keypair() -> #{ atom() => binary() }. that you have > specified is wrong as far as Dialyzer is concerned. But how do you propose I fix this problem then? not_loaded() will *never* be called in a real program since it will be replaced by the NIFs C code. I want that code to crash and burn if it is ever called, because it means the NIF is not loaded properly. That is, what is the correct specification to give here such that the underlying call to enacl_nif:crypto_box_keypair() returns the right type, without getting the dialyzer all confused due to the replacement code in the erlang module? -- J. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dariusz.gadomski@REDACTED Tue Dec 16 09:04:30 2014 From: dariusz.gadomski@REDACTED (Dariusz Gadomski) Date: Tue, 16 Dec 2014 09:04:30 +0100 Subject: [erlang-bugs] Cannot bind epmd to an IPv4 address after building with -DEPMD6 In-Reply-To: <20141114100122.GA27993@phenom> References: <20141114100122.GA27993@phenom> Message-ID: <20141216080430.GA5059@phenom> On Fri, Nov 14, 2014 at 11:01:22AM +0100, Dariusz Gadomski wrote: > Hello everyone, > > There are users observing problems with binding to an IPv4 address with > erlang 16b3 version built with -DEPMD6 flag [1]. This flag has been added > as a response to another?bug [2]. > > By looking at erts/epmd/src/epmd_int.h I see that the address definitions are > mutually exclusive - enabling IPv6 makes all addresses IPv6-only which may > result in problems with binding to a specific IPv4 address: > $ ERL_EPMD_ADDRESS="127.0.0.1" epmd > epmd: Fri Nov 14 11:00:47 2014: cannot parse IP address "127.0.0.1" > > Can you please confirm this? Is it a known issue? Maybe there is a workaround? > > [1]?https://bugs.launchpad.net/ubuntu/+source/erlang/+bug/1374109 > [2]?https://bugs.launchpad.net/ubuntu/+source/rabbitmq-server/+bug/1312507 Hello again, I have made an interesting observation. There is a change linked from [3] on github [4]. This change adds support for IPv6 node registration to epmd. As a side effect it fixes the original issue I mentioned above. Is anyone aware if there are any plans of merging it to the main codebase? [3] http://www.erlang.org/development/ [4] https://github.com/msantos/otp/compare/erlang:maint...epmd-IPv6-node-reg Thanks! Dariusz Gadomski From jesper.louis.andersen@REDACTED Tue Dec 16 09:28:49 2014 From: jesper.louis.andersen@REDACTED (Jesper Louis Andersen) Date: Tue, 16 Dec 2014 09:28:49 +0100 Subject: [erlang-bugs] Dialyzer can't compile map correctly In-Reply-To: References: <548F80DF.9030304@cs.ntua.gr> Message-ID: Answering my own mail here, because Anthony Ramine had the correct answer: erlang:nif_error/1 gets special treatment in the dialyzer and avoids this problem. I'll reflect that in my code. The other problem is that the OTP can't reproduce the error on a fresh 17.4 install. Is it possible for someone to test it. It may be my installation that is broken beyond repair. On Tue, Dec 16, 2014 at 1:55 AM, Jesper Louis Andersen < jesper.louis.andersen@REDACTED> wrote: > > > On Tue, Dec 16, 2014 at 1:46 AM, Kostis Sagonas wrote: > > I indeed see as to why this is true: > > Thus, the -spec box_keypair() -> #{ atom() => binary() }. that you have >> specified is wrong as far as Dialyzer is concerned. > > > But how do you propose I fix this problem then? not_loaded() will *never* > be called in a real program since it will be replaced by the NIFs C code. I > want that code to crash and burn if it is ever called, because it means the > NIF is not loaded properly. That is, what is the correct specification to > give here such that the underlying call to enacl_nif:crypto_box_keypair() > returns the right type, without getting the dialyzer all confused due to > the replacement code in the erlang module? > > > -- > J. > -- J. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kostis@REDACTED Tue Dec 16 12:59:35 2014 From: kostis@REDACTED (Kostis Sagonas) Date: Tue, 16 Dec 2014 12:59:35 +0100 Subject: [erlang-bugs] Dialyzer can't compile map correctly In-Reply-To: References: <548F80DF.9030304@cs.ntua.gr> Message-ID: <54901EA7.5010906@cs.ntua.gr> On 12/16/2014 09:28 AM, Jesper Louis Andersen wrote: > The other problem is that the OTP can't reproduce the error on a fresh > 17.4 install. Is it possible for someone to test it. I can very easily reproduce your problem (with 17.4 and also with the maint branch). Not sure why the folks @ OTP cannot. (Is it really so?) Anyway, actually one does not even need to invoke anything NIF-related to experience the crash. The module included below suffices. Kostis %%======================================================================= -module(enacl). -export([box_keypair/0]). -spec box_keypair() -> #{ atom() => binary() }. box_keypair() -> {PK, SK} = foo(), #{ public => PK, secret => SK}. foo() -> error(42). From egil@REDACTED Tue Dec 16 14:34:00 2014 From: egil@REDACTED (=?windows-1252?Q?Bj=F6rn-Egil_Dahlberg?=) Date: Tue, 16 Dec 2014 14:34:00 +0100 Subject: [erlang-bugs] Dialyzer can't compile map correctly In-Reply-To: <54901EA7.5010906@cs.ntua.gr> References: <548F80DF.9030304@cs.ntua.gr> <54901EA7.5010906@cs.ntua.gr> Message-ID: <549034C8.5050804@erlang.org> On 2014-12-16 12:59, Kostis Sagonas wrote: > On 12/16/2014 09:28 AM, Jesper Louis Andersen wrote: >> The other problem is that the OTP can't reproduce the error on a fresh >> 17.4 install. Is it possible for someone to test it. > > I can very easily reproduce your problem (with 17.4 and also with the > maint branch). Not sure why the folks @ OTP cannot. (Is it really so?) Well, I don't know why but late last night (03:00) I couldn't reproduce it on 17.4 .. but today on 17.4 I get the error on the example below. I admit, I was wrong (but totally suspicious). It is a missing clause in dialyzer_dataflow:find_terminals/1 .. which I thought I had fixed but .. noooo. I should dig a little deeper though so I don't miss any other stuff regarding this. // Bj?rn-Egil, is a sad panda. > > Anyway, actually one does not even need to invoke anything NIF-related > to experience the crash. The module included below suffices. > > Kostis > > %%======================================================================= > -module(enacl). > -export([box_keypair/0]). > > -spec box_keypair() -> #{ atom() => binary() }. > box_keypair() -> > {PK, SK} = foo(), > #{ public => PK, secret => SK}. > > foo() -> > error(42). > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > > From eric.pailleau@REDACTED Tue Dec 16 23:50:25 2014 From: eric.pailleau@REDACTED (PAILLEAU Eric) Date: Tue, 16 Dec 2014 23:50:25 +0100 Subject: [erlang-bugs] Missing return values for mnesia:subscribe/1 in documentation In-Reply-To: <54629058.4080105@wanadoo.fr> References: <54629058.4080105@wanadoo.fr> Message-ID: <5490B731.4050906@wanadoo.fr> I pushed a PR #564 to fix this. regards Le 11/11/2014 23:40, PAILLEAU Eric a ?crit : > Hi, > I found that in online documentation is missing possible returned values > of mnesia:subscribe/1 . > > By trying, looks like {ok, nodes()} when OK, but what on error ? > > I cannot see more info in User's Guide either ... > > ---8<------------------------------------------------------------------------------ > > subscribe(EventCategory) > > Ensures that a copy of all events of type EventCategory are sent to the > caller. The event types available are described in the Mnesia User's Guide. > ---8<------------------------------------------------------------------------------ > > > Regards. > > From n.oxyde@REDACTED Wed Dec 17 12:26:35 2014 From: n.oxyde@REDACTED (Anthony Ramine) Date: Wed, 17 Dec 2014 12:26:35 +0100 Subject: [erlang-bugs] SEGV in process_main() line 3163 [r15B03] In-Reply-To: <21646.53550.37641.137535@gargle.gargle.HOWL> References: <21646.53550.37641.137535@gargle.gargle.HOWL> Message-ID: <7CEC811D-9A80-420C-9CD4-D80523CC115C@gmail.com> Le 15 d?c. 2014 ? 13:16, Mikael Pettersson a ?crit : > [2nd attempt to send this, my apologies if you seee this twice] > > We've had two segfaults now in r15's process_main(), line 3163, which is > the register flushing loop just before the current process is swapped out: > > ==snip== > argp = c_p->arg_reg; > for (i = c_p->arity - 1; i > 0; i--) { > => argp[i] = reg[i]; > } > c_p->arg_reg[0] = r(0); > SWAPOUT; > ==snip== > > The core file is unfortunately truncated: I can see the registers at the > point of the SEGV, but not inspect any memory. The registers and > disassembly are: > > ==snip== > Program terminated with signal 11, Segmentation fault. > #0 process_main () at beam/beam_emu.c:3163 > 3163 beam/beam_emu.c: No such file or directory. > (gdb) info reg > rax 0x7e7d77fff3f8 139077349274616 > rbx 0x7f243b82feb8 139793593990840 > rcx 0x0 0 > rdx 0x53ba78 5487224 > rsi 0x7e7d75622030 139077305376816 > rdi 0x0 0 > rbp 0x1414400 0x1414400 > rsp 0x7f2467432cf0 0x7f2467432cf0 > r8 0x0 0 > r9 0x0 0 > r10 0x0 0 > r11 0x246 582 > r12 0x7f2471b407c8 139794503174088 > r13 0x7e7f4309cae0 139085050661600 > r14 0x7e7f42e57168 139085048279400 > r15 0xc63f 50751 > rip 0x5425e4 0x5425e4 > eflags 0x10202 [ IF RF ] > cs 0x33 51 > ss 0x2b 43 > ds 0x0 0 > es 0x0 0 > fs 0x0 0 > gs 0x0 0 > (gdb) disassemble 0x5425a6,0x542610 > Dump of assembler code from 0x5425a6 to 0x542610: > 0x00000000005425a6 : mov 0x90(%rbp),%rdx > 0x00000000005425ad : mov %rax,0x98(%rbp) > 0x00000000005425b4 : mov %edx,0xa0(%rbp) > 0x00000000005425ba : mov 0xd0(%rbp),%rcx > 0x00000000005425c1 : lea -0x1(%rdx),%eax > 0x00000000005425c4 : mov 0x98(%rbp),%rsi > 0x00000000005425cb : test %eax,%eax > 0x00000000005425cd : mov %rcx,0x48(%rsp) > 0x00000000005425d2 : jle 0x5425fd > 0x00000000005425d4 : cltq > 0x00000000005425d6 : sub $0x2,%edx > 0x00000000005425d9 : shl $0x3,%rax > 0x00000000005425dd : add %rax,%r12 > 0x00000000005425e0 : lea (%rsi,%rax,1),%rax > => 0x00000000005425e4 : mov (%r12),%rcx > 0x00000000005425e8 : sub $0x1,%edx > 0x00000000005425eb : sub $0x8,%r12 > 0x00000000005425ef : mov %rcx,(%rax) > 0x00000000005425f2 : lea 0x1(%rdx),%ecx > 0x00000000005425f5 : sub $0x8,%rax > 0x00000000005425f9 : test %ecx,%ecx > 0x00000000005425fb : jg 0x5425e4 > 0x00000000005425fd : mov %r15,(%rsi) > 0x0000000000542600 : mov %r14,0x0(%rbp) > 0x0000000000542604 : mov $0x8,%esi > 0x0000000000542609 : mov %r13,0x8(%rbp) > 0x000000000054260d : mov %rbx,0xe0(%rbp) > End of assembler dump. > ==snip== > > I interpret this as follows: > 1. c_p == %rbp == 0x1414400 > 2. &argp[i] == %rax == 0x7e7d77fff3f8 > from this I deduce that c_p->arg_reg != c_p->def_arg_reg, so it points > to a dynamically allocated area separate from *c_p > 3. i == c_p->arity - 1 == %rdx == 0x53ba78 > this is clearly bonkers, and what's causing references into unmapped > memory > 4. ®[i] == %r12 == 0x7f2471b407c8 > this is consistent with indexing a frame-local array at 0x53ba78 > > Basically, my conclusion is that c_p->arity has been clobbered, causing > out-of-range accesses in this loop. > > We've had this exact crash twice now, in August and last Thursday (Dec 11). > > I realize the lack of a complete core dump makes this impossible to debug. > What I'm hoping for is that someone might recollect some post-R15 change > or fix that might have something to do with unexpected clobbers of process > structs. > > /Mikael How do you know it's not a NIF doing strange things or whatnot? Did you manage to reproduce it afterwards? Did you try with a debug build? Regards. From michael.santos@REDACTED Wed Dec 17 16:19:07 2014 From: michael.santos@REDACTED (Michael Santos) Date: Wed, 17 Dec 2014 10:19:07 -0500 Subject: [erlang-bugs] Cannot bind epmd to an IPv4 address after building with -DEPMD6 In-Reply-To: <20141216080430.GA5059@phenom> References: <20141114100122.GA27993@phenom> <20141216080430.GA5059@phenom> Message-ID: <20141217151907.GA11995@brk> On Tue, Dec 16, 2014 at 09:04:30AM +0100, Dariusz Gadomski wrote: > On Fri, Nov 14, 2014 at 11:01:22AM +0100, Dariusz Gadomski wrote: > > Hello everyone, > > > > There are users observing problems with binding to an IPv4 address with > > erlang 16b3 version built with -DEPMD6 flag [1]. This flag has been added > > as a response to another?bug [2]. > > > > By looking at erts/epmd/src/epmd_int.h I see that the address definitions are > > mutually exclusive - enabling IPv6 makes all addresses IPv6-only which may > > result in problems with binding to a specific IPv4 address: > > $ ERL_EPMD_ADDRESS="127.0.0.1" epmd > > epmd: Fri Nov 14 11:00:47 2014: cannot parse IP address "127.0.0.1" > > > > Can you please confirm this? Is it a known issue? Maybe there is a workaround? > > > > [1]?https://bugs.launchpad.net/ubuntu/+source/erlang/+bug/1374109 > > [2]?https://bugs.launchpad.net/ubuntu/+source/rabbitmq-server/+bug/1312507 > > Hello again, > > I have made an interesting observation. There is a change linked from > [3] on github [4]. This change adds support for IPv6 node registration to > epmd. > > As a side effect it fixes the original issue I mentioned above. Is > anyone aware if there are any plans of merging it to the main > codebase? > > [3] http://www.erlang.org/development/ > [4] https://github.com/msantos/otp/compare/erlang:maint...epmd-IPv6-node-reg There were problems compiling on Windows that could be fixed by removing support for older versions: http://erlang.org/pipermail/erlang-patches/2013-February/003528.html I'll rebase the patch and make a new pull request. From mikpelinux@REDACTED Thu Dec 18 10:58:40 2014 From: mikpelinux@REDACTED (Mikael Pettersson) Date: Thu, 18 Dec 2014 10:58:40 +0100 Subject: [erlang-bugs] SEGV in process_main() line 3163 [r15B03] In-Reply-To: <7CEC811D-9A80-420C-9CD4-D80523CC115C@gmail.com> References: <21646.53550.37641.137535@gargle.gargle.HOWL> <7CEC811D-9A80-420C-9CD4-D80523CC115C@gmail.com> Message-ID: <21650.42320.48163.239414@gargle.gargle.HOWL> Anthony Ramine writes: > Le 15 d?c. 2014 ? 13:16, Mikael Pettersson a ?crit : > > > [2nd attempt to send this, my apologies if you seee this twice] > > > > We've had two segfaults now in r15's process_main(), line 3163, which is > > the register flushing loop just before the current process is swapped out: > > > > ==snip== > > argp = c_p->arg_reg; > > for (i = c_p->arity - 1; i > 0; i--) { > > => argp[i] = reg[i]; > > } > > c_p->arg_reg[0] = r(0); > > SWAPOUT; > > ==snip== > > > > The core file is unfortunately truncated: I can see the registers at the > > point of the SEGV, but not inspect any memory. The registers and > > disassembly are: > > > > ==snip== > > Program terminated with signal 11, Segmentation fault. > > #0 process_main () at beam/beam_emu.c:3163 > > 3163 beam/beam_emu.c: No such file or directory. > > (gdb) info reg > > rax 0x7e7d77fff3f8 139077349274616 > > rbx 0x7f243b82feb8 139793593990840 > > rcx 0x0 0 > > rdx 0x53ba78 5487224 > > rsi 0x7e7d75622030 139077305376816 > > rdi 0x0 0 > > rbp 0x1414400 0x1414400 > > rsp 0x7f2467432cf0 0x7f2467432cf0 > > r8 0x0 0 > > r9 0x0 0 > > r10 0x0 0 > > r11 0x246 582 > > r12 0x7f2471b407c8 139794503174088 > > r13 0x7e7f4309cae0 139085050661600 > > r14 0x7e7f42e57168 139085048279400 > > r15 0xc63f 50751 > > rip 0x5425e4 0x5425e4 > > eflags 0x10202 [ IF RF ] > > cs 0x33 51 > > ss 0x2b 43 > > ds 0x0 0 > > es 0x0 0 > > fs 0x0 0 > > gs 0x0 0 > > (gdb) disassemble 0x5425a6,0x542610 > > Dump of assembler code from 0x5425a6 to 0x542610: > > 0x00000000005425a6 : mov 0x90(%rbp),%rdx > > 0x00000000005425ad : mov %rax,0x98(%rbp) > > 0x00000000005425b4 : mov %edx,0xa0(%rbp) > > 0x00000000005425ba : mov 0xd0(%rbp),%rcx > > 0x00000000005425c1 : lea -0x1(%rdx),%eax > > 0x00000000005425c4 : mov 0x98(%rbp),%rsi > > 0x00000000005425cb : test %eax,%eax > > 0x00000000005425cd : mov %rcx,0x48(%rsp) > > 0x00000000005425d2 : jle 0x5425fd > > 0x00000000005425d4 : cltq > > 0x00000000005425d6 : sub $0x2,%edx > > 0x00000000005425d9 : shl $0x3,%rax > > 0x00000000005425dd : add %rax,%r12 > > 0x00000000005425e0 : lea (%rsi,%rax,1),%rax > > => 0x00000000005425e4 : mov (%r12),%rcx > > 0x00000000005425e8 : sub $0x1,%edx > > 0x00000000005425eb : sub $0x8,%r12 > > 0x00000000005425ef : mov %rcx,(%rax) > > 0x00000000005425f2 : lea 0x1(%rdx),%ecx > > 0x00000000005425f5 : sub $0x8,%rax > > 0x00000000005425f9 : test %ecx,%ecx > > 0x00000000005425fb : jg 0x5425e4 > > 0x00000000005425fd : mov %r15,(%rsi) > > 0x0000000000542600 : mov %r14,0x0(%rbp) > > 0x0000000000542604 : mov $0x8,%esi > > 0x0000000000542609 : mov %r13,0x8(%rbp) > > 0x000000000054260d : mov %rbx,0xe0(%rbp) > > End of assembler dump. > > ==snip== > > > > I interpret this as follows: > > 1. c_p == %rbp == 0x1414400 > > 2. &argp[i] == %rax == 0x7e7d77fff3f8 > > from this I deduce that c_p->arg_reg != c_p->def_arg_reg, so it points > > to a dynamically allocated area separate from *c_p > > 3. i == c_p->arity - 1 == %rdx == 0x53ba78 > > this is clearly bonkers, and what's causing references into unmapped > > memory > > 4. ®[i] == %r12 == 0x7f2471b407c8 > > this is consistent with indexing a frame-local array at 0x53ba78 > > > > Basically, my conclusion is that c_p->arity has been clobbered, causing > > out-of-range accesses in this loop. > > > > We've had this exact crash twice now, in August and last Thursday (Dec 11). > > > > I realize the lack of a complete core dump makes this impossible to debug. > > What I'm hoping for is that someone might recollect some post-R15 change > > or fix that might have something to do with unexpected clobbers of process > > structs. > > > > /Mikael > > How do you know it's not a NIF doing strange things or whatnot? I can't know for sure, but I find it unlikely that one of the few NIFs we use (we use 3 I think) would clobber c_p->arity and nothing else. Given the other concurrency-related port bug in r15 I find something like that much more likely. > Did you manage to reproduce it afterwards? The Dec. incident is a reproducer, of sorts, since the exact same bug then had occurred twice. > Did you try with a debug build? Sorry, no, we only use release builds on our live systems. This doesn't happen often enough to motivate rebooting them with debug builds right now. We're just holding our breaths for now and hope to upgrade to r16 in Q1. /Mikael From dariusz.gadomski@REDACTED Fri Dec 19 17:58:01 2014 From: dariusz.gadomski@REDACTED (Dariusz Gadomski) Date: Fri, 19 Dec 2014 17:58:01 +0100 Subject: [erlang-bugs] Cannot bind epmd to an IPv4 address after building with -DEPMD6 In-Reply-To: <20141217151907.GA11995@brk> References: <20141114100122.GA27993@phenom> <20141216080430.GA5059@phenom> <20141217151907.GA11995@brk> Message-ID: <20141219165801.GA23835@phenom> On Wed, Dec 17, 2014 at 10:19:07AM -0500, Michael Santos wrote: > > There were problems compiling on Windows that could be fixed by removing > support for older versions: > > http://erlang.org/pipermail/erlang-patches/2013-February/003528.html > > I'll rebase the patch and make a new pull request. That's great news. Thanks for the update. I will keep monitoring your github to grab the rebased version. Thanks! Dariusz From kenji@REDACTED Tue Dec 23 01:46:48 2014 From: kenji@REDACTED (Kenji Rikitake) Date: Tue, 23 Dec 2014 09:46:48 +0900 Subject: [erlang-bugs] All possible internal states of Erlang/OTP random module are practically computable Message-ID: <20141223004648.GA34572@k2r.org> This is a preliminary result of a brute-force check of the AS183 algorithm looping period, using a C program running in the exactly same algorithm as in the Erlang/OTP random module. The code is shown at the following GitHub repository: https://github.com/jj1bdx/as183-c/ The preliminary result tested on Mac mini 2012 (2.6GHz Core i7) using single core (the code is purely sequential) for *less than nine hours* shows that Internal state loop detected count = 6953607871644, y1 = 3172, y2=9814, y3 = 20125 (The {3172, 9814, 20125} is the internal initial seed value of erlang:seed/0, and given as the initial value of the simulation code.) So the period length is: 6953607871644 ~= 2 ^ (42.661). This period is an expected value in the original AS183 algorithm paper. The fact I observed this time is: A C code can practically exploit all the possible sequence of AS183 in less than NINE HOURS on a Mac mini, far shorter than 880 years shown in the original paper. This suggests you can guess the seed value (three 15-bit integers) from a partial random number sequence, and this can be used for an algorithmic attack. (The calculation is practically easily parallelized, since the internal state values can be obtained for a certain large interval value (of 10^8 in my code, for example).) I am now testing this on an Intel NUC at home running FreeBSD as well. The code is portable and will run on other platforms as well. Conclusion: I have to say that Erlang/OTP "random" module should be revised ASAP. Kenji Rikitake From kenji@REDACTED Tue Dec 23 08:30:36 2014 From: kenji@REDACTED (Kenji Rikitake) Date: Tue, 23 Dec 2014 16:30:36 +0900 Subject: [erlang-bugs] [erlang-questions] All possible internal states of Erlang/OTP random module are practically computable In-Reply-To: <0AD10A0F-6460-4CF6-BF01-D723B8BD1B08@cs.otago.ac.nz> References: <20141223004648.GA34572@k2r.org> <0AD10A0F-6460-4CF6-BF01-D723B8BD1B08@cs.otago.ac.nz> Message-ID: <20141223073036.GA35777@k2r.org> I know there's NOTHING NEW on this from academic viewpoints. This is just a brute-force scanning of the problem space. The reason why I wrote this on the mailing list was to show the practicality of this rather primitive brute-force computation method. I've proposed and implemented five alternatives for this in Erlang (the algorithms are of other math geniuses): SFMT19937: https://github.com/jj1bdx/sfmt-erlang (Period: 2^19937-1) TinyMT: https://github.com/jj1bdx/tinymt-erlang (Period: 2^127-1) Xorshift*64: https://github.com/jj1bdx/exs64 (Period: 2^64-1) Xorshift+128: https://github.com/jj1bdx/exsplus (Period: 2^128-1) Xorshift*1024: https://github.com/jj1bdx/exs1024 (Period: 2^1024-1) I agree that at least one algorithm should be in Erlang VM as a BIF (Xorshift*64 will be a practical candidate because it's small and is fast on a 64-bit machine, and will provide a sufficient long period). More details on Xorshift*/Xorshift+: http://xorshift.di.unimi.it/ Kenji Rikitake ++> Richard A. O'Keefe [2014-12-23 17:14:39 +1300]: > On 23/12/2014, at 1:46 pm, Kenji Rikitake wrote: > > > This is a preliminary result of a brute-force check of the AS183 algorithm > > looping period, using a C program running in the exactly same algorithm as in > > the Erlang/OTP random module. > > The result is anything but surprising. > > > Conclusion: I have to say that Erlang/OTP "random" module should be > > revised ASAP. > > We have known this for some time. > > There is a 4-generator version of the Wichmann-Hill idea; there is some > IP restriction on it which I do not understand. The point is that the > inventors of AS183 themselves believe it is past its use-by date. > > AS183 was an excellent choice for a 4 mB 20 MHz machine that secretly > wanted to be a 16-bit machine. Those days are long past. > > George Marsaglia?s ?Random Number Generators? is a good survey of > the 2003 state of the art. > > Many of the good ones (not excluding the Mersenne Twister) require > large mutable tables, so are best done in the VM. > There is code for a Complementary-Multiply-With-Carry generator > in the right column of page 9, and a table that can be used to > shrink the table size to something lighter weight. From n.oxyde@REDACTED Tue Dec 23 10:35:04 2014 From: n.oxyde@REDACTED (Anthony Ramine) Date: Tue, 23 Dec 2014 10:35:04 +0100 Subject: [erlang-bugs] [erlang-questions] All possible internal states of Erlang/OTP random module are practically computable In-Reply-To: <20141223004648.GA34572@k2r.org> References: <20141223004648.GA34572@k2r.org> Message-ID: <3B66C477-7CA6-4D0E-8A6B-7061D3794AB6@gmail.com> The Erlang documentation says: > It should be noted that this random number generator is not cryptographically strong. If a strong cryptographic random number generator is needed for example crypto:rand_bytes/1 could be used instead. I am not sure I understand the alarming tone. Le 23 d?c. 2014 ? 01:46, Kenji Rikitake a ?crit : > Conclusion: I have to say that Erlang/OTP "random" module should be > revised ASAP. From corey@REDACTED Thu Dec 25 06:15:06 2014 From: corey@REDACTED (Corey Cossentino) Date: Thu, 25 Dec 2014 00:15:06 -0500 Subject: [erlang-bugs] Different handling of floating point underflows between Linux and Solaris-based OSes Message-ID: I sent this yesterday but it doesn't look like it went through, so apologies if anyone gets this twice. Calculating math:pow(2, -1075) returns 0 on Linux, but causes an exception on a Solaris-based system. This was causing some crashes in RabbitMQ when it tries to calculate math:exp with inputs less than -745.133. Using OTP 17.4 on OmniOS r151006. -- Erlang/OTP 17 [erts-6.3] [source] [smp:24:24] [async-threads:10] [hipe] [kernel-poll:false] Eshell V6.3 (abort with ^G) 1> math:pow(2, -1074.999). 5.0e-324 2> math:pow(2, -1074) * math:pow(2, -1). 0.0 3> math:pow(2, -1075). ** exception error: an error occurred when evaluating an arithmetic expression in function math:pow/2 called as math:pow(2,-1075) 4> math:exp(-745). 5.0e-324 5> math:exp(-746). ** exception error: an error occurred when evaluating an arithmetic expression in function math:exp/1 called as math:exp(-746) -- Running this in gdb, it looks like the matherr function in sys/unix/sys_float.c is being called on Solaris but not on Linux, possibly because the Linux version of libm requires the _SVID_SOURCE feature test macro in order to call the function ( http://man7.org/linux/man-pages/man3/matherr.3.html ). OmniOS (Solaris libm): -- 0xfeeba733 in ?? () from /lib/libm.so.2 (gdb) 0xfeeb63c0 in matherr@REDACTED () from /lib/libm.so.2 (gdb) matherr (exc=0xfdeffdac) at sys/unix/sys_float.c:839 839 { (gdb) -- Linux: -- 0x00007fb6a1d0e9a7 in pow () from /lib/x86_64-linux-gnu/libm.so.6 (gdb) 0x00007fb6a1ced550 in matherr@REDACTED () from /lib/x86_64-linux-gnu/libm.so.6 (gdb) 0x00007fb6a1d33f00 in ?? () from /lib/x86_64-linux-gnu/libm.so.6 (gdb) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikpelinux@REDACTED Fri Dec 26 19:07:41 2014 From: mikpelinux@REDACTED (Mikael Pettersson) Date: Fri, 26 Dec 2014 19:07:41 +0100 Subject: [erlang-bugs] Different handling of floating point underflows between Linux and Solaris-based OSes In-Reply-To: References: Message-ID: <21661.41965.287800.995905@gargle.gargle.HOWL> Corey Cossentino writes: > I sent this yesterday but it doesn't look like it went through, so > apologies if anyone gets this twice. > > > Calculating math:pow(2, -1075) returns 0 on Linux, but causes an > exception on a Solaris-based system. This was causing some crashes in > RabbitMQ when it tries to calculate math:exp with inputs less than > -745.133. > > Using OTP 17.4 on OmniOS r151006. > > -- > > Erlang/OTP 17 [erts-6.3] [source] [smp:24:24] [async-threads:10] > [hipe] [kernel-poll:false] > > Eshell V6.3 (abort with ^G) > 1> math:pow(2, -1074.999). > 5.0e-324 > 2> math:pow(2, -1074) * math:pow(2, -1). > 0.0 > 3> math:pow(2, -1075). > ** exception error: an error occurred when evaluating an arithmetic > expression > in function math:pow/2 > called as math:pow(2,-1075) > 4> math:exp(-745). > 5.0e-324 > 5> math:exp(-746). > ** exception error: an error occurred when evaluating an arithmetic > expression > in function math:exp/1 > called as math:exp(-746) I can reproduce this on Solaris 10 / SPARC. I have reviewed the situation with matherr() on Linux/glibc and Solaris 10, and I believe a reasonable resolution is to remove the #if !NO_FPE_SIGNALS block in matherr(), so it reduces to a single "return 1;". There are problems with checking math routine results for errors in general, and the matherr() interface in particular. 1. The VM relies on !isfinite() to detect if a math routine failed. This appears to work on most systems, but there is a potential problem in how various systems and libm implementations behave: while most return HUGE_VAL (== INFINITY) on overflows, some return HUGE which is a large but finite value. Solaris' cc -Xt does the latter, but gcc on Solaris does the former. On my glibc-based Linux systems, matherr(3) lists HUGE as the return value on overflows for some routines, but my tests indicate that HUGE_VAL is returned instead, which while good is inconsistent with parts of the documentation. It's entirely possible that other libm implementations also return HUGE rather than HUGE_VAL on overflows, which thoroughly breaks our !isfinite() test. On Linux there are at least 3 non-glibc libc/libm implementations, and who knows what's in all those *BSD variants. 2. matherr(), when properly enabled, is called also in situations the VM does not consider to be errors, in particular the underflow case you reported. When FP exceptions also are enabled, matherr() sets the FP exception flag, causing underflows to erroneously trigger errors. However, on systems where plain HUGE is returned for overflows, matherr() + FP exceptions may be the only viable way of detecting those errors. 3. As you discovered, matherr() isn't enabled by default on Linux. As long as we limit ourselves to systems that consistently return HUGE_VAL on overflows, as Linux/glibc and Solaris w/ gcc do, we don't need matherr() to detect errors, which is why having it just return 1 should be Ok. Can you run the emulator test suite on your Solaris system, first with vanilla 17.4 and then with the proposed code change, and check that the test suite results are the same? /Mikael From corey@REDACTED Wed Dec 31 17:41:28 2014 From: corey@REDACTED (Corey Cossentino) Date: Wed, 31 Dec 2014 11:41:28 -0500 Subject: [erlang-bugs] Different handling of floating point underflows between Linux and Solaris-based OSes In-Reply-To: <21661.41965.287800.995905@gargle.gargle.HOWL> References: <21661.41965.287800.995905@gargle.gargle.HOWL> Message-ID: OK, finished running the tests on an OmniOS virtual machine. I'm not completely sure how to interpret the results, but it looks like a lot of tests are failing, in both the patched and unpatched version. The differences I can see between the two runs, based on the index.html file that was generated: tests.common_test_test - went from 1 failure to 3 with the code change tests.tools_test - went from 1 failure to 2 with the code change Is there a file I should send over that would give more information? On Fri, Dec 26, 2014 at 1:07 PM, Mikael Pettersson wrote: > Corey Cossentino writes: > > I sent this yesterday but it doesn't look like it went through, so > > apologies if anyone gets this twice. > > > > > > Calculating math:pow(2, -1075) returns 0 on Linux, but causes an > > exception on a Solaris-based system. This was causing some crashes in > > RabbitMQ when it tries to calculate math:exp with inputs less than > > -745.133. > > > > Using OTP 17.4 on OmniOS r151006. > > > > -- > > > > Erlang/OTP 17 [erts-6.3] [source] [smp:24:24] [async-threads:10] > > [hipe] [kernel-poll:false] > > > > Eshell V6.3 (abort with ^G) > > 1> math:pow(2, -1074.999). > > 5.0e-324 > > 2> math:pow(2, -1074) * math:pow(2, -1). > > 0.0 > > 3> math:pow(2, -1075). > > ** exception error: an error occurred when evaluating an arithmetic > > expression > > in function math:pow/2 > > called as math:pow(2,-1075) > > 4> math:exp(-745). > > 5.0e-324 > > 5> math:exp(-746). > > ** exception error: an error occurred when evaluating an arithmetic > > expression > > in function math:exp/1 > > called as math:exp(-746) > > I can reproduce this on Solaris 10 / SPARC. > > I have reviewed the situation with matherr() on Linux/glibc and Solaris 10, > and I believe a reasonable resolution is to remove the #if !NO_FPE_SIGNALS > block in matherr(), so it reduces to a single "return 1;". > > There are problems with checking math routine results for errors in general, > and the matherr() interface in particular. > > 1. The VM relies on !isfinite() to detect if a math routine failed. > This appears to work on most systems, but there is a potential problem > in how various systems and libm implementations behave: while most > return HUGE_VAL (== INFINITY) on overflows, some return HUGE which is > a large but finite value. Solaris' cc -Xt does the latter, but gcc on > Solaris does the former. On my glibc-based Linux systems, matherr(3) > lists HUGE as the return value on overflows for some routines, but my > tests indicate that HUGE_VAL is returned instead, which while good is > inconsistent with parts of the documentation. > > It's entirely possible that other libm implementations also return HUGE > rather than HUGE_VAL on overflows, which thoroughly breaks our !isfinite() > test. On Linux there are at least 3 non-glibc libc/libm implementations, > and who knows what's in all those *BSD variants. > > 2. matherr(), when properly enabled, is called also in situations the VM does > not consider to be errors, in particular the underflow case you reported. > When FP exceptions also are enabled, matherr() sets the FP exception flag, > causing underflows to erroneously trigger errors. > > However, on systems where plain HUGE is returned for overflows, matherr() > + FP exceptions may be the only viable way of detecting those errors. > > 3. As you discovered, matherr() isn't enabled by default on Linux. > > As long as we limit ourselves to systems that consistently return HUGE_VAL > on overflows, as Linux/glibc and Solaris w/ gcc do, we don't need matherr() > to detect errors, which is why having it just return 1 should be Ok. > > Can you run the emulator test suite on your Solaris system, first with > vanilla 17.4 and then with the proposed code change, and check that the > test suite results are the same? > > /Mikael