From robert.virding@REDACTED Sun Dec 2 03:14:17 2012 From: robert.virding@REDACTED (Robert Virding) Date: Sun, 2 Dec 2012 02:14:17 +0000 (GMT) Subject: [erlang-bugs] Fwd: exit(self(), normal) causes calling process to exit In-Reply-To: <50B7A4FA.5010401@erlang.org> Message-ID: <566355617.251842.1354414457188.JavaMail.root@erlang-solutions.com> Just to say that I concur that this is definitely a bug, exit/2 always sends a signal, even if the process is sending to itself, and that signal is to be treated in the same way irrespective of who sent it. Unfortunately the bug has been there a long time. In retrospect it would might have been better to call it signal/2 instead. Robert ----- Original Message ----- > From: "Patrik Nyblom" > To: erlang-bugs@REDACTED > Sent: Thursday, 29 November, 2012 7:10:02 PM > Subject: Re: [erlang-bugs] Fwd: exit(self(), normal) causes calling process to exit > > On 11/28/2012 08:50 PM, Daniel Luna wrote: > > I withdraw my comment. It's still true that it works when trapping > > exits, but apparently you shouldn't have to. > > > > From the docs: > > > > "If Reason is the atom normal, Pid will not exit." > > > > I call bug on this. > I agree. It's in the pipe. > > > > Cheers, > > > > Daniel > > Cheers, > /Patrik > > > > On 28 November 2012 13:07, Daniel Luna wrote: > >> I replied on StackOverflow, but the gist of the problem is that > >> you > >> don't trap exits. > >> > >> 1> self(). > >> <0.32.0> > >> 2> process_flag(trap_exit, true). > >> false > >> 3> exit(self(), normal). > >> true > >> 4> self(). > >> <0.32.0> > >> 5> flush(). > >> Shell got {'EXIT',<0.32.0>,normal} > >> ok > >> > >> Cheers, > >> > >> Daniel > >> > >> On 28 November 2012 11:50, Stavros Aronis > >> wrote: > >>> After some speculation on stackoverflow I think I will report > >>> this here as > >>> well. (I am directly copying the content of the question.) > >>> > >>> I am playing around with the exit/2 function and its behavior > >>> when self() is > >>> used as a Pid and normal as a Reason. > >>> > >>> Erlang R15B03 (erts-5.9.3) [source] [64-bit] [smp:8:8] > >>> [async-threads:0] > >>> [hipe] [kernel-poll:false] > >>> > >>> Eshell V5.9.3 (abort with ^G) > >>> 1> self(). > >>> <0.32.0> > >>> 2> exit(self(), normal). > >>> ** exception exit: normal > >>> 3> self(). > >>> <0.35.0> > >>> > >>> Shouldn't it be the case that only a 'normal' exit message is > >>> sent to the > >>> shell process, so there is no reason to exit? > >>> > >>> Similarly: > >>> > >>> 4> spawn(fun() -> receive Pid -> Pid ! ok end end). > >>> <0.38.0> > >>> 5> exit(v(4), normal). > >>> true > >>> 6> v(4) ! self(). > >>> <0.35.0> > >>> 7> flush(). > >>> Shell got ok > >>> ok > >>> > >>> But: > >>> > >>> 8> spawn(fun() -> exit(self(), normal), receive _ -> ok end end). > >>> <0.43.0> > >>> 9> is_process_alive(v(8)). > >>> false > >>> > >>> > >>> _______________________________________________ > >>> erlang-bugs mailing list > >>> erlang-bugs@REDACTED > >>> http://erlang.org/mailman/listinfo/erlang-bugs > >>> > > _______________________________________________ > > erlang-bugs mailing list > > erlang-bugs@REDACTED > > http://erlang.org/mailman/listinfo/erlang-bugs > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > From bengt.kleberg@REDACTED Sun Dec 2 09:13:57 2012 From: bengt.kleberg@REDACTED (Bengt Kleberg) Date: Sun, 2 Dec 2012 08:13:57 +0000 Subject: [erlang-bugs] Fwd: exit(self(), normal) causes calling process to exit In-Reply-To: <566355617.251842.1354414457188.JavaMail.root@erlang-solutions.com> References: <50B7A4FA.5010401@erlang.org>, <566355617.251842.1354414457188.JavaMail.root@erlang-solutions.com> Message-ID: <9t0q4b0cogdaeehs85orhjh1.1354436035816@email.android.com> Greetings, Is not '!' used for sending signals (and other stuff)? Bengt Sent from Moxier Mail (http://www.moxier.com) ----- Ursprungligt meddelande ----- Fr?n: Robert Virding Till: Patrik Nyblom Kopia: "erlang-bugs@REDACTED" Skickat: 02-12-2012 3:14 fm ?mne: Re: [erlang-bugs] Fwd: exit(self(), normal) causes calling process to exit Just to say that I concur that this is definitely a bug, exit/2 always sends a signal, even if the process is sending to itself, and that signal is to be treated in the same way irrespective of who sent it. Unfortunately the bug has been there a long time. In retrospect it would might have been better to call it signal/2 instead. Robert ----- Original Message ----- > From: "Patrik Nyblom" > To: erlang-bugs@REDACTED > Sent: Thursday, 29 November, 2012 7:10:02 PM > Subject: Re: [erlang-bugs] Fwd: exit(self(), normal) causes calling process to exit > > On 11/28/2012 08:50 PM, Daniel Luna wrote: > > I withdraw my comment. It's still true that it works when trapping > > exits, but apparently you shouldn't have to. > > > > From the docs: > > > > "If Reason is the atom normal, Pid will not exit." > > > > I call bug on this. > I agree. It's in the pipe. > > > > Cheers, > > > > Daniel > > Cheers, > /Patrik > > > > On 28 November 2012 13:07, Daniel Luna wrote: > >> I replied on StackOverflow, but the gist of the problem is that > >> you > >> don't trap exits. > >> > >> 1> self(). > >> <0.32.0> > >> 2> process_flag(trap_exit, true). > >> false > >> 3> exit(self(), normal). > >> true > >> 4> self(). > >> <0.32.0> > >> 5> flush(). > >> Shell got {'EXIT',<0.32.0>,normal} > >> ok > >> > >> Cheers, > >> > >> Daniel > >> > >> On 28 November 2012 11:50, Stavros Aronis > >> wrote: > >>> After some speculation on stackoverflow I think I will report > >>> this here as > >>> well. (I am directly copying the content of the question.) > >>> > >>> I am playing around with the exit/2 function and its behavior > >>> when self() is > >>> used as a Pid and normal as a Reason. > >>> > >>> Erlang R15B03 (erts-5.9.3) [source] [64-bit] [smp:8:8] > >>> [async-threads:0] > >>> [hipe] [kernel-poll:false] > >>> > >>> Eshell V5.9.3 (abort with ^G) > >>> 1> self(). > >>> <0.32.0> > >>> 2> exit(self(), normal). > >>> ** exception exit: normal > >>> 3> self(). > >>> <0.35.0> > >>> > >>> Shouldn't it be the case that only a 'normal' exit message is > >>> sent to the > >>> shell process, so there is no reason to exit? > >>> > >>> Similarly: > >>> > >>> 4> spawn(fun() -> receive Pid -> Pid ! ok end end). > >>> <0.38.0> > >>> 5> exit(v(4), normal). > >>> true > >>> 6> v(4) ! self(). > >>> <0.35.0> > >>> 7> flush(). > >>> Shell got ok > >>> ok > >>> > >>> But: > >>> > >>> 8> spawn(fun() -> exit(self(), normal), receive _ -> ok end end). > >>> <0.43.0> > >>> 9> is_process_alive(v(8)). > >>> false > >>> > >>> > >>> _______________________________________________ > >>> erlang-bugs mailing list > >>> erlang-bugs@REDACTED > >>> http://erlang.org/mailman/listinfo/erlang-bugs > >>> > > _______________________________________________ > > erlang-bugs mailing list > > erlang-bugs@REDACTED > > http://erlang.org/mailman/listinfo/erlang-bugs > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > _______________________________________________ erlang-bugs mailing list erlang-bugs@REDACTED http://erlang.org/mailman/listinfo/erlang-bugs From robert.virding@REDACTED Sun Dec 2 11:02:16 2012 From: robert.virding@REDACTED (Robert Virding) Date: Sun, 2 Dec 2012 10:02:16 +0000 (GMT) Subject: [erlang-bugs] Fwd: exit(self(), normal) causes calling process to exit In-Reply-To: <9t0q4b0cogdaeehs85orhjh1.1354436035816@email.android.com> Message-ID: <1549412434.253446.1354442536290.JavaMail.root@erlang-solutions.com> No. '!' is *only* used for sending messages. Exit signals, generated when a process dies or by calling exit/2, are not messages and can not be sent using '!'. When trapping exits and an exit signal arrives at the process it is *CONVERTED* to a message of the form {'EXIT',Pid,Reason} and put in the process's message queue. You will not get the effect of a signal by sending a message with the same format to a process. For example a message will not crash a process. In the basic language there are no special messages which do things to processes beyond that which is explicitly done in code. Even the messages sent by OTP are special only because OTP chooses to interpret them in a special way. So messages and signals are two different things, sent in two different ways and have different meanings. Robert ----- Original Message ----- > From: "Bengt Kleberg" > To: "Robert Virding" > Cc: erlang-bugs@REDACTED, "Patrik Nyblom" > Sent: Sunday, 2 December, 2012 9:13:57 AM > Subject: SV: [erlang-bugs] Fwd: exit(self(), normal) causes calling process to exit > > Greetings, > > Is not '!' used for sending signals (and other stuff)? > > Bengt > > Sent from Moxier Mail > (http://www.moxier.com) > > > ----- Ursprungligt meddelande ----- > Fr?n: Robert Virding > Till: Patrik Nyblom > Kopia: "erlang-bugs@REDACTED" > Skickat: 02-12-2012 3:14 fm > ?mne: Re: [erlang-bugs] Fwd: exit(self(), normal) causes calling > process to exit > > > > Just to say that I concur that this is definitely a bug, exit/2 > always sends a signal, even if the process is sending to itself, and > that signal is to be treated in the same way irrespective of who > sent it. Unfortunately the bug has been there a long time. > > In retrospect it would might have been better to call it signal/2 > instead. > > Robert > > ----- Original Message ----- > > From: "Patrik Nyblom" > > To: erlang-bugs@REDACTED > > Sent: Thursday, 29 November, 2012 7:10:02 PM > > Subject: Re: [erlang-bugs] Fwd: exit(self(), normal) causes calling > > process to exit > > > > On 11/28/2012 08:50 PM, Daniel Luna wrote: > > > I withdraw my comment. It's still true that it works when > > > trapping > > > exits, but apparently you shouldn't have to. > > > > > > From the docs: > > > > > > "If Reason is the atom normal, Pid will not exit." > > > > > > I call bug on this. > > I agree. It's in the pipe. > > > > > > Cheers, > > > > > > Daniel > > > > Cheers, > > /Patrik > > > > > > On 28 November 2012 13:07, Daniel Luna wrote: > > >> I replied on StackOverflow, but the gist of the problem is that > > >> you > > >> don't trap exits. > > >> > > >> 1> self(). > > >> <0.32.0> > > >> 2> process_flag(trap_exit, true). > > >> false > > >> 3> exit(self(), normal). > > >> true > > >> 4> self(). > > >> <0.32.0> > > >> 5> flush(). > > >> Shell got {'EXIT',<0.32.0>,normal} > > >> ok > > >> > > >> Cheers, > > >> > > >> Daniel > > >> > > >> On 28 November 2012 11:50, Stavros Aronis > > >> wrote: > > >>> After some speculation on stackoverflow I think I will report > > >>> this here as > > >>> well. (I am directly copying the content of the question.) > > >>> > > >>> I am playing around with the exit/2 function and its behavior > > >>> when self() is > > >>> used as a Pid and normal as a Reason. > > >>> > > >>> Erlang R15B03 (erts-5.9.3) [source] [64-bit] [smp:8:8] > > >>> [async-threads:0] > > >>> [hipe] [kernel-poll:false] > > >>> > > >>> Eshell V5.9.3 (abort with ^G) > > >>> 1> self(). > > >>> <0.32.0> > > >>> 2> exit(self(), normal). > > >>> ** exception exit: normal > > >>> 3> self(). > > >>> <0.35.0> > > >>> > > >>> Shouldn't it be the case that only a 'normal' exit message is > > >>> sent to the > > >>> shell process, so there is no reason to exit? > > >>> > > >>> Similarly: > > >>> > > >>> 4> spawn(fun() -> receive Pid -> Pid ! ok end end). > > >>> <0.38.0> > > >>> 5> exit(v(4), normal). > > >>> true > > >>> 6> v(4) ! self(). > > >>> <0.35.0> > > >>> 7> flush(). > > >>> Shell got ok > > >>> ok > > >>> > > >>> But: > > >>> > > >>> 8> spawn(fun() -> exit(self(), normal), receive _ -> ok end > > >>> end). > > >>> <0.43.0> > > >>> 9> is_process_alive(v(8)). > > >>> false > > >>> > > >>> > > >>> _______________________________________________ > > >>> erlang-bugs mailing list > > >>> erlang-bugs@REDACTED > > >>> http://erlang.org/mailman/listinfo/erlang-bugs > > >>> > > > _______________________________________________ > > > erlang-bugs mailing list > > > erlang-bugs@REDACTED > > > http://erlang.org/mailman/listinfo/erlang-bugs > > > > _______________________________________________ > > erlang-bugs mailing list > > erlang-bugs@REDACTED > > http://erlang.org/mailman/listinfo/erlang-bugs > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > From dmitriy.kargapolov@REDACTED Mon Dec 3 18:24:18 2012 From: dmitriy.kargapolov@REDACTED (Dmitriy Kargapolov) Date: Mon, 03 Dec 2012 12:24:18 -0500 Subject: [erlang-bugs] inets httpd server minor bug Message-ID: <50BCE042.6000602@gmail.com> There is missing message/3 header in the httpd_util.erl, which should handle 417 error code. ... message(416,ReasonPhrase,_) -> html_encode(ReasonPhrase); message(500,_,ConfigDB) -> ... This code generated by handle_expect/2 in httpd_request_handler.erl (lines 540-541): ... {break, Value} -> httpd_response:send_status(ModData, 417, "Unexpected expect value"), ... This results in server crash when invalid Expect: header received. From daniel.goertzen@REDACTED Wed Dec 5 05:25:42 2012 From: daniel.goertzen@REDACTED (Daniel Goertzen) Date: Tue, 4 Dec 2012 22:25:42 -0600 Subject: [erlang-bugs] concurrent ssl:transport_accept() trashes options Message-ID: I am working with an ssl acceptor pool (10 acceptors) and am using the {packet,4} option. The first incoming connection works fine, however the second connection ends up using {packet,0}. Short version: ssl:transport_accept() is just plain thread-unsafe with respect to ListenSocket. Long version: In ssl:transport_accept(), options on the underlying listen socket are backed up and then replaced with internal_inet_values() as shown below. When the underlying transport accept() returns, the backup up options are put back into the listen socket and also go on to form options in an ssl process. The problem is that a second transport_accept() call will backup *internal_inet_values* , and those incorrect values will be passed onto the ssl process. Thats what it looks like anyway; I haven't traced everything out in detail. My work around for now is to set my acceptor pool to size 1 (can you still call that a pool? ;) Dan. >From ssl.erl: transport_accept(#sslsocket{pid = {ListenSocket, #config{cb=CbInfo, ssl=SslOpts}}}, Timeout) -> %% The setopt could have been invoked on the listen socket %% and options should be inherited. EmOptions = emulated_options(), {ok, InetValues} = inet:getopts(ListenSocket, EmOptions), ok = inet:setopts(ListenSocket, internal_inet_values()), {CbModule,_,_, _} = CbInfo, case CbModule:accept(ListenSocket, Timeout) of {ok, Socket} -> ok = inet:setopts(ListenSocket, InetValues), {ok, Port} = inet:port(Socket), ConnArgs = [server, "localhost", Port, Socket, {SslOpts, socket_options(InetValues)}, self(), CbInfo], case ssl_connection_sup:start_child(ConnArgs) of {ok, Pid} -> ssl_connection:socket_control(Socket, Pid, CbModule); {error, Reason} -> {error, Reason} end; {error, Reason} -> {error, Reason} end. emulated_options() -> [mode, packet, active, header, packet_size]. internal_inet_values() -> [{packet_size,0},{packet, 0},{header, 0},{active, false},{mode,binary}]. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dch@REDACTED Wed Dec 5 09:33:44 2012 From: dch@REDACTED (Dave Cottlehuber) Date: Wed, 5 Dec 2012 09:33:44 +0100 Subject: [erlang-bugs] Unable to build OTP with static crypto for Win32 Message-ID: As far as I can tell, I'm configuring correctly with `--disable-dynamic-ssl-lib` but the resulting installer doesn't function without the two OpenSSL DLLs present. In fact, I've never had this working, ever. Is it actually possible? Full configure & build logs available https://www.dropbox.com/sh/jeifcxpbtpo78ak/BSc0ZomTbs/tmp/R15B03_win32_static_ssl_failure.tar.gz I've previously tried under cygwin and this last run under mingw, same non-result in both cases. A+ Dave From pan@REDACTED Wed Dec 5 10:37:36 2012 From: pan@REDACTED (Patrik Nyblom) Date: Wed, 5 Dec 2012 10:37:36 +0100 Subject: [erlang-bugs] Unable to build OTP with static crypto for Win32 In-Reply-To: References: Message-ID: <50BF15E0.8010801@erlang.org> Hi! On 12/05/2012 09:33 AM, Dave Cottlehuber wrote: > As far as I can tell, I'm configuring correctly with > `--disable-dynamic-ssl-lib` but the resulting installer doesn't > function without the two OpenSSL DLLs present. In fact, I've never had > this working, ever. Is it actually possible? I'm not sure I follow you here, the installer (if you mean the self extracting exe built with nsis) never uses OpenSSL. If it does, you should report that to the NSIS developers. As for Erlang/OTP, it can definitely be done as that's how the prebuilt binaries you can get from our website are built. In $ERL_TOP/HOWTO/INSTALL-WIN32.md, you have a huge instruction on how to build a static OpenSSL lib using the MSVC command prompt. You then use that static lib in your build of Erlang/OTP. You will never need to install any OpenSSL dynamic libs. Verify that you got the lib files and that the correct OpenSSL installation reside in the directory where configure think it should be (in your case /c/OpenSSL) . You can point out the correct installation directory during configure (otp_build configure --with-ssl-dir=...) Then *clean the crypto application* (cd lib/crypto && make clean), the build logs you provide does not rebuild so it's hard to see if the build does the right thing. Then do do otp_build configure, otp_build boot -a , otp_build release -a and otp_build installer_win32. Also, if you look at lib/crypto/c_src/win32/Makefile after configure is run , you should see lines looking like: SSL_LIBDIR = /c/OpenSSL/lib/VC/static SSL_CRYPTO_LIBNAME = libeay32MD etc. Verify that there are files named i.e. /c/OpenSSL/lib/VC/static/libeay32MD.lib on your system. Cheers, /Patrik > Full configure & build logs available > https://www.dropbox.com/sh/jeifcxpbtpo78ak/BSc0ZomTbs/tmp/R15B03_win32_static_ssl_failure.tar.gz > > I've previously tried under cygwin and this last run under mingw, same > non-result in both cases. > > A+ > Dave > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs From peter@REDACTED Wed Dec 5 14:20:19 2012 From: peter@REDACTED (Peter Membrey) Date: Wed, 5 Dec 2012 21:20:19 +0800 Subject: [erlang-bugs] VM locks up on write to socket (and now it seems to file too) In-Reply-To: <50B7AD7C.3060809@erlang.org> References: <50AEC81B.2000908@ninenines.eu> <50AFA09D.4060100@erlang.org> <50B633C7.7000709@erlang.org> <50B64897.2050300@erlang.org> <50B7AD7C.3060809@erlang.org> Message-ID: Hi Patrik, Really sorry for the delay in getting back to you! I tried the same test on RHEL 6.3 using the patched version and everything seems fine. No stuck threads and the VM is still happy and responsive. I'm currently working on a load testing app to try and trigger the issue on demand in the application itself, but I suspect your patch has done the trick! Thanks for fixing this so fast and sorry again for the delay in getting back in touch! Cheers, Pete On 30 November 2012 02:46, Patrik Nyblom wrote: > Hi! > > On 11/29/2012 04:41 AM, Peter Membrey wrote: >> >> Hi Patrik, >> >> I can also confirm that this bug exists on Red Hat Enterprise Linux >> 6.3. I'll raise a support ticket with them as well. >> >> A workaround in the vm would be nice if you have time? :-) > > Could you try the attached diff and see if it works for your environment? It > would seem nothing is written when 0 is returned, so it should be safe to > try again... > > Cheers, > /Patrik > > >> Cheers, >> >> Pete >> >> >> On 29 November 2012 01:23, Patrik Nyblom wrote: >>> >>> Hi again! >>> >>> No problem reproducing when I've got CentOS 6.3... The following commands >>> in >>> the Erlang shell: >>> {ok,L} = gen_tcp:listen(4747,[{active,false}]). >>> {ok,S} = gen_tcp:connect("localhost",4747,[{active,false}]). >>> {ok,A} = gen_tcp:accept(L). >>> gen_tcp:send(A,binary:copy(<<$a:8>>,2158022464)). >>> >>> gives the following strace: >>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>> 2158022464}], 1) = 0 >>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>> 2158022464}], 1) = 0 >>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>> 2158022464}], 1) = 0 >>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>> 2158022464}], 1) = 0 >>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>> 2158022464}], 1) = 0 >>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>> 2158022464}], 1) = 0 >>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>> 2158022464}], 1) = 0 >>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>> 2158022464}], 1) = 0 >>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>> 2158022464}], 1) = 0 >>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>> 2158022464}], 1) = 0 >>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>> 2158022464}], 1) = 0 >>> [.....] >>> >>> While on ubuntu for example it works like it should...Looks like a kernel >>> bug to me... I wonder if this should be worked around or just reported... >>> I >>> suppose both... Sigh... >>> >>> /Patrik >>> >>> >>> On 11/28/2012 05:23 PM, Peter Membrey wrote: >>>> >>>> Hi, >>>> >>>> No problem, I'll do what I can to help - thanks for looking into this >>>> so quickly! >>>> >>>> Any idea what might be causing it? >>>> >>>> Cheers, >>>> >>>> Pete >>>> >>>> On 28 November 2012 23:54, Patrik Nyblom wrote: >>>>> >>>>> Hi! >>>>> >>>>> I'll upgrade the CentOS VM I have to 6.3 (only had 6.1 :() and see if I >>>>> can >>>>> reproduce. If that fails, could you run a VM with a patch to try to >>>>> handle >>>>> the unexpected case and see if that fixes it? >>>>> >>>>> Cheers, >>>>> /Patrik >>>>> >>>>> On 11/24/2012 02:57 PM, Peter Membrey wrote: >>>>>> >>>>>> Hi guys, >>>>>> >>>>>> Thanks for getting back in touch so quickly! >>>>>> >>>>>> I did do an lsof on the process and I can confirm that it was >>>>>> definitely a socket. However by that time the application it had been >>>>>> trying to send to had been killed. When I checked the sockets were >>>>>> showing as waiting to close. Unfortunately I didn't think to do an >>>>>> lsof until after the apps had been shut down. I was hoping the VM >>>>>> would recover if I killed the app that had upset it. However even >>>>>> after all the apps connected had been shut down, the issue didn't >>>>>> resolve. >>>>>> >>>>>> The application receives requests from a client, which contains two >>>>>> data items. The stream ID and a timestamp. Both are encoded as big >>>>>> integer unsigned numbers. The server then looks through the file >>>>>> referenced by the stream ID and uses the timestamp as an index. The >>>>>> file format is currently really simple, in the form of: >>>>>> >>>>>> >>>>>> >>>>>> > >>>>>> >>>>>> There is an index file that provides an offset into the file based on >>>>>> time stamp, but basically it opens the file, and reads sequentially >>>>>> through it until it finds the timestamps that it cares about. In this >>>>>> case it reads all data with a greater timestamp until the end of the >>>>>> file is reached. It's possible the client is sending an incorrect >>>>>> timestamp, and maybe too much data is being read. However the loop is >>>>>> very primitive - it reads all the data in one go before passing it >>>>>> back to the protocol handler to send down the socket; so by that time >>>>>> even though the response is technically incorrect and the app has >>>>>> failed, it should still not cause the VM any issues. >>>>>> >>>>>> The data is polled every 10 seconds by the client app so I would not >>>>>> expect there to be 2GB of new data to send. I'm afraid my C skills are >>>>>> somewhat limited, so I'm not sure how to put together a sample app to >>>>>> try out writev. The platform is 64bit CentOS 6.3 (equivalent to RHEL >>>>>> 6.3) so I'm not expecting any strange or weird behaviour from the OS >>>>>> level but of course I could be completely wrong there. The OS is >>>>>> running directly on hardware, so there's no VM layer to worry about. >>>>>> >>>>>> Hope this might offer some additional clues? >>>>>> >>>>>> Thanks again! >>>>>> >>>>>> Kind Regards, >>>>>> >>>>>> Peter Membrey >>>>>> >>>>>> >>>>>> >>>>>> On 24 November 2012 00:13, Patrik Nyblom wrote: >>>>>>> >>>>>>> Hi again! >>>>>>> >>>>>>> Could you go back to the version without the printouts and get back >>>>>>> to >>>>>>> the >>>>>>> situation where writev loops returning 0 (as in the strace)? If so, >>>>>>> it >>>>>>> would >>>>>>> be really interesting to see an 'lsof' of the beam process, to see if >>>>>>> this >>>>>>> file descriptor really is open and is a socket... >>>>>>> >>>>>>> The thing is that writev with a vector that is not empty, would never >>>>>>> return >>>>>>> 0 for a non blocking socket. Not on any modern (i.e. not ancient) >>>>>>> POSIX >>>>>>> compliant system anyway. Of course it is a *really* large item you >>>>>>> are >>>>>>> trying to write there, but it should be no problem for a 64bit linux. >>>>>>> >>>>>>> Also I think there is no use finding the Erlang code, I'll take that >>>>>>> back, >>>>>>> It would be more interesting to see what really happens at the OS/VM >>>>>>> level >>>>>>> in this case. >>>>>>> >>>>>>> Cheers, >>>>>>> Patrik >>>>>>> >>>>>>> >>>>>>> On 11/23/2012 01:49 AM, Lo?c Hoguin wrote: >>>>>>>> >>>>>>>> Sending this on behalf of someone who didn't manage to get the email >>>>>>>> sent >>>>>>>> to this list after 2 attempts. If someone can check if he's hold up >>>>>>>> or >>>>>>>> something that'd be great. >>>>>>>> >>>>>>>> Anyway he has a big issue so I hope I can relay the conversation >>>>>>>> reliably. >>>>>>>> >>>>>>>> Thanks! >>>>>>>> >>>>>>>> On 11/23/2012 01:45 AM, Peter Membrey wrote: >>>>>>>>> >>>>>>>>> From: Peter Membrey >>>>>>>>> Date: 22 November 2012 19:02 >>>>>>>>> Subject: VM locks up on write to socket (and now it seems to file >>>>>>>>> too) >>>>>>>>> To: erlang-bugs@REDACTED >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi guys, >>>>>>>>> >>>>>>>>> I wrote a simple database application called CakeDB >>>>>>>>> (https://github.com/pmembrey/cakedb) that basically spends its time >>>>>>>>> reading and writing files and sockets. There's very little in the >>>>>>>>> way >>>>>>>>> of complex logic. It is running on CentOS 6.3 with all the updates >>>>>>>>> applied. I hit this problem on R15B02 so I rolled back to R15B01 >>>>>>>>> but >>>>>>>>> the issue remained. Erlang was built from source. >>>>>>>>> >>>>>>>>> The machine has two Intel X5690 CPUs giving 12 cores plus HT. I've >>>>>>>>> tried various arguments for the VM but so far nothing has prevented >>>>>>>>> the problem. At the moment I'm using: >>>>>>>>> >>>>>>>>> +K >>>>>>>>> +A 6 >>>>>>>>> +sbt tnnps >>>>>>>>> >>>>>>>>> The issue I'm seeing is that one of the scheduler threads will hit >>>>>>>>> 100% cpu usage and the entire VM will become unresponsive. When >>>>>>>>> this >>>>>>>>> happens, I am not able to connect via the console with attach and >>>>>>>>> entop is also unable to connect. I can still establish TCP >>>>>>>>> connections >>>>>>>>> to the application, but I never receive a response. A standard kill >>>>>>>>> signal will cause the VM to shut down (it doesn't need -9). >>>>>>>>> >>>>>>>>> Due to the pedigree of the VM I am quite willing to accept that >>>>>>>>> I've >>>>>>>>> made a fundamental mistake in my code. I am pretty sure that the >>>>>>>>> way >>>>>>>>> I >>>>>>>>> am doing the file IO could result in some race conditions. However, >>>>>>>>> my >>>>>>>>> poor code aside, from what I understand, I still shouldn't be able >>>>>>>>> to >>>>>>>>> crash / deadlock the VM like this. >>>>>>>>> >>>>>>>>> The issue doesn't seem to be caused by load. The app can fail when >>>>>>>>> it's very busy, but also when it is practically idle. I haven't >>>>>>>>> been >>>>>>>>> able to find a trigger or any other explanation for the failure. >>>>>>>>> >>>>>>>>> The thread maxing out the CPU is attempting to write data to the >>>>>>>>> socket: >>>>>>>>> >>>>>>>>> (gdb) bt >>>>>>>>> #0 0x00007f9882ab6377 in writev () from /lib64/libc.so.6 >>>>>>>>> #1 0x000000000058a81f in tcp_inet_output (data=0x2407570, >>>>>>>>> event=) at drivers/common/inet_drv.c:9681 >>>>>>>>> #2 tcp_inet_drv_output (data=0x2407570, event=>>>>>>>> out>) >>>>>>>>> at drivers/common/inet_drv.c:9601 >>>>>>>>> #3 0x00000000004b773f in erts_port_task_execute >>>>>>>>> (runq=0x7f98826019c0, >>>>>>>>> curr_port_pp=0x7f9881639338) at beam/erl_port_task.c:858 >>>>>>>>> #4 0x00000000004afd83 in schedule (p=, >>>>>>>>> calls=) at beam/erl_process.c:6533 >>>>>>>>> #5 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >>>>>>>>> #6 0x00000000004b1279 in sched_thread_func (vesdp=0x7f9881639280) >>>>>>>>> at >>>>>>>>> beam/erl_process.c:4834 >>>>>>>>> #7 0x00000000005ba726 in thr_wrapper (vtwd=0x7fff6cfe2300) at >>>>>>>>> pthread/ethread.c:106 >>>>>>>>> #8 0x00007f9882f78851 in start_thread () from >>>>>>>>> /lib64/libpthread.so.0 >>>>>>>>> #9 0x00007f9882abe11d in clone () from /lib64/libc.so.6 >>>>>>>>> (gdb) >>>>>>>>> >>>>>>>>> I then tried running strace on that thread and got (indefinitely): >>>>>>>>> >>>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>>> ... >>>>>>>>> >>>>>>>>> From what I can tell, it's trying to write data to a socket, >>>>>>>>> which >>>>>>>>> is >>>>>>>>> succeeding, but writing 0 bytes. From the earlier definitions in >>>>>>>>> the >>>>>>>>> source file, an error condition would be signified by a negative >>>>>>>>> number. Any other result is the number of bytes written, in this >>>>>>>>> case >>>>>>>>> 0. I'm not sure if this is desired behaviour or not. I've tried >>>>>>>>> killing the application on the other end of the socket, but it has >>>>>>>>> no >>>>>>>>> effect on the VM. >>>>>>>>> >>>>>>>>> I have enabled debugging for the inet code, so hopefully this will >>>>>>>>> give a little more insight. I am currently trying to reproduce the >>>>>>>>> condition, but as I really have no idea what causes it, it's pretty >>>>>>>>> much a case of wait and see. >>>>>>>>> >>>>>>>>> >>>>>>>>> **** UPDATE **** >>>>>>>>> >>>>>>>>> I managed to lock up the VM again, but this time it was caused by >>>>>>>>> file >>>>>>>>> IO, >>>>>>>>> probably from the debugging statements. Although it worked fine for >>>>>>>>> some >>>>>>>>> time >>>>>>>>> the last entry in the file was cut off. >>>>>>>>> >>>>>>>>> From GDB: >>>>>>>>> >>>>>>>>> (gdb) info threads >>>>>>>>> 53 Thread 0x7f83e988b700 (LWP 8621) 0x00007f83ea6da54d in >>>>>>>>> read >>>>>>>>> () >>>>>>>>> from /lib64/libpthread.so.0 >>>>>>>>> 52 Thread 0x7f83e8c8f700 (LWP 8622) 0x00007f83ea6d743c in >>>>>>>>> pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >>>>>>>>> 51 Thread 0x7f83e818d700 (LWP 8623) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 50 Thread 0x7f83e816b700 (LWP 8624) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 49 Thread 0x7f83e8149700 (LWP 8625) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 48 Thread 0x7f83e8127700 (LWP 8626) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 47 Thread 0x7f83e8105700 (LWP 8627) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 46 Thread 0x7f83e80e3700 (LWP 8628) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 45 Thread 0x7f83e80c1700 (LWP 8629) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 44 Thread 0x7f83e809f700 (LWP 8630) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 43 Thread 0x7f83e807d700 (LWP 8631) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 42 Thread 0x7f83e805b700 (LWP 8632) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 41 Thread 0x7f83e8039700 (LWP 8633) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 40 Thread 0x7f83e8017700 (LWP 8634) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 39 Thread 0x7f83e7ff5700 (LWP 8635) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 38 Thread 0x7f83e7fd3700 (LWP 8636) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 37 Thread 0x7f83e7fb1700 (LWP 8637) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 36 Thread 0x7f83e7f8f700 (LWP 8638) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 35 Thread 0x7f83e7f6d700 (LWP 8639) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 34 Thread 0x7f83e7f4b700 (LWP 8640) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 33 Thread 0x7f83e7f29700 (LWP 8641) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 32 Thread 0x7f83e7f07700 (LWP 8642) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 31 Thread 0x7f83e7ee5700 (LWP 8643) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 30 Thread 0x7f83e7ec3700 (LWP 8644) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 29 Thread 0x7f83e7ea1700 (LWP 8645) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 28 Thread 0x7f83e7e7f700 (LWP 8646) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 27 Thread 0x7f83d7c5a700 (LWP 8647) 0x00007f83ea6db09d in >>>>>>>>> waitpid >>>>>>>>> () from /lib64/libpthread.so.0 >>>>>>>>> 26 Thread 0x7f83d7c53700 (LWP 8648) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 25 Thread 0x7f83d7252700 (LWP 8649) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 24 Thread 0x7f83d6851700 (LWP 8650) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 23 Thread 0x7f83d5e50700 (LWP 8651) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 22 Thread 0x7f83d544f700 (LWP 8652) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 21 Thread 0x7f83d4a4e700 (LWP 8653) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 20 Thread 0x7f83d404d700 (LWP 8654) 0x00007f83ea20be7d in >>>>>>>>> write >>>>>>>>> () >>>>>>>>> from /lib64/libc.so.6 >>>>>>>>> 19 Thread 0x7f83d364c700 (LWP 8655) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 18 Thread 0x7f83d2c4b700 (LWP 8656) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 17 Thread 0x7f83d224a700 (LWP 8657) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 16 Thread 0x7f83d1849700 (LWP 8658) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 15 Thread 0x7f83d0e48700 (LWP 8659) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 14 Thread 0x7f83d0447700 (LWP 8660) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 13 Thread 0x7f83cfa46700 (LWP 8661) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 12 Thread 0x7f83cf045700 (LWP 8662) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 11 Thread 0x7f83ce644700 (LWP 8663) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 10 Thread 0x7f83cdc43700 (LWP 8664) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>> 9 Thread 0x7f83cd242700 (LWP 8665) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () >>>>>>>>> from /lib64/libc.so.6 >>>>>>>>> 8 Thread 0x7f83cc841700 (LWP 8666) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () >>>>>>>>> from /lib64/libc.so.6 >>>>>>>>> 7 Thread 0x7f83cbe40700 (LWP 8667) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () >>>>>>>>> from /lib64/libc.so.6 >>>>>>>>> 6 Thread 0x7f83cb43f700 (LWP 8668) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () >>>>>>>>> from /lib64/libc.so.6 >>>>>>>>> 5 Thread 0x7f83caa3e700 (LWP 8669) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () >>>>>>>>> from /lib64/libc.so.6 >>>>>>>>> 4 Thread 0x7f83ca03d700 (LWP 8670) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () >>>>>>>>> from /lib64/libc.so.6 >>>>>>>>> 3 Thread 0x7f83c963c700 (LWP 8671) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () >>>>>>>>> from /lib64/libc.so.6 >>>>>>>>> 2 Thread 0x7f83c8c3b700 (LWP 8672) 0x00007f83ea215ae9 in >>>>>>>>> syscall >>>>>>>>> () >>>>>>>>> from /lib64/libc.so.6 >>>>>>>>> * 1 Thread 0x7f83eb3a8700 (LWP 8597) 0x00007f83ea211d03 in select >>>>>>>>> () >>>>>>>>> from /lib64/libc.so.6 >>>>>>>>> (gdb) >>>>>>>>> >>>>>>>>> >>>>>>>>> (gdb) bt >>>>>>>>> #0 0x00007f83ea20be7d in write () from /lib64/libc.so.6 >>>>>>>>> #1 0x00007f83ea1a2583 in _IO_new_file_write () from >>>>>>>>> /lib64/libc.so.6 >>>>>>>>> #2 0x00007f83ea1a3b35 in _IO_new_do_write () from /lib64/libc.so.6 >>>>>>>>> #3 0x00007f83ea1a21fd in _IO_new_file_xsputn () from >>>>>>>>> /lib64/libc.so.6 >>>>>>>>> #4 0x00007f83ea17589d in vfprintf () from /lib64/libc.so.6 >>>>>>>>> #5 0x00007f83ea18003a in printf () from /lib64/libc.so.6 >>>>>>>>> #6 0x000000000058f0e8 in tcp_recv (desc=0x2c3d350, request_len=0) >>>>>>>>> at >>>>>>>>> drivers/common/inet_drv.c:8976 >>>>>>>>> #7 0x000000000058f63a in tcp_inet_input (data=0x2c3d350, >>>>>>>>> event=>>>>>>>> optimized out>) at drivers/common/inet_drv.c:9326 >>>>>>>>> #8 tcp_inet_drv_input (data=0x2c3d350, event=>>>>>>>> out>) >>>>>>>>> at drivers/common/inet_drv.c:9604 >>>>>>>>> #9 0x00000000004b770f in erts_port_task_execute >>>>>>>>> (runq=0x7f83e9d5d3c0, >>>>>>>>> curr_port_pp=0x7f83e8dc6e78) at beam/erl_port_task.c:851 >>>>>>>>> #10 0x00000000004afd83 in schedule (p=, >>>>>>>>> calls=) at beam/erl_process.c:6533 >>>>>>>>> #11 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >>>>>>>>> #12 0x00000000004b1279 in sched_thread_func (vesdp=0x7f83e8dc6dc0) >>>>>>>>> at >>>>>>>>> beam/erl_process.c:4834 >>>>>>>>> #13 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at >>>>>>>>> pthread/ethread.c:106 >>>>>>>>> #14 0x00007f83ea6d3851 in start_thread () from >>>>>>>>> /lib64/libpthread.so.0 >>>>>>>>> #15 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>>>>>> (gdb) >>>>>>>>> >>>>>>>>> (gdb) bt >>>>>>>>> #0 0x00007f83ea6da54d in read () from /lib64/libpthread.so.0 >>>>>>>>> #1 0x0000000000554b6e in signal_dispatcher_thread_func >>>>>>>>> (unused=>>>>>>>> optimized out>) at sys/unix/sys.c:2776 >>>>>>>>> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266c80) at >>>>>>>>> pthread/ethread.c:106 >>>>>>>>> #3 0x00007f83ea6d3851 in start_thread () from >>>>>>>>> /lib64/libpthread.so.0 >>>>>>>>> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>>>>>> (gdb) >>>>>>>>> >>>>>>>>> (gdb) bt >>>>>>>>> #0 0x00007f83ea215ae9 in syscall () from /lib64/libc.so.6 >>>>>>>>> #1 0x00000000005bba35 in wait__ (e=0x2989390) at >>>>>>>>> pthread/ethr_event.c:92 >>>>>>>>> #2 ethr_event_wait (e=0x2989390) at pthread/ethr_event.c:218 >>>>>>>>> #3 0x00000000004ae5bd in erts_tse_wait (fcalls=>>>>>>>> out>, >>>>>>>>> esdp=0x7f83e8e2c440, rq=0x7f83e9d5e7c0) at beam/erl_threads.h:2319 >>>>>>>>> #4 scheduler_wait (fcalls=, >>>>>>>>> esdp=0x7f83e8e2c440, >>>>>>>>> rq=0x7f83e9d5e7c0) at beam/erl_process.c:2087 >>>>>>>>> #5 0x00000000004afb94 in schedule (p=, >>>>>>>>> calls=) at beam/erl_process.c:6467 >>>>>>>>> #6 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >>>>>>>>> #7 0x00000000004b1279 in sched_thread_func (vesdp=0x7f83e8e2c440) >>>>>>>>> at >>>>>>>>> beam/erl_process.c:4834 >>>>>>>>> #8 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at >>>>>>>>> pthread/ethread.c:106 >>>>>>>>> #9 0x00007f83ea6d3851 in start_thread () from >>>>>>>>> /lib64/libpthread.so.0 >>>>>>>>> #10 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>>>>>> (gdb) >>>>>>>>> >>>>>>>>> >>>>>>>>> (gdb) bt >>>>>>>>> #0 0x00007f83ea6db09d in waitpid () from /lib64/libpthread.so.0 >>>>>>>>> #1 0x0000000000555a9f in child_waiter (unused=>>>>>>>> out>) >>>>>>>>> at sys/unix/sys.c:2700 >>>>>>>>> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266d50) at >>>>>>>>> pthread/ethread.c:106 >>>>>>>>> #3 0x00007f83ea6d3851 in start_thread () from >>>>>>>>> /lib64/libpthread.so.0 >>>>>>>>> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>>>>>> (gdb) >>>>>>>>> >>>>>>>>> >>>>>>>>> **** END UPDATE **** >>>>>>>>> >>>>>>>>> >>>>>>>>> I'm happy to provide any information I can, so please don't >>>>>>>>> hesitate >>>>>>>>> to >>>>>>>>> ask. >>>>>>>>> >>>>>>>>> Thanks in advance! >>>>>>>>> >>>>>>>>> Kind Regards, >>>>>>>>> >>>>>>>>> Peter Membrey >>>>>>>>> >>>>>>> _______________________________________________ >>>>>>> erlang-bugs mailing list >>>>>>> erlang-bugs@REDACTED >>>>>>> http://erlang.org/mailman/listinfo/erlang-bugs >>>>> >>>>> > From pan@REDACTED Thu Dec 6 13:01:38 2012 From: pan@REDACTED (Patrik Nyblom) Date: Thu, 6 Dec 2012 13:01:38 +0100 Subject: [erlang-bugs] VM locks up on write to socket (and now it seems to file too) In-Reply-To: References: <50AEC81B.2000908@ninenines.eu> <50AFA09D.4060100@erlang.org> <50B633C7.7000709@erlang.org> <50B64897.2050300@erlang.org> <50B7AD7C.3060809@erlang.org> Message-ID: <50C08922.5050801@erlang.org> Hi! Good! The workaround isn't all that costly either, so I think we could put that in R16 as is. It would be good if the OS bug got fixed though, but I think that's also on it's way. Cheers, /Patrik On 12/05/2012 02:20 PM, Peter Membrey wrote: > Hi Patrik, > > Really sorry for the delay in getting back to you! > > I tried the same test on RHEL 6.3 using the patched version and > everything seems fine. No stuck threads and the VM is still happy and > responsive. > > I'm currently working on a load testing app to try and trigger the > issue on demand in the application itself, but I suspect your patch > has done the trick! > > Thanks for fixing this so fast and sorry again for the delay in > getting back in touch! > > Cheers, > > Pete > > > > On 30 November 2012 02:46, Patrik Nyblom wrote: >> Hi! >> >> On 11/29/2012 04:41 AM, Peter Membrey wrote: >>> Hi Patrik, >>> >>> I can also confirm that this bug exists on Red Hat Enterprise Linux >>> 6.3. I'll raise a support ticket with them as well. >>> >>> A workaround in the vm would be nice if you have time? :-) >> Could you try the attached diff and see if it works for your environment? It >> would seem nothing is written when 0 is returned, so it should be safe to >> try again... >> >> Cheers, >> /Patrik >> >> >>> Cheers, >>> >>> Pete >>> >>> >>> On 29 November 2012 01:23, Patrik Nyblom wrote: >>>> Hi again! >>>> >>>> No problem reproducing when I've got CentOS 6.3... The following commands >>>> in >>>> the Erlang shell: >>>> {ok,L} = gen_tcp:listen(4747,[{active,false}]). >>>> {ok,S} = gen_tcp:connect("localhost",4747,[{active,false}]). >>>> {ok,A} = gen_tcp:accept(L). >>>> gen_tcp:send(A,binary:copy(<<$a:8>>,2158022464)). >>>> >>>> gives the following strace: >>>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>>> 2158022464}], 1) = 0 >>>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>>> 2158022464}], 1) = 0 >>>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>>> 2158022464}], 1) = 0 >>>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>>> 2158022464}], 1) = 0 >>>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>>> 2158022464}], 1) = 0 >>>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>>> 2158022464}], 1) = 0 >>>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>>> 2158022464}], 1) = 0 >>>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>>> 2158022464}], 1) = 0 >>>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>>> 2158022464}], 1) = 0 >>>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>>> 2158022464}], 1) = 0 >>>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>>> 2158022464}], 1) = 0 >>>> [.....] >>>> >>>> While on ubuntu for example it works like it should...Looks like a kernel >>>> bug to me... I wonder if this should be worked around or just reported... >>>> I >>>> suppose both... Sigh... >>>> >>>> /Patrik >>>> >>>> >>>> On 11/28/2012 05:23 PM, Peter Membrey wrote: >>>>> Hi, >>>>> >>>>> No problem, I'll do what I can to help - thanks for looking into this >>>>> so quickly! >>>>> >>>>> Any idea what might be causing it? >>>>> >>>>> Cheers, >>>>> >>>>> Pete >>>>> >>>>> On 28 November 2012 23:54, Patrik Nyblom wrote: >>>>>> Hi! >>>>>> >>>>>> I'll upgrade the CentOS VM I have to 6.3 (only had 6.1 :() and see if I >>>>>> can >>>>>> reproduce. If that fails, could you run a VM with a patch to try to >>>>>> handle >>>>>> the unexpected case and see if that fixes it? >>>>>> >>>>>> Cheers, >>>>>> /Patrik >>>>>> >>>>>> On 11/24/2012 02:57 PM, Peter Membrey wrote: >>>>>>> Hi guys, >>>>>>> >>>>>>> Thanks for getting back in touch so quickly! >>>>>>> >>>>>>> I did do an lsof on the process and I can confirm that it was >>>>>>> definitely a socket. However by that time the application it had been >>>>>>> trying to send to had been killed. When I checked the sockets were >>>>>>> showing as waiting to close. Unfortunately I didn't think to do an >>>>>>> lsof until after the apps had been shut down. I was hoping the VM >>>>>>> would recover if I killed the app that had upset it. However even >>>>>>> after all the apps connected had been shut down, the issue didn't >>>>>>> resolve. >>>>>>> >>>>>>> The application receives requests from a client, which contains two >>>>>>> data items. The stream ID and a timestamp. Both are encoded as big >>>>>>> integer unsigned numbers. The server then looks through the file >>>>>>> referenced by the stream ID and uses the timestamp as an index. The >>>>>>> file format is currently really simple, in the form of: >>>>>>> >>>>>>> >>>>>>> >>>>>>> > >>>>>>> >>>>>>> There is an index file that provides an offset into the file based on >>>>>>> time stamp, but basically it opens the file, and reads sequentially >>>>>>> through it until it finds the timestamps that it cares about. In this >>>>>>> case it reads all data with a greater timestamp until the end of the >>>>>>> file is reached. It's possible the client is sending an incorrect >>>>>>> timestamp, and maybe too much data is being read. However the loop is >>>>>>> very primitive - it reads all the data in one go before passing it >>>>>>> back to the protocol handler to send down the socket; so by that time >>>>>>> even though the response is technically incorrect and the app has >>>>>>> failed, it should still not cause the VM any issues. >>>>>>> >>>>>>> The data is polled every 10 seconds by the client app so I would not >>>>>>> expect there to be 2GB of new data to send. I'm afraid my C skills are >>>>>>> somewhat limited, so I'm not sure how to put together a sample app to >>>>>>> try out writev. The platform is 64bit CentOS 6.3 (equivalent to RHEL >>>>>>> 6.3) so I'm not expecting any strange or weird behaviour from the OS >>>>>>> level but of course I could be completely wrong there. The OS is >>>>>>> running directly on hardware, so there's no VM layer to worry about. >>>>>>> >>>>>>> Hope this might offer some additional clues? >>>>>>> >>>>>>> Thanks again! >>>>>>> >>>>>>> Kind Regards, >>>>>>> >>>>>>> Peter Membrey >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 24 November 2012 00:13, Patrik Nyblom wrote: >>>>>>>> Hi again! >>>>>>>> >>>>>>>> Could you go back to the version without the printouts and get back >>>>>>>> to >>>>>>>> the >>>>>>>> situation where writev loops returning 0 (as in the strace)? If so, >>>>>>>> it >>>>>>>> would >>>>>>>> be really interesting to see an 'lsof' of the beam process, to see if >>>>>>>> this >>>>>>>> file descriptor really is open and is a socket... >>>>>>>> >>>>>>>> The thing is that writev with a vector that is not empty, would never >>>>>>>> return >>>>>>>> 0 for a non blocking socket. Not on any modern (i.e. not ancient) >>>>>>>> POSIX >>>>>>>> compliant system anyway. Of course it is a *really* large item you >>>>>>>> are >>>>>>>> trying to write there, but it should be no problem for a 64bit linux. >>>>>>>> >>>>>>>> Also I think there is no use finding the Erlang code, I'll take that >>>>>>>> back, >>>>>>>> It would be more interesting to see what really happens at the OS/VM >>>>>>>> level >>>>>>>> in this case. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Patrik >>>>>>>> >>>>>>>> >>>>>>>> On 11/23/2012 01:49 AM, Lo?c Hoguin wrote: >>>>>>>>> Sending this on behalf of someone who didn't manage to get the email >>>>>>>>> sent >>>>>>>>> to this list after 2 attempts. If someone can check if he's hold up >>>>>>>>> or >>>>>>>>> something that'd be great. >>>>>>>>> >>>>>>>>> Anyway he has a big issue so I hope I can relay the conversation >>>>>>>>> reliably. >>>>>>>>> >>>>>>>>> Thanks! >>>>>>>>> >>>>>>>>> On 11/23/2012 01:45 AM, Peter Membrey wrote: >>>>>>>>>> From: Peter Membrey >>>>>>>>>> Date: 22 November 2012 19:02 >>>>>>>>>> Subject: VM locks up on write to socket (and now it seems to file >>>>>>>>>> too) >>>>>>>>>> To: erlang-bugs@REDACTED >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi guys, >>>>>>>>>> >>>>>>>>>> I wrote a simple database application called CakeDB >>>>>>>>>> (https://github.com/pmembrey/cakedb) that basically spends its time >>>>>>>>>> reading and writing files and sockets. There's very little in the >>>>>>>>>> way >>>>>>>>>> of complex logic. It is running on CentOS 6.3 with all the updates >>>>>>>>>> applied. I hit this problem on R15B02 so I rolled back to R15B01 >>>>>>>>>> but >>>>>>>>>> the issue remained. Erlang was built from source. >>>>>>>>>> >>>>>>>>>> The machine has two Intel X5690 CPUs giving 12 cores plus HT. I've >>>>>>>>>> tried various arguments for the VM but so far nothing has prevented >>>>>>>>>> the problem. At the moment I'm using: >>>>>>>>>> >>>>>>>>>> +K >>>>>>>>>> +A 6 >>>>>>>>>> +sbt tnnps >>>>>>>>>> >>>>>>>>>> The issue I'm seeing is that one of the scheduler threads will hit >>>>>>>>>> 100% cpu usage and the entire VM will become unresponsive. When >>>>>>>>>> this >>>>>>>>>> happens, I am not able to connect via the console with attach and >>>>>>>>>> entop is also unable to connect. I can still establish TCP >>>>>>>>>> connections >>>>>>>>>> to the application, but I never receive a response. A standard kill >>>>>>>>>> signal will cause the VM to shut down (it doesn't need -9). >>>>>>>>>> >>>>>>>>>> Due to the pedigree of the VM I am quite willing to accept that >>>>>>>>>> I've >>>>>>>>>> made a fundamental mistake in my code. I am pretty sure that the >>>>>>>>>> way >>>>>>>>>> I >>>>>>>>>> am doing the file IO could result in some race conditions. However, >>>>>>>>>> my >>>>>>>>>> poor code aside, from what I understand, I still shouldn't be able >>>>>>>>>> to >>>>>>>>>> crash / deadlock the VM like this. >>>>>>>>>> >>>>>>>>>> The issue doesn't seem to be caused by load. The app can fail when >>>>>>>>>> it's very busy, but also when it is practically idle. I haven't >>>>>>>>>> been >>>>>>>>>> able to find a trigger or any other explanation for the failure. >>>>>>>>>> >>>>>>>>>> The thread maxing out the CPU is attempting to write data to the >>>>>>>>>> socket: >>>>>>>>>> >>>>>>>>>> (gdb) bt >>>>>>>>>> #0 0x00007f9882ab6377 in writev () from /lib64/libc.so.6 >>>>>>>>>> #1 0x000000000058a81f in tcp_inet_output (data=0x2407570, >>>>>>>>>> event=) at drivers/common/inet_drv.c:9681 >>>>>>>>>> #2 tcp_inet_drv_output (data=0x2407570, event=>>>>>>>>> out>) >>>>>>>>>> at drivers/common/inet_drv.c:9601 >>>>>>>>>> #3 0x00000000004b773f in erts_port_task_execute >>>>>>>>>> (runq=0x7f98826019c0, >>>>>>>>>> curr_port_pp=0x7f9881639338) at beam/erl_port_task.c:858 >>>>>>>>>> #4 0x00000000004afd83 in schedule (p=, >>>>>>>>>> calls=) at beam/erl_process.c:6533 >>>>>>>>>> #5 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >>>>>>>>>> #6 0x00000000004b1279 in sched_thread_func (vesdp=0x7f9881639280) >>>>>>>>>> at >>>>>>>>>> beam/erl_process.c:4834 >>>>>>>>>> #7 0x00000000005ba726 in thr_wrapper (vtwd=0x7fff6cfe2300) at >>>>>>>>>> pthread/ethread.c:106 >>>>>>>>>> #8 0x00007f9882f78851 in start_thread () from >>>>>>>>>> /lib64/libpthread.so.0 >>>>>>>>>> #9 0x00007f9882abe11d in clone () from /lib64/libc.so.6 >>>>>>>>>> (gdb) >>>>>>>>>> >>>>>>>>>> I then tried running strace on that thread and got (indefinitely): >>>>>>>>>> >>>>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>>>> ... >>>>>>>>>> >>>>>>>>>> From what I can tell, it's trying to write data to a socket, >>>>>>>>>> which >>>>>>>>>> is >>>>>>>>>> succeeding, but writing 0 bytes. From the earlier definitions in >>>>>>>>>> the >>>>>>>>>> source file, an error condition would be signified by a negative >>>>>>>>>> number. Any other result is the number of bytes written, in this >>>>>>>>>> case >>>>>>>>>> 0. I'm not sure if this is desired behaviour or not. I've tried >>>>>>>>>> killing the application on the other end of the socket, but it has >>>>>>>>>> no >>>>>>>>>> effect on the VM. >>>>>>>>>> >>>>>>>>>> I have enabled debugging for the inet code, so hopefully this will >>>>>>>>>> give a little more insight. I am currently trying to reproduce the >>>>>>>>>> condition, but as I really have no idea what causes it, it's pretty >>>>>>>>>> much a case of wait and see. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> **** UPDATE **** >>>>>>>>>> >>>>>>>>>> I managed to lock up the VM again, but this time it was caused by >>>>>>>>>> file >>>>>>>>>> IO, >>>>>>>>>> probably from the debugging statements. Although it worked fine for >>>>>>>>>> some >>>>>>>>>> time >>>>>>>>>> the last entry in the file was cut off. >>>>>>>>>> >>>>>>>>>> From GDB: >>>>>>>>>> >>>>>>>>>> (gdb) info threads >>>>>>>>>> 53 Thread 0x7f83e988b700 (LWP 8621) 0x00007f83ea6da54d in >>>>>>>>>> read >>>>>>>>>> () >>>>>>>>>> from /lib64/libpthread.so.0 >>>>>>>>>> 52 Thread 0x7f83e8c8f700 (LWP 8622) 0x00007f83ea6d743c in >>>>>>>>>> pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >>>>>>>>>> 51 Thread 0x7f83e818d700 (LWP 8623) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 50 Thread 0x7f83e816b700 (LWP 8624) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 49 Thread 0x7f83e8149700 (LWP 8625) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 48 Thread 0x7f83e8127700 (LWP 8626) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 47 Thread 0x7f83e8105700 (LWP 8627) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 46 Thread 0x7f83e80e3700 (LWP 8628) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 45 Thread 0x7f83e80c1700 (LWP 8629) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 44 Thread 0x7f83e809f700 (LWP 8630) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 43 Thread 0x7f83e807d700 (LWP 8631) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 42 Thread 0x7f83e805b700 (LWP 8632) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 41 Thread 0x7f83e8039700 (LWP 8633) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 40 Thread 0x7f83e8017700 (LWP 8634) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 39 Thread 0x7f83e7ff5700 (LWP 8635) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 38 Thread 0x7f83e7fd3700 (LWP 8636) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 37 Thread 0x7f83e7fb1700 (LWP 8637) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 36 Thread 0x7f83e7f8f700 (LWP 8638) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 35 Thread 0x7f83e7f6d700 (LWP 8639) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 34 Thread 0x7f83e7f4b700 (LWP 8640) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 33 Thread 0x7f83e7f29700 (LWP 8641) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 32 Thread 0x7f83e7f07700 (LWP 8642) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 31 Thread 0x7f83e7ee5700 (LWP 8643) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 30 Thread 0x7f83e7ec3700 (LWP 8644) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 29 Thread 0x7f83e7ea1700 (LWP 8645) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 28 Thread 0x7f83e7e7f700 (LWP 8646) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 27 Thread 0x7f83d7c5a700 (LWP 8647) 0x00007f83ea6db09d in >>>>>>>>>> waitpid >>>>>>>>>> () from /lib64/libpthread.so.0 >>>>>>>>>> 26 Thread 0x7f83d7c53700 (LWP 8648) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 25 Thread 0x7f83d7252700 (LWP 8649) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 24 Thread 0x7f83d6851700 (LWP 8650) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 23 Thread 0x7f83d5e50700 (LWP 8651) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 22 Thread 0x7f83d544f700 (LWP 8652) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 21 Thread 0x7f83d4a4e700 (LWP 8653) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 20 Thread 0x7f83d404d700 (LWP 8654) 0x00007f83ea20be7d in >>>>>>>>>> write >>>>>>>>>> () >>>>>>>>>> from /lib64/libc.so.6 >>>>>>>>>> 19 Thread 0x7f83d364c700 (LWP 8655) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 18 Thread 0x7f83d2c4b700 (LWP 8656) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 17 Thread 0x7f83d224a700 (LWP 8657) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 16 Thread 0x7f83d1849700 (LWP 8658) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 15 Thread 0x7f83d0e48700 (LWP 8659) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 14 Thread 0x7f83d0447700 (LWP 8660) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 13 Thread 0x7f83cfa46700 (LWP 8661) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 12 Thread 0x7f83cf045700 (LWP 8662) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 11 Thread 0x7f83ce644700 (LWP 8663) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 10 Thread 0x7f83cdc43700 (LWP 8664) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>> 9 Thread 0x7f83cd242700 (LWP 8665) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () >>>>>>>>>> from /lib64/libc.so.6 >>>>>>>>>> 8 Thread 0x7f83cc841700 (LWP 8666) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () >>>>>>>>>> from /lib64/libc.so.6 >>>>>>>>>> 7 Thread 0x7f83cbe40700 (LWP 8667) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () >>>>>>>>>> from /lib64/libc.so.6 >>>>>>>>>> 6 Thread 0x7f83cb43f700 (LWP 8668) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () >>>>>>>>>> from /lib64/libc.so.6 >>>>>>>>>> 5 Thread 0x7f83caa3e700 (LWP 8669) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () >>>>>>>>>> from /lib64/libc.so.6 >>>>>>>>>> 4 Thread 0x7f83ca03d700 (LWP 8670) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () >>>>>>>>>> from /lib64/libc.so.6 >>>>>>>>>> 3 Thread 0x7f83c963c700 (LWP 8671) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () >>>>>>>>>> from /lib64/libc.so.6 >>>>>>>>>> 2 Thread 0x7f83c8c3b700 (LWP 8672) 0x00007f83ea215ae9 in >>>>>>>>>> syscall >>>>>>>>>> () >>>>>>>>>> from /lib64/libc.so.6 >>>>>>>>>> * 1 Thread 0x7f83eb3a8700 (LWP 8597) 0x00007f83ea211d03 in select >>>>>>>>>> () >>>>>>>>>> from /lib64/libc.so.6 >>>>>>>>>> (gdb) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> (gdb) bt >>>>>>>>>> #0 0x00007f83ea20be7d in write () from /lib64/libc.so.6 >>>>>>>>>> #1 0x00007f83ea1a2583 in _IO_new_file_write () from >>>>>>>>>> /lib64/libc.so.6 >>>>>>>>>> #2 0x00007f83ea1a3b35 in _IO_new_do_write () from /lib64/libc.so.6 >>>>>>>>>> #3 0x00007f83ea1a21fd in _IO_new_file_xsputn () from >>>>>>>>>> /lib64/libc.so.6 >>>>>>>>>> #4 0x00007f83ea17589d in vfprintf () from /lib64/libc.so.6 >>>>>>>>>> #5 0x00007f83ea18003a in printf () from /lib64/libc.so.6 >>>>>>>>>> #6 0x000000000058f0e8 in tcp_recv (desc=0x2c3d350, request_len=0) >>>>>>>>>> at >>>>>>>>>> drivers/common/inet_drv.c:8976 >>>>>>>>>> #7 0x000000000058f63a in tcp_inet_input (data=0x2c3d350, >>>>>>>>>> event=>>>>>>>>> optimized out>) at drivers/common/inet_drv.c:9326 >>>>>>>>>> #8 tcp_inet_drv_input (data=0x2c3d350, event=>>>>>>>>> out>) >>>>>>>>>> at drivers/common/inet_drv.c:9604 >>>>>>>>>> #9 0x00000000004b770f in erts_port_task_execute >>>>>>>>>> (runq=0x7f83e9d5d3c0, >>>>>>>>>> curr_port_pp=0x7f83e8dc6e78) at beam/erl_port_task.c:851 >>>>>>>>>> #10 0x00000000004afd83 in schedule (p=, >>>>>>>>>> calls=) at beam/erl_process.c:6533 >>>>>>>>>> #11 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >>>>>>>>>> #12 0x00000000004b1279 in sched_thread_func (vesdp=0x7f83e8dc6dc0) >>>>>>>>>> at >>>>>>>>>> beam/erl_process.c:4834 >>>>>>>>>> #13 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at >>>>>>>>>> pthread/ethread.c:106 >>>>>>>>>> #14 0x00007f83ea6d3851 in start_thread () from >>>>>>>>>> /lib64/libpthread.so.0 >>>>>>>>>> #15 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>>>>>>> (gdb) >>>>>>>>>> >>>>>>>>>> (gdb) bt >>>>>>>>>> #0 0x00007f83ea6da54d in read () from /lib64/libpthread.so.0 >>>>>>>>>> #1 0x0000000000554b6e in signal_dispatcher_thread_func >>>>>>>>>> (unused=>>>>>>>>> optimized out>) at sys/unix/sys.c:2776 >>>>>>>>>> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266c80) at >>>>>>>>>> pthread/ethread.c:106 >>>>>>>>>> #3 0x00007f83ea6d3851 in start_thread () from >>>>>>>>>> /lib64/libpthread.so.0 >>>>>>>>>> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>>>>>>> (gdb) >>>>>>>>>> >>>>>>>>>> (gdb) bt >>>>>>>>>> #0 0x00007f83ea215ae9 in syscall () from /lib64/libc.so.6 >>>>>>>>>> #1 0x00000000005bba35 in wait__ (e=0x2989390) at >>>>>>>>>> pthread/ethr_event.c:92 >>>>>>>>>> #2 ethr_event_wait (e=0x2989390) at pthread/ethr_event.c:218 >>>>>>>>>> #3 0x00000000004ae5bd in erts_tse_wait (fcalls=>>>>>>>>> out>, >>>>>>>>>> esdp=0x7f83e8e2c440, rq=0x7f83e9d5e7c0) at beam/erl_threads.h:2319 >>>>>>>>>> #4 scheduler_wait (fcalls=, >>>>>>>>>> esdp=0x7f83e8e2c440, >>>>>>>>>> rq=0x7f83e9d5e7c0) at beam/erl_process.c:2087 >>>>>>>>>> #5 0x00000000004afb94 in schedule (p=, >>>>>>>>>> calls=) at beam/erl_process.c:6467 >>>>>>>>>> #6 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >>>>>>>>>> #7 0x00000000004b1279 in sched_thread_func (vesdp=0x7f83e8e2c440) >>>>>>>>>> at >>>>>>>>>> beam/erl_process.c:4834 >>>>>>>>>> #8 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at >>>>>>>>>> pthread/ethread.c:106 >>>>>>>>>> #9 0x00007f83ea6d3851 in start_thread () from >>>>>>>>>> /lib64/libpthread.so.0 >>>>>>>>>> #10 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>>>>>>> (gdb) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> (gdb) bt >>>>>>>>>> #0 0x00007f83ea6db09d in waitpid () from /lib64/libpthread.so.0 >>>>>>>>>> #1 0x0000000000555a9f in child_waiter (unused=>>>>>>>>> out>) >>>>>>>>>> at sys/unix/sys.c:2700 >>>>>>>>>> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266d50) at >>>>>>>>>> pthread/ethread.c:106 >>>>>>>>>> #3 0x00007f83ea6d3851 in start_thread () from >>>>>>>>>> /lib64/libpthread.so.0 >>>>>>>>>> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>>>>>>> (gdb) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> **** END UPDATE **** >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I'm happy to provide any information I can, so please don't >>>>>>>>>> hesitate >>>>>>>>>> to >>>>>>>>>> ask. >>>>>>>>>> >>>>>>>>>> Thanks in advance! >>>>>>>>>> >>>>>>>>>> Kind Regards, >>>>>>>>>> >>>>>>>>>> Peter Membrey >>>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> erlang-bugs mailing list >>>>>>>> erlang-bugs@REDACTED >>>>>>>> http://erlang.org/mailman/listinfo/erlang-bugs >>>>>> From bengt.kleberg@REDACTED Thu Dec 6 15:13:27 2012 From: bengt.kleberg@REDACTED (Bengt Kleberg) Date: Thu, 6 Dec 2012 15:13:27 +0100 Subject: [erlang-bugs] Documentation/Code inconsistency Message-ID: <1354803207.6456.11.camel@sekic1152.rnd.ki.sw.ericsson.se> Greetings, I have found that on my machine: Linux sekic1152 2.6.27.42-0.1-default #1 SMP 2010-01-06 16:07:25 +0100 x86_64 x86_64 x86_64 GNU/Linux running Erlang/OTP: Erlang R15B01 (erts-5.9.1) [source] [64-bit] [smp:4:4] [async-threads:0] [hipe] [kernel-poll:false] erlang:port_info/2 does not behave as documented: 12> P = erlang:open_port( {spawn, "sleep 100"}, [stream]). 13> erlang:port_info(P, os_pid). ** exception error: bad argument in function erlang:port_info/2 called as erlang:port_info(#Port<0.599>,os_pid) Presumably the documentation is mistaken and 'os_pid' does not exist. bengt From peter@REDACTED Thu Dec 6 16:55:04 2012 From: peter@REDACTED (Peter Membrey) Date: Thu, 6 Dec 2012 23:55:04 +0800 Subject: [erlang-bugs] VM locks up on write to socket (and now it seems to file too) In-Reply-To: <50C08922.5050801@erlang.org> References: <50AEC81B.2000908@ninenines.eu> <50AFA09D.4060100@erlang.org> <50B633C7.7000709@erlang.org> <50B64897.2050300@erlang.org> <50B7AD7C.3060809@erlang.org> <50C08922.5050801@erlang.org> Message-ID: Hi Patrik, I don't suppose this fix could make it into the re-release of R15B03 could it? :-) Cheers, Pete On 6 December 2012 20:01, Patrik Nyblom wrote: > Hi! > > Good! The workaround isn't all that costly either, so I think we could put > that in R16 as is. It would be good if the OS bug got fixed though, but I > think that's also on it's way. > > Cheers, > /Patrik > > > On 12/05/2012 02:20 PM, Peter Membrey wrote: >> >> Hi Patrik, >> >> Really sorry for the delay in getting back to you! >> >> I tried the same test on RHEL 6.3 using the patched version and >> everything seems fine. No stuck threads and the VM is still happy and >> responsive. >> >> I'm currently working on a load testing app to try and trigger the >> issue on demand in the application itself, but I suspect your patch >> has done the trick! >> >> Thanks for fixing this so fast and sorry again for the delay in >> getting back in touch! >> >> Cheers, >> >> Pete >> >> >> >> On 30 November 2012 02:46, Patrik Nyblom wrote: >>> >>> Hi! >>> >>> On 11/29/2012 04:41 AM, Peter Membrey wrote: >>>> >>>> Hi Patrik, >>>> >>>> I can also confirm that this bug exists on Red Hat Enterprise Linux >>>> 6.3. I'll raise a support ticket with them as well. >>>> >>>> A workaround in the vm would be nice if you have time? :-) >>> >>> Could you try the attached diff and see if it works for your environment? >>> It >>> would seem nothing is written when 0 is returned, so it should be safe to >>> try again... >>> >>> Cheers, >>> /Patrik >>> >>> >>>> Cheers, >>>> >>>> Pete >>>> >>>> >>>> On 29 November 2012 01:23, Patrik Nyblom wrote: >>>>> >>>>> Hi again! >>>>> >>>>> No problem reproducing when I've got CentOS 6.3... The following >>>>> commands >>>>> in >>>>> the Erlang shell: >>>>> {ok,L} = gen_tcp:listen(4747,[{active,false}]). >>>>> {ok,S} = gen_tcp:connect("localhost",4747,[{active,false}]). >>>>> {ok,A} = gen_tcp:accept(L). >>>>> gen_tcp:send(A,binary:copy(<<$a:8>>,2158022464)). >>>>> >>>>> gives the following strace: >>>>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>>>> 2158022464}], 1) = 0 >>>>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>>>> 2158022464}], 1) = 0 >>>>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>>>> 2158022464}], 1) = 0 >>>>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>>>> 2158022464}], 1) = 0 >>>>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>>>> 2158022464}], 1) = 0 >>>>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>>>> 2158022464}], 1) = 0 >>>>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>>>> 2158022464}], 1) = 0 >>>>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>>>> 2158022464}], 1) = 0 >>>>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>>>> 2158022464}], 1) = 0 >>>>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>>>> 2158022464}], 1) = 0 >>>>> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >>>>> 2158022464}], 1) = 0 >>>>> [.....] >>>>> >>>>> While on ubuntu for example it works like it should...Looks like a >>>>> kernel >>>>> bug to me... I wonder if this should be worked around or just >>>>> reported... >>>>> I >>>>> suppose both... Sigh... >>>>> >>>>> /Patrik >>>>> >>>>> >>>>> On 11/28/2012 05:23 PM, Peter Membrey wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> No problem, I'll do what I can to help - thanks for looking into this >>>>>> so quickly! >>>>>> >>>>>> Any idea what might be causing it? >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Pete >>>>>> >>>>>> On 28 November 2012 23:54, Patrik Nyblom wrote: >>>>>>> >>>>>>> Hi! >>>>>>> >>>>>>> I'll upgrade the CentOS VM I have to 6.3 (only had 6.1 :() and see if >>>>>>> I >>>>>>> can >>>>>>> reproduce. If that fails, could you run a VM with a patch to try to >>>>>>> handle >>>>>>> the unexpected case and see if that fixes it? >>>>>>> >>>>>>> Cheers, >>>>>>> /Patrik >>>>>>> >>>>>>> On 11/24/2012 02:57 PM, Peter Membrey wrote: >>>>>>>> >>>>>>>> Hi guys, >>>>>>>> >>>>>>>> Thanks for getting back in touch so quickly! >>>>>>>> >>>>>>>> I did do an lsof on the process and I can confirm that it was >>>>>>>> definitely a socket. However by that time the application it had >>>>>>>> been >>>>>>>> trying to send to had been killed. When I checked the sockets were >>>>>>>> showing as waiting to close. Unfortunately I didn't think to do an >>>>>>>> lsof until after the apps had been shut down. I was hoping the VM >>>>>>>> would recover if I killed the app that had upset it. However even >>>>>>>> after all the apps connected had been shut down, the issue didn't >>>>>>>> resolve. >>>>>>>> >>>>>>>> The application receives requests from a client, which contains two >>>>>>>> data items. The stream ID and a timestamp. Both are encoded as big >>>>>>>> integer unsigned numbers. The server then looks through the file >>>>>>>> referenced by the stream ID and uses the timestamp as an index. The >>>>>>>> file format is currently really simple, in the form of: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> > >>>>>>>> >>>>>>>> There is an index file that provides an offset into the file based >>>>>>>> on >>>>>>>> time stamp, but basically it opens the file, and reads sequentially >>>>>>>> through it until it finds the timestamps that it cares about. In >>>>>>>> this >>>>>>>> case it reads all data with a greater timestamp until the end of the >>>>>>>> file is reached. It's possible the client is sending an incorrect >>>>>>>> timestamp, and maybe too much data is being read. However the loop >>>>>>>> is >>>>>>>> very primitive - it reads all the data in one go before passing it >>>>>>>> back to the protocol handler to send down the socket; so by that >>>>>>>> time >>>>>>>> even though the response is technically incorrect and the app has >>>>>>>> failed, it should still not cause the VM any issues. >>>>>>>> >>>>>>>> The data is polled every 10 seconds by the client app so I would not >>>>>>>> expect there to be 2GB of new data to send. I'm afraid my C skills >>>>>>>> are >>>>>>>> somewhat limited, so I'm not sure how to put together a sample app >>>>>>>> to >>>>>>>> try out writev. The platform is 64bit CentOS 6.3 (equivalent to RHEL >>>>>>>> 6.3) so I'm not expecting any strange or weird behaviour from the OS >>>>>>>> level but of course I could be completely wrong there. The OS is >>>>>>>> running directly on hardware, so there's no VM layer to worry about. >>>>>>>> >>>>>>>> Hope this might offer some additional clues? >>>>>>>> >>>>>>>> Thanks again! >>>>>>>> >>>>>>>> Kind Regards, >>>>>>>> >>>>>>>> Peter Membrey >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 24 November 2012 00:13, Patrik Nyblom wrote: >>>>>>>>> >>>>>>>>> Hi again! >>>>>>>>> >>>>>>>>> Could you go back to the version without the printouts and get back >>>>>>>>> to >>>>>>>>> the >>>>>>>>> situation where writev loops returning 0 (as in the strace)? If so, >>>>>>>>> it >>>>>>>>> would >>>>>>>>> be really interesting to see an 'lsof' of the beam process, to see >>>>>>>>> if >>>>>>>>> this >>>>>>>>> file descriptor really is open and is a socket... >>>>>>>>> >>>>>>>>> The thing is that writev with a vector that is not empty, would >>>>>>>>> never >>>>>>>>> return >>>>>>>>> 0 for a non blocking socket. Not on any modern (i.e. not ancient) >>>>>>>>> POSIX >>>>>>>>> compliant system anyway. Of course it is a *really* large item you >>>>>>>>> are >>>>>>>>> trying to write there, but it should be no problem for a 64bit >>>>>>>>> linux. >>>>>>>>> >>>>>>>>> Also I think there is no use finding the Erlang code, I'll take >>>>>>>>> that >>>>>>>>> back, >>>>>>>>> It would be more interesting to see what really happens at the >>>>>>>>> OS/VM >>>>>>>>> level >>>>>>>>> in this case. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Patrik >>>>>>>>> >>>>>>>>> >>>>>>>>> On 11/23/2012 01:49 AM, Lo?c Hoguin wrote: >>>>>>>>>> >>>>>>>>>> Sending this on behalf of someone who didn't manage to get the >>>>>>>>>> email >>>>>>>>>> sent >>>>>>>>>> to this list after 2 attempts. If someone can check if he's hold >>>>>>>>>> up >>>>>>>>>> or >>>>>>>>>> something that'd be great. >>>>>>>>>> >>>>>>>>>> Anyway he has a big issue so I hope I can relay the conversation >>>>>>>>>> reliably. >>>>>>>>>> >>>>>>>>>> Thanks! >>>>>>>>>> >>>>>>>>>> On 11/23/2012 01:45 AM, Peter Membrey wrote: >>>>>>>>>>> >>>>>>>>>>> From: Peter Membrey >>>>>>>>>>> Date: 22 November 2012 19:02 >>>>>>>>>>> Subject: VM locks up on write to socket (and now it seems to file >>>>>>>>>>> too) >>>>>>>>>>> To: erlang-bugs@REDACTED >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi guys, >>>>>>>>>>> >>>>>>>>>>> I wrote a simple database application called CakeDB >>>>>>>>>>> (https://github.com/pmembrey/cakedb) that basically spends its >>>>>>>>>>> time >>>>>>>>>>> reading and writing files and sockets. There's very little in the >>>>>>>>>>> way >>>>>>>>>>> of complex logic. It is running on CentOS 6.3 with all the >>>>>>>>>>> updates >>>>>>>>>>> applied. I hit this problem on R15B02 so I rolled back to R15B01 >>>>>>>>>>> but >>>>>>>>>>> the issue remained. Erlang was built from source. >>>>>>>>>>> >>>>>>>>>>> The machine has two Intel X5690 CPUs giving 12 cores plus HT. >>>>>>>>>>> I've >>>>>>>>>>> tried various arguments for the VM but so far nothing has >>>>>>>>>>> prevented >>>>>>>>>>> the problem. At the moment I'm using: >>>>>>>>>>> >>>>>>>>>>> +K >>>>>>>>>>> +A 6 >>>>>>>>>>> +sbt tnnps >>>>>>>>>>> >>>>>>>>>>> The issue I'm seeing is that one of the scheduler threads will >>>>>>>>>>> hit >>>>>>>>>>> 100% cpu usage and the entire VM will become unresponsive. When >>>>>>>>>>> this >>>>>>>>>>> happens, I am not able to connect via the console with attach and >>>>>>>>>>> entop is also unable to connect. I can still establish TCP >>>>>>>>>>> connections >>>>>>>>>>> to the application, but I never receive a response. A standard >>>>>>>>>>> kill >>>>>>>>>>> signal will cause the VM to shut down (it doesn't need -9). >>>>>>>>>>> >>>>>>>>>>> Due to the pedigree of the VM I am quite willing to accept that >>>>>>>>>>> I've >>>>>>>>>>> made a fundamental mistake in my code. I am pretty sure that the >>>>>>>>>>> way >>>>>>>>>>> I >>>>>>>>>>> am doing the file IO could result in some race conditions. >>>>>>>>>>> However, >>>>>>>>>>> my >>>>>>>>>>> poor code aside, from what I understand, I still shouldn't be >>>>>>>>>>> able >>>>>>>>>>> to >>>>>>>>>>> crash / deadlock the VM like this. >>>>>>>>>>> >>>>>>>>>>> The issue doesn't seem to be caused by load. The app can fail >>>>>>>>>>> when >>>>>>>>>>> it's very busy, but also when it is practically idle. I haven't >>>>>>>>>>> been >>>>>>>>>>> able to find a trigger or any other explanation for the failure. >>>>>>>>>>> >>>>>>>>>>> The thread maxing out the CPU is attempting to write data to the >>>>>>>>>>> socket: >>>>>>>>>>> >>>>>>>>>>> (gdb) bt >>>>>>>>>>> #0 0x00007f9882ab6377 in writev () from /lib64/libc.so.6 >>>>>>>>>>> #1 0x000000000058a81f in tcp_inet_output (data=0x2407570, >>>>>>>>>>> event=) at drivers/common/inet_drv.c:9681 >>>>>>>>>>> #2 tcp_inet_drv_output (data=0x2407570, event=>>>>>>>>>> out>) >>>>>>>>>>> at drivers/common/inet_drv.c:9601 >>>>>>>>>>> #3 0x00000000004b773f in erts_port_task_execute >>>>>>>>>>> (runq=0x7f98826019c0, >>>>>>>>>>> curr_port_pp=0x7f9881639338) at beam/erl_port_task.c:858 >>>>>>>>>>> #4 0x00000000004afd83 in schedule (p=, >>>>>>>>>>> calls=) at beam/erl_process.c:6533 >>>>>>>>>>> #5 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >>>>>>>>>>> #6 0x00000000004b1279 in sched_thread_func >>>>>>>>>>> (vesdp=0x7f9881639280) >>>>>>>>>>> at >>>>>>>>>>> beam/erl_process.c:4834 >>>>>>>>>>> #7 0x00000000005ba726 in thr_wrapper (vtwd=0x7fff6cfe2300) at >>>>>>>>>>> pthread/ethread.c:106 >>>>>>>>>>> #8 0x00007f9882f78851 in start_thread () from >>>>>>>>>>> /lib64/libpthread.so.0 >>>>>>>>>>> #9 0x00007f9882abe11d in clone () from /lib64/libc.so.6 >>>>>>>>>>> (gdb) >>>>>>>>>>> >>>>>>>>>>> I then tried running strace on that thread and got >>>>>>>>>>> (indefinitely): >>>>>>>>>>> >>>>>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>>>>> ... >>>>>>>>>>> >>>>>>>>>>> From what I can tell, it's trying to write data to a socket, >>>>>>>>>>> which >>>>>>>>>>> is >>>>>>>>>>> succeeding, but writing 0 bytes. From the earlier definitions in >>>>>>>>>>> the >>>>>>>>>>> source file, an error condition would be signified by a negative >>>>>>>>>>> number. Any other result is the number of bytes written, in this >>>>>>>>>>> case >>>>>>>>>>> 0. I'm not sure if this is desired behaviour or not. I've tried >>>>>>>>>>> killing the application on the other end of the socket, but it >>>>>>>>>>> has >>>>>>>>>>> no >>>>>>>>>>> effect on the VM. >>>>>>>>>>> >>>>>>>>>>> I have enabled debugging for the inet code, so hopefully this >>>>>>>>>>> will >>>>>>>>>>> give a little more insight. I am currently trying to reproduce >>>>>>>>>>> the >>>>>>>>>>> condition, but as I really have no idea what causes it, it's >>>>>>>>>>> pretty >>>>>>>>>>> much a case of wait and see. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> **** UPDATE **** >>>>>>>>>>> >>>>>>>>>>> I managed to lock up the VM again, but this time it was caused by >>>>>>>>>>> file >>>>>>>>>>> IO, >>>>>>>>>>> probably from the debugging statements. Although it worked fine >>>>>>>>>>> for >>>>>>>>>>> some >>>>>>>>>>> time >>>>>>>>>>> the last entry in the file was cut off. >>>>>>>>>>> >>>>>>>>>>> From GDB: >>>>>>>>>>> >>>>>>>>>>> (gdb) info threads >>>>>>>>>>> 53 Thread 0x7f83e988b700 (LWP 8621) 0x00007f83ea6da54d in >>>>>>>>>>> read >>>>>>>>>>> () >>>>>>>>>>> from /lib64/libpthread.so.0 >>>>>>>>>>> 52 Thread 0x7f83e8c8f700 (LWP 8622) 0x00007f83ea6d743c in >>>>>>>>>>> pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >>>>>>>>>>> 51 Thread 0x7f83e818d700 (LWP 8623) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 50 Thread 0x7f83e816b700 (LWP 8624) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 49 Thread 0x7f83e8149700 (LWP 8625) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 48 Thread 0x7f83e8127700 (LWP 8626) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 47 Thread 0x7f83e8105700 (LWP 8627) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 46 Thread 0x7f83e80e3700 (LWP 8628) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 45 Thread 0x7f83e80c1700 (LWP 8629) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 44 Thread 0x7f83e809f700 (LWP 8630) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 43 Thread 0x7f83e807d700 (LWP 8631) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 42 Thread 0x7f83e805b700 (LWP 8632) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 41 Thread 0x7f83e8039700 (LWP 8633) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 40 Thread 0x7f83e8017700 (LWP 8634) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 39 Thread 0x7f83e7ff5700 (LWP 8635) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 38 Thread 0x7f83e7fd3700 (LWP 8636) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 37 Thread 0x7f83e7fb1700 (LWP 8637) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 36 Thread 0x7f83e7f8f700 (LWP 8638) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 35 Thread 0x7f83e7f6d700 (LWP 8639) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 34 Thread 0x7f83e7f4b700 (LWP 8640) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 33 Thread 0x7f83e7f29700 (LWP 8641) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 32 Thread 0x7f83e7f07700 (LWP 8642) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 31 Thread 0x7f83e7ee5700 (LWP 8643) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 30 Thread 0x7f83e7ec3700 (LWP 8644) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 29 Thread 0x7f83e7ea1700 (LWP 8645) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 28 Thread 0x7f83e7e7f700 (LWP 8646) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 27 Thread 0x7f83d7c5a700 (LWP 8647) 0x00007f83ea6db09d in >>>>>>>>>>> waitpid >>>>>>>>>>> () from /lib64/libpthread.so.0 >>>>>>>>>>> 26 Thread 0x7f83d7c53700 (LWP 8648) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 25 Thread 0x7f83d7252700 (LWP 8649) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 24 Thread 0x7f83d6851700 (LWP 8650) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 23 Thread 0x7f83d5e50700 (LWP 8651) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 22 Thread 0x7f83d544f700 (LWP 8652) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 21 Thread 0x7f83d4a4e700 (LWP 8653) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 20 Thread 0x7f83d404d700 (LWP 8654) 0x00007f83ea20be7d in >>>>>>>>>>> write >>>>>>>>>>> () >>>>>>>>>>> from /lib64/libc.so.6 >>>>>>>>>>> 19 Thread 0x7f83d364c700 (LWP 8655) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 18 Thread 0x7f83d2c4b700 (LWP 8656) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 17 Thread 0x7f83d224a700 (LWP 8657) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 16 Thread 0x7f83d1849700 (LWP 8658) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 15 Thread 0x7f83d0e48700 (LWP 8659) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 14 Thread 0x7f83d0447700 (LWP 8660) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 13 Thread 0x7f83cfa46700 (LWP 8661) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 12 Thread 0x7f83cf045700 (LWP 8662) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 11 Thread 0x7f83ce644700 (LWP 8663) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 10 Thread 0x7f83cdc43700 (LWP 8664) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () from /lib64/libc.so.6 >>>>>>>>>>> 9 Thread 0x7f83cd242700 (LWP 8665) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () >>>>>>>>>>> from /lib64/libc.so.6 >>>>>>>>>>> 8 Thread 0x7f83cc841700 (LWP 8666) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () >>>>>>>>>>> from /lib64/libc.so.6 >>>>>>>>>>> 7 Thread 0x7f83cbe40700 (LWP 8667) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () >>>>>>>>>>> from /lib64/libc.so.6 >>>>>>>>>>> 6 Thread 0x7f83cb43f700 (LWP 8668) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () >>>>>>>>>>> from /lib64/libc.so.6 >>>>>>>>>>> 5 Thread 0x7f83caa3e700 (LWP 8669) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () >>>>>>>>>>> from /lib64/libc.so.6 >>>>>>>>>>> 4 Thread 0x7f83ca03d700 (LWP 8670) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () >>>>>>>>>>> from /lib64/libc.so.6 >>>>>>>>>>> 3 Thread 0x7f83c963c700 (LWP 8671) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () >>>>>>>>>>> from /lib64/libc.so.6 >>>>>>>>>>> 2 Thread 0x7f83c8c3b700 (LWP 8672) 0x00007f83ea215ae9 in >>>>>>>>>>> syscall >>>>>>>>>>> () >>>>>>>>>>> from /lib64/libc.so.6 >>>>>>>>>>> * 1 Thread 0x7f83eb3a8700 (LWP 8597) 0x00007f83ea211d03 in >>>>>>>>>>> select >>>>>>>>>>> () >>>>>>>>>>> from /lib64/libc.so.6 >>>>>>>>>>> (gdb) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> (gdb) bt >>>>>>>>>>> #0 0x00007f83ea20be7d in write () from /lib64/libc.so.6 >>>>>>>>>>> #1 0x00007f83ea1a2583 in _IO_new_file_write () from >>>>>>>>>>> /lib64/libc.so.6 >>>>>>>>>>> #2 0x00007f83ea1a3b35 in _IO_new_do_write () from >>>>>>>>>>> /lib64/libc.so.6 >>>>>>>>>>> #3 0x00007f83ea1a21fd in _IO_new_file_xsputn () from >>>>>>>>>>> /lib64/libc.so.6 >>>>>>>>>>> #4 0x00007f83ea17589d in vfprintf () from /lib64/libc.so.6 >>>>>>>>>>> #5 0x00007f83ea18003a in printf () from /lib64/libc.so.6 >>>>>>>>>>> #6 0x000000000058f0e8 in tcp_recv (desc=0x2c3d350, >>>>>>>>>>> request_len=0) >>>>>>>>>>> at >>>>>>>>>>> drivers/common/inet_drv.c:8976 >>>>>>>>>>> #7 0x000000000058f63a in tcp_inet_input (data=0x2c3d350, >>>>>>>>>>> event=>>>>>>>>>> optimized out>) at drivers/common/inet_drv.c:9326 >>>>>>>>>>> #8 tcp_inet_drv_input (data=0x2c3d350, event=>>>>>>>>>> out>) >>>>>>>>>>> at drivers/common/inet_drv.c:9604 >>>>>>>>>>> #9 0x00000000004b770f in erts_port_task_execute >>>>>>>>>>> (runq=0x7f83e9d5d3c0, >>>>>>>>>>> curr_port_pp=0x7f83e8dc6e78) at beam/erl_port_task.c:851 >>>>>>>>>>> #10 0x00000000004afd83 in schedule (p=, >>>>>>>>>>> calls=) at beam/erl_process.c:6533 >>>>>>>>>>> #11 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >>>>>>>>>>> #12 0x00000000004b1279 in sched_thread_func >>>>>>>>>>> (vesdp=0x7f83e8dc6dc0) >>>>>>>>>>> at >>>>>>>>>>> beam/erl_process.c:4834 >>>>>>>>>>> #13 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at >>>>>>>>>>> pthread/ethread.c:106 >>>>>>>>>>> #14 0x00007f83ea6d3851 in start_thread () from >>>>>>>>>>> /lib64/libpthread.so.0 >>>>>>>>>>> #15 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>>>>>>>> (gdb) >>>>>>>>>>> >>>>>>>>>>> (gdb) bt >>>>>>>>>>> #0 0x00007f83ea6da54d in read () from /lib64/libpthread.so.0 >>>>>>>>>>> #1 0x0000000000554b6e in signal_dispatcher_thread_func >>>>>>>>>>> (unused=>>>>>>>>>> optimized out>) at sys/unix/sys.c:2776 >>>>>>>>>>> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266c80) at >>>>>>>>>>> pthread/ethread.c:106 >>>>>>>>>>> #3 0x00007f83ea6d3851 in start_thread () from >>>>>>>>>>> /lib64/libpthread.so.0 >>>>>>>>>>> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>>>>>>>> (gdb) >>>>>>>>>>> >>>>>>>>>>> (gdb) bt >>>>>>>>>>> #0 0x00007f83ea215ae9 in syscall () from /lib64/libc.so.6 >>>>>>>>>>> #1 0x00000000005bba35 in wait__ (e=0x2989390) at >>>>>>>>>>> pthread/ethr_event.c:92 >>>>>>>>>>> #2 ethr_event_wait (e=0x2989390) at pthread/ethr_event.c:218 >>>>>>>>>>> #3 0x00000000004ae5bd in erts_tse_wait (fcalls=>>>>>>>>>> out>, >>>>>>>>>>> esdp=0x7f83e8e2c440, rq=0x7f83e9d5e7c0) at >>>>>>>>>>> beam/erl_threads.h:2319 >>>>>>>>>>> #4 scheduler_wait (fcalls=, >>>>>>>>>>> esdp=0x7f83e8e2c440, >>>>>>>>>>> rq=0x7f83e9d5e7c0) at beam/erl_process.c:2087 >>>>>>>>>>> #5 0x00000000004afb94 in schedule (p=, >>>>>>>>>>> calls=) at beam/erl_process.c:6467 >>>>>>>>>>> #6 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >>>>>>>>>>> #7 0x00000000004b1279 in sched_thread_func >>>>>>>>>>> (vesdp=0x7f83e8e2c440) >>>>>>>>>>> at >>>>>>>>>>> beam/erl_process.c:4834 >>>>>>>>>>> #8 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at >>>>>>>>>>> pthread/ethread.c:106 >>>>>>>>>>> #9 0x00007f83ea6d3851 in start_thread () from >>>>>>>>>>> /lib64/libpthread.so.0 >>>>>>>>>>> #10 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>>>>>>>> (gdb) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> (gdb) bt >>>>>>>>>>> #0 0x00007f83ea6db09d in waitpid () from /lib64/libpthread.so.0 >>>>>>>>>>> #1 0x0000000000555a9f in child_waiter (unused=>>>>>>>>>> out>) >>>>>>>>>>> at sys/unix/sys.c:2700 >>>>>>>>>>> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266d50) at >>>>>>>>>>> pthread/ethread.c:106 >>>>>>>>>>> #3 0x00007f83ea6d3851 in start_thread () from >>>>>>>>>>> /lib64/libpthread.so.0 >>>>>>>>>>> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>>>>>>>> (gdb) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> **** END UPDATE **** >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I'm happy to provide any information I can, so please don't >>>>>>>>>>> hesitate >>>>>>>>>>> to >>>>>>>>>>> ask. >>>>>>>>>>> >>>>>>>>>>> Thanks in advance! >>>>>>>>>>> >>>>>>>>>>> Kind Regards, >>>>>>>>>>> >>>>>>>>>>> Peter Membrey >>>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> erlang-bugs mailing list >>>>>>>>> erlang-bugs@REDACTED >>>>>>>>> http://erlang.org/mailman/listinfo/erlang-bugs >>>>>>> >>>>>>> > From pan@REDACTED Fri Dec 7 08:55:29 2012 From: pan@REDACTED (Patrik Nyblom) Date: Fri, 7 Dec 2012 08:55:29 +0100 Subject: [erlang-bugs] Documentation/Code inconsistency In-Reply-To: <1354803207.6456.11.camel@sekic1152.rnd.ki.sw.ericsson.se> References: <1354803207.6456.11.camel@sekic1152.rnd.ki.sw.ericsson.se> Message-ID: <50C1A0F1.2050300@erlang.org> Hi! On 12/06/2012 03:13 PM, Bengt Kleberg wrote: > Greetings, > > I have found that on my machine: > Linux sekic1152 2.6.27.42-0.1-default #1 SMP 2010-01-06 16:07:25 +0100 > x86_64 x86_64 x86_64 GNU/Linux > > running Erlang/OTP: > Erlang R15B01 (erts-5.9.1) [source] [64-bit] [smp:4:4] [async-threads:0] > [hipe] [kernel-poll:false] > > erlang:port_info/2 does not behave as documented: > > 12> P = erlang:open_port( {spawn, "sleep 100"}, [stream]). > 13> erlang:port_info(P, os_pid). > ** exception error: bad argument > in function erlang:port_info/2 > called as erlang:port_info(#Port<0.599>,os_pid) The os_pid came in R15B02, you have to upgrade your Erlang installation. > > Presumably the documentation is mistaken and 'os_pid' does not exist. Cheers, /Patrik > > > bengt > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs From dch@REDACTED Fri Dec 7 15:12:44 2012 From: dch@REDACTED (Dave Cottlehuber) Date: Fri, 7 Dec 2012 15:12:44 +0100 Subject: [erlang-bugs] Unable to build OTP with static crypto for Win32 In-Reply-To: <50BF15E0.8010801@erlang.org> References: <50BF15E0.8010801@erlang.org> Message-ID: Hi Patrik! Thanks for making the entire build chain work on mingw, it is noticeably faster than cygwin for building OTP for me. >> As far as I can tell, I'm configuring correctly with >> `--disable-dynamic-ssl-lib` but the resulting installer doesn't >> function without the two OpenSSL DLLs present. In fact, I've never had >> this working, ever. Is it actually possible? Please accept my apologies, what I should have said was "I'm not able to create an OTP release that supports static OpenSSL"! I'm doing almost the same as OTP, just with the assembler optimisations using nasm instead of the MS non-optimised one.. https://github.com/dch/glazier/blob/master/bin/build_openssl.cmd#L26-L30 > $ERL_TOP/HOWTO/INSTALL-WIN32.md, you have a huge instruction on how to build > a static OpenSSL lib using the MSVC command prompt. You then use that static > lib in your build of Erlang/OTP. You will never need to install any OpenSSL > dynamic libs. Verify that you got the lib files and that the correct OpenSSL > installation reside in the directory where configure think it should be (in > your case /c/OpenSSL) . You can point out the correct installation directory > during configure (otp_build configure --with-ssl-dir=...) Then *clean the > crypto application* (cd lib/crypto && make clean), the build logs you > provide does not rebuild so it's hard to see if the build does the right > thing. Then do do otp_build configure, otp_build boot -a , otp_build release > -a and otp_build installer_win32. > Also, if you look at lib/crypto/c_src/win32/Makefile after configure is run > , you should see lines looking like: > SSL_LIBDIR = /c/OpenSSL/lib/VC/static > SSL_CRYPTO_LIBNAME = libeay32MD > > etc. > Verify that there are files named i.e. > /c/OpenSSL/lib/VC/static/libeay32MD.lib on your system. Aha. This appears to be the issue, I don't have MD.lib. Reverting to following the OTP instructions verbatim, on a fresh VS2012 & 0.9.8r still doesn't produce the required files, nor using SDK 7.1 either. I've been using a script to build OpenSSL, and the only difference is that I've called `call ms\do_nasm` instead of your `ms\do_ms`. Both have same result. Any suggestions on what I might be missing? A+ Dave From pan@REDACTED Fri Dec 7 16:45:50 2012 From: pan@REDACTED (Patrik Nyblom) Date: Fri, 7 Dec 2012 16:45:50 +0100 Subject: [erlang-bugs] Unable to build OTP with static crypto for Win32 In-Reply-To: References: <50BF15E0.8010801@erlang.org> Message-ID: <50C20F2E.9040600@erlang.org> Hi! Oh, I think I mislead you with variables from my weirdly set up development machine. Using the build instructions, you will only get static libraries with that OpenSSL version, libraries named libeay32.lib and ssleay32.lib. Look in the Makefile, what it has found out, and check that those libraries are present. Check that the found files have the approximate right size (see below). Then clean and rebuild everything, don't forget otp_build release and otp_build installer_win32. Send a link to the new build log if it still fails. You should see libraries in e.g. /c/OpenSSL with approximately the following sizes (on win32): libeay32.lib: 3,921 KB ssleay32.lib: 706 KB or (on win64) libeay32.lib: 5243 KB ssleay32.lib: 884 KB - while dynamic libraries would be much smaller, for example the dynamic libraries I have on my development machines are only 756 end 56 KB respectively. Cheers, /Patrik On 12/07/2012 03:12 PM, Dave Cottlehuber wrote: > Hi Patrik! > > Thanks for making the entire build chain work on mingw, it is > noticeably faster than cygwin for building OTP for me. > >>> As far as I can tell, I'm configuring correctly with >>> `--disable-dynamic-ssl-lib` but the resulting installer doesn't >>> function without the two OpenSSL DLLs present. In fact, I've never had >>> this working, ever. Is it actually possible? > Please accept my apologies, what I should have said was "I'm not able > to create an OTP release that supports static OpenSSL"! > > I'm doing almost the same as OTP, just with the assembler > optimisations using nasm instead of the MS non-optimised one.. > > https://github.com/dch/glazier/blob/master/bin/build_openssl.cmd#L26-L30 > >> $ERL_TOP/HOWTO/INSTALL-WIN32.md, you have a huge instruction on how to build >> a static OpenSSL lib using the MSVC command prompt. You then use that static >> lib in your build of Erlang/OTP. You will never need to install any OpenSSL >> dynamic libs. Verify that you got the lib files and that the correct OpenSSL >> installation reside in the directory where configure think it should be (in >> your case /c/OpenSSL) . You can point out the correct installation directory >> during configure (otp_build configure --with-ssl-dir=...) Then *clean the >> crypto application* (cd lib/crypto && make clean), the build logs you >> provide does not rebuild so it's hard to see if the build does the right >> thing. Then do do otp_build configure, otp_build boot -a , otp_build release >> -a and otp_build installer_win32. >> Also, if you look at lib/crypto/c_src/win32/Makefile after configure is run >> , you should see lines looking like: >> SSL_LIBDIR = /c/OpenSSL/lib/VC/static >> SSL_CRYPTO_LIBNAME = libeay32MD >> >> etc. >> Verify that there are files named i.e. >> /c/OpenSSL/lib/VC/static/libeay32MD.lib on your system. > Aha. This appears to be the issue, I don't have MD.lib. > > Reverting to following the OTP instructions verbatim, on a fresh > VS2012 & 0.9.8r still doesn't produce the required files, nor using > SDK 7.1 either. > > I've been using a script to build OpenSSL, and the only difference is > that I've called `call ms\do_nasm` instead of your `ms\do_ms`. Both > have same result. > > Any suggestions on what I might be missing? > > A+ > Dave From tony@REDACTED Sun Dec 9 04:41:53 2012 From: tony@REDACTED (Tony Wallace) Date: Sun, 09 Dec 2012 16:41:53 +1300 Subject: [erlang-bugs] error compiling w3c schemas with xmerl Message-ID: <50C40881.7040805@tony.gen.nz> Hello I created a directory for schemas with an index file. The contents of this directory is: /tony@REDACTED:~/workspace/myXformProject$ ls schemas/ /SchemaList.txt XForms-Schema.xsd xhtml-lat1.ent xml-events.xsd/ /SchemaList.txt~ xhtml1-strict.dtd xhtml-special.ent/ These schema's have been downloaded from w3c. However these schemas would not compile yielding the error wfc_PEs_In_Internal_Subset. I would have expected these well established w3c schemas to compile with xmerl. 6> B. [{"http://www.w3.org/1999/xhtml", "schemas/xhtml1-strict.dtd"}, {"http://www.w3.org/2001/xml-events", "schemas/xml-events.xsd"}, {"http://www.w3.org/2002/xforms", "schemas/XForms-Schema.xsd"}] 9> {ok,S1} = xmerl_xsd:process_schemas(B). 3450- fatal: {error,{wfc_PEs_In_Internal_Subset}} ** exception exit: {fatal,{{error,{wfc_PEs_In_Internal_Subset}}, {file,"schemas/xhtml1-strict.dtd"}, {line,628}, {col,89}}} in function xmerl_scan:fatal/2 in call from xmerl_scan:scan_entity/2 in call from xmerl_scan:scan_markup_decl/2 in call from xmerl_scan:scan_ext_subset/2 in call from xmerl_scan:scan_document/2 in call from xmerl_scan:file/2 in call from xmerl_xsd:process_schemas/2 The version information from xmerl_scan is: -module(xmerl_scan). -vsn('0.20'). -date('03-09-16'). The 3450 refers to the code line in xmerl_scan: scan_entity_value("%" ++ _T,S=#xmerl_scanner{environment=prolog},_,_,_,_,_) -> ?fatal({error,{wfc_PEs_In_Internal_Subset}},S); Whereas line 628 refers to the dtd, the line starting 627 628 635 -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthias@REDACTED Mon Dec 10 01:01:30 2012 From: matthias@REDACTED (Matthias Lang) Date: Mon, 10 Dec 2012 01:01:30 +0100 Subject: [erlang-bugs] http://www.erlang.org/faq/academic.html - section 10.23 with respect to shared heap needs correction? In-Reply-To: References: Message-ID: <20121210000130.GA4612@corelatus.se> On Thursday, November 29, Joseph Wayne Norton wrote: > I noticed the last statement of section 10.23 with respect to shared > heap. It seems this feature is no longer supported or is > undocumented? If it is still present, could you point me to the > appropriate documentation? The shared/hybrid heaps are gone as of at least ERTS 5.9.2, according to the release notes (OTP-10105): http://www.erlang.org/doc/apps/erts/notes.html Thanks for pointing out that the FAQ still talks about them. For now, I've fixed it by removing the last paragraph from the FAQ entry. It'll automagically propagate to erlang.org, probably within an hour. To someone who knows: is the description of GC in that section still correct? Specifically, does GC of one process still freeze all other processes, even when running SMP? If your (Joseph's) main interest in this is trying out the shared and hybrid heap, then the only way I know of is to go back to the Erlang versions which supported it. A good start is probably ERTS 5.4.9 (somewhere in the R10B-x series). Here's one I had lying around: otp_src_R10B-10 >bin/erl -hybrid Erlang (BEAM) emulator version 5.4.13 [64-bit] [source] [hybrid heap] A bit of googling gives some background to the shared/hybrid emulator's fate: http://erlang.org/pipermail/erlang-questions/2009-September/046428.html http://www.trapexit.org/forum/viewtopic.php?t=2956&sid=d3c1c76aa5e04465902c071efa7195ba Matt From wallentin.dahlberg@REDACTED Mon Dec 10 02:04:39 2012 From: wallentin.dahlberg@REDACTED (=?ISO-8859-1?Q?Bj=F6rn=2DEgil_Dahlberg?=) Date: Mon, 10 Dec 2012 02:04:39 +0100 Subject: [erlang-bugs] http://www.erlang.org/faq/academic.html - section 10.23 with respect to shared heap needs correction? In-Reply-To: <20121210000130.GA4612@corelatus.se> References: <20121210000130.GA4612@corelatus.se> Message-ID: 2012/12/10 Matthias Lang > On Thursday, November 29, Joseph Wayne Norton wrote: > > > I noticed the last statement of section 10.23 with respect to shared > > heap. It seems this feature is no longer supported or is > > undocumented? If it is still present, could you point me to the > > appropriate documentation? > > The shared/hybrid heaps are gone as of at least ERTS 5.9.2, according > to the release notes (OTP-10105): > > http://www.erlang.org/doc/apps/erts/notes.html > > Thanks for pointing out that the FAQ still talks about them. For now, > I've fixed it by removing the last paragraph from the FAQ entry. It'll > automagically propagate to erlang.org, probably within an hour. > > To someone who knows: is the description of GC in that section still > correct? Specifically, does GC of one process still freeze all other > processes, even when running SMP? > It will "freeze" the scheduler thread running that particular process. It will also "freeze" any other process in the run-queue coupled to that particular thread. This is not any big problem unless the heaps of the process are unusually large. Any other threads and processes are unaffected. // Bj?rn-Egil > > If your (Joseph's) main interest in this is trying out the shared and > hybrid heap, then the only way I know of is to go back to the Erlang > versions which supported it. A good start is probably ERTS 5.4.9 > (somewhere in the R10B-x series). Here's one I had lying around: > > otp_src_R10B-10 >bin/erl -hybrid > Erlang (BEAM) emulator version 5.4.13 [64-bit] [source] [hybrid heap] > > A bit of googling gives some background to the shared/hybrid > emulator's fate: > > http://erlang.org/pipermail/erlang-questions/2009-September/046428.html > > http://www.trapexit.org/forum/viewtopic.php?t=2956&sid=d3c1c76aa5e04465902c071efa7195ba > > Matt > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > -------------- next part -------------- An HTML attachment was scrubbed... URL: From soichi777@REDACTED Mon Dec 10 11:41:47 2012 From: soichi777@REDACTED (ishi soichi) Date: Mon, 10 Dec 2012 19:41:47 +0900 Subject: [erlang-bugs] erlang bug for couchdb? Message-ID: Hi. I have no knowledge of erlang unfortunately. But recently I started using couchdb. I have asked the same question in couchdb mailing list but not answer. Also I have seen similar error outputs on the web mentioning it is erlang bug, so I guessed it might be as well. My environment MacOSX 10.6 SnowLeopard Python2.7.3 virtualenv latest I have performed a python code https://github.com/ptwobrussell/Mining-the-Social-Web/blob/master/python_code/the_tweet__count_entities_in_tweets.py for couchdb, and it gave a long error message Since it is too long, I have omitted a large portion of it. If you need to see the full message, please let me know. Do you think it's a bug in erlang? soichi the error starts here =CRASH REPORT==== 10-Dec-2012::19:39:43 === crasher: initial call: couch_os_process:init/1 pid: <0.186.0> registered_name: [] exception exit: {function_clause, [{couch_os_process,handle_info, [{#Port<0.2608>,{data,{eol,<<"[[]]">>}}}, {os_proc, "/Users/soichi/domains/py27/bin/couchpy", #Port<0.2608>, #Fun, #Fun,5000}], [{file,"couch_os_process.erl"},{line,207}]}, {gen_server,handle_msg,5, [{file,"gen_server.erl"},{line,607}]}, {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,227}]}]} in function gen_server:terminate/6 (gen_server.erl, line 747) ancestors: [couch_query_servers,couch_secondary_services, couch_server_sup,<0.31.0>] messages: [] links: [<0.185.0>] dictionary: [] trap_exit: false status: running heap_size: 2584 stack_size: 24 reductions: 2868 neighbours: neighbour: [{pid,<0.184.0>}, {registered_name,[]}, {initial_call,{couch_work_queue,init,['Argument__1']}}, {current_function,{gen_server,loop,6}}, {ancestors,[<0.182.0>]}, {messages,[]}, {links,[<0.182.0>]}, {dictionary,[]}, {trap_exit,false}, {status,waiting}, {heap_size,233}, {stack_size,9}, {reductions,37}] neighbour: [{pid,<0.188.0>}, {registered_name,[]}, {initial_call,{erlang,apply,2}}, {current_function,{gen,do_call,4}}, {ancestors,[]}, {messages,[]}, {links,[<0.182.0>]}, {dictionary, [{task_status_props, [{changes_done,0}, {database,<<"tweets-user-timeline-ikedanob">>}, {design_document,<<"_design/index">>}, {progress,0}, {started_on,1355135983}, {total_changes,106}, {type,indexer}, {updated_on,1355135983}]}, {task_status_update,{{0,0,0},500000}}]}, .... omitted... =CRASH REPORT==== 10-Dec-2012::19:26:32 === crasher: initial call: couch_file:init/1 pid: <0.1184.0> registered_name: [] exception exit: {function_clause, [{couch_os_process,handle_info, [{#Port<0.2717>,{data,{eol,<<"[[]]">>}}}, {os_proc, "/Users/soichi/domains/py27/bin/couchpy", #Port<0.2717>, #Fun, #Fun,5000}], [{file,"couch_os_process.erl"},{line,207}]}, {gen_server,handle_msg,5, [{file,"gen_server.erl"},{line,607}]}, {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,227}]}]} in function gen_server:terminate/6 (gen_server.erl, line 747) ancestors: [<0.1183.0>,<0.1182.0>] messages: [{'EXIT',<0.1186.0>,shutdown}] links: [] dictionary: [] trap_exit: true status: running heap_size: 987 stack_size: 24 reductions: 1164 neighbours: -------------- next part -------------- An HTML attachment was scrubbed... URL: From hq@REDACTED Tue Dec 11 07:33:23 2012 From: hq@REDACTED (Adam Rutkowski) Date: Tue, 11 Dec 2012 07:33:23 +0100 Subject: [erlang-bugs] erlang bug for couchdb? In-Reply-To: References: Message-ID: <5A77799F-869D-47D2-8A08-5247311EDD18@mtod.org> On 10 Dec 2012, at 11:41, ishi soichi wrote: > Do you think it's a bug in erlang? I don't think so. The error message clearly says the CouchDB's couch_os_process doesn't handle message in the form of {data,{eol,<<"[[]]">>}}. Looks like perfectly normal behavior to me - the process crashes upon receiving unknown message. So it's either something wrong with the client you're using or CouchDB should be able to handle this. CouchDB issues can be reported here: https://issues.apache.org/jira/browse/CouchDB -- Adam From essen@REDACTED Wed Dec 12 13:21:01 2012 From: essen@REDACTED (=?ISO-8859-1?Q?Lo=EFc_Hoguin?=) Date: Wed, 12 Dec 2012 13:21:01 +0100 Subject: [erlang-bugs] Weird documentation of distributed applications Message-ID: <50C876AD.7000704@ninenines.eu> Hello, I got pointed at a weirdness in the documentation by Eric Pailleau. This chapter: http://www.erlang.org/doc/design_principles/distributed_applications.html#id74957 says: "The system configuration files for cp2@REDACTED and cp3@REDACTED are identical, except for the list of mandatory nodes which should be [cp1@REDACTED, cp3@REDACTED] for cp2@REDACTED and [cp1@REDACTED, cp2@REDACTED] for cp3@REDACTED" First, that's incredibly clumsy. Having a different configuration file per node like this is just impractical if you're going to have 50 of them. But looking at the code it appears you can actually put [cp1@REDACTED, cp2@REDACTED, cp3@REDACTED] everywhere, because all this does is to ping everything and make sure they're up before continuing. A ping to yourself *does* work so that makes the whole sentence pointless. What to do? -- Lo?c Hoguin Erlang Cowboy Nine Nines http://ninenines.eu From essen@REDACTED Mon Dec 17 11:23:21 2012 From: essen@REDACTED (=?ISO-8859-1?Q?Lo=EFc_Hoguin?=) Date: Mon, 17 Dec 2012 11:23:21 +0100 Subject: [erlang-bugs] Compiler bug on R15B03 Message-ID: <50CEF299.3070209@ninenines.eu> Hello, We have found a couple issues when compiling and executing a certain module. Reproduced on 2 machines. Mine was running ArchLinux 64bits. This file: https://raw.github.com/extend/bank_mysql/master/src/bank_mysql.erl You can compile it with -compile(export_all) to quickly test the issue. Load the beam in the R15B03 VM and then run: > bank_mysql:params_to_bin([123,21340949]). You'll get a weird failure in a clause where one of these integers appears to be a binary (line 549). That's not true though, because if you io:format/2 there nothing will appear. The code properly goes through the clause for integers. It's not related to the values, it also does it if it's something other than integers (like a datetime tuple and a binary). The closest-to-failure call reproducing this issue, found by tracing, is: > bank_mysql:params_to_bin([21340949], <<0:1>>, <<8,0>>,<<123,0,0,0,0,0,0,0,0>>). I have no idea what happens there. The line given is definitely not the right one, and the code worked fine on R15B01, no reasons it shouldn't on R15B03. Other weirdness, I wanted to try editing the .S file, adding {line, [...]} clauses in order to find exactly where it failed, but it seems I can't even compile the .S file generated at all. I get this error: % erlc -S src/bank_mysql.erl % erlc bank_mysql.S Function: connect/5 bank_mysql.S:none: internal error in beam_block; crash reason: {{case_clause, {'EXIT', {function_clause, [{beam_utils,live_opt, [[{init,{y,0}},{allocate,4,5},{label,2}], 31, {10, {8,1, {6,1, {1,31,nil,{5,1,{4,0,{3,0,nil,nil},nil},nil}}, {7,1,nil,nil}}, {10,1,{9,1,nil,nil},{11,1,nil,nil}}}}, [{block, [{'%live',5}, {set,[{y,3}],[{x,2}],move}, {set, [{x,2}], [{literal,[binary,{active,false},{packet,raw}]}], move}, {set,[{y,1}],[{x,4}],move}, {set,[{y,2}],[{x,3}],move}, {'%live',3}]}, {line,[{location,"src/bank_mysql.erl",158}]}, {call_ext,3,{extfunc,gen_tcp,connect,3}}, {test,is_tuple,{f,5},[{x,0}]}, {test,test_arity,{f,5},[{x,0},2]}, {block, [{'%live',1}, {set,[{x,1}],[{x,0}],{get_tuple_element,0}}, {set,[{x,2}],[{x,0}],{get_tuple_element,1}}, {'%live',3}]}, {test,is_eq_exact,{f,5},[{x,1},{atom,ok}]}, {block, [{'%live',3}, {set,[],[],{alloc,3,{nozero,nostack,9,[]}}}, {set,[{x,0}],[],{put_tuple,8}}, {set,[],[{atom,mysql_client}],put}, {set,[],[{x,2}],put}, {set,[],[{literal,<<>>}],put}, {set,[],[{integer,0}],put}, {set,[],[{atom,ready}],put}, {set,[],[nil],put}, {set,[],[{integer,5000}],put}, {set,[],[{integer,100000}],put}, {'%live',1}]}, {line,[{location,"src/bank_mysql.erl",161}]}, {call,1,{f,264}}, {test,is_tuple,{f,6},[{x,0}]}, {test,test_arity,{f,6},[{x,0},3]}, {block, [{'%live',1}, {set,[{x,1}],[{x,0}],{get_tuple_element,0}}, {set,[{x,2}],[{x,0}],{get_tuple_element,1}}, {set,[{x,3}],[{x,0}],{get_tuple_element,2}}, {'%live',4}]}, {test,is_eq_exact,{f,6},[{x,1},{atom,ok}]}, {block, [{'%live',4}, {set,[{x,0}],[{x,2}],move}, {set,[{y,0}],[{x,3}],move}, {'%live',1}]}, {line,[{location,"src/bank_mysql.erl",163}]}, {call,1,{f,68}}, {test,is_tuple,{f,7},[{x,0}]}, {test,test_arity,{f,7},[{x,0},9]}, {block, [{'%live',1}, {set,[{x,1}],[{x,0}],{get_tuple_element,0}}, {set,[{x,2}],[{x,0}],{get_tuple_element,4}}, {set,[{x,3}],[{x,0}],{get_tuple_element,6}}, {'%live',4}]}, {test,is_eq_exact,{f,7},[{x,1},{atom,ok}]}, {block, [{'%live',4}, {set,[{x,4}],[{x,3}],move}, {set,[{x,3}],[{x,2}],move}, {set,[{x,2}],[{y,1}],move}, {set,[{x,1}],[{y,2}],move}, {set,[{x,5}],[{y,0}],move}, {set,[{x,0}],[{y,3}],move}, {'%live',6}]}, {kill,{y,0}}, {kill,{y,1}}, {kill,{y,2}}, {kill,{y,3}}, {line,[{location,"src/bank_mysql.erl",164}]}, {call,6,{f,183}}, {test,is_tuple,{f,8},[{x,0}]}, {test,test_arity,{f,8},[{x,0},2]}, {block, [{'%live',1}, {set,[{x,1}],[{x,0}],{get_tuple_element,0}}, {set,[{x,2}],[{x,0}],{get_tuple_element,1}}, {'%live',3}]}, {test,is_eq_exact,{f,8},[{x,1},{atom,ok}]}, {block, [{'%live',3},{set,[{x,0}],[{x,2}],move},{'%live',1}]}, {line,[{location,"src/bank_mysql.erl",166}]}, {call,1,{f,264}}, {test,is_tuple,{f,9},[{x,0}]}, {test,test_arity,{f,9},[{x,0},3]}, {block, [{'%live',1}, {set,[{x,1}],[{x,0}],{get_tuple_element,0}}, {set,[{x,2}],[{x,0}],{get_tuple_element,1}}, {set,[{x,3}],[{x,0}],{get_tuple_element,2}}, {'%live',4}]}, {test,is_eq_exact,{f,9},[{x,1},{atom,ok}]}, {block, [{'%live',4}, {set,[{x,0}],[{x,2}],move}, {set,[{y,2}],[{x,3}],move}, {set,[{y,3}],[{x,0}],move}, {'%live',1}]}, {line,[{location,"src/bank_mysql.erl",167}]}, {call,1,{f,62}}, {test,is_atom,{f,10},[{x,0}]}, {select_val, {x,0}, {f,10}, {list,[{atom,error},{f,3},{atom,ok},{f,4}]}}, {label,3}, {block, [{'%live',0},{set,[{x,0}],[{y,3}],move},{'%live',1}]}, {call_last,1,{f,168},4}, {label,4}, {block, [{'%live',0},{set,[{x,0}],[{y,3}],move},{'%live',1}]}, {kill,{y,3}}, {line,[{location,"src/bank_mysql.erl",169}]}, {call,1,{f,74}}, {test,is_tuple,{f,11},[{x,0}]}, {test,test_arity,{f,11},[{x,0},6]}, {block, [{'%live',1}, {set,[{x,1}],[{x,0}],{get_tuple_element,0}}, {set,[{x,2}],[{x,0}],{get_tuple_element,1}}, {set,[{x,3}],[{x,0}],{get_tuple_element,2}}, {set,[{x,4}],[{x,0}],{get_tuple_element,4}}, {set,[{x,5}],[{x,0}],{get_tuple_element,5}}, {'%live',6}]}, {test,is_eq_exact,{f,11},[{x,1},{atom,ok}]}, {test,is_eq_exact,{f,11},[{x,2},{integer,0}]}, {test,is_eq_exact,{f,11},[{x,3},{integer,0}]}, {test,is_eq_exact,{f,11},[{x,4},{integer,0}]}, {test,is_eq_exact,{f,11},[{x,5},{literal,<<>>}]}, {block, [{'%live',0}, {set,[],[],{alloc,0,{nozero,nostack,3,[]}}}, {set,[{x,0}],[],{put_tuple,2}}, {set,[],[{atom,ok}],put}, {set,[],[{y,2}],put}, {'%live',1}]}, {deallocate,4}, return, {label,5}, {line,[{location,"src/bank_mysql.erl",158}]}, {badmatch,{x,0}}, {label,6}, {line,[{location,"src/bank_mysql.erl",161}]}, {badmatch,{x,0}}, {label,7}, {line,[{location,"src/bank_mysql.erl",163}]}, {badmatch,{x,0}}, {label,8}, {line,[{location,"src/bank_mysql.erl",164}]}, {badmatch,{x,0}}, {label,9}, {line,[{location,"src/bank_mysql.erl",166}]}, {badmatch,{x,0}}, {label,10}, {line,[{location,"src/bank_mysql.erl",167}]}, {case_end,{x,0}}, {label,11}, {line,[{location,"src/bank_mysql.erl",169}]}, {badmatch,{x,0}}]], [{file,"beam_utils.erl"},{line,654}]}, {beam_utils,live_opt,1, [{file,"beam_utils.erl"},{line,205}]}, {beam_block,function,2,[{file,"beam_block.erl"},{line,41}]}, {lists,mapfoldl,3,[{file,"lists.erl"},{line,1278}]}, {beam_block,module,2,[{file,"beam_block.erl"},{line,29}]}, {compile,'-select_passes/2-anonymous-2-',2, [{file,"compile.erl"},{line,473}]}, {compile,'-internal_comp/4-anonymous-1-',2, [{file,"compile.erl"},{line,273}]}, {compile,fold_comp,3,[{file,"compile.erl"},{line,291}]}]}}}, [{compile,'-select_passes/2-anonymous-2-',2, [{file,"compile.erl"},{line,473}]}, {compile,'-internal_comp/4-anonymous-1-',2, [{file,"compile.erl"},{line,273}]}, {compile,fold_comp,3,[{file,"compile.erl"},{line,291}]}, {compile,internal_comp,4,[{file,"compile.erl"},{line,275}]}, {compile,'-do_compile/2-anonymous-0-',2, [{file,"compile.erl"},{line,152}]}]} I'm even more lost with that. Can you please tell me if you can reproduce this? Hope I can help get this resolved. Thanks. -- Lo?c Hoguin Erlang Cowboy Nine Nines http://ninenines.eu From kostis@REDACTED Mon Dec 17 11:48:58 2012 From: kostis@REDACTED (Kostis Sagonas) Date: Mon, 17 Dec 2012 11:48:58 +0100 Subject: [erlang-bugs] Undetected undefined remote function calls Message-ID: <50CEF89A.2070203@cs.ntua.gr> Shouldn't the compiler be complaining that the module below contains an undefined function? (*) %%=================== -module(foo). -export([test/0]). test() -> foo:bar(). %%=================== Kostis (*) Or is this treated as a call to a "future" version of the module? :P From ulf@REDACTED Mon Dec 17 12:42:45 2012 From: ulf@REDACTED (Ulf Wiger) Date: Mon, 17 Dec 2012 12:42:45 +0100 Subject: [erlang-bugs] Undetected undefined remote function calls In-Reply-To: <50CEF89A.2070203@cs.ntua.gr> References: <50CEF89A.2070203@cs.ntua.gr> Message-ID: <35068F1B-D812-4E8E-8F74-D152421B0DEF@feuerlabs.com> Well, in the context of code loading, it is certainly possible, although more than a little weird. :) The function sys:do_change_code/5 in OTP stdlib specifically relies on calling a function in the newly loaded version of a module, with data from the old version. It is of course calling a function that is well understood in advance, but this is only by convention, and not easily checkable*. Let's say the presence of the exported function bar() indicates that a certain feature is supported, and the function looks like this: test() -> case erlang:function_exported(foo, bar, 0) of true -> foo:bar(); false -> ok end. Should the compiler still warn? (I'm open to the answer "yes". I don't think the above is a good solution). BR, Ulf W * It's of course easy if both versions of the module are available for analysis, but they seldom are in practice. On 17 Dec 2012, at 11:48, Kostis Sagonas wrote: > Shouldn't the compiler be complaining that the module below contains an undefined function? (*) > > %%=================== > -module(foo). > -export([test/0]). > > test() -> > foo:bar(). > %%=================== > > Kostis > > (*) Or is this treated as a call to a "future" version of the module? :P > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc. http://feuerlabs.com From essen@REDACTED Mon Dec 17 13:24:18 2012 From: essen@REDACTED (=?ISO-8859-1?Q?Lo=EFc_Hoguin?=) Date: Mon, 17 Dec 2012 13:24:18 +0100 Subject: [erlang-bugs] Compiler bug on R15B03 In-Reply-To: <50CEF299.3070209@ninenines.eu> References: <50CEF299.3070209@ninenines.eu> Message-ID: <50CF0EF2.7050503@ninenines.eu> On 12/17/2012 11:23 AM, Lo?c Hoguin wrote: > Hello, > > We have found a couple issues when compiling and executing a certain > module. Reproduced on 2 machines. Mine was running ArchLinux 64bits. > > This file: > > https://raw.github.com/extend/bank_mysql/master/src/bank_mysql.erl > > You can compile it with -compile(export_all) to quickly test the issue. > Load the beam in the R15B03 VM and then run: > > > bank_mysql:params_to_bin([123,21340949]). > > You'll get a weird failure in a clause where one of these integers > appears to be a binary (line 549). That's not true though, because if > you io:format/2 there nothing will appear. The code properly goes > through the clause for integers. > > It's not related to the values, it also does it if it's something other > than integers (like a datetime tuple and a binary). The > closest-to-failure call reproducing this issue, found by tracing, is: > > > bank_mysql:params_to_bin([21340949], <<0:1>>, > <<8,0>>,<<123,0,0,0,0,0,0,0,0>>). > > I have no idea what happens there. The line given is definitely not the > right one, and the code worked fine on R15B01, no reasons it shouldn't > on R15B03. > > Other weirdness, I wanted to try editing the .S file, adding {line, > [...]} clauses in order to find exactly where it failed, but it seems I > can't even compile the .S file generated at all. I get this error: hq1 on IRC tried a few more things and found that this is the 2nd parameter causing the crash. If it's any bitstring with a size that isn't a multiple of 8 it fails. So bitstrings are apparently broken. Thanks. > % erlc -S src/bank_mysql.erl > % erlc bank_mysql.S > Function: connect/5 > bank_mysql.S:none: internal error in beam_block; > crash reason: {{case_clause, > {'EXIT', > {function_clause, > [{beam_utils,live_opt, > [[{init,{y,0}},{allocate,4,5},{label,2}], > 31, > {10, > {8,1, > {6,1, > {1,31,nil,{5,1,{4,0,{3,0,nil,nil},nil},nil}}, > {7,1,nil,nil}}, > {10,1,{9,1,nil,nil},{11,1,nil,nil}}}}, > [{block, > [{'%live',5}, > {set,[{y,3}],[{x,2}],move}, > {set, > [{x,2}], > [{literal,[binary,{active,false},{packet,raw}]}], > move}, > {set,[{y,1}],[{x,4}],move}, > {set,[{y,2}],[{x,3}],move}, > {'%live',3}]}, > {line,[{location,"src/bank_mysql.erl",158}]}, > {call_ext,3,{extfunc,gen_tcp,connect,3}}, > {test,is_tuple,{f,5},[{x,0}]}, > {test,test_arity,{f,5},[{x,0},2]}, > {block, > [{'%live',1}, > {set,[{x,1}],[{x,0}],{get_tuple_element,0}}, > {set,[{x,2}],[{x,0}],{get_tuple_element,1}}, > {'%live',3}]}, > {test,is_eq_exact,{f,5},[{x,1},{atom,ok}]}, > {block, > [{'%live',3}, > {set,[],[],{alloc,3,{nozero,nostack,9,[]}}}, > {set,[{x,0}],[],{put_tuple,8}}, > {set,[],[{atom,mysql_client}],put}, > {set,[],[{x,2}],put}, > {set,[],[{literal,<<>>}],put}, > {set,[],[{integer,0}],put}, > {set,[],[{atom,ready}],put}, > {set,[],[nil],put}, > {set,[],[{integer,5000}],put}, > {set,[],[{integer,100000}],put}, > {'%live',1}]}, > {line,[{location,"src/bank_mysql.erl",161}]}, > {call,1,{f,264}}, > {test,is_tuple,{f,6},[{x,0}]}, > {test,test_arity,{f,6},[{x,0},3]}, > {block, > [{'%live',1}, > {set,[{x,1}],[{x,0}],{get_tuple_element,0}}, > {set,[{x,2}],[{x,0}],{get_tuple_element,1}}, > {set,[{x,3}],[{x,0}],{get_tuple_element,2}}, > {'%live',4}]}, > {test,is_eq_exact,{f,6},[{x,1},{atom,ok}]}, > {block, > [{'%live',4}, > {set,[{x,0}],[{x,2}],move}, > {set,[{y,0}],[{x,3}],move}, > {'%live',1}]}, > {line,[{location,"src/bank_mysql.erl",163}]}, > {call,1,{f,68}}, > {test,is_tuple,{f,7},[{x,0}]}, > {test,test_arity,{f,7},[{x,0},9]}, > {block, > [{'%live',1}, > {set,[{x,1}],[{x,0}],{get_tuple_element,0}}, > {set,[{x,2}],[{x,0}],{get_tuple_element,4}}, > {set,[{x,3}],[{x,0}],{get_tuple_element,6}}, > {'%live',4}]}, > {test,is_eq_exact,{f,7},[{x,1},{atom,ok}]}, > {block, > [{'%live',4}, > {set,[{x,4}],[{x,3}],move}, > {set,[{x,3}],[{x,2}],move}, > {set,[{x,2}],[{y,1}],move}, > {set,[{x,1}],[{y,2}],move}, > {set,[{x,5}],[{y,0}],move}, > {set,[{x,0}],[{y,3}],move}, > {'%live',6}]}, > {kill,{y,0}}, > {kill,{y,1}}, > {kill,{y,2}}, > {kill,{y,3}}, > {line,[{location,"src/bank_mysql.erl",164}]}, > {call,6,{f,183}}, > {test,is_tuple,{f,8},[{x,0}]}, > {test,test_arity,{f,8},[{x,0},2]}, > {block, > [{'%live',1}, > {set,[{x,1}],[{x,0}],{get_tuple_element,0}}, > {set,[{x,2}],[{x,0}],{get_tuple_element,1}}, > {'%live',3}]}, > {test,is_eq_exact,{f,8},[{x,1},{atom,ok}]}, > {block, > > [{'%live',3},{set,[{x,0}],[{x,2}],move},{'%live',1}]}, > {line,[{location,"src/bank_mysql.erl",166}]}, > {call,1,{f,264}}, > {test,is_tuple,{f,9},[{x,0}]}, > {test,test_arity,{f,9},[{x,0},3]}, > {block, > [{'%live',1}, > {set,[{x,1}],[{x,0}],{get_tuple_element,0}}, > {set,[{x,2}],[{x,0}],{get_tuple_element,1}}, > {set,[{x,3}],[{x,0}],{get_tuple_element,2}}, > {'%live',4}]}, > {test,is_eq_exact,{f,9},[{x,1},{atom,ok}]}, > {block, > [{'%live',4}, > {set,[{x,0}],[{x,2}],move}, > {set,[{y,2}],[{x,3}],move}, > {set,[{y,3}],[{x,0}],move}, > {'%live',1}]}, > {line,[{location,"src/bank_mysql.erl",167}]}, > {call,1,{f,62}}, > {test,is_atom,{f,10},[{x,0}]}, > {select_val, > {x,0}, > {f,10}, > {list,[{atom,error},{f,3},{atom,ok},{f,4}]}}, > {label,3}, > {block, > > [{'%live',0},{set,[{x,0}],[{y,3}],move},{'%live',1}]}, > {call_last,1,{f,168},4}, > {label,4}, > {block, > > [{'%live',0},{set,[{x,0}],[{y,3}],move},{'%live',1}]}, > {kill,{y,3}}, > {line,[{location,"src/bank_mysql.erl",169}]}, > {call,1,{f,74}}, > {test,is_tuple,{f,11},[{x,0}]}, > {test,test_arity,{f,11},[{x,0},6]}, > {block, > [{'%live',1}, > {set,[{x,1}],[{x,0}],{get_tuple_element,0}}, > {set,[{x,2}],[{x,0}],{get_tuple_element,1}}, > {set,[{x,3}],[{x,0}],{get_tuple_element,2}}, > {set,[{x,4}],[{x,0}],{get_tuple_element,4}}, > {set,[{x,5}],[{x,0}],{get_tuple_element,5}}, > {'%live',6}]}, > {test,is_eq_exact,{f,11},[{x,1},{atom,ok}]}, > {test,is_eq_exact,{f,11},[{x,2},{integer,0}]}, > {test,is_eq_exact,{f,11},[{x,3},{integer,0}]}, > {test,is_eq_exact,{f,11},[{x,4},{integer,0}]}, > {test,is_eq_exact,{f,11},[{x,5},{literal,<<>>}]}, > {block, > [{'%live',0}, > {set,[],[],{alloc,0,{nozero,nostack,3,[]}}}, > {set,[{x,0}],[],{put_tuple,2}}, > {set,[],[{atom,ok}],put}, > {set,[],[{y,2}],put}, > {'%live',1}]}, > {deallocate,4}, > return, > {label,5}, > {line,[{location,"src/bank_mysql.erl",158}]}, > {badmatch,{x,0}}, > {label,6}, > {line,[{location,"src/bank_mysql.erl",161}]}, > {badmatch,{x,0}}, > {label,7}, > {line,[{location,"src/bank_mysql.erl",163}]}, > {badmatch,{x,0}}, > {label,8}, > {line,[{location,"src/bank_mysql.erl",164}]}, > {badmatch,{x,0}}, > {label,9}, > {line,[{location,"src/bank_mysql.erl",166}]}, > {badmatch,{x,0}}, > {label,10}, > {line,[{location,"src/bank_mysql.erl",167}]}, > {case_end,{x,0}}, > {label,11}, > {line,[{location,"src/bank_mysql.erl",169}]}, > {badmatch,{x,0}}]], > [{file,"beam_utils.erl"},{line,654}]}, > {beam_utils,live_opt,1, > [{file,"beam_utils.erl"},{line,205}]}, > > {beam_block,function,2,[{file,"beam_block.erl"},{line,41}]}, > {lists,mapfoldl,3,[{file,"lists.erl"},{line,1278}]}, > > {beam_block,module,2,[{file,"beam_block.erl"},{line,29}]}, > {compile,'-select_passes/2-anonymous-2-',2, > [{file,"compile.erl"},{line,473}]}, > {compile,'-internal_comp/4-anonymous-1-',2, > [{file,"compile.erl"},{line,273}]}, > > {compile,fold_comp,3,[{file,"compile.erl"},{line,291}]}]}}}, > [{compile,'-select_passes/2-anonymous-2-',2, > [{file,"compile.erl"},{line,473}]}, > {compile,'-internal_comp/4-anonymous-1-',2, > [{file,"compile.erl"},{line,273}]}, > {compile,fold_comp,3,[{file,"compile.erl"},{line,291}]}, > > {compile,internal_comp,4,[{file,"compile.erl"},{line,275}]}, > {compile,'-do_compile/2-anonymous-0-',2, > [{file,"compile.erl"},{line,152}]}]} > > I'm even more lost with that. > > Can you please tell me if you can reproduce this? Hope I can help get > this resolved. > > Thanks. > -- Lo?c Hoguin Erlang Cowboy Nine Nines http://ninenines.eu From hq@REDACTED Mon Dec 17 13:30:25 2012 From: hq@REDACTED (Adam Rutkowski) Date: Mon, 17 Dec 2012 13:30:25 +0100 Subject: [erlang-bugs] Compiler bug on R15B03 In-Reply-To: <50CF0EF2.7050503@ninenines.eu> References: <50CEF299.3070209@ninenines.eu> <50CF0EF2.7050503@ninenines.eu> Message-ID: <58F4B4B7-7183-48DA-BED1-A0E6AA1E50A3@mtod.org> On 17 Dec 2012, at 13:24, Lo?c Hoguin wrote: > On 12/17/2012 11:23 AM, Lo?c Hoguin wrote: >> Hello, >> >> We have found a couple issues when compiling and executing a certain >> module. Reproduced on 2 machines. Mine was running ArchLinux 64bits. >> >> This file: >> >> https://raw.github.com/extend/bank_mysql/master/src/bank_mysql.erl >> >> You can compile it with -compile(export_all) to quickly test the issue. >> Load the beam in the R15B03 VM and then run: >> >> > bank_mysql:params_to_bin([123,21340949]). >> >> You'll get a weird failure in a clause where one of these integers >> appears to be a binary (line 549). That's not true though, because if >> you io:format/2 there nothing will appear. The code properly goes >> through the clause for integers. >> >> It's not related to the values, it also does it if it's something other >> than integers (like a datetime tuple and a binary). The >> closest-to-failure call reproducing this issue, found by tracing, is: >> >> > bank_mysql:params_to_bin([21340949], <<0:1>>, >> <<8,0>>,<<123,0,0,0,0,0,0,0,0>>). >> >> I have no idea what happens there. The line given is definitely not the >> right one, and the code worked fine on R15B01, no reasons it shouldn't >> on R15B03. >> >> Other weirdness, I wanted to try editing the .S file, adding {line, >> [...]} clauses in order to find exactly where it failed, but it seems I >> can't even compile the .S file generated at all. I get this error: > > hq1 on IRC tried a few more things and found that this is the 2nd parameter causing the crash. If it's any bitstring with a size that isn't a multiple of 8 it fails. So bitstrings are apparently broken. Minimal example to reproduce: -module(foo). -compile([export_all]). bar(A) -> <>. Erlang R15B02 (erts-5.9.2) [source] [smp:4:4] [async-threads:0] [hipe] [kernel-poll:false] Eshell V5.9.2 (abort with ^G) 1> c(foo). {ok,foo} 2> foo:bar(<<0:1>>). <<0:2>> 3> Erlang R15B03 (erts-5.9.3) [source] [smp:4:4] [async-threads:0] [hipe] [kernel-poll:false] Eshell V5.9.3 (abort with ^G) 1> c(foo). {ok,foo} 2> foo:bar(<<0:1>>). ** exception error: bad argument in function foo:bar/1 (foo.erl, line 5) -- Adam From essen@REDACTED Mon Dec 17 13:52:37 2012 From: essen@REDACTED (=?ISO-8859-1?Q?Lo=EFc_Hoguin?=) Date: Mon, 17 Dec 2012 13:52:37 +0100 Subject: [erlang-bugs] Compiler bug on R15B03 In-Reply-To: <58F4B4B7-7183-48DA-BED1-A0E6AA1E50A3@mtod.org> References: <50CEF299.3070209@ninenines.eu> <50CF0EF2.7050503@ninenines.eu> <58F4B4B7-7183-48DA-BED1-A0E6AA1E50A3@mtod.org> Message-ID: <50CF1595.2040706@ninenines.eu> On 12/17/2012 01:30 PM, Adam Rutkowski wrote: > > On 17 Dec 2012, at 13:24, Lo?c Hoguin wrote: > >> On 12/17/2012 11:23 AM, Lo?c Hoguin wrote: >>> Hello, >>> >>> We have found a couple issues when compiling and executing a certain >>> module. Reproduced on 2 machines. Mine was running ArchLinux 64bits. >>> >>> This file: >>> >>> https://raw.github.com/extend/bank_mysql/master/src/bank_mysql.erl >>> >>> You can compile it with -compile(export_all) to quickly test the issue. >>> Load the beam in the R15B03 VM and then run: >>> >>>> bank_mysql:params_to_bin([123,21340949]). >>> >>> You'll get a weird failure in a clause where one of these integers >>> appears to be a binary (line 549). That's not true though, because if >>> you io:format/2 there nothing will appear. The code properly goes >>> through the clause for integers. >>> >>> It's not related to the values, it also does it if it's something other >>> than integers (like a datetime tuple and a binary). The >>> closest-to-failure call reproducing this issue, found by tracing, is: >>> >>>> bank_mysql:params_to_bin([21340949], <<0:1>>, >>> <<8,0>>,<<123,0,0,0,0,0,0,0,0>>). >>> >>> I have no idea what happens there. The line given is definitely not the >>> right one, and the code worked fine on R15B01, no reasons it shouldn't >>> on R15B03. >>> >>> Other weirdness, I wanted to try editing the .S file, adding {line, >>> [...]} clauses in order to find exactly where it failed, but it seems I >>> can't even compile the .S file generated at all. I get this error: >> >> hq1 on IRC tried a few more things and found that this is the 2nd parameter causing the crash. If it's any bitstring with a size that isn't a multiple of 8 it fails. So bitstrings are apparently broken. > > > Minimal example to reproduce: > > > > -module(foo). > > -compile([export_all]). > > bar(A) -> > <>. > > > > Erlang R15B02 (erts-5.9.2) [source] [smp:4:4] [async-threads:0] [hipe] [kernel-poll:false] > > Eshell V5.9.2 (abort with ^G) > 1> c(foo). > {ok,foo} > 2> foo:bar(<<0:1>>). > <<0:2>> > 3> > > > Erlang R15B03 (erts-5.9.3) [source] [smp:4:4] [async-threads:0] [hipe] [kernel-poll:false] > > Eshell V5.9.3 (abort with ^G) > 1> c(foo). > {ok,foo} > 2> foo:bar(<<0:1>>). > ** exception error: bad argument > in function foo:bar/1 (foo.erl, line 5) Appending what Adam and me discussed on IRC: Likely cause: http://erlang.org/pipermail/erlang-bugs/2012-October/003154.html Works if changing /binary to /bitstring. Would have liked that change in R16, not R15B03. There's still two issues: * The completely wrong line number for the error. * The compiler not able to erlc -S bank_mysql.erl ; erlc bank_mysql.S -- Lo?c Hoguin Erlang Cowboy Nine Nines http://ninenines.eu From watson.timothy@REDACTED Mon Dec 17 14:52:27 2012 From: watson.timothy@REDACTED (Tim Watson) Date: Mon, 17 Dec 2012 13:52:27 +0000 Subject: [erlang-bugs] inet:getstat/2 broken on some 64bit platforms Message-ID: In the RabbitMQ Management Plugin (http://www.rabbitmq.com/management.html) we've seen _oct variants wrap after 2^32 bytes are sent, even on 64bit platforms. Looking at inet_drv.c, the code in inet_fill_stat assumes that unsigned longs are always 32 bit, which is a false assumption. The standard simply states that `unsigend long' is not smaller than 32 bits (or despite theoretically being untrue, that in practical use `sizeof(a) <= sizeof(long)' should probably hold) - anyway there are many platforms standardised on ILP64 and LP64 schemes, and on those platforms we end up with 128 bit counters and subsequently lose bits 32-63 and 96-127. Is there a compelling reason we don't want to use the definitions in stdint.h here? AFAIK this is only missing on OSF/1 4.0 (and under MSVC 9) and whilst it may be incomplete on some platforms the type defs are at least there. Either way this code appears to be wrong for many 64 bit unices. Cheers, Tim Watson Staff Engineer RabbitMQ / VMWare From pan@REDACTED Mon Dec 17 18:27:09 2012 From: pan@REDACTED (Patrik Nyblom) Date: Mon, 17 Dec 2012 18:27:09 +0100 Subject: [erlang-bugs] inet:getstat/2 broken on some 64bit platforms In-Reply-To: References: Message-ID: <50CF55ED.7030008@erlang.org> On 12/17/2012 02:52 PM, Tim Watson wrote: > In the RabbitMQ Management Plugin (http://www.rabbitmq.com/management.html) we've seen _oct variants wrap after 2^32 bytes are sent, even on 64bit platforms. Looking at inet_drv.c, the code in inet_fill_stat assumes that unsigned longs are always 32 bit, which is a false assumption. Well, actually inet_fill_stat doesn't, it uses only the lowest 32 bits, regardless of the size of unsigned long. The bug is in inet_input_count and inet_output_count, they assume that the counter will wrap. The simple fix would be something like: --------------------- --- a/erts/emulator/drivers/common/inet_drv.c +++ b/erts/emulator/drivers/common/inet_drv.c @@ -7836,7 +7836,7 @@ static ErlDrvSSizeT inet_ctl(inet_descriptor* desc, int cmd, char* buf, static void inet_output_count(inet_descriptor* desc, ErlDrvSizeT len) { unsigned long n = desc->send_cnt + 1; - unsigned long t = desc->send_oct[0] + len; + unsigned long t = (desc->send_oct[0] + len) & 0xFFFFFFFFUL; int c = (t < desc->send_oct[0]); double avg = desc->send_avg; @@ -7856,7 +7856,7 @@ static void inet_output_count(inet_descriptor* desc, ErlDrvSizeT len) static void inet_input_count(inet_descriptor* desc, ErlDrvSizeT len) {--- a/erts/emulator/drivers/common/inet_drv.c +++ b/erts/emulator/drivers/common/inet_drv.c @@ -7836,7 +7836,7 @@ static ErlDrvSSizeT inet_ctl(inet_descriptor* desc, int cmd, char* buf, static void inet_output_count(inet_descriptor* desc, ErlDrvSizeT len) { unsigned long n = desc->send_cnt + 1; - unsigned long t = desc->send_oct[0] + len; + unsigned long t = (desc->send_oct[0] + len) & 0xFFFFFFFFUL; int c = (t < desc->send_oct[0]); double avg = desc->send_avg; @@ -7856,7 +7856,7 @@ static void inet_output_count(inet_descriptor* desc, ErlDrvSizeT len) static void inet_input_count(inet_descriptor* desc, ErlDrvSizeT len) { unsigned long n = desc->recv_cnt + 1; - unsigned long t = desc->recv_oct[0] + len; + unsigned long t = (desc->recv_oct[0] + len) & 0xFFFFFFFFUL; int c = (t < desc->recv_oct[0]); double avg = desc->recv_avg; double dvi; (E unsigned long n = desc->recv_cnt + 1; - unsigned long t = desc->recv_oct[0] + len; + unsigned long t = (desc->recv_oct[0] + len) & 0xFFFFFFFFUL; int c = (t < desc->recv_oct[0]); double avg = desc->recv_avg; double dvi; --------------------- Of course it would be better to use 64bit integers directly in the 64bit port, but this would fix the problem for now... > The standard simply states that `unsigend long' is not smaller than 32 bits (or despite theoretically being untrue, that in practical use `sizeof(a) <= sizeof(long)' should probably hold) - anyway there are many platforms standardised on ILP64 and LP64 schemes, and on those platforms we end up with 128 bit counters and subsequently lose bits 32-63 and 96-127. Is there a compelling reason we don't want to use the definitions in stdint.h here? AFAIK this is only missing on OSF/1 4.0 (and under MSVC 9) and whilst it may be incomplete on some platforms the type defs are at least there. Either way this code appears to be wrong for many 64 bit unices. We have a lot of types in our sys.h that could have been (and can be) used. I suppose the code is just old and did not account for 64bit at all. > > Cheers, > > Tim Watson > Staff Engineer > RabbitMQ / VMWare > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs Thanks for reporting! Cheers, /Patrik From kostis@REDACTED Mon Dec 17 18:37:14 2012 From: kostis@REDACTED (Kostis Sagonas) Date: Mon, 17 Dec 2012 18:37:14 +0100 Subject: [erlang-bugs] Undetected undefined remote function calls In-Reply-To: <35068F1B-D812-4E8E-8F74-D152421B0DEF@feuerlabs.com> References: <50CEF89A.2070203@cs.ntua.gr> <35068F1B-D812-4E8E-8F74-D152421B0DEF@feuerlabs.com> Message-ID: <50CF584A.7080101@cs.ntua.gr> On 12/17/2012 12:42 PM, Ulf Wiger wrote: > > Well, in the context of code loading, it is certainly possible, although more than a little weird. :) > > The function sys:do_change_code/5 in OTP stdlib specifically relies on calling a function in the newly loaded version of a module, with data from the old version. It is of course calling a function that is well understood in advance, but this is only by convention, and not easily checkable*. The function in sys is a different case than the one in my mail. It takes Mod an argument and issues a Mod:system_code_change(...) call. Moreover, it protects this call with a catch. The intention of that code is clear -- at least to me. What the case I sent does is to issue a remote call to a function in a module with the same name as the one being compiled whose code is nowhere to be found (let alone being listed among the exported functions). The only conceivable reason to do this call is that this function may possibly exist in some future version of this module. I think this may be used to support some cool feature like the ones shown in episodes of the Twilight Zone (and thus increase the perceived "coolness level" of Erlang in some discriminating hackers community :), but apart from that I claim that it most will likely confuse the majority of programmers out there and give them more reasons to think that "Erlang is weird". > Let's say the presence of the exported function bar() indicates that a certain feature is supported, and the function looks like this: > > test() -> > case erlang:function_exported(foo, bar, 0) of > true -> foo:bar(); > false -> ok > end. > > Should the compiler still warn? I would definitely say yes. I would even go as far as considering this a reason for the compiler to refuse to compile this code if this is part of the code of module foo. I see very little reason to place such code together with the *current* version of foo's code. If _really_ needed, one can place such code in some testing or code-updating module, which is compiled/analyzed separately/specially from the core of the application. More generally, while it's true that "dynamic" languages have aspects and features that give quite a bit of freedom to the programmer, there is a fine line between using this freedom for good reason(s) and simply abusing it and ending up with code idioms that are difficult for tools and programmers to reason about. Calling future versions of the code of a module from the module itself is an idea which may seem "cool" the first five minutes you hear/think about it but I think is a feature that Erlang does not need in the long run. Kostis > (I'm open to the answer "yes". I don't think the above is a good solution). > > BR, > Ulf W > > * It's of course easy if both versions of the module are available for analysis, but they seldom are in practice. > > On 17 Dec 2012, at 11:48, Kostis Sagonas wrote: > >> Shouldn't the compiler be complaining that the module below contains an undefined function? (*) >> >> %%=================== >> -module(foo). >> -export([test/0]). >> >> test() -> >> foo:bar(). >> %%=================== >> >> Kostis >> >> (*) Or is this treated as a call to a "future" version of the module? :P >> _______________________________________________ >> erlang-bugs mailing list >> erlang-bugs@REDACTED >> http://erlang.org/mailman/listinfo/erlang-bugs > > Ulf Wiger, Co-founder& Developer Advocate, Feuerlabs Inc. > http://feuerlabs.com > > From watson.timothy@REDACTED Mon Dec 17 19:01:13 2012 From: watson.timothy@REDACTED (Tim Watson) Date: Mon, 17 Dec 2012 18:01:13 +0000 Subject: [erlang-bugs] inet:getstat/2 broken on some 64bit platforms In-Reply-To: <50CF55ED.7030008@erlang.org> References: <50CF55ED.7030008@erlang.org> Message-ID: Hi Patrik On 17 Dec 2012, at 17:27, Patrik Nyblom wrote: > Thanks for reporting! Do you think the fix is likely to make it into R16 or will we have missed the boat for that release? Cheers, Tim From pan@REDACTED Tue Dec 18 10:37:16 2012 From: pan@REDACTED (Patrik Nyblom) Date: Tue, 18 Dec 2012 10:37:16 +0100 Subject: [erlang-bugs] inet:getstat/2 broken on some 64bit platforms In-Reply-To: References: <50CF55ED.7030008@erlang.org> Message-ID: <50D0394C.3040804@erlang.org> On 12/17/2012 07:01 PM, Tim Watson wrote: > Hi Patrik > On 17 Dec 2012, at 17:27, Patrik Nyblom wrote: > >> Thanks for reporting! > Do you think the fix is likely to make it into R16 or will we have missed the boat for that release? I'll fix it in R16 - there is still time. But I may solve it differently, I don't feel comfortable using two 64bit int's to represent one on 64bit machines :) > > Cheers, > Tim Cheers, /Patrik From watson.timothy@REDACTED Tue Dec 18 11:35:02 2012 From: watson.timothy@REDACTED (Tim Watson) Date: Tue, 18 Dec 2012 10:35:02 +0000 Subject: [erlang-bugs] inet:getstat/2 broken on some 64bit platforms In-Reply-To: <50D0394C.3040804@erlang.org> References: <50CF55ED.7030008@erlang.org> <50D0394C.3040804@erlang.org> Message-ID: On 18 Dec 2012, at 09:37, Patrik Nyblom wrote: > On 12/17/2012 07:01 PM, Tim Watson wrote: >> Hi Patrik >> On 17 Dec 2012, at 17:27, Patrik Nyblom wrote: >> >>> Thanks for reporting! >> Do you think the fix is likely to make it into R16 or will we have missed the boat for that release? > I'll fix it in R16 - there is still time. But I may solve it differently, I don't feel comfortable using two 64bit int's to represent one on 64bit machines :) >> He he - I'm not at all surprised. Thanks very much Patrik! Cheers, Tim From n.oxyde@REDACTED Tue Dec 18 11:54:59 2012 From: n.oxyde@REDACTED (Anthony Ramine) Date: Tue, 18 Dec 2012 11:54:59 +0100 Subject: [erlang-bugs] Weird documentation of distributed applications In-Reply-To: <50C876AD.7000704@ninenines.eu> References: <50C876AD.7000704@ninenines.eu> Message-ID: <1EFAA0FA-165B-4980-85AD-DE432572EECB@gmail.com> Furthermore, the documentation contradicts itself in the note following what you pasted: > All involved nodes must have the same value for distributed and sync_nodes_timeout, or the behaviour of the system is undefined. The list of nodes is part of the `distributed` value. -- Anthony Ramine Le 12 d?c. 2012 ? 13:21, Lo?c Hoguin a ?crit : > Hello, > > I got pointed at a weirdness in the documentation by Eric Pailleau. This chapter: > http://www.erlang.org/doc/design_principles/distributed_applications.html#id74957 > > says: > > "The system configuration files for cp2@REDACTED and cp3@REDACTED are identical, except for the list of mandatory nodes which should be [cp1@REDACTED, cp3@REDACTED] for cp2@REDACTED and [cp1@REDACTED, cp2@REDACTED] for cp3@REDACTED" > > First, that's incredibly clumsy. Having a different configuration file per node like this is just impractical if you're going to have 50 of them. > > But looking at the code it appears you can actually put [cp1@REDACTED, cp2@REDACTED, cp3@REDACTED] everywhere, because all this does is to ping everything and make sure they're up before continuing. A ping to yourself *does* work so that makes the whole sentence pointless. > > What to do? > > -- > Lo?c Hoguin > Erlang Cowboy > Nine Nines > http://ninenines.eu > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs From robert.virding@REDACTED Tue Dec 18 16:21:16 2012 From: robert.virding@REDACTED (Robert Virding) Date: Tue, 18 Dec 2012 15:21:16 +0000 (GMT) Subject: [erlang-bugs] Undetected undefined remote function calls In-Reply-To: <50CEF89A.2070203@cs.ntua.gr> Message-ID: <163517229.2515530.1355844076631.JavaMail.root@erlang-solutions.com> Not really. You are making a fully qualified "external" call to a function in another module, which in this case happens to be the same module, and the function's existence is checked at run-time. Even if the function existed when you make the call you would not make an internal jump but go through the modules exported function table. Calling function foo/1 in the same module behaves differently if you call it foo(5) or if you call it module:foo(5). So in that sense it does make perfect sense to NOT complain about an undefined function. It is similar to the difference in doing exit(boom) and exist(self(), boom). Robert ----- Original Message ----- > From: "Kostis Sagonas" > To: "erlang-bugs" > Sent: Monday, 17 December, 2012 11:48:58 AM > Subject: [erlang-bugs] Undetected undefined remote function calls > > Shouldn't the compiler be complaining that the module below contains > an > undefined function? (*) > > %%=================== > -module(foo). > -export([test/0]). > > test() -> > foo:bar(). > %%=================== > > Kostis > > (*) Or is this treated as a call to a "future" version of the module? > :P > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > From robert.virding@REDACTED Tue Dec 18 16:48:51 2012 From: robert.virding@REDACTED (Robert Virding) Date: Tue, 18 Dec 2012 15:48:51 +0000 (GMT) Subject: [erlang-bugs] Undetected undefined remote function calls In-Reply-To: <163517229.2515530.1355844076631.JavaMail.root@erlang-solutions.com> Message-ID: <125309411.2516355.1355845731878.JavaMail.root@erlang-solutions.com> To clarify myself: Calling foo:bar() is doing an intermodule call in which case detecting whether the function exists or not is left to run-time and it is irrelevant which module it is. Robert ----- Original Message ----- > From: "Robert Virding" > To: "Kostis Sagonas" > Cc: "erlang-bugs" > Sent: Tuesday, 18 December, 2012 4:21:16 PM > Subject: Re: [erlang-bugs] Undetected undefined remote function calls > > Not really. You are making a fully qualified "external" call to a > function in another module, which in this case happens to be the > same module, and the function's existence is checked at run-time. > Even if the function existed when you make the call you would not > make an internal jump but go through the modules exported function > table. Calling function foo/1 in the same module behaves differently > if you call it foo(5) or if you call it module:foo(5). So in that > sense it does make perfect sense to NOT complain about an undefined > function. > > It is similar to the difference in doing exit(boom) and exist(self(), > boom). > > Robert > > ----- Original Message ----- > > From: "Kostis Sagonas" > > To: "erlang-bugs" > > Sent: Monday, 17 December, 2012 11:48:58 AM > > Subject: [erlang-bugs] Undetected undefined remote function calls > > > > Shouldn't the compiler be complaining that the module below > > contains > > an > > undefined function? (*) > > > > %%=================== > > -module(foo). > > -export([test/0]). > > > > test() -> > > foo:bar(). > > %%=================== > > > > Kostis > > > > (*) Or is this treated as a call to a "future" version of the > > module? > > :P > > _______________________________________________ > > erlang-bugs mailing list > > erlang-bugs@REDACTED > > http://erlang.org/mailman/listinfo/erlang-bugs > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > From essen@REDACTED Tue Dec 18 16:58:17 2012 From: essen@REDACTED (=?ISO-8859-1?Q?Lo=EFc_Hoguin?=) Date: Tue, 18 Dec 2012 16:58:17 +0100 Subject: [erlang-bugs] Undetected undefined remote function calls In-Reply-To: <125309411.2516355.1355845731878.JavaMail.root@erlang-solutions.com> References: <125309411.2516355.1355845731878.JavaMail.root@erlang-solutions.com> Message-ID: <50D09299.2060008@ninenines.eu> All true, but warnings are there to inform the developer that what he's doing is probably wrong. Not all warnings are caused by actual errors (like not using a defined variable). Doing foo:bar() from foo has very little chances of being done on purpose if bar isn't defined. Setting a warning for this does make sense. Of course as with all other warnings you should be able to disable it if you actually need to do something like this. On 12/18/2012 04:48 PM, Robert Virding wrote: > To clarify myself: > > Calling foo:bar() is doing an intermodule call in which case detecting whether the function exists or not is left to run-time and it is irrelevant which module it is. > > Robert > > ----- Original Message ----- >> From: "Robert Virding" >> To: "Kostis Sagonas" >> Cc: "erlang-bugs" >> Sent: Tuesday, 18 December, 2012 4:21:16 PM >> Subject: Re: [erlang-bugs] Undetected undefined remote function calls >> >> Not really. You are making a fully qualified "external" call to a >> function in another module, which in this case happens to be the >> same module, and the function's existence is checked at run-time. >> Even if the function existed when you make the call you would not >> make an internal jump but go through the modules exported function >> table. Calling function foo/1 in the same module behaves differently >> if you call it foo(5) or if you call it module:foo(5). So in that >> sense it does make perfect sense to NOT complain about an undefined >> function. >> >> It is similar to the difference in doing exit(boom) and exist(self(), >> boom). >> >> Robert >> >> ----- Original Message ----- >>> From: "Kostis Sagonas" >>> To: "erlang-bugs" >>> Sent: Monday, 17 December, 2012 11:48:58 AM >>> Subject: [erlang-bugs] Undetected undefined remote function calls >>> >>> Shouldn't the compiler be complaining that the module below >>> contains >>> an >>> undefined function? (*) >>> >>> %%=================== >>> -module(foo). >>> -export([test/0]). >>> >>> test() -> >>> foo:bar(). >>> %%=================== >>> >>> Kostis >>> >>> (*) Or is this treated as a call to a "future" version of the >>> module? >>> :P >>> _______________________________________________ >>> erlang-bugs mailing list >>> erlang-bugs@REDACTED >>> http://erlang.org/mailman/listinfo/erlang-bugs >>> >> _______________________________________________ >> erlang-bugs mailing list >> erlang-bugs@REDACTED >> http://erlang.org/mailman/listinfo/erlang-bugs >> > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > -- Lo?c Hoguin Erlang Cowboy Nine Nines http://ninenines.eu From ulf@REDACTED Tue Dec 18 17:18:38 2012 From: ulf@REDACTED (Ulf Wiger) Date: Tue, 18 Dec 2012 17:18:38 +0100 Subject: [erlang-bugs] Undetected undefined remote function calls In-Reply-To: <50D09299.2060008@ninenines.eu> References: <125309411.2516355.1355845731878.JavaMail.root@erlang-solutions.com> <50D09299.2060008@ninenines.eu> Message-ID: <507695DD-7561-4B17-9D9F-99266D7E6D0E@feuerlabs.com> Well, the compiler cannot make a determination about an inter-module function call in the general case, so this particular case would be the only one it could warn about. Is it really worth the effort? Dialyzer, on the other hand, will warn about unresolved external calls, and should definitely warn about this one. So should xref. BR, Ulf W On 18 Dec 2012, at 16:58, Lo?c Hoguin wrote: > All true, but warnings are there to inform the developer that what he's doing is probably wrong. Not all warnings are caused by actual errors (like not using a defined variable). Doing foo:bar() from foo has very little chances of being done on purpose if bar isn't defined. Setting a warning for this does make sense. > > Of course as with all other warnings you should be able to disable it if you actually need to do something like this. > > On 12/18/2012 04:48 PM, Robert Virding wrote: >> To clarify myself: >> >> Calling foo:bar() is doing an intermodule call in which case detecting whether the function exists or not is left to run-time and it is irrelevant which module it is. >> >> Robert >> >> ----- Original Message ----- >>> From: "Robert Virding" >>> To: "Kostis Sagonas" >>> Cc: "erlang-bugs" >>> Sent: Tuesday, 18 December, 2012 4:21:16 PM >>> Subject: Re: [erlang-bugs] Undetected undefined remote function calls >>> >>> Not really. You are making a fully qualified "external" call to a >>> function in another module, which in this case happens to be the >>> same module, and the function's existence is checked at run-time. >>> Even if the function existed when you make the call you would not >>> make an internal jump but go through the modules exported function >>> table. Calling function foo/1 in the same module behaves differently >>> if you call it foo(5) or if you call it module:foo(5). So in that >>> sense it does make perfect sense to NOT complain about an undefined >>> function. >>> >>> It is similar to the difference in doing exit(boom) and exist(self(), >>> boom). >>> >>> Robert >>> >>> ----- Original Message ----- >>>> From: "Kostis Sagonas" >>>> To: "erlang-bugs" >>>> Sent: Monday, 17 December, 2012 11:48:58 AM >>>> Subject: [erlang-bugs] Undetected undefined remote function calls >>>> >>>> Shouldn't the compiler be complaining that the module below >>>> contains >>>> an >>>> undefined function? (*) >>>> >>>> %%=================== >>>> -module(foo). >>>> -export([test/0]). >>>> >>>> test() -> >>>> foo:bar(). >>>> %%=================== >>>> >>>> Kostis >>>> >>>> (*) Or is this treated as a call to a "future" version of the >>>> module? >>>> :P >>>> _______________________________________________ >>>> erlang-bugs mailing list >>>> erlang-bugs@REDACTED >>>> http://erlang.org/mailman/listinfo/erlang-bugs >>>> >>> _______________________________________________ >>> erlang-bugs mailing list >>> erlang-bugs@REDACTED >>> http://erlang.org/mailman/listinfo/erlang-bugs >>> >> _______________________________________________ >> erlang-bugs mailing list >> erlang-bugs@REDACTED >> http://erlang.org/mailman/listinfo/erlang-bugs >> > > > -- > Lo?c Hoguin > Erlang Cowboy > Nine Nines > http://ninenines.eu > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc. http://feuerlabs.com From tarek.zeineddine@REDACTED Wed Dec 19 15:48:08 2012 From: tarek.zeineddine@REDACTED (Tarek Zeineddine) Date: Wed, 19 Dec 2012 09:48:08 -0500 Subject: [erlang-bugs] Safari and R15B03-1 Message-ID: Hi Eversince our upgrade to R15B03-1 we are no longer able to connect via secure websockets using a Safari browser with the following error: WebSocket network error: 0SStatus Error -9806: connection closed via error Has anyone else experienced this? We still do not have enough information on this and will provide more details as soon as we get them. Thanks. Tarek -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.pailleau@REDACTED Wed Dec 19 21:26:48 2012 From: eric.pailleau@REDACTED (PAILLEAU Eric) Date: Wed, 19 Dec 2012 21:26:48 +0100 Subject: [erlang-bugs] Weird documentation of distributed applications In-Reply-To: <1EFAA0FA-165B-4980-85AD-DE432572EECB@gmail.com> References: <50C876AD.7000704@ninenines.eu> <1EFAA0FA-165B-4980-85AD-DE432572EECB@gmail.com> Message-ID: <50D22308.4060004@wanadoo.fr> Le 18/12/2012 11:54, Anthony Ramine a ?crit : > Furthermore, the documentation contradicts itself in the note following what you pasted: > >> All involved nodes must have the same value for distributed and sync_nodes_timeout, or the behaviour of the system is undefined. > > The list of nodes is part of the `distributed` value. > Hi, I do not agree. This sentence is true. distributed and sync_nodes_timeout are the same on cp1, cp2 and cp3 config . The 'problem' is only on sync_nodes_mandatory tuple. When I told this to Lo?c at Erlang Lite Paris, I was convinced that I could not add current node to running configuration in sync_nodes_mandatory, until I simply test again. In the past I tested it with error when adding current node, but this was surely only a newbie error at the time (DNS problem on FQDN node names maybe), because we cannot see any change in the Git repository . But I can't remember what Erlang version I used at this time. Maybe the erlang team may know if this part of kernel code changed before Git integration. Anyway the documentation might tell reader that the same configuration is possible on all nodes. regards. From vs@REDACTED Wed Dec 26 15:17:33 2012 From: vs@REDACTED (Viktor Sovietov) Date: Wed, 26 Dec 2012 16:17:33 +0200 Subject: [erlang-bugs] Internal consistency check failed during pattern matching with binary Message-ID: Hello! The following piece of code (purely synthetic): -module(check). -export([check/2]). check(<<"string">>, a1) -> one; check(_, a2) -> two; check(undefined, a3) -> three. produces failure of internal cosistency instead of simple warning: ----------- Erlang R15B02 (erts-5.9.2) [source] [smp:2:2] [async-threads:0] [hipe] [kernel-poll:false] Eshell V5.9.2 (abort with ^G) 1> c(check). check: function check/2+17: Internal consistency check failed - please report this bug. Instruction: {test,is_eq_exact,{f,7},[{x,0},{atom,undefined}]} Error: {match_context,{x,0}}: error ----------- It's probably a bug in pattern matcher. Being wrapped in case ? of it gives the same result. -- Sincerely, Viktor Sovietov Cloudozer LLP tel: +44 845 5086137 Skype: owlbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From jose.valim@REDACTED Mon Dec 31 16:13:20 2012 From: jose.valim@REDACTED (=?ISO-8859-1?Q?Jos=E9_Valim?=) Date: Mon, 31 Dec 2012 16:13:20 +0100 Subject: [erlang-bugs] erl_eval passes invalid argument to try clauses Message-ID: According to the Erlang Abstract Format, a catch clause for try expressions expects three arguments: the kind (which is an error, exit or throw), the value (which is usually matched against) and a third undocumented argument which is a cons list. Taking a quick look around shows that this cons cell contains the stacktrace and a number: https://github.com/nox/otp/blob/master/erts/emulator/beam/bif.c#L1325 Which I was able to verify in practice. This seems to work fine except that erl_eval does not pass a cons cell as the third argument but only the stacktrace: https://github.com/erlang/otp/blob/maint/lib/stdlib/src/erl_eval.erl#L786 In theory, since an Erlang developer cannot specify this third argument using Erlang syntax, this difference does not actually matter. But in practice part of the Erlang/OTP stack may actually rely on this third argument, so I have decided to report this as a bug. In any case, I would be interested to know the purpose of this third argument. Is it meant to be internal to the Erlang VM or should it be documented? Thank you and happy new year! *Jos? Valim* www.plataformatec.com.br Skype: jv.ptec Founder and Lead Developer -------------- next part -------------- An HTML attachment was scrubbed... URL: