From norton@REDACTED Thu Nov 1 06:37:38 2012 From: norton@REDACTED (Joseph Wayne Norton) Date: Thu, 1 Nov 2012 14:37:38 +0900 Subject: [erlang-bugs] R15B02 odbcserver.c 64-bit bug Message-ID: We found a 64-bit bug with odbcserver.c while testing on CentOS6.2 with Oracle's ODBC client. - encode_out_params() is casting 32bit value of SQL_C_SLONG to 64bit --- ((long*)values). It should be ((SQLINTEGER*)values). - encode_column_dyn() is doing similar encoding but correctly with SQLINTEGER(32bit) type. We hope no one else has been silently bitten by this 64-bit bug. regards, Joe N. --- ./otp_src_R15B02/lib/odbc/c_src/odbcserver.c.ORIG 2012-11-01 11:32:57.976934595 +0900 +++ ./otp_src_R15B02/lib/odbc/c_src/odbcserver.c 2012-11-01 11:35:03.248922917 +0900 @@ -1154,7 +1154,7 @@ (column.type.strlen_or_indptr_array[j])); break; case SQL_C_SLONG: - ei_x_encode_long(&dynamic_buffer(state), ((long*)values)[j]); + ei_x_encode_long(&dynamic_buffer(state), ((SQLINTEGER*)values)[j]); break; case SQL_C_DOUBLE: ei_x_encode_double(&dynamic_buffer(state), $ cat /etc/redhat-release CentOS release 6.2 (Final) $ uname -a Linux localhost.localdomain 2.6.32-220.el6.x86_64 #1 SMP Tue Dec 6 19:48:22 GMT 2011 x86_64 x86_64 x86_64 GNU/Linux $ rpm -q -a | grep oracle oracle-instantclient11.2-basic-11.2.0.3.0-1.x86_64 oracle-instantclient11.2-odbc-11.2.0.3.0-1.x86_64 $ rpm -q -a | grep unix unixODBC-2.2.14-11.el6.x86_64 unixODBC-devel-2.2.14-11.el6.x86_64 $ erl Erlang R15B02 (erts-5.9.2) [source] [64-bit] [async-threads:0] [hipe] [kernel-poll:false] Eshell V5.9.2 (abort with ^G) 1> From essen@REDACTED Thu Nov 1 12:38:18 2012 From: essen@REDACTED (=?ISO-8859-1?Q?Lo=EFc_Hoguin?=) Date: Thu, 01 Nov 2012 12:38:18 +0100 Subject: [erlang-bugs] [erlang-questions] Process/FD leak in SSL R15B01 In-Reply-To: <5090054A.5080405@ericsson.com> References: <507C2922.9090207@ninenines.eu> <507C379D.7010004@ninenines.eu> <507D264C.7040102@ericsson.com> <507D2747.8040602@ninenines.eu> <507D2F14.4020705@ericsson.com> <507D3001.6000307@ninenines.eu> <507D8DC3.6050507@ericsson.com> <507E6391.7070304@erix.ericsson.se> <5080211A.2060805@ninenines.eu> <5087B3E2.6070009@ericsson.com> <5087DBAB.8090403@ninenines.eu> <5087E62D.3040304@ericsson.com> <508E6034.7020003@ninenines.eu> <508E8A12.5020009@ericsson.com> <508EB773.2080508@ninenines.eu> <5090054A.5080405@ericsson.com> Message-ID: <50925F2A.904@ninenines.eu> On 10/30/2012 05:50 PM, Ingela Anderton Andin wrote: > Hi! > > Lo?c Hoguin wrote: >> I don't understand why there's no timeout though. Wouldn't it make >> sense to have a small timeout to avoid this and any related problem >> entirely? >> > > Well yes and no ;) ssl:close should conceptually not fail so you should > not need a timeout. The problem seems to be that be that the cleaning > up code sometimes could cause a unforeseen hanging problem, as in the > rabbitmq case. Although in your case it seems that the problem is > something else, at least some of the times. Currently I am thinking that > maybe the close function is incorrectly implemented and that it should > utilize supervisor:terminate_child so that the OTP-supervisor will > kill the process if it hangs in terminate. (It has a timeout) I will > work on making sush a patch. I suppose in the meantime you could > implement a timeout to ssl-close and verify that this timeout would > solve your problem. As then my suggested way should solve it too just > more OTPish and without the need of extending the API. OK so perhaps I have done something wrong before when testing. Today I can most certainly say that: * This problem occurs *only* when ssl_connection processes receive 'DOWN' in handle_info. The sockets for this problem stay in FIN2. * Removing the call for recv/2 itself removes this problem entirely. * There is another problem where processes get stuck elsewhere, which I'm going to try to investigate. Note that the sockets for this problem stay in ESTABLISHED. -- Lo?c Hoguin Erlang Cowboy Nine Nines http://ninenines.eu From raimo+erlang-bugs@REDACTED Wed Nov 7 10:47:00 2012 From: raimo+erlang-bugs@REDACTED (Raimo Niskanen) Date: Wed, 7 Nov 2012 10:47:00 +0100 Subject: [erlang-bugs] prim_inet:close/1 race condition In-Reply-To: <68978949-C924-4393-BFE0-670A6E5B79A4@gmail.com> References: <68978949-C924-4393-BFE0-670A6E5B79A4@gmail.com> Message-ID: <20121107094700.GA25382@erix.ericsson.se> On Thu, Oct 18, 2012 at 04:39:10PM +0400, Dmitry Belyaev wrote: > Is it known bug? > Can anyone propose a patch for prim_inet? That seems like a bug (race condition). We'll have a look at it. The most probable solution for now will be to do move the unlink to after the close a'la: catch erlang:port_close(S), unlink(S), receive {'EXIT',S,_} -> ok after 0 -> ok end, Some releases after the current code was written, the guarantees of unlink/1 were improved, so that after unlink/1 returns - link related messages _are_ in the process message box making receive after 0 safe. > > Thank you. > > On 12.10.2012, at 1:06, Dmitry Belyaev wrote: > > > Some days ago we found that we have thousands of leaked sockets in our project. > > > > These sockets were ports with state like this: > > [{name,"tcp_inet"}, > > {links,[]}, > > {connected,<0.54.0>},...] > > > > We made investigation and found the cause of the leaks. > > > > We have inets option {exit_on_close, false} to read statistics from the socket after it was closed by the peer. Process that controls the socket does not trap_exit and is linked with some another process. > > At the end of connection controller calls gen_tcp:close/1 and sometimes the linked process dies at that the same moment. We found out that the gen_tcp:close/1 calls prim_inet:close/1, the first action of which is unlink from controlling process. So, when controller is unlinked from the port and is killed by the signal, port stays in the system because of exit_on_close feature. > > > > I've made a module that sometimes may reveal the problem. https://gist.github.com/3875485 > > On my system a half of dozen calls to close_bug:start(1000) do find such leaked ports. > > I haven't found the right solution for the problem yet, so no patches at the moment. > > > > Thank you for your attention. > > > > -- > > Dmitry Belyaev > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs -- / Raimo Niskanen, Erlang/OTP, Ericsson AB From essen@REDACTED Wed Nov 7 14:50:48 2012 From: essen@REDACTED (=?ISO-8859-1?Q?Lo=EFc_Hoguin?=) Date: Wed, 07 Nov 2012 14:50:48 +0100 Subject: [erlang-bugs] [erlang-questions] Process/FD leak in SSL R15B01 In-Reply-To: <50925F2A.904@ninenines.eu> References: <507C2922.9090207@ninenines.eu> <507C379D.7010004@ninenines.eu> <507D264C.7040102@ericsson.com> <507D2747.8040602@ninenines.eu> <507D2F14.4020705@ericsson.com> <507D3001.6000307@ninenines.eu> <507D8DC3.6050507@ericsson.com> <507E6391.7070304@erix.ericsson.se> <5080211A.2060805@ninenines.eu> <5087B3E2.6070009@ericsson.com> <5087DBAB.8090403@ninenines.eu> <5087E62D.3040304@ericsson.com> <508E6034.7020003@ninenines.eu> <508E8A12.5020009@ericsson.com> <508EB773.2080508@ninenines.eu> <5090054A.5080405@ericsson.com> <50925F2A.904@ninenines.eu> Message-ID: <509A6738.6020403@ninenines.eu> On 11/01/2012 12:38 PM, Lo?c Hoguin wrote: > * There is another problem where processes get stuck elsewhere, which > I'm going to try to investigate. Note that the sockets for this problem > stay in ESTABLISHED. In this particular case, here's what we got: [{current_function,{gen_fsm,loop,7}}, {initial_call,{proc_lib,init_p,5}}, {status,waiting}, {message_queue_len,0}, {messages,[]}, {links,[<0.897.0>,#Port<0.264729349>]}, {dictionary,[{ssl_manager,ssl_manager}, {'$ancestors',[ssl_connection_sup,ssl_sup,<0.894.0>]}, {'$initial_call',{ssl_connection,init,1}}]}, {trap_exit,false}, {error_handler,error_handler}, {priority,normal}, {group_leader,<0.893.0>}, {total_heap_size,8362}, {heap_size,4181}, {stack_size,10}, {reductions,7029}, {garbage_collection,[{min_bin_vheap_size,46368}, {min_heap_size,233}, {fullsweep_after,10}, {minor_gcs,10}]}, {suspending,[]}] Looking further, I notice something weird. > erlang:process_info(Pid, monitors). {monitors,[{process,<0.1055.0>}]} This is a very old pid. > erlang:process_info(OldPid). [{current_function,{prim_inet,accept0,2}}, {initial_call,{cowboy_acceptor,acceptor,7}}, {status,waiting}, {message_queue_len,1602}, {messages,[{#Ref<0.0.19.196440>,connected}, {#Ref<0.0.21.74727>,connected}, {#Ref<0.0.28.93234>,connected}, {#Ref<0.0.64.192190>,connected}, {#Ref<0.0.167.184831>,connected}, {#Ref<0.0.208.24369>,connected}, {#Ref<0.0.282.59352>,connected}, {#Ref<0.0.340.181599>,connected}, {#Ref<0.0.341.57338>,connected}, {#Ref<0.0.427.15661>,connected}, {#Ref<0.0.430.8560>,connected}, {#Ref<0.0.439.40688>,connected}, {#Ref<0.0.439.214050>,connected}, {#Ref<0.0.440.206978>,connected}, {#Ref<0.0.466.173049>,connected}, {#Ref<0.0.497.35749>,connected}, {#Ref<0.0.514.36774>,connected}, {#Ref<0.0.514.109971>,connected}, {#Ref<0.0.541.246233>,connected}, {#Ref<0.0.544.168339>,connected}, {#Ref<0.0.584.43294>,...}, {...}|...]}, {links,[<0.1028.0>]}, {dictionary,[]}, {trap_exit,false}, {error_handler,error_handler}, {priority,normal}, {group_leader,<0.868.0>}, {total_heap_size,32838}, {heap_size,4181}, {stack_size,22}, {reductions,219876367}, {garbage_collection,[{min_bin_vheap_size,46368}, {min_heap_size,233}, {fullsweep_after,10}, {minor_gcs,1}]}, {suspending,[]}] So this is the acceptor process. It doesn't make sense though, why would the ssl process still monitor the acceptor? The acceptor code is equivalent to this: {ok, Socket} = ssl:transport_accept(ListenSocket, 2000), ok = ssl:ssl_accept(Socket, 2000), {ok, Pid} = supervisor:start_child(SupPid, Args), Transport:controlling_process(Socket, Pid), %% ... I'm also not sure what {#Ref<0.0.19.196440>,connected} is. Are they supposed to receive this? Any pointer as to where to look next would help. Thanks. -- Lo?c Hoguin Erlang Cowboy Nine Nines http://ninenines.eu From ingela.anderton.andin@REDACTED Wed Nov 7 16:12:29 2012 From: ingela.anderton.andin@REDACTED (Ingela Anderton Andin) Date: Wed, 7 Nov 2012 16:12:29 +0100 Subject: [erlang-bugs] [erlang-questions] Process/FD leak in SSL R15B01 In-Reply-To: <509A6738.6020403@ninenines.eu> References: <507C2922.9090207@ninenines.eu> <507C379D.7010004@ninenines.eu> <507D264C.7040102@ericsson.com> <507D2747.8040602@ninenines.eu> <507D2F14.4020705@ericsson.com> <507D3001.6000307@ninenines.eu> <507D8DC3.6050507@ericsson.com> <507E6391.7070304@erix.ericsson.se> <5080211A.2060805@ninenines.eu> <5087B3E2.6070009@ericsson.com> <5087DBAB.8090403@ninenines.eu> <5087E62D.3040304@ericsson.com> <508E6034.7020003@ninenines.eu> <508E8A12.5020009@ericsson.com> <508EB773.2080508@ninenines.eu> <5090054A.5080405@ericsson.com> <50925F2A.904@ninenines.eu> <509A6738.6020403@ninenines.eu> Message-ID: <509A7A5D.7060308@erix.ericsson.se> Hi! The problem is that "call-timeouts" in gen_server/fsm suck, as they are purly client side. The {ref, connected} is a gen:fsm-reply that should have been received by the ssl connect code. Like the recv problem on erlang-questions this is solved by making the timer server side, it could be argued some of these timeouts in ssl API are not needed, but they are legacy... We will fix it. After some investigation I think the "best" solution to your other problem will be call gen_tcp:recv/3 in the workaround. We will also clean up the logic in terminate. Regards Ingela Erlang/OTP team - Ericsson AB Lo?c Hoguin wrote: > On 11/01/2012 12:38 PM, Lo?c Hoguin wrote: >> * There is another problem where processes get stuck elsewhere, which >> I'm going to try to investigate. Note that the sockets for this problem >> stay in ESTABLISHED. > > In this particular case, here's what we got: > > [{current_function,{gen_fsm,loop,7}}, > {initial_call,{proc_lib,init_p,5}}, > {status,waiting}, > {message_queue_len,0}, > {messages,[]}, > {links,[<0.897.0>,#Port<0.264729349>]}, > {dictionary,[{ssl_manager,ssl_manager}, > {'$ancestors',[ssl_connection_sup,ssl_sup,<0.894.0>]}, > {'$initial_call',{ssl_connection,init,1}}]}, > {trap_exit,false}, > {error_handler,error_handler}, > {priority,normal}, > {group_leader,<0.893.0>}, > {total_heap_size,8362}, > {heap_size,4181}, > {stack_size,10}, > {reductions,7029}, > {garbage_collection,[{min_bin_vheap_size,46368}, > {min_heap_size,233}, > {fullsweep_after,10}, > {minor_gcs,10}]}, > {suspending,[]}] > > Looking further, I notice something weird. > > > erlang:process_info(Pid, monitors). > {monitors,[{process,<0.1055.0>}]} > > This is a very old pid. > > > erlang:process_info(OldPid). > [{current_function,{prim_inet,accept0,2}}, > {initial_call,{cowboy_acceptor,acceptor,7}}, > {status,waiting}, > {message_queue_len,1602}, > {messages,[{#Ref<0.0.19.196440>,connected}, > {#Ref<0.0.21.74727>,connected}, > {#Ref<0.0.28.93234>,connected}, > {#Ref<0.0.64.192190>,connected}, > {#Ref<0.0.167.184831>,connected}, > {#Ref<0.0.208.24369>,connected}, > {#Ref<0.0.282.59352>,connected}, > {#Ref<0.0.340.181599>,connected}, > {#Ref<0.0.341.57338>,connected}, > {#Ref<0.0.427.15661>,connected}, > {#Ref<0.0.430.8560>,connected}, > {#Ref<0.0.439.40688>,connected}, > {#Ref<0.0.439.214050>,connected}, > {#Ref<0.0.440.206978>,connected}, > {#Ref<0.0.466.173049>,connected}, > {#Ref<0.0.497.35749>,connected}, > {#Ref<0.0.514.36774>,connected}, > {#Ref<0.0.514.109971>,connected}, > {#Ref<0.0.541.246233>,connected}, > {#Ref<0.0.544.168339>,connected}, > {#Ref<0.0.584.43294>,...}, > {...}|...]}, > {links,[<0.1028.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {error_handler,error_handler}, > {priority,normal}, > {group_leader,<0.868.0>}, > {total_heap_size,32838}, > {heap_size,4181}, > {stack_size,22}, > {reductions,219876367}, > {garbage_collection,[{min_bin_vheap_size,46368}, > {min_heap_size,233}, > {fullsweep_after,10}, > {minor_gcs,1}]}, > {suspending,[]}] > > So this is the acceptor process. It doesn't make sense though, why > would the ssl process still monitor the acceptor? The acceptor code is > equivalent to this: > > {ok, Socket} = ssl:transport_accept(ListenSocket, 2000), > ok = ssl:ssl_accept(Socket, 2000), > {ok, Pid} = supervisor:start_child(SupPid, Args), > Transport:controlling_process(Socket, Pid), > %% ... > > I'm also not sure what {#Ref<0.0.19.196440>,connected} is. Are they > supposed to receive this? > > Any pointer as to where to look next would help. > > Thanks. > From essen@REDACTED Wed Nov 7 17:15:45 2012 From: essen@REDACTED (=?ISO-8859-1?Q?Lo=EFc_Hoguin?=) Date: Wed, 07 Nov 2012 17:15:45 +0100 Subject: [erlang-bugs] [erlang-questions] Process/FD leak in SSL R15B01 In-Reply-To: <509A7A5D.7060308@erix.ericsson.se> References: <507C2922.9090207@ninenines.eu> <507C379D.7010004@ninenines.eu> <507D264C.7040102@ericsson.com> <507D2747.8040602@ninenines.eu> <507D2F14.4020705@ericsson.com> <507D3001.6000307@ninenines.eu> <507D8DC3.6050507@ericsson.com> <507E6391.7070304@erix.ericsson.se> <5080211A.2060805@ninenines.eu> <5087B3E2.6070009@ericsson.com> <5087DBAB.8090403@ninenines.eu> <5087E62D.3040304@ericsson.com> <508E6034.7020003@ninenines.eu> <508E8A12.5020009@ericsson.com> <508EB773.2080508@ninenines.eu> <5090054A.5080405@ericsson.com> <50925F2A.904@ninenines.eu> <509A6738.6020403@ninenines.eu> <509A7A5D.7060308@erix.ericsson.se> Message-ID: <509A8931.1020100@ninenines.eu> Alright. What should I do for a temporary fix to make sure this is the right issue? On 11/07/2012 04:12 PM, Ingela Anderton Andin wrote: > Hi! > > The problem is that "call-timeouts" in gen_server/fsm suck, as they are > purly client side. The {ref, connected} is a gen:fsm-reply that should > have been received by the ssl connect code. Like the recv problem on > erlang-questions this is solved by making the timer server side, it > could be argued some of these timeouts in ssl API are not needed, but > they are legacy... We will fix it. After some investigation I think > the "best" solution to your other problem will be call gen_tcp:recv/3 in > the workaround. We will also clean up the logic in terminate. > > Regards Ingela Erlang/OTP team - Ericsson AB > > > Lo?c Hoguin wrote: >> On 11/01/2012 12:38 PM, Lo?c Hoguin wrote: >>> * There is another problem where processes get stuck elsewhere, which >>> I'm going to try to investigate. Note that the sockets for this problem >>> stay in ESTABLISHED. >> >> In this particular case, here's what we got: >> >> [{current_function,{gen_fsm,loop,7}}, >> {initial_call,{proc_lib,init_p,5}}, >> {status,waiting}, >> {message_queue_len,0}, >> {messages,[]}, >> {links,[<0.897.0>,#Port<0.264729349>]}, >> {dictionary,[{ssl_manager,ssl_manager}, >> {'$ancestors',[ssl_connection_sup,ssl_sup,<0.894.0>]}, >> {'$initial_call',{ssl_connection,init,1}}]}, >> {trap_exit,false}, >> {error_handler,error_handler}, >> {priority,normal}, >> {group_leader,<0.893.0>}, >> {total_heap_size,8362}, >> {heap_size,4181}, >> {stack_size,10}, >> {reductions,7029}, >> {garbage_collection,[{min_bin_vheap_size,46368}, >> {min_heap_size,233}, >> {fullsweep_after,10}, >> {minor_gcs,10}]}, >> {suspending,[]}] >> >> Looking further, I notice something weird. >> >> > erlang:process_info(Pid, monitors). >> {monitors,[{process,<0.1055.0>}]} >> >> This is a very old pid. >> >> > erlang:process_info(OldPid). >> [{current_function,{prim_inet,accept0,2}}, >> {initial_call,{cowboy_acceptor,acceptor,7}}, >> {status,waiting}, >> {message_queue_len,1602}, >> {messages,[{#Ref<0.0.19.196440>,connected}, >> {#Ref<0.0.21.74727>,connected}, >> {#Ref<0.0.28.93234>,connected}, >> {#Ref<0.0.64.192190>,connected}, >> {#Ref<0.0.167.184831>,connected}, >> {#Ref<0.0.208.24369>,connected}, >> {#Ref<0.0.282.59352>,connected}, >> {#Ref<0.0.340.181599>,connected}, >> {#Ref<0.0.341.57338>,connected}, >> {#Ref<0.0.427.15661>,connected}, >> {#Ref<0.0.430.8560>,connected}, >> {#Ref<0.0.439.40688>,connected}, >> {#Ref<0.0.439.214050>,connected}, >> {#Ref<0.0.440.206978>,connected}, >> {#Ref<0.0.466.173049>,connected}, >> {#Ref<0.0.497.35749>,connected}, >> {#Ref<0.0.514.36774>,connected}, >> {#Ref<0.0.514.109971>,connected}, >> {#Ref<0.0.541.246233>,connected}, >> {#Ref<0.0.544.168339>,connected}, >> {#Ref<0.0.584.43294>,...}, >> {...}|...]}, >> {links,[<0.1028.0>]}, >> {dictionary,[]}, >> {trap_exit,false}, >> {error_handler,error_handler}, >> {priority,normal}, >> {group_leader,<0.868.0>}, >> {total_heap_size,32838}, >> {heap_size,4181}, >> {stack_size,22}, >> {reductions,219876367}, >> {garbage_collection,[{min_bin_vheap_size,46368}, >> {min_heap_size,233}, >> {fullsweep_after,10}, >> {minor_gcs,1}]}, >> {suspending,[]}] >> >> So this is the acceptor process. It doesn't make sense though, why >> would the ssl process still monitor the acceptor? The acceptor code is >> equivalent to this: >> >> {ok, Socket} = ssl:transport_accept(ListenSocket, 2000), >> ok = ssl:ssl_accept(Socket, 2000), >> {ok, Pid} = supervisor:start_child(SupPid, Args), >> Transport:controlling_process(Socket, Pid), >> %% ... >> >> I'm also not sure what {#Ref<0.0.19.196440>,connected} is. Are they >> supposed to receive this? >> >> Any pointer as to where to look next would help. >> >> Thanks. >> > -- Lo?c Hoguin Erlang Cowboy Nine Nines http://ninenines.eu From ingela.anderton.andin@REDACTED Thu Nov 8 10:16:37 2012 From: ingela.anderton.andin@REDACTED (Ingela Anderton Andin) Date: Thu, 8 Nov 2012 10:16:37 +0100 Subject: [erlang-bugs] [erlang-questions] Process/FD leak in SSL R15B01 In-Reply-To: <509A8931.1020100@ninenines.eu> References: <507C2922.9090207@ninenines.eu> <507C379D.7010004@ninenines.eu> <507D264C.7040102@ericsson.com> <507D2747.8040602@ninenines.eu> <507D2F14.4020705@ericsson.com> <507D3001.6000307@ninenines.eu> <507D8DC3.6050507@ericsson.com> <507E6391.7070304@erix.ericsson.se> <5080211A.2060805@ninenines.eu> <5087B3E2.6070009@ericsson.com> <5087DBAB.8090403@ninenines.eu> <5087E62D.3040304@ericsson.com> <508E6034.7020003@ninenines.eu> <508E8A12.5020009@ericsson.com> <508EB773.2080508@ninenines.eu> <5090054A.5080405@ericsson.com> <50925F2A.904@ninenines.eu> <509A6738.6020403@ninenines.eu> <509A7A5D.7060308@erix.ericsson.se> <509A8931.1020100@ninenines.eu> Message-ID: <509B7875.1080507@erix.ericsson.se> Hi! Well as you have a server increase the timeout in ssl_accept/[2,3], you do not want it to expire unless there is a network failure. Regards Ingela Erlang/OTP team - Ericsson AB Lo?c Hoguin wrote: > Alright. > > What should I do for a temporary fix to make sure this is the right > issue? > > On 11/07/2012 04:12 PM, Ingela Anderton Andin wrote: >> Hi! >> >> The problem is that "call-timeouts" in gen_server/fsm suck, as they are >> purly client side. The {ref, connected} is a gen:fsm-reply that should >> have been received by the ssl connect code. Like the recv problem on >> erlang-questions this is solved by making the timer server side, it >> could be argued some of these timeouts in ssl API are not needed, but >> they are legacy... We will fix it. After some investigation I think >> the "best" solution to your other problem will be call gen_tcp:recv/3 in >> the workaround. We will also clean up the logic in terminate. >> >> Regards Ingela Erlang/OTP team - Ericsson AB >> >> >> Lo?c Hoguin wrote: >>> On 11/01/2012 12:38 PM, Lo?c Hoguin wrote: >>>> * There is another problem where processes get stuck elsewhere, which >>>> I'm going to try to investigate. Note that the sockets for this >>>> problem >>>> stay in ESTABLISHED. >>> >>> In this particular case, here's what we got: >>> >>> [{current_function,{gen_fsm,loop,7}}, >>> {initial_call,{proc_lib,init_p,5}}, >>> {status,waiting}, >>> {message_queue_len,0}, >>> {messages,[]}, >>> {links,[<0.897.0>,#Port<0.264729349>]}, >>> {dictionary,[{ssl_manager,ssl_manager}, >>> {'$ancestors',[ssl_connection_sup,ssl_sup,<0.894.0>]}, >>> {'$initial_call',{ssl_connection,init,1}}]}, >>> {trap_exit,false}, >>> {error_handler,error_handler}, >>> {priority,normal}, >>> {group_leader,<0.893.0>}, >>> {total_heap_size,8362}, >>> {heap_size,4181}, >>> {stack_size,10}, >>> {reductions,7029}, >>> {garbage_collection,[{min_bin_vheap_size,46368}, >>> {min_heap_size,233}, >>> {fullsweep_after,10}, >>> {minor_gcs,10}]}, >>> {suspending,[]}] >>> >>> Looking further, I notice something weird. >>> >>> > erlang:process_info(Pid, monitors). >>> {monitors,[{process,<0.1055.0>}]} >>> >>> This is a very old pid. >>> >>> > erlang:process_info(OldPid). >>> [{current_function,{prim_inet,accept0,2}}, >>> {initial_call,{cowboy_acceptor,acceptor,7}}, >>> {status,waiting}, >>> {message_queue_len,1602}, >>> {messages,[{#Ref<0.0.19.196440>,connected}, >>> {#Ref<0.0.21.74727>,connected}, >>> {#Ref<0.0.28.93234>,connected}, >>> {#Ref<0.0.64.192190>,connected}, >>> {#Ref<0.0.167.184831>,connected}, >>> {#Ref<0.0.208.24369>,connected}, >>> {#Ref<0.0.282.59352>,connected}, >>> {#Ref<0.0.340.181599>,connected}, >>> {#Ref<0.0.341.57338>,connected}, >>> {#Ref<0.0.427.15661>,connected}, >>> {#Ref<0.0.430.8560>,connected}, >>> {#Ref<0.0.439.40688>,connected}, >>> {#Ref<0.0.439.214050>,connected}, >>> {#Ref<0.0.440.206978>,connected}, >>> {#Ref<0.0.466.173049>,connected}, >>> {#Ref<0.0.497.35749>,connected}, >>> {#Ref<0.0.514.36774>,connected}, >>> {#Ref<0.0.514.109971>,connected}, >>> {#Ref<0.0.541.246233>,connected}, >>> {#Ref<0.0.544.168339>,connected}, >>> {#Ref<0.0.584.43294>,...}, >>> {...}|...]}, >>> {links,[<0.1028.0>]}, >>> {dictionary,[]}, >>> {trap_exit,false}, >>> {error_handler,error_handler}, >>> {priority,normal}, >>> {group_leader,<0.868.0>}, >>> {total_heap_size,32838}, >>> {heap_size,4181}, >>> {stack_size,22}, >>> {reductions,219876367}, >>> {garbage_collection,[{min_bin_vheap_size,46368}, >>> {min_heap_size,233}, >>> {fullsweep_after,10}, >>> {minor_gcs,1}]}, >>> {suspending,[]}] >>> >>> So this is the acceptor process. It doesn't make sense though, why >>> would the ssl process still monitor the acceptor? The acceptor code is >>> equivalent to this: >>> >>> {ok, Socket} = ssl:transport_accept(ListenSocket, 2000), >>> ok = ssl:ssl_accept(Socket, 2000), >>> {ok, Pid} = supervisor:start_child(SupPid, Args), >>> Transport:controlling_process(Socket, Pid), >>> %% ... >>> >>> I'm also not sure what {#Ref<0.0.19.196440>,connected} is. Are they >>> supposed to receive this? >>> >>> Any pointer as to where to look next would help. >>> >>> Thanks. >>> >> > > From Ingela.Anderton.Andin@REDACTED Fri Nov 9 10:03:25 2012 From: Ingela.Anderton.Andin@REDACTED (Ingela Anderton Andin) Date: Fri, 9 Nov 2012 10:03:25 +0100 Subject: [erlang-bugs] [erlang-questions] Process/FD leak in SSL R15B01 In-Reply-To: References: Message-ID: <509CC6DD.9020301@ericsson.com> Hi! Here is the patch to make timouts server side. diff --git a/lib/ssl/src/ssl_connection.erl b/lib/ssl/src/ssl_connection.erl index ff2556c..ce64f05 100644 --- a/lib/ssl/src/ssl_connection.erl +++ b/lib/ssl/src/ssl_connection.erl @@ -118,7 +118,7 @@ send(Pid, Data) -> sync_send_all_state_event(Pid, {application_data, %% iolist_to_binary should really %% be called iodata_to_binary() - erlang:iolist_to_binary(Data)}, infinity). + erlang:iolist_to_binary(Data)}). %%-------------------------------------------------------------------- -spec recv(pid(), integer(), timeout()) -> @@ -127,7 +127,7 @@ send(Pid, Data) -> %% Description: Receives data when active = false %%-------------------------------------------------------------------- recv(Pid, Length, Timeout) -> - sync_send_all_state_event(Pid, {recv, Length}, Timeout). + sync_send_all_state_event(Pid, {recv, Length, Timeout}). %%-------------------------------------------------------------------- -spec connect(host(), inet:port_number(), port(), {#ssl_options{}, #socket_options{}}, pid(), tuple(), timeout()) -> @@ -164,7 +164,7 @@ ssl_accept(Port, Socket, Opts, User, CbInfo, Timeout) -> %% Description: Starts ssl handshake. %%-------------------------------------------------------------------- handshake(#sslsocket{pid = Pid}, Timeout) -> - case sync_send_all_state_event(Pid, start, Timeout) of + case sync_send_all_state_event(Pid, {start, Timeout}) of connected -> ok; Error -> @@ -335,15 +335,15 @@ init([Role, Host, Port, Socket, {SSLOpts0, _} = Options, User, CbInfo]) -> #state{}) -> gen_fsm_state_return(). %%-------------------------------------------------------------------- hello(start, #state{host = Host, port = Port, role = client, - ssl_options = SslOpts, - session = #session{own_certificate = Cert} = Session0, - session_cache = Cache, session_cache_cb = CacheCb, - transport_cb = Transport, socket = Socket, - connection_states = ConnectionStates0, - renegotiation = {Renegotiation, _}} = State0) -> + ssl_options = SslOpts, + session = #session{own_certificate = Cert} = Session0, + session_cache = Cache, session_cache_cb = CacheCb, + transport_cb = Transport, socket = Socket, + connection_states = ConnectionStates0, + renegotiation = {Renegotiation, _}} = State0) -> Hello = ssl_handshake:client_hello(Host, Port, ConnectionStates0, SslOpts, Cache, CacheCb, Renegotiation, Cert), - + Version = Hello#client_hello.client_version, Handshake0 = ssl_handshake:init_handshake_history(), {BinMsg, ConnectionStates, Handshake} = @@ -768,7 +768,8 @@ handle_sync_event({application_data, Data}, From, StateName, State#state{send_queue = queue:in({From, Data}, Queue)}, get_timeout(State)}; -handle_sync_event(start, StartFrom, hello, State) -> +handle_sync_event({start, Timeout} = Start, StartFrom, hello, State) -> + start_or_recv_cancel_timer(Timeout, StartFrom), hello(start, State#state{start_or_recv_from = StartFrom}); %% The two clauses below could happen if a server upgrades a socket in @@ -778,12 +779,14 @@ handle_sync_event(start, StartFrom, hello, State) -> %% mode before telling the client that it is willing to upgrade %% and before calling ssl:ssl_accept/2. These clauses are %% here to make sure it is the users problem and not owers if -%% they upgrade a active socket. -handle_sync_event(start, _, connection, State) -> +%% they upgrade an active socket. +handle_sync_event({start,_}, _, connection, State) -> {reply, connected, connection, State, get_timeout(State)}; -handle_sync_event(start, _From, error, {Error, State = #state{}}) -> +handle_sync_event({start,_}, _From, error, {Error, State = #state{}}) -> {stop, {shutdown, Error}, {error, Error}, State}; -handle_sync_event(start, StartFrom, StateName, State) -> + +handle_sync_event({start, Timeout}, StartFrom, StateName, State) -> + start_or_recv_cancel_timer(Timeout, StartFrom), {next_state, StateName, State#state{start_or_recv_from = StartFrom}, get_timeout(State)}; handle_sync_event(close, _, StateName, State) -> @@ -815,12 +818,14 @@ handle_sync_event({shutdown, How0}, _, StateName, {stop, normal, Error, State} end; -handle_sync_event({recv, N}, RecvFrom, connection = StateName, State0) -> +handle_sync_event({recv, N, Timeout}, RecvFrom, connection = StateName, State0) -> + start_or_recv_cancel_timer(Timeout, RecvFrom), passive_receive(State0#state{bytes_to_read = N, start_or_recv_from = RecvFrom}, StateName); %% Doing renegotiate wait with handling request until renegotiate is %% finished. Will be handled by next_state_is_connection/2. -handle_sync_event({recv, N}, RecvFrom, StateName, State) -> +handle_sync_event({recv, N, Timeout}, RecvFrom, StateName, State) -> + start_or_recv_cancel_timer(Timeout, RecvFrom), {next_state, StateName, State#state{bytes_to_read = N, start_or_recv_from = RecvFrom}, get_timeout(State)}; @@ -990,7 +995,14 @@ handle_info({'DOWN', MonitorRef, _, _, _}, _, handle_info(allow_renegotiate, StateName, State) -> {next_state, StateName, State#state{allow_renegotiate = true}, get_timeout(State)}; - + +handle_info({cancel_start_or_recv, RecvFrom}, connection = StateName, #state{start_or_recv_from = RecvFrom} = State) -> + gen_fsm:reply(RecvFrom, {error, timeout}), + {next_state, StateName, State#state{start_or_recv_from = undefined}, get_timeout(State)}; + +handle_info({cancel_start_or_recv, _RecvFrom}, StateName, State) -> + {next_state, StateName, State, get_timeout(State)}; + handle_info(Msg, StateName, State) -> Report = io_lib:format("SSL: Got unexpected info: ~p ~n", [Msg]), error_logger:info_report(Report), @@ -1201,15 +1213,10 @@ init_diffie_hellman(DbHandle,_, DHParamFile, server) -> end. sync_send_all_state_event(FsmPid, Event) -> - sync_send_all_state_event(FsmPid, Event, infinity). - -sync_send_all_state_event(FsmPid, Event, Timeout) -> - try gen_fsm:sync_send_all_state_event(FsmPid, Event, Timeout) + try gen_fsm:sync_send_all_state_event(FsmPid, Event, infinity) catch exit:{noproc, _} -> {error, closed}; - exit:{timeout, _} -> - {error, timeout}; exit:{normal, _} -> {error, closed}; exit:{shutdown, _} -> @@ -2465,3 +2472,8 @@ default_hashsign(_Version, KeyExchange) default_hashsign(_Version, KeyExchange) when KeyExchange == dh_anon -> {null, anon}. + +start_or_recv_cancel_timer(infinity, _RecvFrom) -> + ok; +start_or_recv_cancel_timer(Timeout, RecvFrom) -> + erlang:send_after(Timeout, self(), {cancel_start_or_recv, RecvFrom}). Regards Ingela Erlang/OTP team - Ericsson AB Lo?c Hoguin wrote: > Thanks! > > Ingela Anderton Andin wrote: > >> Hi! >> >> Well as you have a server increase the timeout in ssl_accept/[2,3], you >> do not want >> it to expire unless there is a network failure. >> >> Regards Ingela Erlang/OTP team - Ericsson AB >> >> Lo?c Hoguin wrote: >>> Alright. >>> >>> What should I do for a temporary fix to make sure this is the right >>> issue? >>> >>> On 11/07/2012 04:12 PM, Ingela Anderton Andin wrote: >>>> Hi! >>>> >>>> The problem is that "call-timeouts" in gen_server/fsm suck, as they are >>>> purly client side. The {ref, connected} is a gen:fsm-reply that should >>>> have been received by the ssl connect code. Like the recv problem on >>>> erlang-questions this is solved by making the timer server side, it >>>> could be argued some of these timeouts in ssl API are not needed, but >>>> they are legacy... We will fix it. After some investigation I think >>>> the "best" solution to your other problem will be call gen_tcp:recv/3 in >>>> the workaround. We will also clean up the logic in terminate. >>>> >>>> Regards Ingela Erlang/OTP team - Ericsson AB >>>> >>>> >>>> Lo?c Hoguin wrote: >>>>> On 11/01/2012 12:38 PM, Lo?c Hoguin wrote: >>>>>> * There is another problem where processes get stuck elsewhere, which >>>>>> I'm going to try to investigate. Note that the sockets for this >>>>>> problem >>>>>> stay in ESTABLISHED. >>>>> In this particular case, here's what we got: >>>>> >>>>> [{current_function,{gen_fsm,loop,7}}, >>>>> {initial_call,{proc_lib,init_p,5}}, >>>>> {status,waiting}, >>>>> {message_queue_len,0}, >>>>> {messages,[]}, >>>>> {links,[<0.897.0>,#Port<0.264729349>]}, >>>>> {dictionary,[{ssl_manager,ssl_manager}, >>>>> {'$ancestors',[ssl_connection_sup,ssl_sup,<0.894.0>]}, >>>>> {'$initial_call',{ssl_connection,init,1}}]}, >>>>> {trap_exit,false}, >>>>> {error_handler,error_handler}, >>>>> {priority,normal}, >>>>> {group_leader,<0.893.0>}, >>>>> {total_heap_size,8362}, >>>>> {heap_size,4181}, >>>>> {stack_size,10}, >>>>> {reductions,7029}, >>>>> {garbage_collection,[{min_bin_vheap_size,46368}, >>>>> {min_heap_size,233}, >>>>> {fullsweep_after,10}, >>>>> {minor_gcs,10}]}, >>>>> {suspending,[]}] >>>>> >>>>> Looking further, I notice something weird. >>>>> >>>>>> erlang:process_info(Pid, monitors). >>>>> {monitors,[{process,<0.1055.0>}]} >>>>> >>>>> This is a very old pid. >>>>> >>>>>> erlang:process_info(OldPid). >>>>> [{current_function,{prim_inet,accept0,2}}, >>>>> {initial_call,{cowboy_acceptor,acceptor,7}}, >>>>> {status,waiting}, >>>>> {message_queue_len,1602}, >>>>> {messages,[{#Ref<0.0.19.196440>,connected}, >>>>> {#Ref<0.0.21.74727>,connected}, >>>>> {#Ref<0.0.28.93234>,connected}, >>>>> {#Ref<0.0.64.192190>,connected}, >>>>> {#Ref<0.0.167.184831>,connected}, >>>>> {#Ref<0.0.208.24369>,connected}, >>>>> {#Ref<0.0.282.59352>,connected}, >>>>> {#Ref<0.0.340.181599>,connected}, >>>>> {#Ref<0.0.341.57338>,connected}, >>>>> {#Ref<0.0.427.15661>,connected}, >>>>> {#Ref<0.0.430.8560>,connected}, >>>>> {#Ref<0.0.439.40688>,connected}, >>>>> {#Ref<0.0.439.214050>,connected}, >>>>> {#Ref<0.0.440.206978>,connected}, >>>>> {#Ref<0.0.466.173049>,connected}, >>>>> {#Ref<0.0.497.35749>,connected}, >>>>> {#Ref<0.0.514.36774>,connected}, >>>>> {#Ref<0.0.514.109971>,connected}, >>>>> {#Ref<0.0.541.246233>,connected}, >>>>> {#Ref<0.0.544.168339>,connected}, >>>>> {#Ref<0.0.584.43294>,...}, >>>>> {...}|...]}, >>>>> {links,[<0.1028.0>]}, >>>>> {dictionary,[]}, >>>>> {trap_exit,false}, >>>>> {error_handler,error_handler}, >>>>> {priority,normal}, >>>>> {group_leader,<0.868.0>}, >>>>> {total_heap_size,32838}, >>>>> {heap_size,4181}, >>>>> {stack_size,22}, >>>>> {reductions,219876367}, >>>>> {garbage_collection,[{min_bin_vheap_size,46368}, >>>>> {min_heap_size,233}, >>>>> {fullsweep_after,10}, >>>>> {minor_gcs,1}]}, >>>>> {suspending,[]}] >>>>> >>>>> So this is the acceptor process. It doesn't make sense though, why >>>>> would the ssl process still monitor the acceptor? The acceptor code is >>>>> equivalent to this: >>>>> >>>>> {ok, Socket} = ssl:transport_accept(ListenSocket, 2000), >>>>> ok = ssl:ssl_accept(Socket, 2000), >>>>> {ok, Pid} = supervisor:start_child(SupPid, Args), >>>>> Transport:controlling_process(Socket, Pid), >>>>> %% ... >>>>> >>>>> I'm also not sure what {#Ref<0.0.19.196440>,connected} is. Are they >>>>> supposed to receive this? >>>>> >>>>> Any pointer as to where to look next would help. >>>>> >>>>> Thanks. >>>>> >>> From tuncer.ayaz@REDACTED Tue Nov 13 14:18:57 2012 From: tuncer.ayaz@REDACTED (Tuncer Ayaz) Date: Tue, 13 Nov 2012 14:18:57 +0100 Subject: [erlang-bugs] R16 EUnit *** context setup failed *** Message-ID: If I build and run rebar's EUnit tests with R16, I see "*** context setup failed ***" errors. Interestingly, if I use a tree where ebin/*, rebar, and .eunit/*.beam have been built with R15B02, then neither R16 nor R15 throw the context setup errors. # fetch and build rebar $ git clone git://github.com/rebar/rebar.git $ cd rebar $ make # run tests with new rebar binary $ ./rebar eunit # *** context setup failed *** errors and 50 out of 72 tests # not executed $ rm ebin/* .eunit/* rebar # rebuild rebar and EUnit tests with R15 $ make && ./rebar eunit # running EUnit tests with either R15 or R16 works now Can anyone else reproduce this? Could this be caused by compiler changes in R16? I don't see any changes in lib/eunit after R15B02. From tuncer.ayaz@REDACTED Tue Nov 13 14:23:01 2012 From: tuncer.ayaz@REDACTED (Tuncer Ayaz) Date: Tue, 13 Nov 2012 14:23:01 +0100 Subject: [erlang-bugs] Spec or Dialyzer regression In-Reply-To: References: <4FB2BB82.7000303@cs.ntua.gr> Message-ID: On Tue, Oct 2, 2012 at 4:09 PM, wrote: > Hi! > > It's not really obvious from the output, but the problem is the spec > for open_port in erlang.erl. All the "will never return" things all > boil down to rebar_utils:sh/2 and eventually the call to open_port. > The option 'hide' is missing from the spec (which is new as it was > before handled by the erl_bif_types.erl thing). > > I will update the spec in erlang.erl and you should be down to the > single warning again in a few days! Has the fix landed in master? From zfsgeek@REDACTED Fri Nov 16 04:11:29 2012 From: zfsgeek@REDACTED (Xu Yifeng) Date: Fri, 16 Nov 2012 11:11:29 +0800 Subject: [erlang-bugs] clang compiler warnings Message-ID: <50A5AEE1.8030203@163.com> I have compiled Erlang R15B02 with clang on FreeBSD, and got quite a lot of warnings. Please see attachment. Regards, Xu Yifeng -------------- next part -------------- configure: WARNING: Can not find wx/stc/stc.h -g -Wall -O2 -fPIC -fomit-frame-pointer -fno-strict-aliasing -O2 -pipe -fno-strict-aliasing -std=gnu89 -isystem /usr/X11R6/include -D_GNU_SOURCE -D_THREAD_SAFE -D_REENTRANT -I/usr/local/lib/wx/include/gtk2-unicode-release-2.8 -I/usr/local/include/wx-2.8 -D_FILE_OFFSET_BITS=64 -D_LARGE_FILES -D__WXGTK__ -pthread -D_THREAD_SAFE configure: WARNING: Can not link wx program are all developer packages installed? configure: WARNING: Check for large file support flags failed; getconf failed configure: WARNING: No 'fop' command found: going to generate placeholder PDF files beam/erl_instrument.c:778:48: warning: for loop has empty body [-Wempty-body] for (bp = mem_anchor; bp->next; bp = bp->next); ^ beam/erl_instrument.c:778:48: note: put the semicolon on a separate line to silence this warning 1 warning generated. beam/erl_bif_info.c:3324:14: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare] if (res < 0) ~~~ ^ ~ beam/erl_bif_info.c:3681:11: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare] if (res < 0) ~~~ ^ ~ 2 warnings generated. beam/bif.c:339:5: warning: variable 'mon' is used uninitialized whenever switch default is taken [-Wsometimes-uninitialized] default: ^~~~~~~ beam/bif.c:364:26: note: uninitialized use occurs here erts_destroy_monitor(mon); ^~~ beam/bif.c:258:21: note: initialize the variable 'mon' to silence this warning ErtsMonitor *mon; ^ = NULL beam/bif.c:1531:5: warning: expression result unused [-Wunused-value] ERTS_PROC_SET_TRAP_EXIT(BIF_P); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ beam/erl_process.h:1034:4: note: expanded from macro 'ERTS_PROC_SET_TRAP_EXIT' 1) ^ 2 warnings generated. beam/io.c:3573:14: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare] if (hlen < 0) ~~~~ ^ ~ 1 warning generated. beam/external.c:972:13: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare] if (hsz < 0) ~~~ ^ ~ 1 warning generated. beam/dist.c:2474:5: warning: expression result unused; should this cast be to 'void'? [-Wunused-value] (void *) ERTS_PROC_SET_DIST_ENTRY(net_kernel, ^ ~ 1 warning generated. beam/packet_parser.c:463:22: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare] if (plen < 0) ~~~~ ^ ~ 1 warning generated. beam/beam_load.c:2931:40: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare] if (Size.type == TAG_i && Size.val < 0) { ~~~~~~~~ ^ ~ beam/beam_load.c:5967:12: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare] if (arity < 0) { ~~~~~ ^ ~ 2 warnings generated. beam/beam_bp.c:762:3: warning: expression result unused; should this cast be to 'void'? [-Wunused-value] (void *) ERTS_PROC_SET_CALL_TIME(p, ERTS_PROC_LOCK_MAIN, pbt); ^ ~ 1 warning generated. drivers/unix/unix_efile.c:1504:11: warning: implicit declaration of function 'sendfile' [-Wimplicit-function-declaration] retval = sendfile(in_fd, out_fd, *offset, SENDFILE_CHUNK_SIZE, ^ 1 warning generated. drivers/common/gzio.c:329:17: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare] } else if (res < 0) { ~~~ ^ ~ drivers/common/gzio.c:502:21: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare] } else if (res < 0) { ~~~ ^ ~ 2 warnings generated. sys/unix/sys_float.c:835:16: warning: declaration of 'struct exception' will not be visible outside of this function [-Wvisibility] matherr(struct exception *exc) ^ sys/unix/sys_float.c:835:1: warning: no previous prototype for function 'matherr' [-Wmissing-prototypes] matherr(struct exception *exc) ^ 2 warnings generated. sys/common/erl_poll.c:2400:72: warning: for loop has empty body [-Wempty-body] for (prev_ps = pollsets; ps != prev_ps->next; prev_ps = prev_ps->next); ^ sys/common/erl_poll.c:2400:72: note: put the semicolon on a separate line to silence this warning 1 warning generated. sys/common/erl_poll.c:2400:72: warning: for loop has empty body [-Wempty-body] for (prev_ps = pollsets; ps != prev_ps->next; prev_ps = prev_ps->next); ^ sys/common/erl_poll.c:2400:72: note: put the semicolon on a separate line to silence this warning 1 warning generated. hipe/hipe_x86_signal.c:264:5: warning: no previous prototype for function '_sigaction' [-Wmissing-prototypes] int __SIGACTION(int signum, const struct sigaction *act, struct sigaction *oldact) ^ hipe/hipe_x86_signal.c:222:21: note: expanded from macro '__SIGACTION' #define __SIGACTION _sigaction ^ 1 warning generated. common/ethr_mutex.c:695:7: warning: expression result unused [-Wunused-value] ETHR_YIELD(); ^~~~~~~~~~~~ ../include/internal/ethread.h:403:49: note: expanded from macro 'ETHR_YIELD' # define ETHR_YIELD() (sched_yield() < 0 ? errno : 0) ^ /usr/include/errno.h:46:17: note: expanded from macro 'errno' #define errno (* __error()) ^ ~~~~~~~~~ common/ethr_mutex.c:714:3: warning: expression result unused [-Wunused-value] ETHR_YIELD(); ^~~~~~~~~~~~ ../include/internal/ethread.h:403:49: note: expanded from macro 'ETHR_YIELD' # define ETHR_YIELD() (sched_yield() < 0 ? errno : 0) ^ /usr/include/errno.h:46:17: note: expanded from macro 'errno' #define errno (* __error()) ^ ~~~~~~~~~ common/ethr_mutex.c:2164:3: warning: expression result unused [-Wunused-value] ETHR_YIELD(); ^~~~~~~~~~~~ ../include/internal/ethread.h:403:49: note: expanded from macro 'ETHR_YIELD' # define ETHR_YIELD() (sched_yield() < 0 ? errno : 0) ^ /usr/include/errno.h:46:17: note: expanded from macro 'errno' #define errno (* __error()) ^ ~~~~~~~~~ common/ethr_mutex.c:2291:3: warning: expression result unused [-Wunused-value] ETHR_YIELD(); ^~~~~~~~~~~~ ../include/internal/ethread.h:403:49: note: expanded from macro 'ETHR_YIELD' # define ETHR_YIELD() (sched_yield() < 0 ? errno : 0) ^ /usr/include/errno.h:46:17: note: expanded from macro 'errno' #define errno (* __error()) ^ ~~~~~~~~~ 4 warnings generated. common/erl_printf_format.c:439:14: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare] if (size < 0) { ~~~~ ^ ~ 1 warning generated. common/erl_printf_format.c:439:14: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare] if (size < 0) { ~~~~ ^ ~ 1 warning generated. cc: warning: argument unused during compilation: '-rpath=/usr/lib:/usr/local/lib' cc: warning: argument unused during compilation: '-rpath=/usr/lib:/usr/local/lib' beam/erl_instrument.c:778:48: warning: for loop has empty body [-Wempty-body] for (bp = mem_anchor; bp->next; bp = bp->next); ^ beam/erl_instrument.c:778:48: note: put the semicolon on a separate line to silence this warning 1 warning generated. beam/erl_bif_info.c:3324:14: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare] if (res < 0) ~~~ ^ ~ beam/erl_bif_info.c:3681:11: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare] if (res < 0) ~~~ ^ ~ 2 warnings generated. beam/bif.c:339:5: warning: variable 'mon' is used uninitialized whenever switch default is taken [-Wsometimes-uninitialized] default: ^~~~~~~ beam/bif.c:364:26: note: uninitialized use occurs here erts_destroy_monitor(mon); ^~~ beam/bif.c:258:21: note: initialize the variable 'mon' to silence this warning ErtsMonitor *mon; ^ = NULL beam/bif.c:1531:5: warning: expression result unused [-Wunused-value] ERTS_PROC_SET_TRAP_EXIT(BIF_P); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ beam/erl_process.h:1045:63: note: expanded from macro 'ERTS_PROC_SET_TRAP_EXIT' #define ERTS_PROC_SET_TRAP_EXIT(P) ((P)->flags |= F_TRAPEXIT, 1) ^ 2 warnings generated. beam/io.c:3573:14: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare] if (hlen < 0) ~~~~ ^ ~ 1 warning generated. beam/external.c:972:13: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare] if (hsz < 0) ~~~ ^ ~ 1 warning generated. beam/dist.c:2474:5: warning: expression result unused; should this cast be to 'void'? [-Wunused-value] (void *) ERTS_PROC_SET_DIST_ENTRY(net_kernel, ^ ~ 1 warning generated. beam/packet_parser.c:463:22: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare] if (plen < 0) ~~~~ ^ ~ 1 warning generated. beam/beam_load.c:2931:40: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare] if (Size.type == TAG_i && Size.val < 0) { ~~~~~~~~ ^ ~ beam/beam_load.c:5967:12: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare] if (arity < 0) { ~~~~~ ^ ~ 2 warnings generated. beam/beam_bp.c:762:3: warning: expression result unused; should this cast be to 'void'? [-Wunused-value] (void *) ERTS_PROC_SET_CALL_TIME(p, ERTS_PROC_LOCK_MAIN, pbt); ^ ~ 1 warning generated. drivers/unix/unix_efile.c:1504:11: warning: implicit declaration of function 'sendfile' [-Wimplicit-function-declaration] retval = sendfile(in_fd, out_fd, *offset, SENDFILE_CHUNK_SIZE, ^ 1 warning generated. drivers/common/gzio.c:329:17: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare] } else if (res < 0) { ~~~ ^ ~ drivers/common/gzio.c:502:21: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare] } else if (res < 0) { ~~~ ^ ~ 2 warnings generated. sys/unix/sys_float.c:835:16: warning: declaration of 'struct exception' will not be visible outside of this function [-Wvisibility] matherr(struct exception *exc) ^ sys/unix/sys_float.c:835:1: warning: no previous prototype for function 'matherr' [-Wmissing-prototypes] matherr(struct exception *exc) ^ 2 warnings generated. sys/common/erl_poll.c:2400:72: warning: for loop has empty body [-Wempty-body] for (prev_ps = pollsets; ps != prev_ps->next; prev_ps = prev_ps->next); ^ sys/common/erl_poll.c:2400:72: note: put the semicolon on a separate line to silence this warning 1 warning generated. sys/common/erl_poll.c:2400:72: warning: for loop has empty body [-Wempty-body] for (prev_ps = pollsets; ps != prev_ps->next; prev_ps = prev_ps->next); ^ sys/common/erl_poll.c:2400:72: note: put the semicolon on a separate line to silence this warning 1 warning generated. hipe/hipe_x86_signal.c:264:5: warning: no previous prototype for function '_sigaction' [-Wmissing-prototypes] int __SIGACTION(int signum, const struct sigaction *act, struct sigaction *oldact) ^ hipe/hipe_x86_signal.c:222:21: note: expanded from macro '__SIGACTION' #define __SIGACTION _sigaction ^ 1 warning generated. cc: warning: argument unused during compilation: '-rpath=/usr/lib:/usr/local/lib' inet_gethost.c:2565:35: warning: if statement has empty body [-Wempty-body] if(write(2,buff,strlen(buff))); ^ inet_gethost.c:2565:35: note: put the semicolon on a separate line to silence this warning inet_gethost.c:2588:35: warning: if statement has empty body [-Wempty-body] if(write(2,buff,strlen(buff))); ^ inet_gethost.c:2588:35: note: put the semicolon on a separate line to silence this warning inet_gethost.c:2611:35: warning: if statement has empty body [-Wempty-body] if(write(2,buff,strlen(buff))); ^ inet_gethost.c:2611:35: note: put the semicolon on a separate line to silence this warning 3 warnings generated. cc: warning: argument unused during compilation: '-rpath=/usr/lib:/usr/local/lib' heart.c:687:23: warning: if statement has empty body [-Wempty-body] if(system(command)); ^ heart.c:687:23: note: put the semicolon on a separate line to silence this warning heart.c:695:28: warning: if statement has empty body [-Wempty-body] if(system((char*)&cmd[0])); ^ heart.c:695:28: note: put the semicolon on a separate line to silence this warning 2 warnings generated. cc: warning: argument unused during compilation: '-rpath=/usr/lib:/usr/local/lib' ./erlexec.c:1131:1: warning: unused function 'usage_msg' [-Wunused-function] usage_msg(const char *msg) ^ 1 warning generated. cc: warning: argument unused during compilation: '-rpath=/usr/lib:/usr/local/lib' cc: warning: argument unused during compilation: '-rpath=/usr/lib:/usr/local/lib' cc: warning: argument unused during compilation: '-rpath=/usr/lib:/usr/local/lib' cc: warning: argument unused during compilation: '-rpath=/usr/lib:/usr/local/lib' cc: warning: argument unused during compilation: '-rpath=/usr/lib:/usr/local/lib' cc: warning: argument unused during compilation: '-rpath=/usr/lib:/usr/local/lib' ../unix/run_erl.c:925:11: warning: implicit declaration of function 'openpty' [-Wimplicit-function-declaration] if (openpty(&mfd, sfdp, slave, NULL, NULL) == 0) { ^ ../unix/run_erl.c:1157:2: warning: implicit declaration of function 'vsyslog' [-Wimplicit-function-declaration] vsyslog(priority,format,args); ^ 2 warnings generated. cc: warning: argument unused during compilation: '-rpath=/usr/lib:/usr/local/lib' cc: warning: argument unused during compilation: '-rpath=/usr/lib:/usr/local/lib' cc: warning: argument unused during compilation: '-rpath=/usr/lib:/usr/local/lib' cc: warning: argument unused during compilation: '-rpath=/usr/lib:/usr/local/lib' Makefile:71: warning: overriding recipe for target `clean' /usr/ports/lang/erlang/work/otp_src_R15B02/make/otp_subdir.mk:28: warning: ignoring old recipe for target `clean' oe_ei_encode_string.c:26:3: warning: expression result unused [-Wunused-value] (int) ei_encode_string(0,&size,p); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 warning generated. oe_ei_encode_atom.c:26:3: warning: expression result unused [-Wunused-value] (int) ei_encode_atom(0,&size,p); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~ 1 warning generated. oe_ei_encode_pid.c:26:3: warning: expression result unused [-Wunused-value] (int) ei_encode_pid(NULL, &size, p); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 warning generated. oe_ei_encode_port.c:26:3: warning: expression result unused [-Wunused-value] (int) ei_encode_port(NULL, &size, p); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 warning generated. oe_ei_encode_ref.c:26:3: warning: expression result unused [-Wunused-value] (int) ei_encode_ref(NULL, &size, p); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 warning generated. oe_ei_encode_term.c:26:3: warning: expression result unused [-Wunused-value] (int) ei_encode_term(NULL, &size, t); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 warning generated. oe_ei_code_erlang_binary.c:27:3: warning: expression result unused [-Wunused-value] (int) ei_encode_binary(0, &size, binary->_buffer, binary->_length); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 warning generated. erl_memory.c:861:19: warning: passing 'usgnd_int_8 *' (aka 'unsigned char *') to parameter of type 'char *' converts between pointers to integer types with different sign [-Wpointer-sign] *p += sprintf(*p, "%*" USGND_INT_MAX_FSTR " ", fw, mi->size); ^~ /usr/include/stdio.h:267:31: note: passing argument to parameter here int sprintf(char * __restrict, const char * __restrict, ...); ^ erl_memory.c:863:16: warning: passing 'usgnd_int_8 *' (aka 'unsigned char *') to parameter of type 'char *' converts between pointers to integer types with different sign [-Wpointer-sign] *p += sprintf(*p, ^~ /usr/include/stdio.h:267:31: note: passing argument to parameter here int sprintf(char * __restrict, const char * __restrict, ...); ^ erl_memory.c:869:16: warning: passing 'usgnd_int_8 *' (aka 'unsigned char *') to parameter of type 'char *' converts between pointers to integer types with different sign [-Wpointer-sign] *p += sprintf(*p, "%*" USGND_INT_MAX_FSTR " ", fw, mi->no); ^~ /usr/include/stdio.h:267:31: note: passing argument to parameter here int sprintf(char * __restrict, const char * __restrict, ...); ^ erl_memory.c:871:20: warning: passing 'usgnd_int_8 *' (aka 'unsigned char *') to parameter of type 'char *' converts between pointers to integer types with different sign [-Wpointer-sign] *p += sprintf(*p, ^~ /usr/include/stdio.h:267:31: note: passing argument to parameter here int sprintf(char * __restrict, const char * __restrict, ...); ^ erl_memory.c:878:16: warning: passing 'usgnd_int_8 *' (aka 'unsigned char *') to parameter of type 'char *' converts between pointers to integer types with different sign [-Wpointer-sign] *p += sprintf(*p, ^~ /usr/include/stdio.h:267:31: note: passing argument to parameter here int sprintf(char * __restrict, const char * __restrict, ...); ^ erl_memory.c:900:19: warning: passing 'usgnd_int_8 *' (aka 'unsigned char *') to parameter of type 'char *' converts between pointers to integer types with different sign [-Wpointer-sign] *p += sprintf(*p, "%*" USGND_INT_MAX_FSTR " ", fw, mi->max_ever_size); ^~ /usr/include/stdio.h:267:31: note: passing argument to parameter here int sprintf(char * __restrict, const char * __restrict, ...); ^ erl_memory.c:902:16: warning: passing 'usgnd_int_8 *' (aka 'unsigned char *') to parameter of type 'char *' converts between pointers to integer types with different sign [-Wpointer-sign] *p += sprintf(*p, "%*s %*s ", fw, "", fw, ""); ^~ /usr/include/stdio.h:267:31: note: passing argument to parameter here int sprintf(char * __restrict, const char * __restrict, ...); ^ erl_memory.c:904:16: warning: passing 'usgnd_int_8 *' (aka 'unsigned char *') to parameter of type 'char *' converts between pointers to integer types with different sign [-Wpointer-sign] *p += sprintf(*p, "%*" USGND_INT_MAX_FSTR " ", fw, mi->max_ever_no); ^~ /usr/include/stdio.h:267:31: note: passing argument to parameter here int sprintf(char * __restrict, const char * __restrict, ...); ^ erl_memory.c:906:20: warning: passing 'usgnd_int_8 *' (aka 'unsigned char *') to parameter of type 'char *' converts between pointers to integer types with different sign [-Wpointer-sign] *p += sprintf(*p, "%*s %*s ", fw, "", fw, ""); ^~ /usr/include/stdio.h:267:31: note: passing argument to parameter here int sprintf(char * __restrict, const char * __restrict, ...); ^ erl_memory.c:909:16: warning: passing 'usgnd_int_8 *' (aka 'unsigned char *') to parameter of type 'char *' converts between pointers to integer types with different sign [-Wpointer-sign] *p += sprintf(*p, "%*s %*s %*s ", fw, "", fw, "", fw, ""); ^~ /usr/include/stdio.h:267:31: note: passing argument to parameter here int sprintf(char * __restrict, const char * __restrict, ...); ^ erl_memory.c:947:7: warning: passing 'int *' to parameter of type 'socklen_t *' (aka 'unsigned int *') converts between pointers to integer types with different sign [-Wpointer-sign] &saddr_size) != 0) ^~~~~~~~~~~ /usr/include/sys/socket.h:616:74: note: passing argument to parameter here int getsockname(int, struct sockaddr * __restrict, socklen_t * __restrict); ^ erl_memory.c:986:25: warning: passing 'usgnd_int_8 *' (aka 'unsigned char *') to parameter of type 'char *' converts between pointers to integer types with different sign [-Wpointer-sign] area.size = sprintf(area.ptr, format, carg); ^~~~~~~~ /usr/include/stdio.h:267:31: note: passing argument to parameter here int sprintf(char * __restrict, const char * __restrict, ...); ^ erl_memory.c:1103:25: warning: passing 'usgnd_int_8 *' (aka 'unsigned char *') to parameter of type 'char *' converts between pointers to integer types with different sign [-Wpointer-sign] area.size = sprintf(area.ptr, ^~~~~~~~ /usr/include/stdio.h:267:31: note: passing argument to parameter here int sprintf(char * __restrict, const char * __restrict, ...); ^ erl_memory.c:1109:23: warning: passing 'usgnd_int_8 *' (aka 'unsigned char *') to parameter of type 'char *' converts between pointers to integer types with different sign [-Wpointer-sign] area.size += sprintf(area.ptr + area.size, ^~~~~~~~~~~~~~~~~~~~ /usr/include/stdio.h:267:31: note: passing argument to parameter here int sprintf(char * __restrict, const char * __restrict, ...); ^ erl_memory.c:1120:26: warning: passing 'usgnd_int_8 *' (aka 'unsigned char *') to parameter of type 'char *' converts between pointers to integer types with different sign [-Wpointer-sign] area.size += sprintf(area.ptr + area.size, ^~~~~~~~~~~~~~~~~~~~ /usr/include/stdio.h:267:31: note: passing argument to parameter here int sprintf(char * __restrict, const char * __restrict, ...); ^ erl_memory.c:1190:18: warning: passing 'usgnd_int_8 *' (aka 'unsigned char *') to parameter of type 'char *' converts between pointers to integer types with different sign [-Wpointer-sign] p += sprintf(p, "> %-*s", EM_TIME_FIELD_WIDTH - 2, "Maximum:"); ^ /usr/include/stdio.h:267:31: note: passing argument to parameter here int sprintf(char * __restrict, const char * __restrict, ...); ^ erl_memory.c:1223:18: warning: passing 'usgnd_int_8 *' (aka 'unsigned char *') to parameter of type 'char *' converts between pointers to integer types with different sign [-Wpointer-sign] p += sprintf(p, "\n"); ^ /usr/include/stdio.h:267:31: note: passing argument to parameter here int sprintf(char * __restrict, const char * __restrict, ...); ^ erl_memory.c:1227:15: warning: passing 'usgnd_int_8 *' (aka 'unsigned char *') to parameter of type 'char *' converts between pointers to integer types with different sign [-Wpointer-sign] p += sprintf(p, "%s", stop_str); ^ /usr/include/stdio.h:267:31: note: passing argument to parameter here int sprintf(char * __restrict, const char * __restrict, ...); ^ erl_memory.c:1230:15: warning: passing 'usgnd_int_8 *' (aka 'unsigned char *') to parameter of type 'char *' converts between pointers to integer types with different sign [-Wpointer-sign] p += sprintf(p, exit_str, state->info.exit_status); ^ /usr/include/stdio.h:267:31: note: passing argument to parameter here int sprintf(char * __restrict, const char * __restrict, ...); ^ erl_memory.c:1236:18: warning: passing 'usgnd_int_8 *' (aka 'unsigned char *') to parameter of type 'char *' converts between pointers to integer types with different sign [-Wpointer-sign] p += sprintf(p, format, tsz, bw); ^ /usr/include/stdio.h:267:31: note: passing argument to parameter here int sprintf(char * __restrict, const char * __restrict, ...); ^ erl_memory.c:1286:18: warning: passing 'usgnd_int_8 *' (aka 'unsigned char *') to parameter of type 'char *' converts between pointers to integer types with different sign [-Wpointer-sign] p += sprintf(p, "%*" USGND_INT_32_FSTR " ", EM_TIME_FIELD_WIDTH - 1, secs); ^ /usr/include/stdio.h:267:31: note: passing argument to parameter here int sprintf(char * __restrict, const char * __restrict, ...); ^ erl_memory.c:1311:18: warning: passing 'usgnd_int_8 *' (aka 'unsigned char *') to parameter of type 'char *' converts between pointers to integer types with different sign [-Wpointer-sign] p += sprintf(p, "\n"); ^ /usr/include/stdio.h:267:31: note: passing argument to parameter here int sprintf(char * __restrict, const char * __restrict, ...); ^ erl_memory.c:2231:32: warning: passing 'char *' to parameter of type 'usgnd_int_8 *' (aka 'unsigned char *') converts between pointers to integer types with different sign [-Wpointer-sign] size = write_header(state, state->output.header, 1); ^~~~~~~~~~~~~~~~~~~~ erl_memory.c:736:44: note: passing argument to parameter 'ptr' here write_header(em_state *state, usgnd_int_8 *ptr, int trunc) ^ erl_memory.c:2613:57: warning: passing 'int *' to parameter of type 'socklen_t *' (aka 'unsigned int *') converts between pointers to integer types with different sign [-Wpointer-sign] sock = accept(lsock, (struct sockaddr *) &oth_addr, &oth_addr_len); ^~~~~~~~~~~~~ /usr/include/sys/socket.h:612:69: note: passing argument to parameter here int accept(int, struct sockaddr * __restrict, socklen_t * __restrict); ^ 24 warnings generated. cc: warning: argument unused during compilation: '-rpath=/usr/lib:/usr/local/lib' cc: warning: argument unused during compilation: '-rpath=/usr/lib:/usr/local/lib' cc: warning: argument unused during compilation: '-rpath=/usr/lib:/usr/local/lib' cc: warning: argument unused during compilation: '-rpath=/usr/lib:/usr/local/lib' In file included from connect/eirecv.c:35: misc/ei_portio.h:31:46: warning: declaration of 'struct iovec' will not be visible outside of this function [-Wvisibility] int ei_writev_fill_t(int fd, const struct iovec *iov, int iovcnt, ^ 1 warning generated. encode/encode_ulong.c:38:28: warning: comparison of unsigned expression >= 0 is always true [-Wtautological-compare] else if ((p < 256) && (p >= 0)) { ~ ^ ~ 1 warning generated. encode/encode_ulonglong.c:55:25: warning: comparison of unsigned expression >= 0 is always true [-Wtautological-compare] if ((p < 256) && (p >= 0)) { ~ ^ ~ 1 warning generated. legacy/erl_marshal.c:273:24: warning: comparison of unsigned expression >= 0 is always true [-Wtautological-compare] if ((ul < 256) && (ul >= 0)) { ~~ ^ ~ 1 warning generated. In file included from connect/eirecv.c:35: misc/ei_portio.h:31:46: warning: declaration of 'struct iovec' will not be visible outside of this function [-Wvisibility] int ei_writev_fill_t(int fd, const struct iovec *iov, int iovcnt, ^ 1 warning generated. encode/encode_ulong.c:38:28: warning: comparison of unsigned expression >= 0 is always true [-Wtautological-compare] else if ((p < 256) && (p >= 0)) { ~ ^ ~ 1 warning generated. encode/encode_ulonglong.c:55:25: warning: comparison of unsigned expression >= 0 is always true [-Wtautological-compare] if ((p < 256) && (p >= 0)) { ~ ^ ~ 1 warning generated. legacy/erl_marshal.c:273:24: warning: comparison of unsigned expression >= 0 is always true [-Wtautological-compare] if ((ul < 256) && (ul >= 0)) { ~~ ^ ~ 1 warning generated. cc: warning: argument unused during compilation: '-rpath=/usr/lib:/usr/local/lib' cc: warning: argument unused during compilation: '-rpath=/usr/lib:/usr/local/lib' cc: warning: argument unused during compilation: '-rpath=/usr/lib:/usr/local/lib' cc: warning: argument unused during compilation: '-rpath=/usr/lib:/usr/local/lib' Makefile:129: warning: overriding recipe for target `../index.html' Makefile:126: warning: ignoring old recipe for target `../index.html' Makefile:71: warning: overriding recipe for target `clean' /usr/ports/lang/erlang/work/otp_src_R15B02/make/otp_subdir.mk:28: warning: ignoring old recipe for target `clean' From pan@REDACTED Fri Nov 16 16:45:44 2012 From: pan@REDACTED (Patrik Nyblom) Date: Fri, 16 Nov 2012 16:45:44 +0100 Subject: [erlang-bugs] clang compiler warnings In-Reply-To: <50A5AEE1.8030203@163.com> References: <50A5AEE1.8030203@163.com> Message-ID: <50A65FA8.3020506@erlang.org> Hi! Great, We'll look into them - some of them seem to indicate real bugs (while of course some are just picky or due to gcc attributes clang will never see), but I'll try to take them all away for R16! Thanks! /Patrik On 11/16/2012 04:11 AM, Xu Yifeng wrote: > I have compiled Erlang R15B02 with clang on FreeBSD, and got quite > a lot of warnings. Please see attachment. > > Regards, > Xu Yifeng > > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs -------------- next part -------------- An HTML attachment was scrubbed... URL: From dan353hehe@REDACTED Sat Nov 17 00:37:36 2012 From: dan353hehe@REDACTED (Delorum) Date: Fri, 16 Nov 2012 16:37:36 -0700 Subject: [erlang-bugs] ssl socket session upgrade fails References: <5F5F057B-68C0-4F7E-8F4E-901EA1838B40@pagodabox.com> Message-ID: <13EC05B8-CE05-448C-89EF-E268C21EF77D@gmail.com> So i think that reusing sessions might be broke if the client and the server do not have the same version of openssl installed on their machine. here is a bit of code that can trigger the error: ssl:start(), {ok,Listen} = ssl:listen(443,[{reuseaddr,true},{certfile,"/mnt/ssl/mysite.com.crt"},{keyfile,"mysite.com.key"}]), {ok,NewSocket} = ssl:transport_accept(Listen), ssl:ssl_accept(NewSocket), {ok,NewSock2} = ssl:transport_accept(Listen), ssl:ssl_accept(NewSock2). and here is what can be run in another shell to case the error: openssl s_client -ssl3 -connect 192.168.0.10:443 -reconnect the interesting thing that I have noticed is that when running the openssl s_client command from the same machine that the erlang server is runing DOES NOT cause the issue. But when running the same command from any other machine, and I tested it with 12 machines here in the office it fails. to be more specific, if the version of openssl on the CLIENT machine is 0.9.8r, and the server version is in the 1.0.1 series. really the problem is that clients should not have to upgrade their version of openssl in order to visit websites hosted by an erlang application. and here is the crash, i removed all the binary data and the private key data because this is not a test cert: =ERROR REPORT==== 16-Nov-2012::16:54:57 === ** State machine <0.49.0> terminating ** Last message in was {tcp,#Port<0.1263>, << removed >>} ** When State == hello ** Data == {state,server, {#Ref<0.0.0.58>,<0.32.0>}, gen_tcp,tcp,tcp_closed,tcp_error,"localhost",443, #Port<0.1263>, {ssl_options,[],verify_none, {#Fun,[]}, false,false,undefined,1, <<"/mnt/ssl/mysite.com.crt">>, undefined, <<"/mnt/ssl/mysite.com.key">>, undefined,undefined,undefined,<<>>,undefined, undefined, [<<0,57>>, <<0,56>>, <<0,53>>, <<0,22>>, <<0,19>>, <<0,10>>, <<0,51>>, <<0,50>>, <<0,47>>, <<0,5>>, <<0,4>>, <<0,21>>, <<0,9>>], #Fun,true,268435456,false,[], undefined,false,undefined,undefined}, {socket_options,list,0,0,0,true}, {connection_states, {connection_state, {security_parameters, <<0,0>>, 0,0,0,0,0,0,0,0,0,0,0,undefined,undefined, undefined,undefined}, undefined,undefined,undefined,0,undefined, undefined,undefined}, {connection_state, {security_parameters,undefined,0,undefined, undefined,undefined,undefined,undefined, undefined,undefined,undefined,undefined, undefined,undefined,undefined, <>, undefined}, undefined,undefined,undefined,undefined, undefined,undefined,undefined}, {connection_state, {security_parameters, <<0,0>>, 0,0,0,0,0,0,0,0,0,0,0,undefined,undefined, undefined,undefined}, undefined,undefined,undefined,0,undefined, undefined,undefined}, {connection_state, {security_parameters,undefined,0,undefined, undefined,undefined,undefined,undefined, undefined,undefined,undefined,undefined, undefined,undefined,undefined, << removed >>, undefined}, undefined,undefined,undefined,undefined, undefined,undefined,undefined}}, [],<<>>,<<>>, {[],[]}, [],16400, {session,undefined,undefined, << removed >>, undefined,undefined,undefined,new,63520304097}, 28691,ssl_session_cache,undefined,undefined,false, undefined,undefined,undefined, {'RSAPrivateKey','two-prime', removed asn1_NOVALUE}, {'DHParameter', removed, 2,asn1_NOVALUE}, undefined,undefined,20497,#Ref<0.0.0.61>,0,<<>>,true, {false,first}, {<0.32.0>,#Ref<0.0.0.60>}, {[],[]}, false,true,false,undefined} ** Reason for termination = ** {function_clause, [{ssl_session,server_id, [443, <<135,245,186,148,131,78,105,38,70,210,147,42,207,139,174,106,166, 97,85,161,20,70,127,51,6,193,41,5,157,250,239,90>>, {ssl_options,[],verify_none, {#Fun,[]}, false,false,undefined,1, <<"/mnt/ssl/mysite.com.crt">>,undefined, <<"/mnt/ssl/mysite.com.key">>,undefined, undefined,undefined,<<>>,undefined,undefined, [<<0,57>>, <<0,56>>, <<0,53>>, <<0,22>>, <<0,19>>, <<0,10>>, <<0,51>>, <<0,50>>, <<0,47>>, <<0,5>>, <<0,4>>, <<0,21>>, <<0,9>>], #Fun,true,268435456,false,[],undefined,false, undefined,undefined}, << removed >>, 28691,ssl_session_cache], [{file,"ssl_session.erl"},{line,73}]}, {ssl_handshake,select_session,8, [{file,"ssl_handshake.erl"},{line,629}]}, {ssl_handshake,hello,4,[{file,"ssl_handshake.erl"},{line,178}]}, {ssl_connection,hello,2,[{file,"ssl_connection.erl"},{line,414}]}, {ssl_connection,next_state,4, [{file,"ssl_connection.erl"},{line,2002}]}, {gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,494}]}, {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]} From n.oxyde@REDACTED Mon Nov 19 11:02:42 2012 From: n.oxyde@REDACTED (Anthony Ramine) Date: Mon, 19 Nov 2012 11:02:42 +0100 Subject: [erlang-bugs] Local function names in Core Erlang guards Message-ID: <6BADBA9E-8FD4-4051-B960-A13B64A20B53@gmail.com> Hi, While patching the compiler to allow substitutions of variables which values are local function names [1], I discovered that core_lint doesn't forbid them in guards, even though that makes the compiler passes further down the road generate badly-formed BEAM code. Is that a bug in core_lint or a bug in the BEAM code generation? Should local function names be allowed in guards? If it is a bug in core_lint, I can make a patch for that; if it is a bug in the BEAM code generation I would love to fix it and remove the code I wrote to avoid the substitution in guards... but I lack knowledge about the BEAM innards. Regards, [1] http://erlang.org/pipermail/erlang-patches/2012-November/003137.html -- Anthony Ramine From robert.virding@REDACTED Mon Nov 19 12:24:45 2012 From: robert.virding@REDACTED (Robert Virding) Date: Mon, 19 Nov 2012 11:24:45 -0000 (GMT) Subject: [erlang-bugs] Local function names in Core Erlang guards In-Reply-To: <6BADBA9E-8FD4-4051-B960-A13B64A20B53@gmail.com> Message-ID: It's a core_lint bug! Erlang (and core and the BEAM) does not permit calling user defined functions in a guard. The core scanning/parsing/linting was added to allow people to write code directly in Core erlang. As far as I know no one does. Robert ----- Original Message ----- > From: "Anthony Ramine" > To: erlang-bugs@REDACTED > Cc: "Bjorn Gustavsson" > Sent: Monday, 19 November, 2012 11:02:42 AM > Subject: [erlang-bugs] Local function names in Core Erlang guards > > Hi, > > While patching the compiler to allow substitutions of variables which > values are > local function names [1], I discovered that core_lint doesn't forbid > them in guards, > even though that makes the compiler passes further down the road > generate badly-formed > BEAM code. > > Is that a bug in core_lint or a bug in the BEAM code generation? > Should local function > names be allowed in guards? > > If it is a bug in core_lint, I can make a patch for that; if it is a > bug in the BEAM > code generation I would love to fix it and remove the code I wrote to > avoid the > substitution in guards... but I lack knowledge about the BEAM > innards. > > Regards, > > [1] > http://erlang.org/pipermail/erlang-patches/2012-November/003137.html > > -- > Anthony Ramine > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > From n.oxyde@REDACTED Mon Nov 19 12:38:21 2012 From: n.oxyde@REDACTED (Anthony Ramine) Date: Mon, 19 Nov 2012 12:38:21 +0100 Subject: [erlang-bugs] Local function names in Core Erlang guards In-Reply-To: References: Message-ID: <55CBB68F-60E0-4D46-B298-458DA37BFE55@gmail.com> Le 19 nov. 2012 ? 12:24, Robert Virding a ?crit : > It's a core_lint bug! Erlang (and core and the BEAM) does not permit calling user defined functions in a guard. Mmh, I'm not talking about calling used defined functions in a guard; I'm talking about putting the name of a local function there. Guards like the one in this letrec: letrec 'Myself'/1 = fun (F) -> case of when 'erlang':'=='(F, 'Myself'/1) -> true; <_> when 'true' -> false end in 'Myself'/1 The "'Myself'/1" in 'erlang':'=='(F, 'Myself'/1) is compiled to a make_fun opcode but there is no stack frame for it and it triggers an error in beam_validator. A local function name in Core Erlang is just a variable which name is a tuple {FunName,Arity} and core_lint accepts every variable in guards [1] (that's the bug I'm talking about). Custom functions correctly trigger an error as every expression which is neither a guard nor a constant expr is disallowed by the linter [2]. [1] https://github.com/erlang/otp/blob/d30cee99c662bf030ce035e56e342d7ebf155513/lib/compiler/src/core_lint.erl#L250 [2] https://github.com/erlang/otp/blob/d30cee99c662bf030ce035e56e342d7ebf155513/lib/compiler/src/core_lint.erl#L279-280 > The core scanning/parsing/linting was added to allow people to write code directly in Core erlang. As far as I know no one does. Well, I did for EEP37 :) > Robert > > ----- Original Message ----- >> From: "Anthony Ramine" >> To: erlang-bugs@REDACTED >> Cc: "Bjorn Gustavsson" >> Sent: Monday, 19 November, 2012 11:02:42 AM >> Subject: [erlang-bugs] Local function names in Core Erlang guards >> >> Hi, >> >> While patching the compiler to allow substitutions of variables which >> values are >> local function names [1], I discovered that core_lint doesn't forbid >> them in guards, >> even though that makes the compiler passes further down the road >> generate badly-formed >> BEAM code. >> >> Is that a bug in core_lint or a bug in the BEAM code generation? >> Should local function >> names be allowed in guards? >> >> If it is a bug in core_lint, I can make a patch for that; if it is a >> bug in the BEAM >> code generation I would love to fix it and remove the code I wrote to >> avoid the >> substitution in guards... but I lack knowledge about the BEAM >> innards. >> >> Regards, >> >> [1] >> http://erlang.org/pipermail/erlang-patches/2012-November/003137.html -- Anthony Ramine From Antonio.Musumeci@REDACTED Mon Nov 19 14:01:39 2012 From: Antonio.Musumeci@REDACTED (Musumeci, Antonio S) Date: Mon, 19 Nov 2012 13:01:39 +0000 Subject: [erlang-bugs] beam core'ing Message-ID: <51C6F20DC46369418387C5250127649B039D96@HZWEX2014N4.msad.ms.com> I'm just starting to debug this but figured I'd send it along in case anyone has seen this before. 64bit RHEL 5.0.1 built from source beam.smp R15B02 Happens consistently when trying to start our app and then just stops after a time. Across a few boxes. Oddly we have an identical cluster (hw and sw) and it never happens. #0 bf_unlink_free_block (flags=, block=0x6f00, allctr=) at beam/erl_bestfit_alloc.c:789 #1 bf_get_free_block (allctr=0x6824600, size=304, cand_blk=0x0, cand_size=, flags=0) at beam/erl_bestfit_alloc.c:869 #2 0x000000000045343c in mbc_alloc_block (alcu_flgsp=, blk_szp=, size=, allctr=) at beam/erl_alloc_util.c:1198 #3 mbc_alloc (allctr=0x6824600, size=295) at beam/erl_alloc_util.c:1345 #4 0x000000000045398d in do_erts_alcu_alloc (type=164, extra=0x6824600, size=295) at beam/erl_alloc_util.c:3442 #5 0x0000000000453a0f in erts_alcu_alloc_thr_pref (type=164, extra=, size=287) at beam/erl_alloc_util.c:3520 #6 0x0000000000511463 in erts_alloc (size=287, type=) at beam/erl_alloc.h:208 #7 erts_bin_nrml_alloc (size=) at beam/erl_binary.h:260 #8 erts_bs_append (c_p=0x69fba60, reg=, live=, build_size_term=, extra_words=0, unit=8) at beam/erl_bits.c:1327 #9 0x000000000053ffd8 in process_main () at beam/beam_emu.c:3858 #10 0x00000000004ae853 in sched_thread_func (vesdp=) at beam/erl_process.c:5184 #11 0x00000000005c17e9 in thr_wrapper (vtwd=) at pthread/ethread.c:106 #12 0x00002b430f39e73d in start_thread () from /lib64/libpthread.so.0 #13 0x00002b430f890f6d in clone () from /lib64/libc.so.6 #14 0x0000000000000000 in ?? () -------------- next part -------------- An HTML attachment was scrubbed... URL: From pan@REDACTED Mon Nov 19 14:54:31 2012 From: pan@REDACTED (Patrik Nyblom) Date: Mon, 19 Nov 2012 14:54:31 +0100 Subject: [erlang-bugs] beam core'ing In-Reply-To: <51C6F20DC46369418387C5250127649B039D96@HZWEX2014N4.msad.ms.com> References: <51C6F20DC46369418387C5250127649B039D96@HZWEX2014N4.msad.ms.com> Message-ID: <50AA3A17.9030300@erlang.org> On 11/19/2012 02:01 PM, Musumeci, Antonio S wrote: > > I'm just starting to debug this but figured I'd send it along in case > anyone has seen this before. > > 64bit RHEL 5.0.1 > > built from source beam.smp R15B02 > > Happens consistently when trying to start our app and then just stops > after a time. Across a few boxes. Oddly we have an identical cluster > (hw and sw) and it never happens. > Yes! I've seen it before and have tried for several months to get areproducable example and acore i can analyze here. I've had one core that wassomewhat readable but had no luck in locating the beam code that triggered this. If you could try narrowing it down, I would be really grateful! Please email me any findings, theories, cores dumps- anything! I really want to find this! The most interesting would be to find the snippet of erlang code that makes this happen (intermittently probably). The problem is that when the allocators crash, the error is usually somewhere else. Access of freed memory, double free or something else doing horrid things to memory. Obviously none of our testsuites exercise this bug as neither our debug builds, nor our valgrind runs find it. It happens on both SMP and non SMP and is always in the context of the erts_bs_append, so I'm pretty sure this has a connection to the other users seeing the crash in the allocators... Cheers, Patrik > > #0 bf_unlink_free_block (flags=, block=0x6f00, > allctr=) at beam/erl_bestfit_alloc.c:789 > #1 bf_get_free_block (allctr=0x6824600, size=304, cand_blk=0x0, > cand_size=, flags=0) at beam/erl_bestfit_alloc.c:869 > #2 0x000000000045343c in mbc_alloc_block (alcu_flgsp=, > blk_szp=, size=, allctr=) > at beam/erl_alloc_util.c:1198 > #3 mbc_alloc (allctr=0x6824600, size=295) at beam/erl_alloc_util.c:1345 > #4 0x000000000045398d in do_erts_alcu_alloc (type=164, > extra=0x6824600, size=295) at beam/erl_alloc_util.c:3442 > #5 0x0000000000453a0f in erts_alcu_alloc_thr_pref (type=164, > extra=, size=287) at beam/erl_alloc_util.c:3520 > #6 0x0000000000511463 in erts_alloc (size=287, type=) > at beam/erl_alloc.h:208 > #7 erts_bin_nrml_alloc (size=) at beam/erl_binary.h:260 > #8 erts_bs_append (c_p=0x69fba60, reg=, live= out>, build_size_term=, extra_words=0, unit=8)at > beam/erl_bits.c:1327 > #9 0x000000000053ffd8 in process_main () at beam/beam_emu.c:3858 > #10 0x00000000004ae853 in sched_thread_func (vesdp=) at > beam/erl_process.c:5184 > #11 0x00000000005c17e9 in thr_wrapper (vtwd=) at > pthread/ethread.c:106 > #12 0x00002b430f39e73d in start_thread () from /lib64/libpthread.so.0 > #13 0x00002b430f890f6d in clone () from /lib64/libc.so.6 > #14 0x0000000000000000 in ?? () > > > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs -------------- next part -------------- An HTML attachment was scrubbed... URL: From arif.ishaq@REDACTED Tue Nov 20 09:17:46 2012 From: arif.ishaq@REDACTED (Arif Ishaq) Date: Tue, 20 Nov 2012 08:17:46 +0000 Subject: [erlang-bugs] Module re, escape needs double backslash Message-ID: <1CAB695D2C2A8F4BB0B242A5B44C75E901A904@ESESSMB301.ericsson.se> Hi, The backslash character as escape doesn't work as expected. Erlang R15B (erts-5.9) [smp:4:4] [async-threads:0] Eshell V5.9 (abort with ^G) 1> String = "/* this is a C comment */". "/* this is a C comment */" 2> re:run(String, "/\*.*?\*/"). ** exception error: bad argument in function re:run/2 called as re:run("/* this is a C comment */","/*.*?*/") 3> re:run(String, "/\\*.*?\\*/"). {match,[{0,25}]} 4> Best regards PS. The documentation in "erl5.9/lib/stdlib-1.18/doc/html/re.html" says: ".. the pattern /\*.*?\*/ does the right thing with the C comments." -------------- next part -------------- An HTML attachment was scrubbed... URL: From ess@REDACTED Tue Nov 20 11:24:06 2012 From: ess@REDACTED (=?ISO-8859-1?Q?Erik_S=F8e_S=F8rensen?=) Date: Tue, 20 Nov 2012 11:24:06 +0100 Subject: [erlang-bugs] Module re, escape needs double backslash In-Reply-To: <1CAB695D2C2A8F4BB0B242A5B44C75E901A904@ESESSMB301.ericsson.se> References: <1CAB695D2C2A8F4BB0B242A5B44C75E901A904@ESESSMB301.ericsson.se> Message-ID: <50AB5A46.5020806@trifork.com> Not a bug, just a surprise - The string "/\\*.*?\\*/" does only contain two backslashes: > length([C || C <- "/\\*.*?\\*/", C==92]). % The character code for backslash being 92. 2 The string literal in the *Erlang source file*[1], of course, plainly contains four of them. The surprise is that there are two interpreters involved -- two layers, each of which requires backslash-escaping: 1. The Erlang parser reads [Backslash, Backslash] and puts a single backslash into the string literal (rather than taking the second backslash as a signal to start an escape sequence). 2. The "re" module reads [Backslash, Asterisk] and interprets this by taking the asterisk literally (rather than as a zero-or-more modifier). Or going the other way: If we desire a literal asterisk in the pattern, we must escape it, so that "re" sees a backslash in front of the asterisk. Thus, we want a backslash in the string. And if we want to write that as a string literal, then in order for the string to contain a backslash, we must put another backslash in front of it in the Erlang source code, because backslash means something special to the Erlang parser as well. /Erik [1] Or in this case, the expression typed into the Erlang shell. On 20-11-2012 09:17, Arif Ishaq wrote: > Hi, > The backslash character as escape doesn't work as expected. > Erlang R15B (erts-5.9) [smp:4:4] [async-threads:0] > Eshell V5.9 (abort with ^G) > 1> String = "/* this is a C comment */". > "/* this is a C comment */" > 2> re:run(String, "/\*.*?\*/"). > ** exception error: bad argument > in function re:run/2 > called as re:run("/* this is a C comment */","/*.*?*/") > 3> re:run(String, "/\\*.*?\\*/"). > {match,[{0,25}]} > 4> > Best regards > PS. The documentation in "erl5.9/lib/stdlib-1.18/doc/html/re.html" says: > ".. the pattern > /\*.*?\*/ > does the right thing with the C comments." -- Mobile: + 45 26 36 17 55 | Skype: eriksoesorensen | Twitter: @eriksoe Trifork A/S | Margrethepladsen 4 | DK-8000 Aarhus C | www.trifork.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From Antonio.Musumeci@REDACTED Tue Nov 20 16:37:45 2012 From: Antonio.Musumeci@REDACTED (Musumeci, Antonio S) Date: Tue, 20 Nov 2012 15:37:45 +0000 Subject: [erlang-bugs] beam core'ing In-Reply-To: <50AA3A17.9030300@erlang.org> References: <51C6F20DC46369418387C5250127649B039D96@HZWEX2014N4.msad.ms.com> <50AA3A17.9030300@erlang.org> Message-ID: <51C6F20DC46369418387C5250127649B03B16B@HZWEX2014N4.msad.ms.com> I've got lots of cores... but they are all from optimized builds. Has this been seen in other versions? We are keen to solve this because it's causing us pain in production. We hit another, older, memory bug (the 32bit values used in 64bit build)... and now this. I'm going to be building and trying R15B01 to see if we hit it as well. I'll send any additional information I can. Any suggestions on debugging beam would be appreciated. Compile options, etc. Thanks. -antonio ________________________________ From: erlang-bugs-bounces@REDACTED [mailto:erlang-bugs-bounces@REDACTED] On Behalf Of Patrik Nyblom Sent: Monday, November 19, 2012 8:55 AM To: erlang-bugs@REDACTED Subject: Re: [erlang-bugs] beam core'ing On 11/19/2012 02:01 PM, Musumeci, Antonio S wrote: I'm just starting to debug this but figured I'd send it along in case anyone has seen this before. 64bit RHEL 5.0.1 built from source beam.smp R15B02 Happens consistently when trying to start our app and then just stops after a time. Across a few boxes. Oddly we have an identical cluster (hw and sw) and it never happens. Yes! I've seen it before and have tried for several months to get a reproducable example and a core i can analyze here. I've had one core that was somewhat readable but had no luck in locating the beam code that triggered this. If you could try narrowing it down, I would be really grateful! Please email me any findings, theories, cores dumps - anything! I really want to find this! The most interesting would be to find the snippet of erlang code that makes this happen (intermittently probably). The problem is that when the allocators crash, the error is usually somewhere else. Access of freed memory, double free or something else doing horrid things to memory. Obviously none of our testsuites exercise this bug as neither our debug builds, nor our valgrind runs find it. It happens on both SMP and non SMP and is always in the context of the erts_bs_append, so I'm pretty sure this has a connection to the other users seeing the crash in the allocators... Cheers, Patrik #0 bf_unlink_free_block (flags=, block=0x6f00, allctr=) at beam/erl_bestfit_alloc.c:789 #1 bf_get_free_block (allctr=0x6824600, size=304, cand_blk=0x0, cand_size=, flags=0) at beam/erl_bestfit_alloc.c:869 #2 0x000000000045343c in mbc_alloc_block (alcu_flgsp=, blk_szp=, size=, allctr=) at beam/erl_alloc_util.c:1198 #3 mbc_alloc (allctr=0x6824600, size=295) at beam/erl_alloc_util.c:1345 #4 0x000000000045398d in do_erts_alcu_alloc (type=164, extra=0x6824600, size=295) at beam/erl_alloc_util.c:3442 #5 0x0000000000453a0f in erts_alcu_alloc_thr_pref (type=164, extra=, size=287) at beam/erl_alloc_util.c:3520 #6 0x0000000000511463 in erts_alloc (size=287, type=) at beam/erl_alloc.h:208 #7 erts_bin_nrml_alloc (size=) at beam/erl_binary.h:260 #8 erts_bs_append (c_p=0x69fba60, reg=, live=, build_size_term=, extra_words=0, unit=8) at beam/erl_bits.c:1327 #9 0x000000000053ffd8 in process_main () at beam/beam_emu.c:3858 #10 0x00000000004ae853 in sched_thread_func (vesdp=) at beam/erl_process.c:5184 #11 0x00000000005c17e9 in thr_wrapper (vtwd=) at pthread/ethread.c:106 #12 0x00002b430f39e73d in start_thread () from /lib64/libpthread.so.0 #13 0x00002b430f890f6d in clone () from /lib64/libc.so.6 #14 0x0000000000000000 in ?? () _______________________________________________ erlang-bugs mailing list erlang-bugs@REDACTED http://erlang.org/mailman/listinfo/erlang-bugs -------------- next part -------------- An HTML attachment was scrubbed... URL: From freeakk@REDACTED Tue Nov 20 19:28:56 2012 From: freeakk@REDACTED (Michael Uvarov) Date: Tue, 20 Nov 2012 21:28:56 +0300 Subject: [erlang-bugs] Module re, escape needs double backslash In-Reply-To: <50AB5A46.5020806@trifork.com> References: <1CAB695D2C2A8F4BB0B242A5B44C75E901A904@ESESSMB301.ericsson.se> <50AB5A46.5020806@trifork.com> Message-ID: There is a parse transform, that helps with this issue. It stores raw strings as comments in the file's header. https://github.com/mad-cocktail/pisco#example-1 Other examples: https://github.com/mad-cocktail/pisco/blob/master/test/pisco_tests.erl -- Best regards, Uvarov Michael From sidentdv@REDACTED Tue Nov 20 22:40:30 2012 From: sidentdv@REDACTED (Denis Titoruk) Date: Wed, 21 Nov 2012 01:40:30 +0400 Subject: [erlang-bugs] beam core'ing In-Reply-To: <51C6F20DC46369418387C5250127649B03B16B@HZWEX2014N4.msad.ms.com> References: <51C6F20DC46369418387C5250127649B039D96@HZWEX2014N4.msad.ms.com> <50AA3A17.9030300@erlang.org> <51C6F20DC46369418387C5250127649B03B16B@HZWEX2014N4.msad.ms.com> Message-ID: <79133563-669F-4FDD-8982-01DB7B321DA5@aboutecho.com> Hi, We've got the same error on R15B01, R15B02 I've finished my investigation of this issue today & here is result: Let's assume we have the code: encode_formats(Columns) -> encode_formats(Columns, 0, <<>>). encode_formats([], Count, Acc) -> <>; encode_formats([#column{format = Format} | T], Count, Acc) -> encode_formats(T, Count + 1, <>). So, <> translates to {bs_append,{f,0},{integer,16},0,7,8,{x,2},{field_flags,[]},{x,1}}. {bs_put_integer,{f,0},{integer,16},1,{field_flags,[signed,big]},{x,6}}. There is GC execution in bs_append and it can reallocate binary but there isn't reassigning erts_current_bin which used in bs_put_integer. Fix: erl_bits.c: Eterm erts_bs_append(Process* c_p, Eterm* reg, Uint live, Eterm build_size_term, Uint extra_words, Uint unit) ? if (c_p->stop - c_p->htop < heap_need) { (void) erts_garbage_collect(c_p, heap_need, reg, live+1); } sb = (ErlSubBin *) c_p->htop; c_p->htop += ERL_SUB_BIN_SIZE; sb->thing_word = HEADER_SUB_BIN; sb->size = BYTE_OFFSET(used_size_in_bits); sb->bitsize = BIT_OFFSET(used_size_in_bits); sb->offs = 0; sb->bitoffs = 0; sb->is_writable = 1; sb->orig = reg[live]; /////////////////////////////////////////////////////////////////// // add this lines /////////////////////////////////////////////////////////////////// pb = (ProcBin *) boxed_val(sb->orig); erts_current_bin = pb->bytes; erts_writable_bin = 1; /////////////////////////////////////////////////////////////////// return make_binary(sb); ? -- Cheers, Denis 20.11.2012, ? 19:37, Musumeci, Antonio S ???????(?): > > I've got lots of cores... but they are all from optimized builds. > > Has this been seen in other versions? We are keen to solve this because it's causing us pain in production. We hit another, older, memory bug (the 32bit values used in 64bit build)... and now this. > > I'm going to be building and trying R15B01 to see if we hit it as well. I'll send any additional information I can. Any suggestions on debugging beam would be appreciated. Compile options, etc. > > Thanks. > > -antonio > > From: erlang-bugs-bounces@REDACTED [mailto:erlang-bugs-bounces@REDACTED] On Behalf Of Patrik Nyblom > Sent: Monday, November 19, 2012 8:55 AM > To: erlang-bugs@REDACTED > Subject: Re: [erlang-bugs] beam core'ing > > On 11/19/2012 02:01 PM, Musumeci, Antonio S wrote: >> >> I'm just starting to debug this but figured I'd send it along in case anyone has seen this before. >> >> 64bit RHEL 5.0.1 >> >> built from source beam.smp R15B02 >> >> Happens consistently when trying to start our app and then just stops after a time. Across a few boxes. Oddly we have an identical cluster (hw and sw) and it never happens. >> > Yes! I've seen it before and have tried for several months to get a reproducable example and a core i can analyze here. I've had one core that was somewhat readable but had no luck in locating the beam code that triggered this. If you could try narrowing it down, I would be really grateful! > > Please email me any findings, theories, cores dumps - anything! I really want to find this! The most interesting would be to find the snippet of erlang code that makes this happen (intermittently probably). > > The problem is that when the allocators crash, the error is usually somewhere else. Access of freed memory, double free or something else doing horrid things to memory. Obviously none of our testsuites exercise this bug as neither our debug builds, nor our valgrind runs find it. It happens on both SMP and non SMP and is always in the context of the erts_bs_append, so I'm pretty sure this has a connection to the other users seeing the crash in the allocators... > > Cheers, > Patrik >> #0 bf_unlink_free_block (flags=, block=0x6f00, allctr=) at beam/erl_bestfit_alloc.c:789 >> #1 bf_get_free_block (allctr=0x6824600, size=304, cand_blk=0x0, cand_size=, flags=0) at beam/erl_bestfit_alloc.c:869 >> #2 0x000000000045343c in mbc_alloc_block (alcu_flgsp=, blk_szp=, size=, allctr=) at beam/erl_alloc_util.c:1198 >> #3 mbc_alloc (allctr=0x6824600, size=295) at beam/erl_alloc_util.c:1345 >> #4 0x000000000045398d in do_erts_alcu_alloc (type=164, extra=0x6824600, size=295) at beam/erl_alloc_util.c:3442 >> #5 0x0000000000453a0f in erts_alcu_alloc_thr_pref (type=164, extra=, size=287) at beam/erl_alloc_util.c:3520 >> #6 0x0000000000511463 in erts_alloc (size=287, type=) at beam/erl_alloc.h:208 >> #7 erts_bin_nrml_alloc (size=) at beam/erl_binary.h:260 >> #8 erts_bs_append (c_p=0x69fba60, reg=, live=, build_size_term=, extra_words=0, unit=8) at beam/erl_bits.c:1327 >> #9 0x000000000053ffd8 in process_main () at beam/beam_emu.c:3858 >> #10 0x00000000004ae853 in sched_thread_func (vesdp=) at beam/erl_process.c:5184 >> #11 0x00000000005c17e9 in thr_wrapper (vtwd=) at pthread/ethread.c:106 >> #12 0x00002b430f39e73d in start_thread () from /lib64/libpthread.so.0 >> #13 0x00002b430f890f6d in clone () from /lib64/libc.so.6 >> #14 0x0000000000000000 in ?? () >> >> >> >> >> _______________________________________________ >> erlang-bugs mailing list >> erlang-bugs@REDACTED >> http://erlang.org/mailman/listinfo/erlang-bugs > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs -------------- next part -------------- An HTML attachment was scrubbed... URL: From wallentin.dahlberg@REDACTED Wed Nov 21 03:38:37 2012 From: wallentin.dahlberg@REDACTED (=?ISO-8859-1?Q?Bj=F6rn=2DEgil_Dahlberg?=) Date: Wed, 21 Nov 2012 03:38:37 +0100 Subject: [erlang-bugs] beam core'ing In-Reply-To: <79133563-669F-4FDD-8982-01DB7B321DA5@aboutecho.com> References: <51C6F20DC46369418387C5250127649B039D96@HZWEX2014N4.msad.ms.com> <50AA3A17.9030300@erlang.org> <51C6F20DC46369418387C5250127649B03B16B@HZWEX2014N4.msad.ms.com> <79133563-669F-4FDD-8982-01DB7B321DA5@aboutecho.com> Message-ID: I knew it! =) Ever since I first saw that gc in bs_append felt it was trouble. I will get someone, probably me, to look over this fix tomorrow. // Bj?rn-Egil 2012/11/20 Denis Titoruk > Hi, > > We've got the same error on R15B01, R15B02 > I've finished my investigation of this issue today & here is result: > > Let's assume we have the code: > encode_formats(Columns) -> > encode_formats(Columns, 0, <<>>). > > encode_formats([], Count, Acc) -> > <>; > > encode_formats([#column{format = Format} | T], Count, Acc) -> > encode_formats(T, Count + 1, <>). > > So, <> translates to > > {bs_append,{f,0},{integer,16},0,7,8,{x,2},{field_flags,[]},{x,1}}. > {bs_put_integer,{f,0},{integer,16},1,{field_flags,[signed,big]},{x,6}}. > > There is GC execution in bs_append and it can reallocate binary but there > isn't reassigning erts_current_bin which used in bs_put_integer. > > Fix: > > erl_bits.c: > Eterm > erts_bs_append(Process* c_p, Eterm* reg, Uint live, Eterm build_size_term, > Uint extra_words, Uint unit) > ? > if (c_p->stop - c_p->htop < heap_need) { > (void) erts_garbage_collect(c_p, heap_need, reg, live+1); > } > sb = (ErlSubBin *) c_p->htop; > c_p->htop += ERL_SUB_BIN_SIZE; > sb->thing_word = HEADER_SUB_BIN; > sb->size = BYTE_OFFSET(used_size_in_bits); > sb->bitsize = BIT_OFFSET(used_size_in_bits); > sb->offs = 0; > sb->bitoffs = 0; > sb->is_writable = 1; > sb->orig = reg[live]; > > /////////////////////////////////////////////////////////////////// > // add this lines > /////////////////////////////////////////////////////////////////// > pb = (ProcBin *) boxed_val(sb->orig); > erts_current_bin = pb->bytes; > erts_writable_bin = 1; > /////////////////////////////////////////////////////////////////// > > return make_binary(sb); > ? > > > -- > Cheers, > Denis > > 20.11.2012, ? 19:37, Musumeci, Antonio S ???????(?): > > > I've got lots of cores... but they are all from optimized builds. > > Has this been seen in other versions? We are keen to solve this because > it's causing us pain in production. We hit another, older, memory bug (the > 32bit values used in 64bit build)... and now this. > > I'm going to be building and trying R15B01 to see if we hit it as well. > I'll send any additional information I can. Any suggestions on debugging > beam would be appreciated. Compile options, etc. > > Thanks. > > -antonio > ------------------------------ > *From:* erlang-bugs-bounces@REDACTED [mailto: > erlang-bugs-bounces@REDACTED] *On Behalf Of *Patrik Nyblom > *Sent:* Monday, November 19, 2012 8:55 AM > *To:* erlang-bugs@REDACTED > *Subject:* Re: [erlang-bugs] beam core'ing > > On 11/19/2012 02:01 PM, Musumeci, Antonio S wrote: > > > I'm just starting to debug this but figured I'd send it along in case > anyone has seen this before. > > 64bit RHEL 5.0.1 > > built from source beam.smp R15B02 > > Happens consistently when trying to start our app and then just stops > after a time. Across a few boxes. Oddly we have an identical cluster (hw > and sw) and it never happens. > > Yes! I've seen it before and have tried for several months to get a reproducable > example and a core i can analyze here. I've had one core that was somewhat > readable but had no luck in locating the beam code that triggered this. If > you could try narrowing it down, I would be really grateful! > > Please email me any findings, theories, cores dumps - anything! I really > want to find this! The most interesting would be to find the snippet of > erlang code that makes this happen (intermittently probably). > > The problem is that when the allocators crash, the error is usually > somewhere else. Access of freed memory, double free or something else > doing horrid things to memory. Obviously none of our testsuites exercise > this bug as neither our debug builds, nor our valgrind runs find it. It > happens on both SMP and non SMP and is always in the context of the erts > _bs_append, so I'm pretty sure this has a connection to the other users > seeing the crash in the allocators... > > Cheers, > Patrik > > #0 bf_unlink_free_block (flags=, block=0x6f00, > allctr=) at beam/erl_bestfit_alloc.c:789 > #1 bf_get_free_block (allctr=0x6824600, size=304, cand_blk=0x0, > cand_size=, flags=0) at beam/erl_bestfit_alloc.c:869 > #2 0x000000000045343c in mbc_alloc_block (alcu_flgsp=, > blk_szp=, size=, allctr=) at > beam/erl_alloc_util.c:1198 > #3 mbc_alloc (allctr=0x6824600, size=295) at beam/erl_alloc_util.c:1345 > #4 0x000000000045398d in do_erts_alcu_alloc (type=164, extra=0x6824600, > size=295) at beam/erl_alloc_util.c:3442 > #5 0x0000000000453a0f in erts_alcu_alloc_thr_pref (type=164, > extra=, size=287) at beam/erl_alloc_util.c:3520 > #6 0x0000000000511463 in erts_alloc (size=287, type=) at > beam/erl_alloc.h:208 > #7 erts_bin_nrml_alloc (size=) at beam/erl_binary.h:260 > #8 erts_bs_append (c_p=0x69fba60, reg=, live= out>, build_size_term=, extra_words=0, unit=8) at > beam/erl_bits.c:1327 > #9 0x000000000053ffd8 in process_main () at beam/beam_emu.c:3858 > #10 0x00000000004ae853 in sched_thread_func (vesdp=) at > beam/erl_process.c:5184 > #11 0x00000000005c17e9 in thr_wrapper (vtwd=) at > pthread/ethread.c:106 > #12 0x00002b430f39e73d in start_thread () from /lib64/libpthread.so.0 > #13 0x00002b430f890f6d in clone () from /lib64/libc.so.6 > #14 0x0000000000000000 in ?? () > > _______________________________________________ > erlang-bugs mailing listerlang-bugs@REDACTED://erlang.org/mailman/listinfo/erlang-bugs > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From norton@REDACTED Wed Nov 21 06:51:50 2012 From: norton@REDACTED (Joseph Wayne Norton) Date: Wed, 21 Nov 2012 14:51:50 +0900 Subject: [erlang-bugs] dialyzer R15B02: outdated docs for -Wbehaviours option ? Message-ID: <1CB0CFC5-4309-4F82-93E8-5F24F435C23B@lovely.email.ne.jp> Regarding dialyzer R15B02, I noticed the -Wbehaviours option is documented in the HTML and man pages. However, this option is not accepted by dialyzer and not present in the command help usage. It seems the docs haven't been updated. thanks, Joe N. Excerpt from HTML documentation: -Wbehaviours*** Include warnings about behaviour callbacks which drift from the published recommended interfaces. $ dialyzer --version Dialyzer version v2.5.2 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kostis@REDACTED Wed Nov 21 08:15:49 2012 From: kostis@REDACTED (Kostis Sagonas) Date: Wed, 21 Nov 2012 08:15:49 +0100 Subject: [erlang-bugs] dialyzer R15B02: outdated docs for -Wbehaviours option ? In-Reply-To: <1CB0CFC5-4309-4F82-93E8-5F24F435C23B@lovely.email.ne.jp> References: <1CB0CFC5-4309-4F82-93E8-5F24F435C23B@lovely.email.ne.jp> Message-ID: <50AC7FA5.3090007@cs.ntua.gr> On 11/21/2012 06:51 AM, Joseph Wayne Norton wrote: > > Regarding dialyzer R15B02, I noticed the -Wbehaviours option is > documented in the HTML and man pages. However, this option is not > accepted by dialyzer and not present in the command help usage. It seems > the docs haven't been updated. What has happened is that this option became on by default in R15B02 and the user can only disable it, if desired. Do a 'dialyzer --help' and you will see: .... -Wno_behaviours Suppress warnings about behaviour callbacks which drift from the published recommended interfaces. ... I do not know why the HTML documentation has not been updated, but we will try to correct this for R15B03. Thanks for bringing this to our attention. Kostis > Excerpt from HTML documentation: > > *-Wbehaviours**** > Include warnings about behaviour callbacks which drift from the > published recommended interfaces. > > > > $ dialyzer --version > Dialyzer version v2.5.2 > > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs From pan@REDACTED Wed Nov 21 10:44:26 2012 From: pan@REDACTED (Patrik Nyblom) Date: Wed, 21 Nov 2012 10:44:26 +0100 Subject: [erlang-bugs] beam core'ing In-Reply-To: <79133563-669F-4FDD-8982-01DB7B321DA5@aboutecho.com> References: <51C6F20DC46369418387C5250127649B039D96@HZWEX2014N4.msad.ms.com> <50AA3A17.9030300@erlang.org> <51C6F20DC46369418387C5250127649B03B16B@HZWEX2014N4.msad.ms.com> <79133563-669F-4FDD-8982-01DB7B321DA5@aboutecho.com> Message-ID: <50ACA27A.40707@erlang.org> Hi! On 11/20/2012 10:40 PM, Denis Titoruk wrote: > Hi, > > We've got the same error on R15B01, R15B02 > I've finished my investigation of this issue today & here is result: > > Let's assume we have the code: > encode_formats(Columns) -> > encode_formats(Columns, 0, <<>>). > > encode_formats([], Count, Acc) -> > <>; > > encode_formats([#column{format = Format} | T], Count, Acc) -> > encode_formats(T, Count + 1, <>). > > So, <> translates to > > {bs_append,{f,0},{integer,16},0,7,8,{x,2},{field_flags,[]},{x,1}}. > {bs_put_integer,{f,0},{integer,16},1,{field_flags,[signed,big]},{x,6}}. > > There is GC execution in bs_append and it can reallocate binary but > there isn't reassigning erts_current_bin which used in bs_put_integer. > > Fix: > > erl_bits.c: > Eterm > erts_bs_append(Process* c_p, Eterm* reg, Uint live, Eterm build_size_term, > Uint extra_words, Uint unit) > ? > if (c_p->stop - c_p->htop < heap_need) { > (void) erts_garbage_collect(c_p, heap_need, reg, live+1); > } > sb = (ErlSubBin *) c_p->htop; > c_p->htop += ERL_SUB_BIN_SIZE; > sb->thing_word = HEADER_SUB_BIN; > sb->size = BYTE_OFFSET(used_size_in_bits); > sb->bitsize = BIT_OFFSET(used_size_in_bits); > sb->offs = 0; > sb->bitoffs = 0; > sb->is_writable = 1; > sb->orig = reg[live]; > > /////////////////////////////////////////////////////////////////// > // add this lines > /////////////////////////////////////////////////////////////////// > pb = (ProcBin *) boxed_val(sb->orig); > erts_current_bin = pb->bytes; > erts_writable_bin = 1; > /////////////////////////////////////////////////////////////////// > > return make_binary(sb); > ? > Can you reproduce the bug and verify that this fix really works? The thing is that binaries should *only* be reallocated in the gc if there are no active writers, which there obviously is here ( pb->flags |= PB_ACTIVE_WRITER a few lines earlier), so the bug would be in the detection of active writers in the gc if this code change actually removes the crash. > > -- > Cheers, > Denis Cheers, /Patrik > > 20.11.2012, ? 19:37, Musumeci, Antonio S ???????(?): > >> >> I've got lots of cores... but they are all from optimized builds. >> >> Has this been seen in other versions? We are keen to solve this >> because it's causing us pain in production. We hit another, older, >> memory bug (the 32bit values used in 64bit build)... and now this. >> >> I'm going to be building and trying R15B01 to see if we hit it as >> well. I'll send any additional information I can.Any suggestions on >> debugging beam would be appreciated. Compile options, etc. >> >> Thanks. >> >> -antonio >> >> ------------------------------------------------------------------------ >> *From:*erlang-bugs-bounces@REDACTED >> [mailto:erlang-bugs-bounces@REDACTED]*On >> Behalf Of*Patrik Nyblom >> *Sent:*Monday, November 19, 2012 8:55 AM >> *To:*erlang-bugs@REDACTED >> *Subject:*Re: [erlang-bugs] beam core'ing >> >> On 11/19/2012 02:01 PM, Musumeci, Antonio S wrote: >>> >>> I'm just starting to debug this but figured I'd send it along in >>> case anyone has seen this before. >>> >>> 64bit RHEL 5.0.1 >>> >>> built from source beam.smp R15B02 >>> >>> Happens consistently when trying to start our app and then just >>> stops after a time. Across a few boxes. Oddly we have an identical >>> cluster (hw and sw) and it never happens. >>> >> Yes! I've seen it before and have tried for several months to get >> areproducable example and acore i can analyze here. I've had one core >> that wassomewhat readable but had no luck in locating the beam code >> that triggered this. If you could try narrowing it down, I would be >> really grateful! >> >> Please email me any findings, theories, cores dumps- anything! I >> really want to find this! The most interesting would be to find the >> snippet of erlang code that makes this happen (intermittently probably). >> >> The problem isthatwhen the allocators crash, the error is usually >> somewhere else.Access of freed memory, double free or something else >> doing horrid things to memory. Obviously none of our testsuites >> exercise this bug asneither our debug builds, nor our valgrind runs >> find it. It happens on both SMP and non SMP and is always in the >> context of the erts_bs_append, so I'm pretty sure this has a >> connection to the other users seeing the crash in the allocators... >> >> Cheers, >> Patrik >>> >>> #0 bf_unlink_free_block (flags=, block=0x6f00, >>> allctr=) at beam/erl_bestfit_alloc.c:789 >>> #1 bf_get_free_block (allctr=0x6824600, size=304, cand_blk=0x0, >>> cand_size=, flags=0) at beam/erl_bestfit_alloc.c:869 >>> #2 0x000000000045343c in mbc_alloc_block (alcu_flgsp=>> out>, blk_szp=, size=, >>> allctr=) at beam/erl_alloc_util.c:1198 >>> #3 mbc_alloc (allctr=0x6824600, size=295) at beam/erl_alloc_util.c:1345 >>> #4 0x000000000045398d in do_erts_alcu_alloc (type=164, >>> extra=0x6824600, size=295) at beam/erl_alloc_util.c:3442 >>> #5 0x0000000000453a0f in erts_alcu_alloc_thr_pref (type=164, >>> extra=, size=287) at beam/erl_alloc_util.c:3520 >>> #6 0x0000000000511463 in erts_alloc (size=287, type=) >>> at beam/erl_alloc.h:208 >>> #7 erts_bin_nrml_alloc (size=) at beam/erl_binary.h:260 >>> #8 erts_bs_append (c_p=0x69fba60, reg=, >>> live=, build_size_term=, >>> extra_words=0, unit=8)at beam/erl_bits.c:1327 >>> #9 0x000000000053ffd8 in process_main () at beam/beam_emu.c:3858 >>> #10 0x00000000004ae853 in sched_thread_func (vesdp=) >>> at beam/erl_process.c:5184 >>> #11 0x00000000005c17e9 in thr_wrapper (vtwd=) at >>> pthread/ethread.c:106 >>> #12 0x00002b430f39e73d in start_thread () from /lib64/libpthread.so.0 >>> #13 0x00002b430f890f6d in clone () from /lib64/libc.so.6 >>> #14 0x0000000000000000 in ?? () >>> >>> >>> _______________________________________________ >>> erlang-bugs mailing list >>> erlang-bugs@REDACTED >>> http://erlang.org/mailman/listinfo/erlang-bugs >> _______________________________________________ >> erlang-bugs mailing list >> erlang-bugs@REDACTED >> http://erlang.org/mailman/listinfo/erlang-bugs > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sidentdv@REDACTED Wed Nov 21 11:21:51 2012 From: sidentdv@REDACTED (Denis Titoruk) Date: Wed, 21 Nov 2012 14:21:51 +0400 Subject: [erlang-bugs] beam core'ing In-Reply-To: <50ACA27A.40707@erlang.org> References: <51C6F20DC46369418387C5250127649B039D96@HZWEX2014N4.msad.ms.com> <50AA3A17.9030300@erlang.org> <51C6F20DC46369418387C5250127649B03B16B@HZWEX2014N4.msad.ms.com> <79133563-669F-4FDD-8982-01DB7B321DA5@aboutecho.com> <50ACA27A.40707@erlang.org> Message-ID: <8436D993-C4FC-4822-B0E8-7A2D6AB2E0C9@gmail.com> 21.11.2012, ? 13:44, Patrik Nyblom ???????(?): > Hi! > On 11/20/2012 10:40 PM, Denis Titoruk wrote: >> Hi, >> >> We've got the same error on R15B01, R15B02 >> I've finished my investigation of this issue today & here is result: >> >> Let's assume we have the code: >> encode_formats(Columns) -> >> encode_formats(Columns, 0, <<>>). >> >> encode_formats([], Count, Acc) -> >> <>; >> >> encode_formats([#column{format = Format} | T], Count, Acc) -> >> encode_formats(T, Count + 1, <>). >> >> So, <> translates to >> >> {bs_append,{f,0},{integer,16},0,7,8,{x,2},{field_flags,[]},{x,1}}. >> {bs_put_integer,{f,0},{integer,16},1,{field_flags,[signed,big]},{x,6}}. >> >> There is GC execution in bs_append and it can reallocate binary but there isn't reassigning erts_current_bin which used in bs_put_integer. >> >> Fix: >> >> erl_bits.c: >> Eterm >> erts_bs_append(Process* c_p, Eterm* reg, Uint live, Eterm build_size_term, >> Uint extra_words, Uint unit) >> ? >> if (c_p->stop - c_p->htop < heap_need) { >> (void) erts_garbage_collect(c_p, heap_need, reg, live+1); >> } >> sb = (ErlSubBin *) c_p->htop; >> c_p->htop += ERL_SUB_BIN_SIZE; >> sb->thing_word = HEADER_SUB_BIN; >> sb->size = BYTE_OFFSET(used_size_in_bits); >> sb->bitsize = BIT_OFFSET(used_size_in_bits); >> sb->offs = 0; >> sb->bitoffs = 0; >> sb->is_writable = 1; >> sb->orig = reg[live]; >> >> /////////////////////////////////////////////////////////////////// >> // add this lines >> /////////////////////////////////////////////////////////////////// >> pb = (ProcBin *) boxed_val(sb->orig); >> erts_current_bin = pb->bytes; >> erts_writable_bin = 1; >> /////////////////////////////////////////////////////////////////// >> >> return make_binary(sb); >> ? >> > Can you reproduce the bug and verify that this fix really works? The thing is that binaries should *only* be reallocated in the gc if there are no active writers, which there obviously is here ( pb->flags |= PB_ACTIVE_WRITER a few lines earlier), so the bug would be in the detection of active writers in the gc if this code change actually removes the crash. Yes, it works in my case. I haven't simple test case for reproducing this bug (actually I run few processes to send requests to pgsql) pb = (ProcBin *) boxed_val(sb->orig); if (erts_current_bin != (pb->bytes)) { fprintf(stderr, "erts_current_bin != (pb->bytes)\n"); fflush(stderr); } erts_current_bin = pb->bytes; erts_writable_bin = 1; (jskit@REDACTED)1> f(F), F = fun() -> postgresql:equery('echo-customers', write, <<"some query here">>, []) end. #Fun (jskit@REDACTED)2> perftest:comprehensive(1000, F). Sequential 100 cycles in ~1 seconds (100 cycles/s) Sequential 200 cycles in ~2 seconds (106 cycles/s) Sequential 1000 cycles in ~12 seconds (85 cycles/s) Parallel 2 1000 cycles in ~8 seconds (132 cycles/s) Parallel 4 1000 cycles in ~8 seconds (121 cycles/s) Parallel 10 1000 cycles in ~8 seconds (119 cycles/s) Parallel 100 1000 cycles in ~13 seconds (74 cycles/s) [85,132,121,119,74] (jskit@REDACTED)3> perftest:comprehensive(1000, F). Sequential 100 cycles in ~1 seconds (83 cycles/s) Sequential 200 cycles in ~2 seconds (83 cycles/s) Sequential 1000 cycles in ~14 seconds (71 cycles/s) Parallel 2 1000 cycles in ~11 seconds (95 cycles/s) Parallel 4 1000 cycles in ~10 seconds (105 cycles/s) Parallel 10 1000 cycles in ~11 seconds (91 cycles/s) Parallel 100 1000 cycles in ~13 seconds (76 cycles/s) "G_i[L" (jskit@REDACTED)4> perftest:comprehensive(1000, F). Sequential 100 cycles in ~1 seconds (88 cycles/s) Sequential 200 cycles in ~2 seconds (85 cycles/s) Sequential 1000 cycles in ~13 seconds (74 cycles/s) Parallel 2 1000 cycles in ~9 seconds (109 cycles/s) Parallel 4 1000 cycles in ~10 seconds (101 cycles/s) Parallel 10 1000 cycles in ~11 seconds (95 cycles/s) erts_current_bin != (pb->bytes) Parallel 100 1000 cycles in ~13 seconds (77 cycles/s) "Jme_M" > >> >> -- >> Cheers, >> Denis > Cheers, > /Patrik >> >> 20.11.2012, ? 19:37, Musumeci, Antonio S ???????(?): >> >>> >>> I've got lots of cores... but they are all from optimized builds. >>> >>> Has this been seen in other versions? We are keen to solve this because it's causing us pain in production. We hit another, older, memory bug (the 32bit values used in 64bit build)... and now this. >>> >>> I'm going to be building and trying R15B01 to see if we hit it as well. I'll send any additional information I can. Any suggestions on debugging beam would be appreciated. Compile options, etc. >>> >>> Thanks. >>> >>> -antonio >>> From: erlang-bugs-bounces@REDACTED [mailto:erlang-bugs-bounces@REDACTED] On Behalf Of Patrik Nyblom >>> Sent: Monday, November 19, 2012 8:55 AM >>> To: erlang-bugs@REDACTED >>> Subject: Re: [erlang-bugs] beam core'ing >>> >>> On 11/19/2012 02:01 PM, Musumeci, Antonio S wrote: >>>> >>>> I'm just starting to debug this but figured I'd send it along in case anyone has seen this before. >>>> >>>> 64bit RHEL 5.0.1 >>>> >>>> built from source beam.smp R15B02 >>>> >>>> Happens consistently when trying to start our app and then just stops after a time. Across a few boxes. Oddly we have an identical cluster (hw and sw) and it never happens. >>>> >>> Yes! I've seen it before and have tried for several months to get a reproducable example and a core i can analyze here. I've had one core that was somewhat readable but had no luck in locating the beam code that triggered this. If you could try narrowing it down, I would be really grateful! >>> >>> Please email me any findings, theories, cores dumps - anything! I really want to find this! The most interesting would be to find the snippet of erlang code that makes this happen (intermittently probably). >>> >>> The problem is that when the allocators crash, the error is usually somewhere else. Access of freed memory, double free or something else doing horrid things to memory. Obviously none of our testsuites exercise this bug as neither our debug builds, nor our valgrind runs find it. It happens on both SMP and non SMP and is always in the context of the erts_bs_append, so I'm pretty sure this has a connection to the other users seeing the crash in the allocators... >>> >>> Cheers, >>> Patrik >>>> #0 bf_unlink_free_block (flags=, block=0x6f00, allctr=) at beam/erl_bestfit_alloc.c:789 >>>> #1 bf_get_free_block (allctr=0x6824600, size=304, cand_blk=0x0, cand_size=, flags=0) at beam/erl_bestfit_alloc.c:869 >>>> #2 0x000000000045343c in mbc_alloc_block (alcu_flgsp=, blk_szp=, size=, allctr=) at beam/erl_alloc_util.c:1198 >>>> #3 mbc_alloc (allctr=0x6824600, size=295) at beam/erl_alloc_util.c:1345 >>>> #4 0x000000000045398d in do_erts_alcu_alloc (type=164, extra=0x6824600, size=295) at beam/erl_alloc_util.c:3442 >>>> #5 0x0000000000453a0f in erts_alcu_alloc_thr_pref (type=164, extra=, size=287) at beam/erl_alloc_util.c:3520 >>>> #6 0x0000000000511463 in erts_alloc (size=287, type=) at beam/erl_alloc.h:208 >>>> #7 erts_bin_nrml_alloc (size=) at beam/erl_binary.h:260 >>>> #8 erts_bs_append (c_p=0x69fba60, reg=, live=, build_size_term=, extra_words=0, unit=8) at beam/erl_bits.c:1327 >>>> #9 0x000000000053ffd8 in process_main () at beam/beam_emu.c:3858 >>>> #10 0x00000000004ae853 in sched_thread_func (vesdp=) at beam/erl_process.c:5184 >>>> #11 0x00000000005c17e9 in thr_wrapper (vtwd=) at pthread/ethread.c:106 >>>> #12 0x00002b430f39e73d in start_thread () from /lib64/libpthread.so.0 >>>> #13 0x00002b430f890f6d in clone () from /lib64/libc.so.6 >>>> #14 0x0000000000000000 in ?? () >>>> >>>> _______________________________________________ >>>> erlang-bugs mailing list >>>> erlang-bugs@REDACTED >>>> http://erlang.org/mailman/listinfo/erlang-bugs >>> >>> _______________________________________________ >>> erlang-bugs mailing list >>> erlang-bugs@REDACTED >>> http://erlang.org/mailman/listinfo/erlang-bugs >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pan@REDACTED Wed Nov 21 11:46:46 2012 From: pan@REDACTED (Patrik Nyblom) Date: Wed, 21 Nov 2012 11:46:46 +0100 Subject: [erlang-bugs] beam core'ing In-Reply-To: <8436D993-C4FC-4822-B0E8-7A2D6AB2E0C9@gmail.com> References: <51C6F20DC46369418387C5250127649B039D96@HZWEX2014N4.msad.ms.com> <50AA3A17.9030300@erlang.org> <51C6F20DC46369418387C5250127649B03B16B@HZWEX2014N4.msad.ms.com> <79133563-669F-4FDD-8982-01DB7B321DA5@aboutecho.com> <50ACA27A.40707@erlang.org> <8436D993-C4FC-4822-B0E8-7A2D6AB2E0C9@gmail.com> Message-ID: <50ACB116.5000402@erlang.org> Hi! On 11/21/2012 11:21 AM, Denis Titoruk wrote: > > 21.11.2012, ? 13:44, Patrik Nyblom ???????(?): > >> Hi! >> On 11/20/2012 10:40 PM, Denis Titoruk wrote: >>> Hi, >>> >>> We've got the same error on R15B01, R15B02 >>> I've finished my investigation of this issue today & here is result: >>> >>> Let's assume we have the code: >>> encode_formats(Columns) -> >>> encode_formats(Columns, 0, <<>>). >>> >>> encode_formats([], Count, Acc) -> >>> <>; >>> >>> encode_formats([#column{format = Format} | T], Count, Acc) -> >>> encode_formats(T, Count + 1, <>). >>> >>> So, <> translates to >>> >>> {bs_append,{f,0},{integer,16},0,7,8,{x,2},{field_flags,[]},{x,1}}. >>> {bs_put_integer,{f,0},{integer,16},1,{field_flags,[signed,big]},{x,6}}. >>> >>> There is GC execution in bs_append and it can reallocate binary but >>> there isn't reassigning erts_current_bin which used in bs_put_integer. >>> >>> Fix: >>> >>> erl_bits.c: >>> Eterm >>> erts_bs_append(Process* c_p, Eterm* reg, Uint live, Eterm >>> build_size_term, >>> Uint extra_words, Uint unit) >>> ? >>> if (c_p->stop - c_p->htop < heap_need) { >>> (void) erts_garbage_collect(c_p, heap_need, reg, live+1); >>> } >>> sb = (ErlSubBin *) c_p->htop; >>> c_p->htop += ERL_SUB_BIN_SIZE; >>> sb->thing_word = HEADER_SUB_BIN; >>> sb->size = BYTE_OFFSET(used_size_in_bits); >>> sb->bitsize = BIT_OFFSET(used_size_in_bits); >>> sb->offs = 0; >>> sb->bitoffs = 0; >>> sb->is_writable = 1; >>> sb->orig = reg[live]; >>> >>> /////////////////////////////////////////////////////////////////// >>> // add this lines >>> /////////////////////////////////////////////////////////////////// >>> pb = (ProcBin *) boxed_val(sb->orig); >>> erts_current_bin = pb->bytes; >>> erts_writable_bin = 1; >>> /////////////////////////////////////////////////////////////////// >>> >>> return make_binary(sb); >>> ? >>> >> Can you reproduce the bug and verify that this fix really works? The >> thing is that binaries should *only* be reallocated in the gc if >> there are no active writers, which there obviously is here ( >> pb->flags |= PB_ACTIVE_WRITER a few lines earlier), so the bug would >> be in the detection of active writers in the gc if this code change >> actually removes the crash. > > Yes, it works in my case. I haven't simple test case for reproducing > this bug (actually I run few processes to send requests to pgsql) > > pb = (ProcBin *) boxed_val(sb->orig); > if (erts_current_bin != (pb->bytes)) { > fprintf(stderr, "erts_current_bin != (pb->bytes)\n"); > fflush(stderr); > } > erts_current_bin = pb->bytes; > erts_writable_bin = 1; > > > (jskit@REDACTED)1> f(F), F = fun() -> postgresql:equery('echo-customers', > write, <<"some query here">>, []) end. > #Fun > (jskit@REDACTED)2> perftest:comprehensive(1000, F). > Sequential 100 cycles in ~1 seconds (100 cycles/s) > Sequential 200 cycles in ~2 seconds (106 cycles/s) > Sequential 1000 cycles in ~12 seconds (85 cycles/s) > Parallel 2 1000 cycles in ~8 seconds (132 cycles/s) > Parallel 4 1000 cycles in ~8 seconds (121 cycles/s) > Parallel 10 1000 cycles in ~8 seconds (119 cycles/s) > Parallel 100 1000 cycles in ~13 seconds (74 cycles/s) > [85,132,121,119,74] > (jskit@REDACTED)3> perftest:comprehensive(1000, F). > Sequential 100 cycles in ~1 seconds (83 cycles/s) > Sequential 200 cycles in ~2 seconds (83 cycles/s) > Sequential 1000 cycles in ~14 seconds (71 cycles/s) > Parallel 2 1000 cycles in ~11 seconds (95 cycles/s) > Parallel 4 1000 cycles in ~10 seconds (105 cycles/s) > Parallel 10 1000 cycles in ~11 seconds (91 cycles/s) > Parallel 100 1000 cycles in ~13 seconds (76 cycles/s) > "G_i[L" > (jskit@REDACTED)4> perftest:comprehensive(1000, F). > Sequential 100 cycles in ~1 seconds (88 cycles/s) > Sequential 200 cycles in ~2 seconds (85 cycles/s) > Sequential 1000 cycles in ~13 seconds (74 cycles/s) > Parallel 2 1000 cycles in ~9 seconds (109 cycles/s) > Parallel 4 1000 cycles in ~10 seconds (101 cycles/s) > Parallel 10 1000 cycles in ~11 seconds (95 cycles/s) > erts_current_bin != (pb->bytes) > Parallel 100 1000 cycles in ~13 seconds (77 cycles/s) > "Jme_M" Yes! Looks like you've found something icky there! Is it possible to send the code (the erlang code) you use to reproduce it to me (maybe along with your c-source-diff)? > >> >>> >>> -- >>> Cheers, >>> Denis >> Cheers, >> /Patrik >>> >>> 20.11.2012, ? 19:37, Musumeci, Antonio S ???????(?): >>> >>>> >>>> I've got lots of cores... but they are all from optimized builds. >>>> >>>> Has this been seen in other versions? We are keen to solve this >>>> because it's causing us pain in production. We hit another, older, >>>> memory bug (the 32bit values used in 64bit build)... and now this. >>>> >>>> I'm going to be building and trying R15B01 to see if we hit it as >>>> well. I'll send any additional information I can.Any suggestions on >>>> debugging beam would be appreciated. Compile options, etc. >>>> >>>> Thanks. >>>> >>>> -antonio >>>> >>>> ------------------------------------------------------------------------ >>>> *From:*erlang-bugs-bounces@REDACTED >>>> [mailto:erlang-bugs-bounces@REDACTED]*On >>>> Behalf Of*Patrik Nyblom >>>> *Sent:*Monday, November 19, 2012 8:55 AM >>>> *To:*erlang-bugs@REDACTED >>>> *Subject:*Re: [erlang-bugs] beam core'ing >>>> >>>> On 11/19/2012 02:01 PM, Musumeci, Antonio S wrote: >>>>> >>>>> I'm just starting to debug this but figured I'd send it along in >>>>> case anyone has seen this before. >>>>> >>>>> 64bit RHEL 5.0.1 >>>>> >>>>> built from source beam.smp R15B02 >>>>> >>>>> Happens consistently when trying to start our app and then just >>>>> stops after a time. Across a few boxes. Oddly we have an identical >>>>> cluster (hw and sw) and it never happens. >>>>> >>>> Yes! I've seen it before and have tried for several months to get >>>> areproducable example and acore i can analyze here. I've had one >>>> core that wassomewhat readable but had no luck in locating the beam >>>> code that triggered this. If you could try narrowing it down, I >>>> would be really grateful! >>>> >>>> Please email me any findings, theories, cores dumps- anything! I >>>> really want to find this! The most interesting would be to find the >>>> snippet of erlang code that makes this happen (intermittently >>>> probably). >>>> >>>> The problem isthatwhen the allocators crash, the error is usually >>>> somewhere else.Access of freed memory, double free or something >>>> else doing horrid things to memory. Obviously none of our >>>> testsuites exercise this bug asneither our debug builds, nor our >>>> valgrind runs find it. It happens on both SMP and non SMP and is >>>> always in the context of the erts_bs_append, so I'm pretty sure >>>> this has a connection to the other users seeing the crash in the >>>> allocators... >>>> >>>> Cheers, >>>> Patrik >>>>> >>>>> #0 bf_unlink_free_block (flags=, block=0x6f00, >>>>> allctr=) at beam/erl_bestfit_alloc.c:789 >>>>> #1 bf_get_free_block (allctr=0x6824600, size=304, cand_blk=0x0, >>>>> cand_size=, flags=0) at beam/erl_bestfit_alloc.c:869 >>>>> #2 0x000000000045343c in mbc_alloc_block (alcu_flgsp=>>>> out>, blk_szp=, size=, >>>>> allctr=) at beam/erl_alloc_util.c:1198 >>>>> #3 mbc_alloc (allctr=0x6824600, size=295) at >>>>> beam/erl_alloc_util.c:1345 >>>>> #4 0x000000000045398d in do_erts_alcu_alloc (type=164, >>>>> extra=0x6824600, size=295) at beam/erl_alloc_util.c:3442 >>>>> #5 0x0000000000453a0f in erts_alcu_alloc_thr_pref (type=164, >>>>> extra=, size=287) at beam/erl_alloc_util.c:3520 >>>>> #6 0x0000000000511463 in erts_alloc (size=287, type=>>>> out>) at beam/erl_alloc.h:208 >>>>> #7 erts_bin_nrml_alloc (size=) at beam/erl_binary.h:260 >>>>> #8 erts_bs_append (c_p=0x69fba60, reg=, >>>>> live=, build_size_term=, >>>>> extra_words=0, unit=8)at beam/erl_bits.c:1327 >>>>> #9 0x000000000053ffd8 in process_main () at beam/beam_emu.c:3858 >>>>> #10 0x00000000004ae853 in sched_thread_func (vesdp=>>>> out>) at beam/erl_process.c:5184 >>>>> #11 0x00000000005c17e9 in thr_wrapper (vtwd=) at >>>>> pthread/ethread.c:106 >>>>> #12 0x00002b430f39e73d in start_thread () from /lib64/libpthread.so.0 >>>>> #13 0x00002b430f890f6d in clone () from /lib64/libc.so.6 >>>>> #14 0x0000000000000000 in ?? () >>>>> >>>>> >>>>> _______________________________________________ >>>>> erlang-bugs mailing list >>>>> erlang-bugs@REDACTED >>>>> http://erlang.org/mailman/listinfo/erlang-bugs >>>> _______________________________________________ >>>> erlang-bugs mailing list >>>> erlang-bugs@REDACTED >>>> http://erlang.org/mailman/listinfo/erlang-bugs >>> >> > Cheers, /Patrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From pan@REDACTED Wed Nov 21 12:09:28 2012 From: pan@REDACTED (Patrik Nyblom) Date: Wed, 21 Nov 2012 12:09:28 +0100 Subject: [erlang-bugs] beam core'ing In-Reply-To: <8436D993-C4FC-4822-B0E8-7A2D6AB2E0C9@gmail.com> References: <51C6F20DC46369418387C5250127649B039D96@HZWEX2014N4.msad.ms.com> <50AA3A17.9030300@erlang.org> <51C6F20DC46369418387C5250127649B03B16B@HZWEX2014N4.msad.ms.com> <79133563-669F-4FDD-8982-01DB7B321DA5@aboutecho.com> <50ACA27A.40707@erlang.org> <8436D993-C4FC-4822-B0E8-7A2D6AB2E0C9@gmail.com> Message-ID: <50ACB668.6040807@erlang.org> Hi again :) Another thing that would be helpful is if you could create a crash dump instead of a fprintf when the binary is wrongly moved, i.e. call erl_exit(ERTS_DUMP_EXIT, "erts_current_bin != (pb->bytes)"); instead of the fprintf? Then you could isolate the erlang code snippet that exercises the bug and I maybe could create a smaller testcase... A simple testcase when diving into the GC would be really helpful :) Cheers, /Patrik On 11/21/2012 11:21 AM, Denis Titoruk wrote: > > 21.11.2012, ? 13:44, Patrik Nyblom ???????(?): > >> Hi! >> On 11/20/2012 10:40 PM, Denis Titoruk wrote: >>> Hi, >>> >>> We've got the same error on R15B01, R15B02 >>> I've finished my investigation of this issue today & here is result: >>> >>> Let's assume we have the code: >>> encode_formats(Columns) -> >>> encode_formats(Columns, 0, <<>>). >>> >>> encode_formats([], Count, Acc) -> >>> <>; >>> >>> encode_formats([#column{format = Format} | T], Count, Acc) -> >>> encode_formats(T, Count + 1, <>). >>> >>> So, <> translates to >>> >>> {bs_append,{f,0},{integer,16},0,7,8,{x,2},{field_flags,[]},{x,1}}. >>> {bs_put_integer,{f,0},{integer,16},1,{field_flags,[signed,big]},{x,6}}. >>> >>> There is GC execution in bs_append and it can reallocate binary but >>> there isn't reassigning erts_current_bin which used in bs_put_integer. >>> >>> Fix: >>> >>> erl_bits.c: >>> Eterm >>> erts_bs_append(Process* c_p, Eterm* reg, Uint live, Eterm >>> build_size_term, >>> Uint extra_words, Uint unit) >>> ? >>> if (c_p->stop - c_p->htop < heap_need) { >>> (void) erts_garbage_collect(c_p, heap_need, reg, live+1); >>> } >>> sb = (ErlSubBin *) c_p->htop; >>> c_p->htop += ERL_SUB_BIN_SIZE; >>> sb->thing_word = HEADER_SUB_BIN; >>> sb->size = BYTE_OFFSET(used_size_in_bits); >>> sb->bitsize = BIT_OFFSET(used_size_in_bits); >>> sb->offs = 0; >>> sb->bitoffs = 0; >>> sb->is_writable = 1; >>> sb->orig = reg[live]; >>> >>> /////////////////////////////////////////////////////////////////// >>> // add this lines >>> /////////////////////////////////////////////////////////////////// >>> pb = (ProcBin *) boxed_val(sb->orig); >>> erts_current_bin = pb->bytes; >>> erts_writable_bin = 1; >>> /////////////////////////////////////////////////////////////////// >>> >>> return make_binary(sb); >>> ? >>> >> Can you reproduce the bug and verify that this fix really works? The >> thing is that binaries should *only* be reallocated in the gc if >> there are no active writers, which there obviously is here ( >> pb->flags |= PB_ACTIVE_WRITER a few lines earlier), so the bug would >> be in the detection of active writers in the gc if this code change >> actually removes the crash. > > Yes, it works in my case. I haven't simple test case for reproducing > this bug (actually I run few processes to send requests to pgsql) > > pb = (ProcBin *) boxed_val(sb->orig); > if (erts_current_bin != (pb->bytes)) { > fprintf(stderr, "erts_current_bin != (pb->bytes)\n"); > fflush(stderr); > } > erts_current_bin = pb->bytes; > erts_writable_bin = 1; > > > (jskit@REDACTED)1> f(F), F = fun() -> postgresql:equery('echo-customers', > write, <<"some query here">>, []) end. > #Fun > (jskit@REDACTED)2> perftest:comprehensive(1000, F). > Sequential 100 cycles in ~1 seconds (100 cycles/s) > Sequential 200 cycles in ~2 seconds (106 cycles/s) > Sequential 1000 cycles in ~12 seconds (85 cycles/s) > Parallel 2 1000 cycles in ~8 seconds (132 cycles/s) > Parallel 4 1000 cycles in ~8 seconds (121 cycles/s) > Parallel 10 1000 cycles in ~8 seconds (119 cycles/s) > Parallel 100 1000 cycles in ~13 seconds (74 cycles/s) > [85,132,121,119,74] > (jskit@REDACTED)3> perftest:comprehensive(1000, F). > Sequential 100 cycles in ~1 seconds (83 cycles/s) > Sequential 200 cycles in ~2 seconds (83 cycles/s) > Sequential 1000 cycles in ~14 seconds (71 cycles/s) > Parallel 2 1000 cycles in ~11 seconds (95 cycles/s) > Parallel 4 1000 cycles in ~10 seconds (105 cycles/s) > Parallel 10 1000 cycles in ~11 seconds (91 cycles/s) > Parallel 100 1000 cycles in ~13 seconds (76 cycles/s) > "G_i[L" > (jskit@REDACTED)4> perftest:comprehensive(1000, F). > Sequential 100 cycles in ~1 seconds (88 cycles/s) > Sequential 200 cycles in ~2 seconds (85 cycles/s) > Sequential 1000 cycles in ~13 seconds (74 cycles/s) > Parallel 2 1000 cycles in ~9 seconds (109 cycles/s) > Parallel 4 1000 cycles in ~10 seconds (101 cycles/s) > Parallel 10 1000 cycles in ~11 seconds (95 cycles/s) > erts_current_bin != (pb->bytes) > Parallel 100 1000 cycles in ~13 seconds (77 cycles/s) > "Jme_M" > >> >>> >>> -- >>> Cheers, >>> Denis >> Cheers, >> /Patrik >>> >>> 20.11.2012, ? 19:37, Musumeci, Antonio S ???????(?): >>> >>>> >>>> I've got lots of cores... but they are all from optimized builds. >>>> >>>> Has this been seen in other versions? We are keen to solve this >>>> because it's causing us pain in production. We hit another, older, >>>> memory bug (the 32bit values used in 64bit build)... and now this. >>>> >>>> I'm going to be building and trying R15B01 to see if we hit it as >>>> well. I'll send any additional information I can.Any suggestions on >>>> debugging beam would be appreciated. Compile options, etc. >>>> >>>> Thanks. >>>> >>>> -antonio >>>> >>>> ------------------------------------------------------------------------ >>>> *From:*erlang-bugs-bounces@REDACTED >>>> [mailto:erlang-bugs-bounces@REDACTED]*On >>>> Behalf Of*Patrik Nyblom >>>> *Sent:*Monday, November 19, 2012 8:55 AM >>>> *To:*erlang-bugs@REDACTED >>>> *Subject:*Re: [erlang-bugs] beam core'ing >>>> >>>> On 11/19/2012 02:01 PM, Musumeci, Antonio S wrote: >>>>> >>>>> I'm just starting to debug this but figured I'd send it along in >>>>> case anyone has seen this before. >>>>> >>>>> 64bit RHEL 5.0.1 >>>>> >>>>> built from source beam.smp R15B02 >>>>> >>>>> Happens consistently when trying to start our app and then just >>>>> stops after a time. Across a few boxes. Oddly we have an identical >>>>> cluster (hw and sw) and it never happens. >>>>> >>>> Yes! I've seen it before and have tried for several months to get >>>> areproducable example and acore i can analyze here. I've had one >>>> core that wassomewhat readable but had no luck in locating the beam >>>> code that triggered this. If you could try narrowing it down, I >>>> would be really grateful! >>>> >>>> Please email me any findings, theories, cores dumps- anything! I >>>> really want to find this! The most interesting would be to find the >>>> snippet of erlang code that makes this happen (intermittently >>>> probably). >>>> >>>> The problem isthatwhen the allocators crash, the error is usually >>>> somewhere else.Access of freed memory, double free or something >>>> else doing horrid things to memory. Obviously none of our >>>> testsuites exercise this bug asneither our debug builds, nor our >>>> valgrind runs find it. It happens on both SMP and non SMP and is >>>> always in the context of the erts_bs_append, so I'm pretty sure >>>> this has a connection to the other users seeing the crash in the >>>> allocators... >>>> >>>> Cheers, >>>> Patrik >>>>> >>>>> #0 bf_unlink_free_block (flags=, block=0x6f00, >>>>> allctr=) at beam/erl_bestfit_alloc.c:789 >>>>> #1 bf_get_free_block (allctr=0x6824600, size=304, cand_blk=0x0, >>>>> cand_size=, flags=0) at beam/erl_bestfit_alloc.c:869 >>>>> #2 0x000000000045343c in mbc_alloc_block (alcu_flgsp=>>>> out>, blk_szp=, size=, >>>>> allctr=) at beam/erl_alloc_util.c:1198 >>>>> #3 mbc_alloc (allctr=0x6824600, size=295) at >>>>> beam/erl_alloc_util.c:1345 >>>>> #4 0x000000000045398d in do_erts_alcu_alloc (type=164, >>>>> extra=0x6824600, size=295) at beam/erl_alloc_util.c:3442 >>>>> #5 0x0000000000453a0f in erts_alcu_alloc_thr_pref (type=164, >>>>> extra=, size=287) at beam/erl_alloc_util.c:3520 >>>>> #6 0x0000000000511463 in erts_alloc (size=287, type=>>>> out>) at beam/erl_alloc.h:208 >>>>> #7 erts_bin_nrml_alloc (size=) at beam/erl_binary.h:260 >>>>> #8 erts_bs_append (c_p=0x69fba60, reg=, >>>>> live=, build_size_term=, >>>>> extra_words=0, unit=8)at beam/erl_bits.c:1327 >>>>> #9 0x000000000053ffd8 in process_main () at beam/beam_emu.c:3858 >>>>> #10 0x00000000004ae853 in sched_thread_func (vesdp=>>>> out>) at beam/erl_process.c:5184 >>>>> #11 0x00000000005c17e9 in thr_wrapper (vtwd=) at >>>>> pthread/ethread.c:106 >>>>> #12 0x00002b430f39e73d in start_thread () from /lib64/libpthread.so.0 >>>>> #13 0x00002b430f890f6d in clone () from /lib64/libc.so.6 >>>>> #14 0x0000000000000000 in ?? () >>>>> >>>>> _______________________________________________ >>>>> erlang-bugs mailing list >>>>> erlang-bugs@REDACTED >>>>> http://erlang.org/mailman/listinfo/erlang-bugs >>>> >>>> >>>> _______________________________________________ >>>> erlang-bugs mailing list >>>> erlang-bugs@REDACTED >>>> http://erlang.org/mailman/listinfo/erlang-bugs >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ingela.Anderton.Andin@REDACTED Wed Nov 21 14:42:24 2012 From: Ingela.Anderton.Andin@REDACTED (Ingela Anderton Andin) Date: Wed, 21 Nov 2012 14:42:24 +0100 Subject: [erlang-bugs] ssl socket session upgrade fails In-Reply-To: <13EC05B8-CE05-448C-89EF-E268C21EF77D@gmail.com> References: <5F5F057B-68C0-4F7E-8F4E-901EA1838B40@pagodabox.com> <13EC05B8-CE05-448C-89EF-E268C21EF77D@gmail.com> Message-ID: <50ACDA40.7070404@ericsson.com> Hi! Delorum wrote: > So i think that reusing sessions might be broke if the client and the server do not have the same version of openssl installed on their machine. > > here is a bit of code that can trigger the error: > > ssl:start(), > {ok,Listen} = ssl:listen(443,[{reuseaddr,true},{certfile,"/mnt/ssl/mysite.com.crt"},{keyfile,"mysite.com.key"}]), > {ok,NewSocket} = ssl:transport_accept(Listen), > ssl:ssl_accept(NewSocket), > {ok,NewSock2} = ssl:transport_accept(Listen), > ssl:ssl_accept(NewSock2). > > and here is what can be run in another shell to case the error: > > openssl s_client -ssl3 -connect 192.168.0.10:443 -reconnect > > the interesting thing that I have noticed is that when running the openssl s_client command from the same machine that the erlang server is runing DOES NOT cause the issue. But when running the same command from any other machine, and I tested it with 12 machines here in the office it fails. > > to be more specific, if the version of openssl on the CLIENT machine is 0.9.8r, and the server version is in the 1.0.1 series. I do not think that the openssl version on the server host is relevant. Erlang SSL application uses openssl for crypto operations only. Can you connect with the s_client to s_servers that you start on the server host? [...] > clients should not have to upgrade their version of openssl in order to visit websites hosted by an erlang application. Agreed! > and here is the crash, i removed all the binary data and the private key data because this is not a test cert: > > =ERROR REPORT==== 16-Nov-2012::16:54:57 === > ** State machine <0.49.0> terminating > ** Last message in was {tcp,#Port<0.1263>, > << removed >>} This looks really strange << removed >> is not a valid TLS message! [...] Regards Ingela Erlang/OTP team - Ericsson AB From ingela.anderton.andin@REDACTED Wed Nov 21 15:50:09 2012 From: ingela.anderton.andin@REDACTED (Ingela Anderton Andin) Date: Wed, 21 Nov 2012 15:50:09 +0100 Subject: [erlang-bugs] ssl socket session upgrade fails In-Reply-To: <13EC05B8-CE05-448C-89EF-E268C21EF77D@gmail.com> References: <5F5F057B-68C0-4F7E-8F4E-901EA1838B40@pagodabox.com> <13EC05B8-CE05-448C-89EF-E268C21EF77D@gmail.com> Message-ID: <50ACEA21.6070005@erix.ericsson.se> Hi again! Disregard my comment about <> of course that was because you removed sensitive information, I was temporally confused. Delorum wrote: [...] The function clause below suggestion you have mismatching versions of an internal record that may be caused by an error in our makefiles if you have not cleaned your git repository between builds. We will fix the make file and a solution for you ought to be to do git clean -xfd and make again. > ** Reason for termination = > ** {function_clause, > [{ssl_session,server_id, > [443, > <<135,245,186,148,131,78,105,38,70,210,147,42,207,139,174,106,166, > 97,85,161,20,70,127,51,6,193,41,5,157,250,239,90>>, > {ssl_options,[],verify_none, > {#Fun,[]}, > false,false,undefined,1, > <<"/mnt/ssl/mysite.com.crt">>,undefined, > <<"/mnt/ssl/mysite.com.key">>,undefined, > undefined,undefined,<<>>,undefined,undefined, > [<<0,57>>, > <<0,56>>, > <<0,53>>, > <<0,22>>, > <<0,19>>, > <<0,10>>, > <<0,51>>, > <<0,50>>, > <<0,47>>, > <<0,5>>, > <<0,4>>, > <<0,21>>, > <<0,9>>], > #Fun,true,268435456,false,[],undefined,false, > undefined,undefined}, > << removed >>, > 28691,ssl_session_cache], > [{file,"ssl_session.erl"},{line,73}]}, > {ssl_handshake,select_session,8, > [{file,"ssl_handshake.erl"},{line,629}]}, > {ssl_handshake,hello,4,[{file,"ssl_handshake.erl"},{line,178}]}, > {ssl_connection,hello,2,[{file,"ssl_connection.erl"},{line,414}]}, > {ssl_connection,next_state,4, > [{file,"ssl_connection.erl"},{line,2002}]}, > {gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,494}]}, > {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]} > _______________________________________________ > > Regards Ingela Erlang/OTP team - Ericsson AB From Antonio.Musumeci@REDACTED Wed Nov 21 17:35:00 2012 From: Antonio.Musumeci@REDACTED (Musumeci, Antonio S) Date: Wed, 21 Nov 2012 16:35:00 +0000 Subject: [erlang-bugs] beam core'ing In-Reply-To: <50ACB668.6040807@erlang.org> References: <51C6F20DC46369418387C5250127649B039D96@HZWEX2014N4.msad.ms.com> <50AA3A17.9030300@erlang.org> <51C6F20DC46369418387C5250127649B03B16B@HZWEX2014N4.msad.ms.com> <79133563-669F-4FDD-8982-01DB7B321DA5@aboutecho.com> <50ACA27A.40707@erlang.org> <8436D993-C4FC-4822-B0E8-7A2D6AB2E0C9@gmail.com> <50ACB668.6040807@erlang.org> Message-ID: <51C6F20DC46369418387C5250127649B03B8FF@HZWEX2014N4.msad.ms.com> Something my team just noticed was that our segv occurs right after reboot of the box consistantly. After which beam appears to work alright. We are trying to narrow down what code is triggering it but it may take some time. ________________________________ From: Patrik Nyblom [mailto:pan@REDACTED] Sent: Wednesday, November 21, 2012 6:09 AM To: Denis Titoruk Cc: Musumeci, Antonio S (Enterprise Infrastructure); erlang-bugs@REDACTED Subject: Re: [erlang-bugs] beam core'ing Hi again :) Another thing that would be helpful is if you could create a crash dump instead of a fprintf when the binary is wrongly moved, i.e. call erl_exit(ERTS_DUMP_EXIT, "erts_current_bin != (pb->bytes)"); instead of the fprintf? Then you could isolate the erlang code snippet that exercises the bug and I maybe could create a smaller testcase... A simple testcase when diving into the GC would be really helpful :) Cheers, /Patrik On 11/21/2012 11:21 AM, Denis Titoruk wrote: 21.11.2012, ? 13:44, Patrik Nyblom ???????(?): Hi! On 11/20/2012 10:40 PM, Denis Titoruk wrote: Hi, We've got the same error on R15B01, R15B02 I've finished my investigation of this issue today & here is result: Let's assume we have the code: encode_formats(Columns) -> encode_formats(Columns, 0, <<>>). encode_formats([], Count, Acc) -> <>; encode_formats([#column{format = Format} | T], Count, Acc) -> encode_formats(T, Count + 1, <>). So, <> translates to {bs_append,{f,0},{integer,16},0,7,8,{x,2},{field_flags,[]},{x,1}}. {bs_put_integer,{f,0},{integer,16},1,{field_flags,[signed,big]},{x,6}}. There is GC execution in bs_append and it can reallocate binary but there isn't reassigning erts_current_bin which used in bs_put_integer. Fix: erl_bits.c: Eterm erts_bs_append(Process* c_p, Eterm* reg, Uint live, Eterm build_size_term, Uint extra_words, Uint unit) ... if (c_p->stop - c_p->htop < heap_need) { (void) erts_garbage_collect(c_p, heap_need, reg, live+1); } sb = (ErlSubBin *) c_p->htop; c_p->htop += ERL_SUB_BIN_SIZE; sb->thing_word = HEADER_SUB_BIN; sb->size = BYTE_OFFSET(used_size_in_bits); sb->bitsize = BIT_OFFSET(used_size_in_bits); sb->offs = 0; sb->bitoffs = 0; sb->is_writable = 1; sb->orig = reg[live]; /////////////////////////////////////////////////////////////////// // add this lines /////////////////////////////////////////////////////////////////// pb = (ProcBin *) boxed_val(sb->orig); erts_current_bin = pb->bytes; erts_writable_bin = 1; /////////////////////////////////////////////////////////////////// return make_binary(sb); ... Can you reproduce the bug and verify that this fix really works? The thing is that binaries should *only* be reallocated in the gc if there are no active writers, which there obviously is here ( pb->flags |= PB_ACTIVE_WRITER a few lines earlier), so the bug would be in the detection of active writers in the gc if this code change actually removes the crash. Yes, it works in my case. I haven't simple test case for reproducing this bug (actually I run few processes to send requests to pgsql) pb = (ProcBin *) boxed_val(sb->orig); if (erts_current_bin != (pb->bytes)) { fprintf(stderr, "erts_current_bin != (pb->bytes)\n"); fflush(stderr); } erts_current_bin = pb->bytes; erts_writable_bin = 1; (jskit@REDACTED)1> f(F), F = fun() -> postgresql:equery('echo-customers', write, <<"some query here">>, []) end. #Fun (jskit@REDACTED)2> perftest:comprehensive(1000, F). Sequential 100 cycles in ~1 seconds (100 cycles/s) Sequential 200 cycles in ~2 seconds (106 cycles/s) Sequential 1000 cycles in ~12 seconds (85 cycles/s) Parallel 2 1000 cycles in ~8 seconds (132 cycles/s) Parallel 4 1000 cycles in ~8 seconds (121 cycles/s) Parallel 10 1000 cycles in ~8 seconds (119 cycles/s) Parallel 100 1000 cycles in ~13 seconds (74 cycles/s) [85,132,121,119,74] (jskit@REDACTED)3> perftest:comprehensive(1000, F). Sequential 100 cycles in ~1 seconds (83 cycles/s) Sequential 200 cycles in ~2 seconds (83 cycles/s) Sequential 1000 cycles in ~14 seconds (71 cycles/s) Parallel 2 1000 cycles in ~11 seconds (95 cycles/s) Parallel 4 1000 cycles in ~10 seconds (105 cycles/s) Parallel 10 1000 cycles in ~11 seconds (91 cycles/s) Parallel 100 1000 cycles in ~13 seconds (76 cycles/s) "G_i[L" (jskit@REDACTED)4> perftest:comprehensive(1000, F). Sequential 100 cycles in ~1 seconds (88 cycles/s) Sequential 200 cycles in ~2 seconds (85 cycles/s) Sequential 1000 cycles in ~13 seconds (74 cycles/s) Parallel 2 1000 cycles in ~9 seconds (109 cycles/s) Parallel 4 1000 cycles in ~10 seconds (101 cycles/s) Parallel 10 1000 cycles in ~11 seconds (95 cycles/s) erts_current_bin != (pb->bytes) Parallel 100 1000 cycles in ~13 seconds (77 cycles/s) "Jme_M" -- Cheers, Denis Cheers, /Patrik 20.11.2012, ? 19:37, Musumeci, Antonio S ???????(?): I've got lots of cores... but they are all from optimized builds. Has this been seen in other versions? We are keen to solve this because it's causing us pain in production. We hit another, older, memory bug (the 32bit values used in 64bit build)... and now this. I'm going to be building and trying R15B01 to see if we hit it as well. I'll send any additional information I can. Any suggestions on debugging beam would be appreciated. Compile options, etc. Thanks. -antonio ________________________________ From: erlang-bugs-bounces@REDACTED [mailto:erlang-bugs-bounces@REDACTED] On Behalf Of Patrik Nyblom Sent: Monday, November 19, 2012 8:55 AM To: erlang-bugs@REDACTED Subject: Re: [erlang-bugs] beam core'ing On 11/19/2012 02:01 PM, Musumeci, Antonio S wrote: I'm just starting to debug this but figured I'd send it along in case anyone has seen this before. 64bit RHEL 5.0.1 built from source beam.smp R15B02 Happens consistently when trying to start our app and then just stops after a time. Across a few boxes. Oddly we have an identical cluster (hw and sw) and it never happens. Yes! I've seen it before and have tried for several months to get a reproducable example and a core i can analyze here. I've had one core that was somewhat readable but had no luck in locating the beam code that triggered this. If you could try narrowing it down, I would be really grateful! Please email me any findings, theories, cores dumps - anything! I really want to find this! The most interesting would be to find the snippet of erlang code that makes this happen (intermittently probably). The problem is that when the allocators crash, the error is usually somewhere else. Access of freed memory, double free or something else doing horrid things to memory. Obviously none of our testsuites exercise this bug as neither our debug builds, nor our valgrind runs find it. It happens on both SMP and non SMP and is always in the context of the erts_bs_append, so I'm pretty sure this has a connection to the other users seeing the crash in the allocators... Cheers, Patrik #0 bf_unlink_free_block (flags=, block=0x6f00, allctr=) at beam/erl_bestfit_alloc.c:789 #1 bf_get_free_block (allctr=0x6824600, size=304, cand_blk=0x0, cand_size=, flags=0) at beam/erl_bestfit_alloc.c:869 #2 0x000000000045343c in mbc_alloc_block (alcu_flgsp=, blk_szp=, size=, allctr=) at beam/erl_alloc_util.c:1198 #3 mbc_alloc (allctr=0x6824600, size=295) at beam/erl_alloc_util.c:1345 #4 0x000000000045398d in do_erts_alcu_alloc (type=164, extra=0x6824600, size=295) at beam/erl_alloc_util.c:3442 #5 0x0000000000453a0f in erts_alcu_alloc_thr_pref (type=164, extra=, size=287) at beam/erl_alloc_util.c:3520 #6 0x0000000000511463 in erts_alloc (size=287, type=) at beam/erl_alloc.h:208 #7 erts_bin_nrml_alloc (size=) at beam/erl_binary.h:260 #8 erts_bs_append (c_p=0x69fba60, reg=, live=, build_size_term=, extra_words=0, unit=8) at beam/erl_bits.c:1327 #9 0x000000000053ffd8 in process_main () at beam/beam_emu.c:3858 #10 0x00000000004ae853 in sched_thread_func (vesdp=) at beam/erl_process.c:5184 #11 0x00000000005c17e9 in thr_wrapper (vtwd=) at pthread/ethread.c:106 #12 0x00002b430f39e73d in start_thread () from /lib64/libpthread.so.0 #13 0x00002b430f890f6d in clone () from /lib64/libc.so.6 #14 0x0000000000000000 in ?? () _______________________________________________ erlang-bugs mailing list erlang-bugs@REDACTED http://erlang.org/mailman/listinfo/erlang-bugs _______________________________________________ erlang-bugs mailing list erlang-bugs@REDACTED http://erlang.org/mailman/listinfo/erlang-bugs -------------- next part -------------- An HTML attachment was scrubbed... URL: From pan@REDACTED Thu Nov 22 16:23:40 2012 From: pan@REDACTED (Patrik Nyblom) Date: Thu, 22 Nov 2012 16:23:40 +0100 Subject: [erlang-bugs] beam core'ing In-Reply-To: <51C6F20DC46369418387C5250127649B03B8FF@HZWEX2014N4.msad.ms.com> References: <51C6F20DC46369418387C5250127649B039D96@HZWEX2014N4.msad.ms.com> <50AA3A17.9030300@erlang.org> <51C6F20DC46369418387C5250127649B03B16B@HZWEX2014N4.msad.ms.com> <79133563-669F-4FDD-8982-01DB7B321DA5@aboutecho.com> <50ACA27A.40707@erlang.org> <8436D993-C4FC-4822-B0E8-7A2D6AB2E0C9@gmail.com> <50ACB668.6040807@erlang.org> <51C6F20DC46369418387C5250127649B03B8FF@HZWEX2014N4.msad.ms.com> Message-ID: <50AE437C.9090806@erlang.org> Hi! Thanks to everyone helping out trying to find this bug! With the help of Denis, I have now a verified fix for the garbage collector bug which moved a "fixed" (and writable) binary in the middle of erts_bs_append (erts_bs_append in erl_bits.c was the "innocent bystander" triggering the gc bug). The bugfix will be a last minute contribution to R15B03, but I also attach a source patch to this mail. Cheers, /Patrik On 11/21/2012 05:35 PM, Musumeci, Antonio S wrote: > > Something my team just noticed was that our segv occurs right after > reboot of the box consistantly. After which beam appears to work > alright. We are trying to narrow down what code is triggering it but > it may take some time. > > ------------------------------------------------------------------------ > *From:* Patrik Nyblom [mailto:pan@REDACTED] > *Sent:* Wednesday, November 21, 2012 6:09 AM > *To:* Denis Titoruk > *Cc:* Musumeci, Antonio S (Enterprise Infrastructure); > erlang-bugs@REDACTED > *Subject:* Re: [erlang-bugs] beam core'ing > > Hi again :) > > Another thing that would be helpful is if you could create a crash > dump instead of a fprintf when the binary is wrongly moved, i.e. call > erl_exit(ERTS_DUMP_EXIT, "erts_current_bin != (pb->bytes)"); instead > of the fprintf? Then you could isolate the erlang code snippet that > exercises the bug and I maybe could create a smaller testcase... A > simple testcase when diving into the GC would be really helpful :) > > Cheers, > /Patrik > > On 11/21/2012 11:21 AM, Denis Titoruk wrote: >> >> 21.11.2012, ? 13:44, Patrik Nyblom ???????(?): >> >>> Hi! >>> On 11/20/2012 10:40 PM, Denis Titoruk wrote: >>>> Hi, >>>> >>>> We've got the same error on R15B01, R15B02 >>>> I've finished my investigation of this issue today & here is result: >>>> >>>> Let's assume we have the code: >>>> encode_formats(Columns) -> >>>> encode_formats(Columns, 0, <<>>). >>>> >>>> encode_formats([], Count, Acc) -> >>>> <>; >>>> >>>> encode_formats([#column{format = Format} | T], Count, Acc) -> >>>> encode_formats(T, Count + 1, <>). >>>> >>>> So, <> translates to >>>> >>>> {bs_append,{f,0},{integer,16},0,7,8,{x,2},{field_flags,[]},{x,1}}. >>>> {bs_put_integer,{f,0},{integer,16},1,{field_flags,[signed,big]},{x,6}}. >>>> >>>> There is GC execution in bs_append and it can reallocate binary but >>>> there isn't reassigning erts_current_bin which used in bs_put_integer. >>>> >>>> Fix: >>>> >>>> erl_bits.c: >>>> Eterm >>>> erts_bs_append(Process* c_p, Eterm* reg, Uint live, Eterm >>>> build_size_term, >>>> Uint extra_words, Uint unit) >>>> ... >>>> if (c_p->stop - c_p->htop < heap_need) { >>>> (void) erts_garbage_collect(c_p, heap_need, reg, live+1); >>>> } >>>> sb = (ErlSubBin *) c_p->htop; >>>> c_p->htop += ERL_SUB_BIN_SIZE; >>>> sb->thing_word = HEADER_SUB_BIN; >>>> sb->size = BYTE_OFFSET(used_size_in_bits); >>>> sb->bitsize = BIT_OFFSET(used_size_in_bits); >>>> sb->offs = 0; >>>> sb->bitoffs = 0; >>>> sb->is_writable = 1; >>>> sb->orig = reg[live]; >>>> >>>> /////////////////////////////////////////////////////////////////// >>>> // add this lines >>>> /////////////////////////////////////////////////////////////////// >>>> pb = (ProcBin *) boxed_val(sb->orig); >>>> erts_current_bin = pb->bytes; >>>> erts_writable_bin = 1; >>>> /////////////////////////////////////////////////////////////////// >>>> >>>> return make_binary(sb); >>>> ... >>>> >>> Can you reproduce the bug and verify that this fix really works? The >>> thing is that binaries should *only* be reallocated in the gc if >>> there are no active writers, which there obviously is here ( >>> pb->flags |= PB_ACTIVE_WRITER a few lines earlier), so the bug >>> would be in the detection of active writers in the gc if this code >>> change actually removes the crash. >> >> Yes, it works in my case. I haven't simple test case for reproducing >> this bug (actually I run few processes to send requests to pgsql) >> >> pb = (ProcBin *) boxed_val(sb->orig); >> if (erts_current_bin != (pb->bytes)) { >> fprintf(stderr, "erts_current_bin != (pb->bytes)\n"); >> fflush(stderr); >> } >> erts_current_bin = pb->bytes; >> erts_writable_bin = 1; >> >> >> (jskit@REDACTED)1> f(F), F = fun() -> >> postgresql:equery('echo-customers', write, <<"some query here">>, []) >> end. >> #Fun >> (jskit@REDACTED)2> perftest:comprehensive(1000, F). >> Sequential 100 cycles in ~1 seconds (100 cycles/s) >> Sequential 200 cycles in ~2 seconds (106 cycles/s) >> Sequential 1000 cycles in ~12 seconds (85 cycles/s) >> Parallel 2 1000 cycles in ~8 seconds (132 cycles/s) >> Parallel 4 1000 cycles in ~8 seconds (121 cycles/s) >> Parallel 10 1000 cycles in ~8 seconds (119 cycles/s) >> Parallel 100 1000 cycles in ~13 seconds (74 cycles/s) >> [85,132,121,119,74] >> (jskit@REDACTED)3> perftest:comprehensive(1000, F). >> Sequential 100 cycles in ~1 seconds (83 cycles/s) >> Sequential 200 cycles in ~2 seconds (83 cycles/s) >> Sequential 1000 cycles in ~14 seconds (71 cycles/s) >> Parallel 2 1000 cycles in ~11 seconds (95 cycles/s) >> Parallel 4 1000 cycles in ~10 seconds (105 cycles/s) >> Parallel 10 1000 cycles in ~11 seconds (91 cycles/s) >> Parallel 100 1000 cycles in ~13 seconds (76 cycles/s) >> "G_i[L" >> (jskit@REDACTED)4> perftest:comprehensive(1000, F). >> Sequential 100 cycles in ~1 seconds (88 cycles/s) >> Sequential 200 cycles in ~2 seconds (85 cycles/s) >> Sequential 1000 cycles in ~13 seconds (74 cycles/s) >> Parallel 2 1000 cycles in ~9 seconds (109 cycles/s) >> Parallel 4 1000 cycles in ~10 seconds (101 cycles/s) >> Parallel 10 1000 cycles in ~11 seconds (95 cycles/s) >> erts_current_bin != (pb->bytes) >> Parallel 100 1000 cycles in ~13 seconds (77 cycles/s) >> "Jme_M" >> >>> >>>> >>>> -- >>>> Cheers, >>>> Denis >>> Cheers, >>> /Patrik >>>> >>>> 20.11.2012, ? 19:37, Musumeci, Antonio S ???????(?): >>>> >>>>> >>>>> I've got lots of cores... but they are all from optimized builds. >>>>> >>>>> Has this been seen in other versions? We are keen to solve this >>>>> because it's causing us pain in production. We hit another, older, >>>>> memory bug (the 32bit values used in 64bit build)... and now this. >>>>> >>>>> I'm going to be building and trying R15B01 to see if we hit it as >>>>> well. I'll send any additional information I can.Any suggestions >>>>> on debugging beam would be appreciated. Compile options, etc. >>>>> >>>>> Thanks. >>>>> >>>>> -antonio >>>>> >>>>> ------------------------------------------------------------------------ >>>>> *From:*erlang-bugs-bounces@REDACTED >>>>> [mailto:erlang-bugs-bounces@REDACTED]*On >>>>> Behalf Of*Patrik Nyblom >>>>> *Sent:*Monday, November 19, 2012 8:55 AM >>>>> *To:*erlang-bugs@REDACTED >>>>> *Subject:*Re: [erlang-bugs] beam core'ing >>>>> >>>>> On 11/19/2012 02:01 PM, Musumeci, Antonio S wrote: >>>>>> >>>>>> I'm just starting to debug this but figured I'd send it along in >>>>>> case anyone has seen this before. >>>>>> >>>>>> 64bit RHEL 5.0.1 >>>>>> >>>>>> built from source beam.smp R15B02 >>>>>> >>>>>> Happens consistently when trying to start our app and then just >>>>>> stops after a time. Across a few boxes. Oddly we have an >>>>>> identical cluster (hw and sw) and it never happens. >>>>>> >>>>> Yes! I've seen it before and have tried for several months to get >>>>> areproducable example and acore i can analyze here. I've had one >>>>> core that wassomewhat readable but had no luck in locating the >>>>> beam code that triggered this. If you could try narrowing it down, >>>>> I would be really grateful! >>>>> >>>>> Please email me any findings, theories, cores dumps- anything! I >>>>> really want to find this! The most interesting would be to find >>>>> the snippet of erlang code that makes this happen (intermittently >>>>> probably). >>>>> >>>>> The problem isthatwhen the allocators crash, the error is usually >>>>> somewhere else.Access of freed memory, double free or something >>>>> else doing horrid things to memory. Obviously none of our >>>>> testsuites exercise this bug asneither our debug builds, nor our >>>>> valgrind runs find it. It happens on both SMP and non SMP and is >>>>> always in the context of the erts_bs_append, so I'm pretty sure >>>>> this has a connection to the other users seeing the crash in the >>>>> allocators... >>>>> >>>>> Cheers, >>>>> Patrik >>>>>> >>>>>> #0 bf_unlink_free_block (flags=, block=0x6f00, >>>>>> allctr=) at beam/erl_bestfit_alloc.c:789 >>>>>> #1 bf_get_free_block (allctr=0x6824600, size=304, cand_blk=0x0, >>>>>> cand_size=, flags=0) at beam/erl_bestfit_alloc.c:869 >>>>>> #2 0x000000000045343c in mbc_alloc_block (alcu_flgsp=>>>>> out>, blk_szp=, size=, >>>>>> allctr=) at beam/erl_alloc_util.c:1198 >>>>>> #3 mbc_alloc (allctr=0x6824600, size=295) at >>>>>> beam/erl_alloc_util.c:1345 >>>>>> #4 0x000000000045398d in do_erts_alcu_alloc (type=164, >>>>>> extra=0x6824600, size=295) at beam/erl_alloc_util.c:3442 >>>>>> #5 0x0000000000453a0f in erts_alcu_alloc_thr_pref (type=164, >>>>>> extra=, size=287) at beam/erl_alloc_util.c:3520 >>>>>> #6 0x0000000000511463 in erts_alloc (size=287, type=>>>>> out>) at beam/erl_alloc.h:208 >>>>>> #7 erts_bin_nrml_alloc (size=) at >>>>>> beam/erl_binary.h:260 >>>>>> #8 erts_bs_append (c_p=0x69fba60, reg=, >>>>>> live=, build_size_term=, >>>>>> extra_words=0, unit=8)at beam/erl_bits.c:1327 >>>>>> #9 0x000000000053ffd8 in process_main () at beam/beam_emu.c:3858 >>>>>> #10 0x00000000004ae853 in sched_thread_func (vesdp=>>>>> out>) at beam/erl_process.c:5184 >>>>>> #11 0x00000000005c17e9 in thr_wrapper (vtwd=) at >>>>>> pthread/ethread.c:106 >>>>>> #12 0x00002b430f39e73d in start_thread () from /lib64/libpthread.so.0 >>>>>> #13 0x00002b430f890f6d in clone () from /lib64/libc.so.6 >>>>>> #14 0x0000000000000000 in ?? () >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> erlang-bugs mailing list >>>>>> erlang-bugs@REDACTED >>>>>> http://erlang.org/mailman/listinfo/erlang-bugs >>>>> >>>>> _______________________________________________ >>>>> erlang-bugs mailing list >>>>> erlang-bugs@REDACTED >>>>> http://erlang.org/mailman/listinfo/erlang-bugs >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: bs_append_crash.diff Type: text/x-patch Size: 1053 bytes Desc: not available URL: From tuncer.ayaz@REDACTED Thu Nov 22 22:13:07 2012 From: tuncer.ayaz@REDACTED (Tuncer Ayaz) Date: Thu, 22 Nov 2012 22:13:07 +0100 Subject: [erlang-bugs] Spec or Dialyzer regression In-Reply-To: References: <4FB2BB82.7000303@cs.ntua.gr> Message-ID: On Tue, Nov 13, 2012 at 2:23 PM, Tuncer Ayaz wrote: > On Tue, Oct 2, 2012 at 4:09 PM, wrote: >> Hi! >> >> It's not really obvious from the output, but the problem is the spec >> for open_port in erlang.erl. All the "will never return" things all >> boil down to rebar_utils:sh/2 and eventually the call to open_port. >> The option 'hide' is missing from the spec (which is new as it was >> before handled by the erl_bif_types.erl thing). >> >> I will update the spec in erlang.erl and you should be down to the >> single warning again in a few days! > > Has the fix landed in master? As of yesterday the fix is in master and works. Thanks Patrik! From essen@REDACTED Fri Nov 23 01:49:31 2012 From: essen@REDACTED (=?ISO-8859-1?Q?Lo=EFc_Hoguin?=) Date: Fri, 23 Nov 2012 01:49:31 +0100 Subject: [erlang-bugs] VM locks up on write to socket (and now it seems to file too) In-Reply-To: References: Message-ID: <50AEC81B.2000908@ninenines.eu> Sending this on behalf of someone who didn't manage to get the email sent to this list after 2 attempts. If someone can check if he's hold up or something that'd be great. Anyway he has a big issue so I hope I can relay the conversation reliably. Thanks! On 11/23/2012 01:45 AM, Peter Membrey wrote: > From: Peter Membrey > Date: 22 November 2012 19:02 > Subject: VM locks up on write to socket (and now it seems to file too) > To: erlang-bugs@REDACTED > > > Hi guys, > > I wrote a simple database application called CakeDB > (https://github.com/pmembrey/cakedb) that basically spends its time > reading and writing files and sockets. There's very little in the way > of complex logic. It is running on CentOS 6.3 with all the updates > applied. I hit this problem on R15B02 so I rolled back to R15B01 but > the issue remained. Erlang was built from source. > > The machine has two Intel X5690 CPUs giving 12 cores plus HT. I've > tried various arguments for the VM but so far nothing has prevented > the problem. At the moment I'm using: > > +K > +A 6 > +sbt tnnps > > The issue I'm seeing is that one of the scheduler threads will hit > 100% cpu usage and the entire VM will become unresponsive. When this > happens, I am not able to connect via the console with attach and > entop is also unable to connect. I can still establish TCP connections > to the application, but I never receive a response. A standard kill > signal will cause the VM to shut down (it doesn't need -9). > > Due to the pedigree of the VM I am quite willing to accept that I've > made a fundamental mistake in my code. I am pretty sure that the way I > am doing the file IO could result in some race conditions. However, my > poor code aside, from what I understand, I still shouldn't be able to > crash / deadlock the VM like this. > > The issue doesn't seem to be caused by load. The app can fail when > it's very busy, but also when it is practically idle. I haven't been > able to find a trigger or any other explanation for the failure. > > The thread maxing out the CPU is attempting to write data to the socket: > > (gdb) bt > #0 0x00007f9882ab6377 in writev () from /lib64/libc.so.6 > #1 0x000000000058a81f in tcp_inet_output (data=0x2407570, > event=) at drivers/common/inet_drv.c:9681 > #2 tcp_inet_drv_output (data=0x2407570, event=) > at drivers/common/inet_drv.c:9601 > #3 0x00000000004b773f in erts_port_task_execute (runq=0x7f98826019c0, > curr_port_pp=0x7f9881639338) at beam/erl_port_task.c:858 > #4 0x00000000004afd83 in schedule (p=, > calls=) at beam/erl_process.c:6533 > #5 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 > #6 0x00000000004b1279 in sched_thread_func (vesdp=0x7f9881639280) at > beam/erl_process.c:4834 > #7 0x00000000005ba726 in thr_wrapper (vtwd=0x7fff6cfe2300) at > pthread/ethread.c:106 > #8 0x00007f9882f78851 in start_thread () from /lib64/libpthread.so.0 > #9 0x00007f9882abe11d in clone () from /lib64/libc.so.6 > (gdb) > > I then tried running strace on that thread and got (indefinitely): > > writev(15, [{"", 2158022464}], 1) = 0 > writev(15, [{"", 2158022464}], 1) = 0 > writev(15, [{"", 2158022464}], 1) = 0 > writev(15, [{"", 2158022464}], 1) = 0 > writev(15, [{"", 2158022464}], 1) = 0 > writev(15, [{"", 2158022464}], 1) = 0 > writev(15, [{"", 2158022464}], 1) = 0 > writev(15, [{"", 2158022464}], 1) = 0 > writev(15, [{"", 2158022464}], 1) = 0 > writev(15, [{"", 2158022464}], 1) = 0 > ... > > From what I can tell, it's trying to write data to a socket, which is > succeeding, but writing 0 bytes. From the earlier definitions in the > source file, an error condition would be signified by a negative > number. Any other result is the number of bytes written, in this case > 0. I'm not sure if this is desired behaviour or not. I've tried > killing the application on the other end of the socket, but it has no > effect on the VM. > > I have enabled debugging for the inet code, so hopefully this will > give a little more insight. I am currently trying to reproduce the > condition, but as I really have no idea what causes it, it's pretty > much a case of wait and see. > > > **** UPDATE **** > > I managed to lock up the VM again, but this time it was caused by file IO, > probably from the debugging statements. Although it worked fine for some time > the last entry in the file was cut off. > > From GDB: > > (gdb) info threads > 53 Thread 0x7f83e988b700 (LWP 8621) 0x00007f83ea6da54d in read () > from /lib64/libpthread.so.0 > 52 Thread 0x7f83e8c8f700 (LWP 8622) 0x00007f83ea6d743c in > pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > 51 Thread 0x7f83e818d700 (LWP 8623) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 50 Thread 0x7f83e816b700 (LWP 8624) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 49 Thread 0x7f83e8149700 (LWP 8625) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 48 Thread 0x7f83e8127700 (LWP 8626) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 47 Thread 0x7f83e8105700 (LWP 8627) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 46 Thread 0x7f83e80e3700 (LWP 8628) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 45 Thread 0x7f83e80c1700 (LWP 8629) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 44 Thread 0x7f83e809f700 (LWP 8630) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 43 Thread 0x7f83e807d700 (LWP 8631) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 42 Thread 0x7f83e805b700 (LWP 8632) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 41 Thread 0x7f83e8039700 (LWP 8633) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 40 Thread 0x7f83e8017700 (LWP 8634) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 39 Thread 0x7f83e7ff5700 (LWP 8635) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 38 Thread 0x7f83e7fd3700 (LWP 8636) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 37 Thread 0x7f83e7fb1700 (LWP 8637) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 36 Thread 0x7f83e7f8f700 (LWP 8638) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 35 Thread 0x7f83e7f6d700 (LWP 8639) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 34 Thread 0x7f83e7f4b700 (LWP 8640) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 33 Thread 0x7f83e7f29700 (LWP 8641) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 32 Thread 0x7f83e7f07700 (LWP 8642) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 31 Thread 0x7f83e7ee5700 (LWP 8643) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 30 Thread 0x7f83e7ec3700 (LWP 8644) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 29 Thread 0x7f83e7ea1700 (LWP 8645) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 28 Thread 0x7f83e7e7f700 (LWP 8646) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 27 Thread 0x7f83d7c5a700 (LWP 8647) 0x00007f83ea6db09d in waitpid > () from /lib64/libpthread.so.0 > 26 Thread 0x7f83d7c53700 (LWP 8648) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 25 Thread 0x7f83d7252700 (LWP 8649) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 24 Thread 0x7f83d6851700 (LWP 8650) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 23 Thread 0x7f83d5e50700 (LWP 8651) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 22 Thread 0x7f83d544f700 (LWP 8652) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 21 Thread 0x7f83d4a4e700 (LWP 8653) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 20 Thread 0x7f83d404d700 (LWP 8654) 0x00007f83ea20be7d in write () > from /lib64/libc.so.6 > 19 Thread 0x7f83d364c700 (LWP 8655) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 18 Thread 0x7f83d2c4b700 (LWP 8656) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 17 Thread 0x7f83d224a700 (LWP 8657) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 16 Thread 0x7f83d1849700 (LWP 8658) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 15 Thread 0x7f83d0e48700 (LWP 8659) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 14 Thread 0x7f83d0447700 (LWP 8660) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 13 Thread 0x7f83cfa46700 (LWP 8661) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 12 Thread 0x7f83cf045700 (LWP 8662) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 11 Thread 0x7f83ce644700 (LWP 8663) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 10 Thread 0x7f83cdc43700 (LWP 8664) 0x00007f83ea215ae9 in syscall > () from /lib64/libc.so.6 > 9 Thread 0x7f83cd242700 (LWP 8665) 0x00007f83ea215ae9 in syscall () > from /lib64/libc.so.6 > 8 Thread 0x7f83cc841700 (LWP 8666) 0x00007f83ea215ae9 in syscall () > from /lib64/libc.so.6 > 7 Thread 0x7f83cbe40700 (LWP 8667) 0x00007f83ea215ae9 in syscall () > from /lib64/libc.so.6 > 6 Thread 0x7f83cb43f700 (LWP 8668) 0x00007f83ea215ae9 in syscall () > from /lib64/libc.so.6 > 5 Thread 0x7f83caa3e700 (LWP 8669) 0x00007f83ea215ae9 in syscall () > from /lib64/libc.so.6 > 4 Thread 0x7f83ca03d700 (LWP 8670) 0x00007f83ea215ae9 in syscall () > from /lib64/libc.so.6 > 3 Thread 0x7f83c963c700 (LWP 8671) 0x00007f83ea215ae9 in syscall () > from /lib64/libc.so.6 > 2 Thread 0x7f83c8c3b700 (LWP 8672) 0x00007f83ea215ae9 in syscall () > from /lib64/libc.so.6 > * 1 Thread 0x7f83eb3a8700 (LWP 8597) 0x00007f83ea211d03 in select () > from /lib64/libc.so.6 > (gdb) > > > (gdb) bt > #0 0x00007f83ea20be7d in write () from /lib64/libc.so.6 > #1 0x00007f83ea1a2583 in _IO_new_file_write () from /lib64/libc.so.6 > #2 0x00007f83ea1a3b35 in _IO_new_do_write () from /lib64/libc.so.6 > #3 0x00007f83ea1a21fd in _IO_new_file_xsputn () from /lib64/libc.so.6 > #4 0x00007f83ea17589d in vfprintf () from /lib64/libc.so.6 > #5 0x00007f83ea18003a in printf () from /lib64/libc.so.6 > #6 0x000000000058f0e8 in tcp_recv (desc=0x2c3d350, request_len=0) at > drivers/common/inet_drv.c:8976 > #7 0x000000000058f63a in tcp_inet_input (data=0x2c3d350, event= optimized out>) at drivers/common/inet_drv.c:9326 > #8 tcp_inet_drv_input (data=0x2c3d350, event=) > at drivers/common/inet_drv.c:9604 > #9 0x00000000004b770f in erts_port_task_execute (runq=0x7f83e9d5d3c0, > curr_port_pp=0x7f83e8dc6e78) at beam/erl_port_task.c:851 > #10 0x00000000004afd83 in schedule (p=, > calls=) at beam/erl_process.c:6533 > #11 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 > #12 0x00000000004b1279 in sched_thread_func (vesdp=0x7f83e8dc6dc0) at > beam/erl_process.c:4834 > #13 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at > pthread/ethread.c:106 > #14 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 > #15 0x00007f83ea21911d in clone () from /lib64/libc.so.6 > (gdb) > > (gdb) bt > #0 0x00007f83ea6da54d in read () from /lib64/libpthread.so.0 > #1 0x0000000000554b6e in signal_dispatcher_thread_func (unused= optimized out>) at sys/unix/sys.c:2776 > #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266c80) at > pthread/ethread.c:106 > #3 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 > #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 > (gdb) > > (gdb) bt > #0 0x00007f83ea215ae9 in syscall () from /lib64/libc.so.6 > #1 0x00000000005bba35 in wait__ (e=0x2989390) at > pthread/ethr_event.c:92 > #2 ethr_event_wait (e=0x2989390) at pthread/ethr_event.c:218 > #3 0x00000000004ae5bd in erts_tse_wait (fcalls=, > esdp=0x7f83e8e2c440, rq=0x7f83e9d5e7c0) at beam/erl_threads.h:2319 > #4 scheduler_wait (fcalls=, esdp=0x7f83e8e2c440, > rq=0x7f83e9d5e7c0) at beam/erl_process.c:2087 > #5 0x00000000004afb94 in schedule (p=, > calls=) at beam/erl_process.c:6467 > #6 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 > #7 0x00000000004b1279 in sched_thread_func (vesdp=0x7f83e8e2c440) at > beam/erl_process.c:4834 > #8 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at > pthread/ethread.c:106 > #9 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 > #10 0x00007f83ea21911d in clone () from /lib64/libc.so.6 > (gdb) > > > (gdb) bt > #0 0x00007f83ea6db09d in waitpid () from /lib64/libpthread.so.0 > #1 0x0000000000555a9f in child_waiter (unused=) > at sys/unix/sys.c:2700 > #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266d50) at > pthread/ethread.c:106 > #3 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 > #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 > (gdb) > > > **** END UPDATE **** > > > I'm happy to provide any information I can, so please don't hesitate to ask. > > Thanks in advance! > > Kind Regards, > > Peter Membrey > -- Lo?c Hoguin Erlang Cowboy Nine Nines http://ninenines.eu From pan@REDACTED Fri Nov 23 10:21:42 2012 From: pan@REDACTED (Patrik Nyblom) Date: Fri, 23 Nov 2012 10:21:42 +0100 Subject: [erlang-bugs] VM locks up on write to socket (and now it seems to file too) In-Reply-To: <50AEC81B.2000908@ninenines.eu> References: <50AEC81B.2000908@ninenines.eu> Message-ID: <50AF4026.5050709@erlang.org> Hi! Try to trace the erlang code to see what triggers this. Some sequence of operations or some special data sent on the socket? BTW you have to be registered (with the correct mail address, the one you send from) to post to this list, that's usually the problem when you're unable to send to the list. Cheers, Patrik On 11/23/2012 01:49 AM, Lo?c Hoguin wrote: > Sending this on behalf of someone who didn't manage to get the email > sent to this list after 2 attempts. If someone can check if he's hold > up or something that'd be great. > > Anyway he has a big issue so I hope I can relay the conversation > reliably. > > Thanks! > > On 11/23/2012 01:45 AM, Peter Membrey wrote: >> From: Peter Membrey >> Date: 22 November 2012 19:02 >> Subject: VM locks up on write to socket (and now it seems to file too) >> To: erlang-bugs@REDACTED >> >> >> Hi guys, >> >> I wrote a simple database application called CakeDB >> (https://github.com/pmembrey/cakedb) that basically spends its time >> reading and writing files and sockets. There's very little in the way >> of complex logic. It is running on CentOS 6.3 with all the updates >> applied. I hit this problem on R15B02 so I rolled back to R15B01 but >> the issue remained. Erlang was built from source. >> >> The machine has two Intel X5690 CPUs giving 12 cores plus HT. I've >> tried various arguments for the VM but so far nothing has prevented >> the problem. At the moment I'm using: >> >> +K >> +A 6 >> +sbt tnnps >> >> The issue I'm seeing is that one of the scheduler threads will hit >> 100% cpu usage and the entire VM will become unresponsive. When this >> happens, I am not able to connect via the console with attach and >> entop is also unable to connect. I can still establish TCP connections >> to the application, but I never receive a response. A standard kill >> signal will cause the VM to shut down (it doesn't need -9). >> >> Due to the pedigree of the VM I am quite willing to accept that I've >> made a fundamental mistake in my code. I am pretty sure that the way I >> am doing the file IO could result in some race conditions. However, my >> poor code aside, from what I understand, I still shouldn't be able to >> crash / deadlock the VM like this. >> >> The issue doesn't seem to be caused by load. The app can fail when >> it's very busy, but also when it is practically idle. I haven't been >> able to find a trigger or any other explanation for the failure. >> >> The thread maxing out the CPU is attempting to write data to the socket: >> >> (gdb) bt >> #0 0x00007f9882ab6377 in writev () from /lib64/libc.so.6 >> #1 0x000000000058a81f in tcp_inet_output (data=0x2407570, >> event=) at drivers/common/inet_drv.c:9681 >> #2 tcp_inet_drv_output (data=0x2407570, event=) >> at drivers/common/inet_drv.c:9601 >> #3 0x00000000004b773f in erts_port_task_execute (runq=0x7f98826019c0, >> curr_port_pp=0x7f9881639338) at beam/erl_port_task.c:858 >> #4 0x00000000004afd83 in schedule (p=, >> calls=) at beam/erl_process.c:6533 >> #5 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >> #6 0x00000000004b1279 in sched_thread_func (vesdp=0x7f9881639280) at >> beam/erl_process.c:4834 >> #7 0x00000000005ba726 in thr_wrapper (vtwd=0x7fff6cfe2300) at >> pthread/ethread.c:106 >> #8 0x00007f9882f78851 in start_thread () from /lib64/libpthread.so.0 >> #9 0x00007f9882abe11d in clone () from /lib64/libc.so.6 >> (gdb) >> >> I then tried running strace on that thread and got (indefinitely): >> >> writev(15, [{"", 2158022464}], 1) = 0 >> writev(15, [{"", 2158022464}], 1) = 0 >> writev(15, [{"", 2158022464}], 1) = 0 >> writev(15, [{"", 2158022464}], 1) = 0 >> writev(15, [{"", 2158022464}], 1) = 0 >> writev(15, [{"", 2158022464}], 1) = 0 >> writev(15, [{"", 2158022464}], 1) = 0 >> writev(15, [{"", 2158022464}], 1) = 0 >> writev(15, [{"", 2158022464}], 1) = 0 >> writev(15, [{"", 2158022464}], 1) = 0 >> ... >> >> From what I can tell, it's trying to write data to a socket, which is >> succeeding, but writing 0 bytes. From the earlier definitions in the >> source file, an error condition would be signified by a negative >> number. Any other result is the number of bytes written, in this case >> 0. I'm not sure if this is desired behaviour or not. I've tried >> killing the application on the other end of the socket, but it has no >> effect on the VM. >> >> I have enabled debugging for the inet code, so hopefully this will >> give a little more insight. I am currently trying to reproduce the >> condition, but as I really have no idea what causes it, it's pretty >> much a case of wait and see. >> >> >> **** UPDATE **** >> >> I managed to lock up the VM again, but this time it was caused by >> file IO, >> probably from the debugging statements. Although it worked fine for >> some time >> the last entry in the file was cut off. >> >> From GDB: >> >> (gdb) info threads >> 53 Thread 0x7f83e988b700 (LWP 8621) 0x00007f83ea6da54d in read () >> from /lib64/libpthread.so.0 >> 52 Thread 0x7f83e8c8f700 (LWP 8622) 0x00007f83ea6d743c in >> pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >> 51 Thread 0x7f83e818d700 (LWP 8623) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 50 Thread 0x7f83e816b700 (LWP 8624) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 49 Thread 0x7f83e8149700 (LWP 8625) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 48 Thread 0x7f83e8127700 (LWP 8626) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 47 Thread 0x7f83e8105700 (LWP 8627) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 46 Thread 0x7f83e80e3700 (LWP 8628) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 45 Thread 0x7f83e80c1700 (LWP 8629) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 44 Thread 0x7f83e809f700 (LWP 8630) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 43 Thread 0x7f83e807d700 (LWP 8631) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 42 Thread 0x7f83e805b700 (LWP 8632) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 41 Thread 0x7f83e8039700 (LWP 8633) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 40 Thread 0x7f83e8017700 (LWP 8634) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 39 Thread 0x7f83e7ff5700 (LWP 8635) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 38 Thread 0x7f83e7fd3700 (LWP 8636) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 37 Thread 0x7f83e7fb1700 (LWP 8637) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 36 Thread 0x7f83e7f8f700 (LWP 8638) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 35 Thread 0x7f83e7f6d700 (LWP 8639) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 34 Thread 0x7f83e7f4b700 (LWP 8640) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 33 Thread 0x7f83e7f29700 (LWP 8641) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 32 Thread 0x7f83e7f07700 (LWP 8642) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 31 Thread 0x7f83e7ee5700 (LWP 8643) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 30 Thread 0x7f83e7ec3700 (LWP 8644) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 29 Thread 0x7f83e7ea1700 (LWP 8645) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 28 Thread 0x7f83e7e7f700 (LWP 8646) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 27 Thread 0x7f83d7c5a700 (LWP 8647) 0x00007f83ea6db09d in waitpid >> () from /lib64/libpthread.so.0 >> 26 Thread 0x7f83d7c53700 (LWP 8648) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 25 Thread 0x7f83d7252700 (LWP 8649) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 24 Thread 0x7f83d6851700 (LWP 8650) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 23 Thread 0x7f83d5e50700 (LWP 8651) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 22 Thread 0x7f83d544f700 (LWP 8652) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 21 Thread 0x7f83d4a4e700 (LWP 8653) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 20 Thread 0x7f83d404d700 (LWP 8654) 0x00007f83ea20be7d in write () >> from /lib64/libc.so.6 >> 19 Thread 0x7f83d364c700 (LWP 8655) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 18 Thread 0x7f83d2c4b700 (LWP 8656) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 17 Thread 0x7f83d224a700 (LWP 8657) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 16 Thread 0x7f83d1849700 (LWP 8658) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 15 Thread 0x7f83d0e48700 (LWP 8659) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 14 Thread 0x7f83d0447700 (LWP 8660) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 13 Thread 0x7f83cfa46700 (LWP 8661) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 12 Thread 0x7f83cf045700 (LWP 8662) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 11 Thread 0x7f83ce644700 (LWP 8663) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 10 Thread 0x7f83cdc43700 (LWP 8664) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 9 Thread 0x7f83cd242700 (LWP 8665) 0x00007f83ea215ae9 in syscall () >> from /lib64/libc.so.6 >> 8 Thread 0x7f83cc841700 (LWP 8666) 0x00007f83ea215ae9 in syscall () >> from /lib64/libc.so.6 >> 7 Thread 0x7f83cbe40700 (LWP 8667) 0x00007f83ea215ae9 in syscall () >> from /lib64/libc.so.6 >> 6 Thread 0x7f83cb43f700 (LWP 8668) 0x00007f83ea215ae9 in syscall () >> from /lib64/libc.so.6 >> 5 Thread 0x7f83caa3e700 (LWP 8669) 0x00007f83ea215ae9 in syscall () >> from /lib64/libc.so.6 >> 4 Thread 0x7f83ca03d700 (LWP 8670) 0x00007f83ea215ae9 in syscall () >> from /lib64/libc.so.6 >> 3 Thread 0x7f83c963c700 (LWP 8671) 0x00007f83ea215ae9 in syscall () >> from /lib64/libc.so.6 >> 2 Thread 0x7f83c8c3b700 (LWP 8672) 0x00007f83ea215ae9 in syscall () >> from /lib64/libc.so.6 >> * 1 Thread 0x7f83eb3a8700 (LWP 8597) 0x00007f83ea211d03 in select () >> from /lib64/libc.so.6 >> (gdb) >> >> >> (gdb) bt >> #0 0x00007f83ea20be7d in write () from /lib64/libc.so.6 >> #1 0x00007f83ea1a2583 in _IO_new_file_write () from /lib64/libc.so.6 >> #2 0x00007f83ea1a3b35 in _IO_new_do_write () from /lib64/libc.so.6 >> #3 0x00007f83ea1a21fd in _IO_new_file_xsputn () from /lib64/libc.so.6 >> #4 0x00007f83ea17589d in vfprintf () from /lib64/libc.so.6 >> #5 0x00007f83ea18003a in printf () from /lib64/libc.so.6 >> #6 0x000000000058f0e8 in tcp_recv (desc=0x2c3d350, request_len=0) at >> drivers/common/inet_drv.c:8976 >> #7 0x000000000058f63a in tcp_inet_input (data=0x2c3d350, event=> optimized out>) at drivers/common/inet_drv.c:9326 >> #8 tcp_inet_drv_input (data=0x2c3d350, event=) >> at drivers/common/inet_drv.c:9604 >> #9 0x00000000004b770f in erts_port_task_execute (runq=0x7f83e9d5d3c0, >> curr_port_pp=0x7f83e8dc6e78) at beam/erl_port_task.c:851 >> #10 0x00000000004afd83 in schedule (p=, >> calls=) at beam/erl_process.c:6533 >> #11 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >> #12 0x00000000004b1279 in sched_thread_func (vesdp=0x7f83e8dc6dc0) at >> beam/erl_process.c:4834 >> #13 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at >> pthread/ethread.c:106 >> #14 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >> #15 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >> (gdb) >> >> (gdb) bt >> #0 0x00007f83ea6da54d in read () from /lib64/libpthread.so.0 >> #1 0x0000000000554b6e in signal_dispatcher_thread_func (unused=> optimized out>) at sys/unix/sys.c:2776 >> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266c80) at >> pthread/ethread.c:106 >> #3 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >> (gdb) >> >> (gdb) bt >> #0 0x00007f83ea215ae9 in syscall () from /lib64/libc.so.6 >> #1 0x00000000005bba35 in wait__ (e=0x2989390) at >> pthread/ethr_event.c:92 >> #2 ethr_event_wait (e=0x2989390) at pthread/ethr_event.c:218 >> #3 0x00000000004ae5bd in erts_tse_wait (fcalls=, >> esdp=0x7f83e8e2c440, rq=0x7f83e9d5e7c0) at beam/erl_threads.h:2319 >> #4 scheduler_wait (fcalls=, esdp=0x7f83e8e2c440, >> rq=0x7f83e9d5e7c0) at beam/erl_process.c:2087 >> #5 0x00000000004afb94 in schedule (p=, >> calls=) at beam/erl_process.c:6467 >> #6 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >> #7 0x00000000004b1279 in sched_thread_func (vesdp=0x7f83e8e2c440) at >> beam/erl_process.c:4834 >> #8 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at >> pthread/ethread.c:106 >> #9 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >> #10 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >> (gdb) >> >> >> (gdb) bt >> #0 0x00007f83ea6db09d in waitpid () from /lib64/libpthread.so.0 >> #1 0x0000000000555a9f in child_waiter (unused=) >> at sys/unix/sys.c:2700 >> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266d50) at >> pthread/ethread.c:106 >> #3 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >> (gdb) >> >> >> **** END UPDATE **** >> >> >> I'm happy to provide any information I can, so please don't hesitate >> to ask. >> >> Thanks in advance! >> >> Kind Regards, >> >> Peter Membrey >> > > From ess@REDACTED Fri Nov 23 15:22:18 2012 From: ess@REDACTED (=?ISO-8859-1?Q?Erik_S=F8e_S=F8rensen?=) Date: Fri, 23 Nov 2012 15:22:18 +0100 Subject: [erlang-bugs] VM locks up on write to socket (and now it seems to file too) In-Reply-To: <50AEC81B.2000908@ninenines.eu> References: <50AEC81B.2000908@ninenines.eu> Message-ID: <50AF869A.2010706@trifork.com> The strace is an interesting clue. If I am reading this right: writev(15, [{"", 2158022464}], 1) = 0 (the manual page for writev appears to support my reading), then this is a request to write a data chunk a tad larger than 2GB. (2158022464 = 0x80A0CF40). Is this intended? Does it sound right that a blob of this size should be sent? I imagine that it might give rise to problems if the length in some of the involved layers were interpreted as a 32-bit *signed* integer. And as far as I can tell, it is normal for ssize_t to be signed. Your system appears to be 64bit, though, judging from the "/lib64" path. Still, some lower layers might have problems. Going forward, a) is a write of this size expected? b) How does a plain C program behave if you call writev in that fashion? Answers to those questions could help isolating the problem. /Erik On 23-11-2012 01:49, Lo?c Hoguin wrote: > Sending this on behalf of someone who didn't manage to get the email > sent to this list after 2 attempts. If someone can check if he's hold up > or something that'd be great. > > Anyway he has a big issue so I hope I can relay the conversation reliably. > > Thanks! > > On 11/23/2012 01:45 AM, Peter Membrey wrote: >> From: Peter Membrey >> Date: 22 November 2012 19:02 >> Subject: VM locks up on write to socket (and now it seems to file too) >> To: erlang-bugs@REDACTED >> >> >> Hi guys, >> >> I wrote a simple database application called CakeDB >> (https://github.com/pmembrey/cakedb) that basically spends its time >> reading and writing files and sockets. There's very little in the way >> of complex logic. It is running on CentOS 6.3 with all the updates >> applied. I hit this problem on R15B02 so I rolled back to R15B01 but >> the issue remained. Erlang was built from source. >> >> The machine has two Intel X5690 CPUs giving 12 cores plus HT. I've >> tried various arguments for the VM but so far nothing has prevented >> the problem. At the moment I'm using: >> >> +K >> +A 6 >> +sbt tnnps >> >> The issue I'm seeing is that one of the scheduler threads will hit >> 100% cpu usage and the entire VM will become unresponsive. When this >> happens, I am not able to connect via the console with attach and >> entop is also unable to connect. I can still establish TCP connections >> to the application, but I never receive a response. A standard kill >> signal will cause the VM to shut down (it doesn't need -9). >> >> Due to the pedigree of the VM I am quite willing to accept that I've >> made a fundamental mistake in my code. I am pretty sure that the way I >> am doing the file IO could result in some race conditions. However, my >> poor code aside, from what I understand, I still shouldn't be able to >> crash / deadlock the VM like this. >> >> The issue doesn't seem to be caused by load. The app can fail when >> it's very busy, but also when it is practically idle. I haven't been >> able to find a trigger or any other explanation for the failure. >> >> The thread maxing out the CPU is attempting to write data to the socket: >> >> (gdb) bt >> #0 0x00007f9882ab6377 in writev () from /lib64/libc.so.6 >> #1 0x000000000058a81f in tcp_inet_output (data=0x2407570, >> event=) at drivers/common/inet_drv.c:9681 >> #2 tcp_inet_drv_output (data=0x2407570, event=) >> at drivers/common/inet_drv.c:9601 >> #3 0x00000000004b773f in erts_port_task_execute (runq=0x7f98826019c0, >> curr_port_pp=0x7f9881639338) at beam/erl_port_task.c:858 >> #4 0x00000000004afd83 in schedule (p=, >> calls=) at beam/erl_process.c:6533 >> #5 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >> #6 0x00000000004b1279 in sched_thread_func (vesdp=0x7f9881639280) at >> beam/erl_process.c:4834 >> #7 0x00000000005ba726 in thr_wrapper (vtwd=0x7fff6cfe2300) at >> pthread/ethread.c:106 >> #8 0x00007f9882f78851 in start_thread () from /lib64/libpthread.so.0 >> #9 0x00007f9882abe11d in clone () from /lib64/libc.so.6 >> (gdb) >> >> I then tried running strace on that thread and got (indefinitely): >> >> writev(15, [{"", 2158022464}], 1) = 0 >> writev(15, [{"", 2158022464}], 1) = 0 >> writev(15, [{"", 2158022464}], 1) = 0 >> writev(15, [{"", 2158022464}], 1) = 0 >> writev(15, [{"", 2158022464}], 1) = 0 >> writev(15, [{"", 2158022464}], 1) = 0 >> writev(15, [{"", 2158022464}], 1) = 0 >> writev(15, [{"", 2158022464}], 1) = 0 >> writev(15, [{"", 2158022464}], 1) = 0 >> writev(15, [{"", 2158022464}], 1) = 0 >> ... >> >> From what I can tell, it's trying to write data to a socket, which is >> succeeding, but writing 0 bytes. From the earlier definitions in the >> source file, an error condition would be signified by a negative >> number. Any other result is the number of bytes written, in this case >> 0. I'm not sure if this is desired behaviour or not. I've tried >> killing the application on the other end of the socket, but it has no >> effect on the VM. >> >> I have enabled debugging for the inet code, so hopefully this will >> give a little more insight. I am currently trying to reproduce the >> condition, but as I really have no idea what causes it, it's pretty >> much a case of wait and see. >> >> >> **** UPDATE **** >> >> I managed to lock up the VM again, but this time it was caused by file IO, >> probably from the debugging statements. Although it worked fine for some time >> the last entry in the file was cut off. >> >> From GDB: >> >> (gdb) info threads >> 53 Thread 0x7f83e988b700 (LWP 8621) 0x00007f83ea6da54d in read () >> from /lib64/libpthread.so.0 >> 52 Thread 0x7f83e8c8f700 (LWP 8622) 0x00007f83ea6d743c in >> pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >> 51 Thread 0x7f83e818d700 (LWP 8623) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 50 Thread 0x7f83e816b700 (LWP 8624) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 49 Thread 0x7f83e8149700 (LWP 8625) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 48 Thread 0x7f83e8127700 (LWP 8626) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 47 Thread 0x7f83e8105700 (LWP 8627) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 46 Thread 0x7f83e80e3700 (LWP 8628) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 45 Thread 0x7f83e80c1700 (LWP 8629) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 44 Thread 0x7f83e809f700 (LWP 8630) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 43 Thread 0x7f83e807d700 (LWP 8631) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 42 Thread 0x7f83e805b700 (LWP 8632) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 41 Thread 0x7f83e8039700 (LWP 8633) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 40 Thread 0x7f83e8017700 (LWP 8634) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 39 Thread 0x7f83e7ff5700 (LWP 8635) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 38 Thread 0x7f83e7fd3700 (LWP 8636) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 37 Thread 0x7f83e7fb1700 (LWP 8637) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 36 Thread 0x7f83e7f8f700 (LWP 8638) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 35 Thread 0x7f83e7f6d700 (LWP 8639) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 34 Thread 0x7f83e7f4b700 (LWP 8640) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 33 Thread 0x7f83e7f29700 (LWP 8641) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 32 Thread 0x7f83e7f07700 (LWP 8642) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 31 Thread 0x7f83e7ee5700 (LWP 8643) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 30 Thread 0x7f83e7ec3700 (LWP 8644) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 29 Thread 0x7f83e7ea1700 (LWP 8645) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 28 Thread 0x7f83e7e7f700 (LWP 8646) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 27 Thread 0x7f83d7c5a700 (LWP 8647) 0x00007f83ea6db09d in waitpid >> () from /lib64/libpthread.so.0 >> 26 Thread 0x7f83d7c53700 (LWP 8648) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 25 Thread 0x7f83d7252700 (LWP 8649) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 24 Thread 0x7f83d6851700 (LWP 8650) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 23 Thread 0x7f83d5e50700 (LWP 8651) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 22 Thread 0x7f83d544f700 (LWP 8652) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 21 Thread 0x7f83d4a4e700 (LWP 8653) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 20 Thread 0x7f83d404d700 (LWP 8654) 0x00007f83ea20be7d in write () >> from /lib64/libc.so.6 >> 19 Thread 0x7f83d364c700 (LWP 8655) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 18 Thread 0x7f83d2c4b700 (LWP 8656) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 17 Thread 0x7f83d224a700 (LWP 8657) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 16 Thread 0x7f83d1849700 (LWP 8658) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 15 Thread 0x7f83d0e48700 (LWP 8659) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 14 Thread 0x7f83d0447700 (LWP 8660) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 13 Thread 0x7f83cfa46700 (LWP 8661) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 12 Thread 0x7f83cf045700 (LWP 8662) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 11 Thread 0x7f83ce644700 (LWP 8663) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 10 Thread 0x7f83cdc43700 (LWP 8664) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 9 Thread 0x7f83cd242700 (LWP 8665) 0x00007f83ea215ae9 in syscall () >> from /lib64/libc.so.6 >> 8 Thread 0x7f83cc841700 (LWP 8666) 0x00007f83ea215ae9 in syscall () >> from /lib64/libc.so.6 >> 7 Thread 0x7f83cbe40700 (LWP 8667) 0x00007f83ea215ae9 in syscall () >> from /lib64/libc.so.6 >> 6 Thread 0x7f83cb43f700 (LWP 8668) 0x00007f83ea215ae9 in syscall () >> from /lib64/libc.so.6 >> 5 Thread 0x7f83caa3e700 (LWP 8669) 0x00007f83ea215ae9 in syscall () >> from /lib64/libc.so.6 >> 4 Thread 0x7f83ca03d700 (LWP 8670) 0x00007f83ea215ae9 in syscall () >> from /lib64/libc.so.6 >> 3 Thread 0x7f83c963c700 (LWP 8671) 0x00007f83ea215ae9 in syscall () >> from /lib64/libc.so.6 >> 2 Thread 0x7f83c8c3b700 (LWP 8672) 0x00007f83ea215ae9 in syscall () >> from /lib64/libc.so.6 >> * 1 Thread 0x7f83eb3a8700 (LWP 8597) 0x00007f83ea211d03 in select () >> from /lib64/libc.so.6 >> (gdb) >> >> >> (gdb) bt >> #0 0x00007f83ea20be7d in write () from /lib64/libc.so.6 >> #1 0x00007f83ea1a2583 in _IO_new_file_write () from /lib64/libc.so.6 >> #2 0x00007f83ea1a3b35 in _IO_new_do_write () from /lib64/libc.so.6 >> #3 0x00007f83ea1a21fd in _IO_new_file_xsputn () from /lib64/libc.so.6 >> #4 0x00007f83ea17589d in vfprintf () from /lib64/libc.so.6 >> #5 0x00007f83ea18003a in printf () from /lib64/libc.so.6 >> #6 0x000000000058f0e8 in tcp_recv (desc=0x2c3d350, request_len=0) at >> drivers/common/inet_drv.c:8976 >> #7 0x000000000058f63a in tcp_inet_input (data=0x2c3d350, event=> optimized out>) at drivers/common/inet_drv.c:9326 >> #8 tcp_inet_drv_input (data=0x2c3d350, event=) >> at drivers/common/inet_drv.c:9604 >> #9 0x00000000004b770f in erts_port_task_execute (runq=0x7f83e9d5d3c0, >> curr_port_pp=0x7f83e8dc6e78) at beam/erl_port_task.c:851 >> #10 0x00000000004afd83 in schedule (p=, >> calls=) at beam/erl_process.c:6533 >> #11 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >> #12 0x00000000004b1279 in sched_thread_func (vesdp=0x7f83e8dc6dc0) at >> beam/erl_process.c:4834 >> #13 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at >> pthread/ethread.c:106 >> #14 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >> #15 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >> (gdb) >> >> (gdb) bt >> #0 0x00007f83ea6da54d in read () from /lib64/libpthread.so.0 >> #1 0x0000000000554b6e in signal_dispatcher_thread_func (unused=> optimized out>) at sys/unix/sys.c:2776 >> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266c80) at >> pthread/ethread.c:106 >> #3 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >> (gdb) >> >> (gdb) bt >> #0 0x00007f83ea215ae9 in syscall () from /lib64/libc.so.6 >> #1 0x00000000005bba35 in wait__ (e=0x2989390) at >> pthread/ethr_event.c:92 >> #2 ethr_event_wait (e=0x2989390) at pthread/ethr_event.c:218 >> #3 0x00000000004ae5bd in erts_tse_wait (fcalls=, >> esdp=0x7f83e8e2c440, rq=0x7f83e9d5e7c0) at beam/erl_threads.h:2319 >> #4 scheduler_wait (fcalls=, esdp=0x7f83e8e2c440, >> rq=0x7f83e9d5e7c0) at beam/erl_process.c:2087 >> #5 0x00000000004afb94 in schedule (p=, >> calls=) at beam/erl_process.c:6467 >> #6 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >> #7 0x00000000004b1279 in sched_thread_func (vesdp=0x7f83e8e2c440) at >> beam/erl_process.c:4834 >> #8 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at >> pthread/ethread.c:106 >> #9 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >> #10 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >> (gdb) >> >> >> (gdb) bt >> #0 0x00007f83ea6db09d in waitpid () from /lib64/libpthread.so.0 >> #1 0x0000000000555a9f in child_waiter (unused=) >> at sys/unix/sys.c:2700 >> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266d50) at >> pthread/ethread.c:106 >> #3 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >> (gdb) >> >> >> **** END UPDATE **** >> >> >> I'm happy to provide any information I can, so please don't hesitate to ask. >> >> Thanks in advance! >> >> Kind Regards, >> >> Peter Membrey >> > > -- > Lo?c Hoguin > Erlang Cowboy > Nine Nines > http://ninenines.eu > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs -- Mobile: + 45 26 36 17 55 | Skype: eriksoesorensen | Twitter: @eriksoe Trifork A/S | Margrethepladsen 4 | DK-8000 Aarhus C | www.trifork.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From n.oxyde@REDACTED Fri Nov 23 16:20:00 2012 From: n.oxyde@REDACTED (Anthony Ramine) Date: Fri, 23 Nov 2012 16:20:00 +0100 Subject: [erlang-bugs] R16 EUnit *** context setup failed *** In-Reply-To: References: Message-ID: <85BE9BC9-731C-4DAE-BD47-A33FE7B30AD5@gmail.com> I can reproduce this with the exact same steps. -- Anthony Ramine Le 13 nov. 2012 ? 14:18, Tuncer Ayaz a ?crit : > If I build and run rebar's EUnit tests with R16, I see "*** context > setup failed ***" errors. > > Interestingly, if I use a tree where ebin/*, rebar, and .eunit/*.beam > have been built with R15B02, then neither R16 nor R15 throw the > context setup errors. > > # fetch and build rebar > $ git clone git://github.com/rebar/rebar.git > $ cd rebar > $ make > # run tests with new rebar binary > $ ./rebar eunit > # *** context setup failed *** errors and 50 out of 72 tests > # not executed > $ rm ebin/* .eunit/* rebar > # rebuild rebar and EUnit tests with R15 > $ make && ./rebar eunit > # running EUnit tests with either R15 or R16 works now > > Can anyone else reproduce this? Could this be caused by compiler > changes in R16? I don't see any changes in lib/eunit after R15B02. From pan@REDACTED Fri Nov 23 17:13:17 2012 From: pan@REDACTED (Patrik Nyblom) Date: Fri, 23 Nov 2012 17:13:17 +0100 Subject: [erlang-bugs] VM locks up on write to socket (and now it seems to file too) In-Reply-To: <50AEC81B.2000908@ninenines.eu> References: <50AEC81B.2000908@ninenines.eu> Message-ID: <50AFA09D.4060100@erlang.org> Hi again! Could you go back to the version without the printouts and get back to the situation where writev loops returning 0 (as in the strace)? If so, it would be really interesting to see an 'lsof' of the beam process, to see if this file descriptor really is open and is a socket... The thing is that writev with a vector that is not empty, would never return 0 for a non blocking socket. Not on any modern (i.e. not ancient) POSIX compliant system anyway. Of course it is a *really* large item you are trying to write there, but it should be no problem for a 64bit linux. Also I think there is no use finding the Erlang code, I'll take that back, It would be more interesting to see what really happens at the OS/VM level in this case. Cheers, Patrik On 11/23/2012 01:49 AM, Lo?c Hoguin wrote: > Sending this on behalf of someone who didn't manage to get the email > sent to this list after 2 attempts. If someone can check if he's hold > up or something that'd be great. > > Anyway he has a big issue so I hope I can relay the conversation > reliably. > > Thanks! > > On 11/23/2012 01:45 AM, Peter Membrey wrote: >> From: Peter Membrey >> Date: 22 November 2012 19:02 >> Subject: VM locks up on write to socket (and now it seems to file too) >> To: erlang-bugs@REDACTED >> >> >> Hi guys, >> >> I wrote a simple database application called CakeDB >> (https://github.com/pmembrey/cakedb) that basically spends its time >> reading and writing files and sockets. There's very little in the way >> of complex logic. It is running on CentOS 6.3 with all the updates >> applied. I hit this problem on R15B02 so I rolled back to R15B01 but >> the issue remained. Erlang was built from source. >> >> The machine has two Intel X5690 CPUs giving 12 cores plus HT. I've >> tried various arguments for the VM but so far nothing has prevented >> the problem. At the moment I'm using: >> >> +K >> +A 6 >> +sbt tnnps >> >> The issue I'm seeing is that one of the scheduler threads will hit >> 100% cpu usage and the entire VM will become unresponsive. When this >> happens, I am not able to connect via the console with attach and >> entop is also unable to connect. I can still establish TCP connections >> to the application, but I never receive a response. A standard kill >> signal will cause the VM to shut down (it doesn't need -9). >> >> Due to the pedigree of the VM I am quite willing to accept that I've >> made a fundamental mistake in my code. I am pretty sure that the way I >> am doing the file IO could result in some race conditions. However, my >> poor code aside, from what I understand, I still shouldn't be able to >> crash / deadlock the VM like this. >> >> The issue doesn't seem to be caused by load. The app can fail when >> it's very busy, but also when it is practically idle. I haven't been >> able to find a trigger or any other explanation for the failure. >> >> The thread maxing out the CPU is attempting to write data to the socket: >> >> (gdb) bt >> #0 0x00007f9882ab6377 in writev () from /lib64/libc.so.6 >> #1 0x000000000058a81f in tcp_inet_output (data=0x2407570, >> event=) at drivers/common/inet_drv.c:9681 >> #2 tcp_inet_drv_output (data=0x2407570, event=) >> at drivers/common/inet_drv.c:9601 >> #3 0x00000000004b773f in erts_port_task_execute (runq=0x7f98826019c0, >> curr_port_pp=0x7f9881639338) at beam/erl_port_task.c:858 >> #4 0x00000000004afd83 in schedule (p=, >> calls=) at beam/erl_process.c:6533 >> #5 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >> #6 0x00000000004b1279 in sched_thread_func (vesdp=0x7f9881639280) at >> beam/erl_process.c:4834 >> #7 0x00000000005ba726 in thr_wrapper (vtwd=0x7fff6cfe2300) at >> pthread/ethread.c:106 >> #8 0x00007f9882f78851 in start_thread () from /lib64/libpthread.so.0 >> #9 0x00007f9882abe11d in clone () from /lib64/libc.so.6 >> (gdb) >> >> I then tried running strace on that thread and got (indefinitely): >> >> writev(15, [{"", 2158022464}], 1) = 0 >> writev(15, [{"", 2158022464}], 1) = 0 >> writev(15, [{"", 2158022464}], 1) = 0 >> writev(15, [{"", 2158022464}], 1) = 0 >> writev(15, [{"", 2158022464}], 1) = 0 >> writev(15, [{"", 2158022464}], 1) = 0 >> writev(15, [{"", 2158022464}], 1) = 0 >> writev(15, [{"", 2158022464}], 1) = 0 >> writev(15, [{"", 2158022464}], 1) = 0 >> writev(15, [{"", 2158022464}], 1) = 0 >> ... >> >> From what I can tell, it's trying to write data to a socket, which is >> succeeding, but writing 0 bytes. From the earlier definitions in the >> source file, an error condition would be signified by a negative >> number. Any other result is the number of bytes written, in this case >> 0. I'm not sure if this is desired behaviour or not. I've tried >> killing the application on the other end of the socket, but it has no >> effect on the VM. >> >> I have enabled debugging for the inet code, so hopefully this will >> give a little more insight. I am currently trying to reproduce the >> condition, but as I really have no idea what causes it, it's pretty >> much a case of wait and see. >> >> >> **** UPDATE **** >> >> I managed to lock up the VM again, but this time it was caused by >> file IO, >> probably from the debugging statements. Although it worked fine for >> some time >> the last entry in the file was cut off. >> >> From GDB: >> >> (gdb) info threads >> 53 Thread 0x7f83e988b700 (LWP 8621) 0x00007f83ea6da54d in read () >> from /lib64/libpthread.so.0 >> 52 Thread 0x7f83e8c8f700 (LWP 8622) 0x00007f83ea6d743c in >> pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >> 51 Thread 0x7f83e818d700 (LWP 8623) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 50 Thread 0x7f83e816b700 (LWP 8624) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 49 Thread 0x7f83e8149700 (LWP 8625) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 48 Thread 0x7f83e8127700 (LWP 8626) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 47 Thread 0x7f83e8105700 (LWP 8627) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 46 Thread 0x7f83e80e3700 (LWP 8628) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 45 Thread 0x7f83e80c1700 (LWP 8629) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 44 Thread 0x7f83e809f700 (LWP 8630) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 43 Thread 0x7f83e807d700 (LWP 8631) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 42 Thread 0x7f83e805b700 (LWP 8632) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 41 Thread 0x7f83e8039700 (LWP 8633) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 40 Thread 0x7f83e8017700 (LWP 8634) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 39 Thread 0x7f83e7ff5700 (LWP 8635) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 38 Thread 0x7f83e7fd3700 (LWP 8636) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 37 Thread 0x7f83e7fb1700 (LWP 8637) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 36 Thread 0x7f83e7f8f700 (LWP 8638) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 35 Thread 0x7f83e7f6d700 (LWP 8639) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 34 Thread 0x7f83e7f4b700 (LWP 8640) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 33 Thread 0x7f83e7f29700 (LWP 8641) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 32 Thread 0x7f83e7f07700 (LWP 8642) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 31 Thread 0x7f83e7ee5700 (LWP 8643) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 30 Thread 0x7f83e7ec3700 (LWP 8644) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 29 Thread 0x7f83e7ea1700 (LWP 8645) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 28 Thread 0x7f83e7e7f700 (LWP 8646) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 27 Thread 0x7f83d7c5a700 (LWP 8647) 0x00007f83ea6db09d in waitpid >> () from /lib64/libpthread.so.0 >> 26 Thread 0x7f83d7c53700 (LWP 8648) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 25 Thread 0x7f83d7252700 (LWP 8649) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 24 Thread 0x7f83d6851700 (LWP 8650) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 23 Thread 0x7f83d5e50700 (LWP 8651) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 22 Thread 0x7f83d544f700 (LWP 8652) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 21 Thread 0x7f83d4a4e700 (LWP 8653) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 20 Thread 0x7f83d404d700 (LWP 8654) 0x00007f83ea20be7d in write () >> from /lib64/libc.so.6 >> 19 Thread 0x7f83d364c700 (LWP 8655) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 18 Thread 0x7f83d2c4b700 (LWP 8656) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 17 Thread 0x7f83d224a700 (LWP 8657) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 16 Thread 0x7f83d1849700 (LWP 8658) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 15 Thread 0x7f83d0e48700 (LWP 8659) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 14 Thread 0x7f83d0447700 (LWP 8660) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 13 Thread 0x7f83cfa46700 (LWP 8661) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 12 Thread 0x7f83cf045700 (LWP 8662) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 11 Thread 0x7f83ce644700 (LWP 8663) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 10 Thread 0x7f83cdc43700 (LWP 8664) 0x00007f83ea215ae9 in syscall >> () from /lib64/libc.so.6 >> 9 Thread 0x7f83cd242700 (LWP 8665) 0x00007f83ea215ae9 in syscall () >> from /lib64/libc.so.6 >> 8 Thread 0x7f83cc841700 (LWP 8666) 0x00007f83ea215ae9 in syscall () >> from /lib64/libc.so.6 >> 7 Thread 0x7f83cbe40700 (LWP 8667) 0x00007f83ea215ae9 in syscall () >> from /lib64/libc.so.6 >> 6 Thread 0x7f83cb43f700 (LWP 8668) 0x00007f83ea215ae9 in syscall () >> from /lib64/libc.so.6 >> 5 Thread 0x7f83caa3e700 (LWP 8669) 0x00007f83ea215ae9 in syscall () >> from /lib64/libc.so.6 >> 4 Thread 0x7f83ca03d700 (LWP 8670) 0x00007f83ea215ae9 in syscall () >> from /lib64/libc.so.6 >> 3 Thread 0x7f83c963c700 (LWP 8671) 0x00007f83ea215ae9 in syscall () >> from /lib64/libc.so.6 >> 2 Thread 0x7f83c8c3b700 (LWP 8672) 0x00007f83ea215ae9 in syscall () >> from /lib64/libc.so.6 >> * 1 Thread 0x7f83eb3a8700 (LWP 8597) 0x00007f83ea211d03 in select () >> from /lib64/libc.so.6 >> (gdb) >> >> >> (gdb) bt >> #0 0x00007f83ea20be7d in write () from /lib64/libc.so.6 >> #1 0x00007f83ea1a2583 in _IO_new_file_write () from /lib64/libc.so.6 >> #2 0x00007f83ea1a3b35 in _IO_new_do_write () from /lib64/libc.so.6 >> #3 0x00007f83ea1a21fd in _IO_new_file_xsputn () from /lib64/libc.so.6 >> #4 0x00007f83ea17589d in vfprintf () from /lib64/libc.so.6 >> #5 0x00007f83ea18003a in printf () from /lib64/libc.so.6 >> #6 0x000000000058f0e8 in tcp_recv (desc=0x2c3d350, request_len=0) at >> drivers/common/inet_drv.c:8976 >> #7 0x000000000058f63a in tcp_inet_input (data=0x2c3d350, event=> optimized out>) at drivers/common/inet_drv.c:9326 >> #8 tcp_inet_drv_input (data=0x2c3d350, event=) >> at drivers/common/inet_drv.c:9604 >> #9 0x00000000004b770f in erts_port_task_execute (runq=0x7f83e9d5d3c0, >> curr_port_pp=0x7f83e8dc6e78) at beam/erl_port_task.c:851 >> #10 0x00000000004afd83 in schedule (p=, >> calls=) at beam/erl_process.c:6533 >> #11 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >> #12 0x00000000004b1279 in sched_thread_func (vesdp=0x7f83e8dc6dc0) at >> beam/erl_process.c:4834 >> #13 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at >> pthread/ethread.c:106 >> #14 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >> #15 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >> (gdb) >> >> (gdb) bt >> #0 0x00007f83ea6da54d in read () from /lib64/libpthread.so.0 >> #1 0x0000000000554b6e in signal_dispatcher_thread_func (unused=> optimized out>) at sys/unix/sys.c:2776 >> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266c80) at >> pthread/ethread.c:106 >> #3 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >> (gdb) >> >> (gdb) bt >> #0 0x00007f83ea215ae9 in syscall () from /lib64/libc.so.6 >> #1 0x00000000005bba35 in wait__ (e=0x2989390) at >> pthread/ethr_event.c:92 >> #2 ethr_event_wait (e=0x2989390) at pthread/ethr_event.c:218 >> #3 0x00000000004ae5bd in erts_tse_wait (fcalls=, >> esdp=0x7f83e8e2c440, rq=0x7f83e9d5e7c0) at beam/erl_threads.h:2319 >> #4 scheduler_wait (fcalls=, esdp=0x7f83e8e2c440, >> rq=0x7f83e9d5e7c0) at beam/erl_process.c:2087 >> #5 0x00000000004afb94 in schedule (p=, >> calls=) at beam/erl_process.c:6467 >> #6 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >> #7 0x00000000004b1279 in sched_thread_func (vesdp=0x7f83e8e2c440) at >> beam/erl_process.c:4834 >> #8 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at >> pthread/ethread.c:106 >> #9 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >> #10 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >> (gdb) >> >> >> (gdb) bt >> #0 0x00007f83ea6db09d in waitpid () from /lib64/libpthread.so.0 >> #1 0x0000000000555a9f in child_waiter (unused=) >> at sys/unix/sys.c:2700 >> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266d50) at >> pthread/ethread.c:106 >> #3 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >> (gdb) >> >> >> **** END UPDATE **** >> >> >> I'm happy to provide any information I can, so please don't hesitate >> to ask. >> >> Thanks in advance! >> >> Kind Regards, >> >> Peter Membrey >> > > From jose.valim@REDACTED Sun Nov 25 20:16:16 2012 From: jose.valim@REDACTED (=?ISO-8859-1?Q?Jos=E9_Valim?=) Date: Sun, 25 Nov 2012 20:16:16 +0100 Subject: [erlang-bugs] R15B02 HiPE can't compile modules with on_load attribute Message-ID: HiPE can't compile a module with on_load attribute. This sample module fails: -module(foo). -on_load(do_nothing/0). %% Exporting the function doesn't affect the outcome %% -exports([do_nothing/0]). do_nothing() -> ok. When compiled via command line or via compile:forms. A snippet of the error message is: =ERROR REPORT==== 25-Nov-2012::18:48:29 === Error in process <0.88.0> with exit value: {{badmatch,{'EXIT',{{hipe_beam_to_icode,1103,{'trans_fun/2',on_load}},[{hipe_beam_to_icode,trans_fun,2,[{file,"hipe_beam_to_icode.erl"},{line,1103}]},{hipe_beam_to_icode,trans_fun,2,[{file,"hipe_beam_to_icode.erl"},{line,253}]},{hipe_beam_to_icode... I have put the full stack in a gist: https://gist.github.com/9ae21eb51928d7de5f23 Using R15B02 on Mac OS X Snow Leopard installed from homebrew. Let me know if you need more information, thanks! *Jos? Valim* www.plataformatec.com.br Skype: jv.ptec Founder and Lead Developer -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmembrey@REDACTED Mon Nov 26 12:35:52 2012 From: pmembrey@REDACTED (Peter Membrey) Date: Mon, 26 Nov 2012 19:35:52 +0800 Subject: [erlang-bugs] Fwd: VM locks up on write to socket (and now it seems to file too) In-Reply-To: References: <50AEC81B.2000908@ninenines.eu> <50AFA09D.4060100@erlang.org> Message-ID: Hi all, Trying to send again under a new account... Cheers, Pete ---------- Forwarded message ---------- From: Peter Membrey Date: 24 November 2012 21:57 Subject: Re: [erlang-bugs] VM locks up on write to socket (and now it seems to file too) To: Patrik Nyblom Cc: erlang-bugs@REDACTED Hi guys, Thanks for getting back in touch so quickly! I did do an lsof on the process and I can confirm that it was definitely a socket. However by that time the application it had been trying to send to had been killed. When I checked the sockets were showing as waiting to close. Unfortunately I didn't think to do an lsof until after the apps had been shut down. I was hoping the VM would recover if I killed the app that had upset it. However even after all the apps connected had been shut down, the issue didn't resolve. The application receives requests from a client, which contains two data items. The stream ID and a timestamp. Both are encoded as big integer unsigned numbers. The server then looks through the file referenced by the stream ID and uses the timestamp as an index. The file format is currently really simple, in the form of: > There is an index file that provides an offset into the file based on time stamp, but basically it opens the file, and reads sequentially through it until it finds the timestamps that it cares about. In this case it reads all data with a greater timestamp until the end of the file is reached. It's possible the client is sending an incorrect timestamp, and maybe too much data is being read. However the loop is very primitive - it reads all the data in one go before passing it back to the protocol handler to send down the socket; so by that time even though the response is technically incorrect and the app has failed, it should still not cause the VM any issues. The data is polled every 10 seconds by the client app so I would not expect there to be 2GB of new data to send. I'm afraid my C skills are somewhat limited, so I'm not sure how to put together a sample app to try out writev. The platform is 64bit CentOS 6.3 (equivalent to RHEL 6.3) so I'm not expecting any strange or weird behaviour from the OS level but of course I could be completely wrong there. The OS is running directly on hardware, so there's no VM layer to worry about. Hope this might offer some additional clues? Thanks again! Kind Regards, Peter Membrey On 24 November 2012 00:13, Patrik Nyblom wrote: > Hi again! > > Could you go back to the version without the printouts and get back to the > situation where writev loops returning 0 (as in the strace)? If so, it would > be really interesting to see an 'lsof' of the beam process, to see if this > file descriptor really is open and is a socket... > > The thing is that writev with a vector that is not empty, would never return > 0 for a non blocking socket. Not on any modern (i.e. not ancient) POSIX > compliant system anyway. Of course it is a *really* large item you are > trying to write there, but it should be no problem for a 64bit linux. > > Also I think there is no use finding the Erlang code, I'll take that back, > It would be more interesting to see what really happens at the OS/VM level > in this case. > > Cheers, > Patrik > > > On 11/23/2012 01:49 AM, Lo?c Hoguin wrote: >> >> Sending this on behalf of someone who didn't manage to get the email sent >> to this list after 2 attempts. If someone can check if he's hold up or >> something that'd be great. >> >> Anyway he has a big issue so I hope I can relay the conversation reliably. >> >> Thanks! >> >> On 11/23/2012 01:45 AM, Peter Membrey wrote: >>> >>> From: Peter Membrey >>> Date: 22 November 2012 19:02 >>> Subject: VM locks up on write to socket (and now it seems to file too) >>> To: erlang-bugs@REDACTED >>> >>> >>> Hi guys, >>> >>> I wrote a simple database application called CakeDB >>> (https://github.com/pmembrey/cakedb) that basically spends its time >>> reading and writing files and sockets. There's very little in the way >>> of complex logic. It is running on CentOS 6.3 with all the updates >>> applied. I hit this problem on R15B02 so I rolled back to R15B01 but >>> the issue remained. Erlang was built from source. >>> >>> The machine has two Intel X5690 CPUs giving 12 cores plus HT. I've >>> tried various arguments for the VM but so far nothing has prevented >>> the problem. At the moment I'm using: >>> >>> +K >>> +A 6 >>> +sbt tnnps >>> >>> The issue I'm seeing is that one of the scheduler threads will hit >>> 100% cpu usage and the entire VM will become unresponsive. When this >>> happens, I am not able to connect via the console with attach and >>> entop is also unable to connect. I can still establish TCP connections >>> to the application, but I never receive a response. A standard kill >>> signal will cause the VM to shut down (it doesn't need -9). >>> >>> Due to the pedigree of the VM I am quite willing to accept that I've >>> made a fundamental mistake in my code. I am pretty sure that the way I >>> am doing the file IO could result in some race conditions. However, my >>> poor code aside, from what I understand, I still shouldn't be able to >>> crash / deadlock the VM like this. >>> >>> The issue doesn't seem to be caused by load. The app can fail when >>> it's very busy, but also when it is practically idle. I haven't been >>> able to find a trigger or any other explanation for the failure. >>> >>> The thread maxing out the CPU is attempting to write data to the socket: >>> >>> (gdb) bt >>> #0 0x00007f9882ab6377 in writev () from /lib64/libc.so.6 >>> #1 0x000000000058a81f in tcp_inet_output (data=0x2407570, >>> event=) at drivers/common/inet_drv.c:9681 >>> #2 tcp_inet_drv_output (data=0x2407570, event=) >>> at drivers/common/inet_drv.c:9601 >>> #3 0x00000000004b773f in erts_port_task_execute (runq=0x7f98826019c0, >>> curr_port_pp=0x7f9881639338) at beam/erl_port_task.c:858 >>> #4 0x00000000004afd83 in schedule (p=, >>> calls=) at beam/erl_process.c:6533 >>> #5 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >>> #6 0x00000000004b1279 in sched_thread_func (vesdp=0x7f9881639280) at >>> beam/erl_process.c:4834 >>> #7 0x00000000005ba726 in thr_wrapper (vtwd=0x7fff6cfe2300) at >>> pthread/ethread.c:106 >>> #8 0x00007f9882f78851 in start_thread () from /lib64/libpthread.so.0 >>> #9 0x00007f9882abe11d in clone () from /lib64/libc.so.6 >>> (gdb) >>> >>> I then tried running strace on that thread and got (indefinitely): >>> >>> writev(15, [{"", 2158022464}], 1) = 0 >>> writev(15, [{"", 2158022464}], 1) = 0 >>> writev(15, [{"", 2158022464}], 1) = 0 >>> writev(15, [{"", 2158022464}], 1) = 0 >>> writev(15, [{"", 2158022464}], 1) = 0 >>> writev(15, [{"", 2158022464}], 1) = 0 >>> writev(15, [{"", 2158022464}], 1) = 0 >>> writev(15, [{"", 2158022464}], 1) = 0 >>> writev(15, [{"", 2158022464}], 1) = 0 >>> writev(15, [{"", 2158022464}], 1) = 0 >>> ... >>> >>> From what I can tell, it's trying to write data to a socket, which is >>> succeeding, but writing 0 bytes. From the earlier definitions in the >>> source file, an error condition would be signified by a negative >>> number. Any other result is the number of bytes written, in this case >>> 0. I'm not sure if this is desired behaviour or not. I've tried >>> killing the application on the other end of the socket, but it has no >>> effect on the VM. >>> >>> I have enabled debugging for the inet code, so hopefully this will >>> give a little more insight. I am currently trying to reproduce the >>> condition, but as I really have no idea what causes it, it's pretty >>> much a case of wait and see. >>> >>> >>> **** UPDATE **** >>> >>> I managed to lock up the VM again, but this time it was caused by file >>> IO, >>> probably from the debugging statements. Although it worked fine for some >>> time >>> the last entry in the file was cut off. >>> >>> From GDB: >>> >>> (gdb) info threads >>> 53 Thread 0x7f83e988b700 (LWP 8621) 0x00007f83ea6da54d in read () >>> from /lib64/libpthread.so.0 >>> 52 Thread 0x7f83e8c8f700 (LWP 8622) 0x00007f83ea6d743c in >>> pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >>> 51 Thread 0x7f83e818d700 (LWP 8623) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 50 Thread 0x7f83e816b700 (LWP 8624) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 49 Thread 0x7f83e8149700 (LWP 8625) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 48 Thread 0x7f83e8127700 (LWP 8626) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 47 Thread 0x7f83e8105700 (LWP 8627) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 46 Thread 0x7f83e80e3700 (LWP 8628) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 45 Thread 0x7f83e80c1700 (LWP 8629) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 44 Thread 0x7f83e809f700 (LWP 8630) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 43 Thread 0x7f83e807d700 (LWP 8631) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 42 Thread 0x7f83e805b700 (LWP 8632) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 41 Thread 0x7f83e8039700 (LWP 8633) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 40 Thread 0x7f83e8017700 (LWP 8634) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 39 Thread 0x7f83e7ff5700 (LWP 8635) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 38 Thread 0x7f83e7fd3700 (LWP 8636) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 37 Thread 0x7f83e7fb1700 (LWP 8637) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 36 Thread 0x7f83e7f8f700 (LWP 8638) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 35 Thread 0x7f83e7f6d700 (LWP 8639) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 34 Thread 0x7f83e7f4b700 (LWP 8640) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 33 Thread 0x7f83e7f29700 (LWP 8641) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 32 Thread 0x7f83e7f07700 (LWP 8642) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 31 Thread 0x7f83e7ee5700 (LWP 8643) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 30 Thread 0x7f83e7ec3700 (LWP 8644) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 29 Thread 0x7f83e7ea1700 (LWP 8645) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 28 Thread 0x7f83e7e7f700 (LWP 8646) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 27 Thread 0x7f83d7c5a700 (LWP 8647) 0x00007f83ea6db09d in waitpid >>> () from /lib64/libpthread.so.0 >>> 26 Thread 0x7f83d7c53700 (LWP 8648) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 25 Thread 0x7f83d7252700 (LWP 8649) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 24 Thread 0x7f83d6851700 (LWP 8650) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 23 Thread 0x7f83d5e50700 (LWP 8651) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 22 Thread 0x7f83d544f700 (LWP 8652) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 21 Thread 0x7f83d4a4e700 (LWP 8653) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 20 Thread 0x7f83d404d700 (LWP 8654) 0x00007f83ea20be7d in write () >>> from /lib64/libc.so.6 >>> 19 Thread 0x7f83d364c700 (LWP 8655) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 18 Thread 0x7f83d2c4b700 (LWP 8656) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 17 Thread 0x7f83d224a700 (LWP 8657) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 16 Thread 0x7f83d1849700 (LWP 8658) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 15 Thread 0x7f83d0e48700 (LWP 8659) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 14 Thread 0x7f83d0447700 (LWP 8660) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 13 Thread 0x7f83cfa46700 (LWP 8661) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 12 Thread 0x7f83cf045700 (LWP 8662) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 11 Thread 0x7f83ce644700 (LWP 8663) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 10 Thread 0x7f83cdc43700 (LWP 8664) 0x00007f83ea215ae9 in syscall >>> () from /lib64/libc.so.6 >>> 9 Thread 0x7f83cd242700 (LWP 8665) 0x00007f83ea215ae9 in syscall () >>> from /lib64/libc.so.6 >>> 8 Thread 0x7f83cc841700 (LWP 8666) 0x00007f83ea215ae9 in syscall () >>> from /lib64/libc.so.6 >>> 7 Thread 0x7f83cbe40700 (LWP 8667) 0x00007f83ea215ae9 in syscall () >>> from /lib64/libc.so.6 >>> 6 Thread 0x7f83cb43f700 (LWP 8668) 0x00007f83ea215ae9 in syscall () >>> from /lib64/libc.so.6 >>> 5 Thread 0x7f83caa3e700 (LWP 8669) 0x00007f83ea215ae9 in syscall () >>> from /lib64/libc.so.6 >>> 4 Thread 0x7f83ca03d700 (LWP 8670) 0x00007f83ea215ae9 in syscall () >>> from /lib64/libc.so.6 >>> 3 Thread 0x7f83c963c700 (LWP 8671) 0x00007f83ea215ae9 in syscall () >>> from /lib64/libc.so.6 >>> 2 Thread 0x7f83c8c3b700 (LWP 8672) 0x00007f83ea215ae9 in syscall () >>> from /lib64/libc.so.6 >>> * 1 Thread 0x7f83eb3a8700 (LWP 8597) 0x00007f83ea211d03 in select () >>> from /lib64/libc.so.6 >>> (gdb) >>> >>> >>> (gdb) bt >>> #0 0x00007f83ea20be7d in write () from /lib64/libc.so.6 >>> #1 0x00007f83ea1a2583 in _IO_new_file_write () from /lib64/libc.so.6 >>> #2 0x00007f83ea1a3b35 in _IO_new_do_write () from /lib64/libc.so.6 >>> #3 0x00007f83ea1a21fd in _IO_new_file_xsputn () from /lib64/libc.so.6 >>> #4 0x00007f83ea17589d in vfprintf () from /lib64/libc.so.6 >>> #5 0x00007f83ea18003a in printf () from /lib64/libc.so.6 >>> #6 0x000000000058f0e8 in tcp_recv (desc=0x2c3d350, request_len=0) at >>> drivers/common/inet_drv.c:8976 >>> #7 0x000000000058f63a in tcp_inet_input (data=0x2c3d350, event=>> optimized out>) at drivers/common/inet_drv.c:9326 >>> #8 tcp_inet_drv_input (data=0x2c3d350, event=) >>> at drivers/common/inet_drv.c:9604 >>> #9 0x00000000004b770f in erts_port_task_execute (runq=0x7f83e9d5d3c0, >>> curr_port_pp=0x7f83e8dc6e78) at beam/erl_port_task.c:851 >>> #10 0x00000000004afd83 in schedule (p=, >>> calls=) at beam/erl_process.c:6533 >>> #11 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >>> #12 0x00000000004b1279 in sched_thread_func (vesdp=0x7f83e8dc6dc0) at >>> beam/erl_process.c:4834 >>> #13 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at >>> pthread/ethread.c:106 >>> #14 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >>> #15 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>> (gdb) >>> >>> (gdb) bt >>> #0 0x00007f83ea6da54d in read () from /lib64/libpthread.so.0 >>> #1 0x0000000000554b6e in signal_dispatcher_thread_func (unused=>> optimized out>) at sys/unix/sys.c:2776 >>> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266c80) at >>> pthread/ethread.c:106 >>> #3 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >>> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>> (gdb) >>> >>> (gdb) bt >>> #0 0x00007f83ea215ae9 in syscall () from /lib64/libc.so.6 >>> #1 0x00000000005bba35 in wait__ (e=0x2989390) at >>> pthread/ethr_event.c:92 >>> #2 ethr_event_wait (e=0x2989390) at pthread/ethr_event.c:218 >>> #3 0x00000000004ae5bd in erts_tse_wait (fcalls=, >>> esdp=0x7f83e8e2c440, rq=0x7f83e9d5e7c0) at beam/erl_threads.h:2319 >>> #4 scheduler_wait (fcalls=, esdp=0x7f83e8e2c440, >>> rq=0x7f83e9d5e7c0) at beam/erl_process.c:2087 >>> #5 0x00000000004afb94 in schedule (p=, >>> calls=) at beam/erl_process.c:6467 >>> #6 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >>> #7 0x00000000004b1279 in sched_thread_func (vesdp=0x7f83e8e2c440) at >>> beam/erl_process.c:4834 >>> #8 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at >>> pthread/ethread.c:106 >>> #9 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >>> #10 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>> (gdb) >>> >>> >>> (gdb) bt >>> #0 0x00007f83ea6db09d in waitpid () from /lib64/libpthread.so.0 >>> #1 0x0000000000555a9f in child_waiter (unused=) >>> at sys/unix/sys.c:2700 >>> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266d50) at >>> pthread/ethread.c:106 >>> #3 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >>> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>> (gdb) >>> >>> >>> **** END UPDATE **** >>> >>> >>> I'm happy to provide any information I can, so please don't hesitate to >>> ask. >>> >>> Thanks in advance! >>> >>> Kind Regards, >>> >>> Peter Membrey >>> >> >> > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs -------------- next part -------------- An HTML attachment was scrubbed... URL: From ess@REDACTED Mon Nov 26 13:22:16 2012 From: ess@REDACTED (=?windows-1252?Q?Erik_S=F8e_S=F8rensen?=) Date: Mon, 26 Nov 2012 13:22:16 +0100 Subject: [erlang-bugs] Fwd: VM locks up on write to socket (and now it seems to file too) In-Reply-To: References: <50AEC81B.2000908@ninenines.eu> <50AFA09D.4060100@erlang.org> Message-ID: <50B35EF8.30402@trifork.com> Suggestions for things to look at: - See what data size is sent, as seen from the Erlang side. Is the 2GB number correct? - Verify endian-ness of the timestamps and data lengths you read from the file. "native"-endian may be correct, but is a bit of a funny thing to have in your file format. A mistake here may well cause your program to write more data than you intended. As for how writev handles large values, my quick test on 64-bit Ubuntu shows that (on a non-socket file descriptor) it returns 2147479552=0x7FFFF000 for an input size of 2158022464 - i.e, it does return something reasonable and positive, but writes less than 2GB. That doesn't necessarily say anything about how the behaviour is on a closed socket on CentOS, of course. /Erik On 26-11-2012 12:35, Peter Membrey wrote: > Hi all, > > Trying to send again under a new account... > > Cheers, > > Pete > > ---------- Forwarded message ---------- > From: *Peter Membrey* > > Date: 24 November 2012 21:57 > Subject: Re: [erlang-bugs] VM locks up on write to socket (and now it > seems to file too) > To: Patrik Nyblom > > Cc: erlang-bugs@REDACTED > > > Hi guys, > > Thanks for getting back in touch so quickly! > > I did do an lsof on the process and I can confirm that it was > definitely a socket. However by that time the application it had been > trying to send to had been killed. When I checked the sockets were > showing as waiting to close. Unfortunately I didn't think to do an > lsof until after the apps had been shut down. I was hoping the VM > would recover if I killed the app that had upset it. However even > after all the apps connected had been shut down, the issue didn't > resolve. > > The application receives requests from a client, which contains two > data items. The stream ID and a timestamp. Both are encoded as big > integer unsigned numbers. The server then looks through the file > referenced by the stream ID and uses the timestamp as an index. The > file format is currently really simple, in the form of: > > > > > There is an index file that provides an offset into the file based on > time stamp, but basically it opens the file, and reads sequentially > through it until it finds the timestamps that it cares about. In this > case it reads all data with a greater timestamp until the end of the > file is reached. It's possible the client is sending an incorrect > timestamp, and maybe too much data is being read. However the loop is > very primitive - it reads all the data in one go before passing it > back to the protocol handler to send down the socket; so by that time > even though the response is technically incorrect and the app has > failed, it should still not cause the VM any issues. > > The data is polled every 10 seconds by the client app so I would not > expect there to be 2GB of new data to send. I'm afraid my C skills are > somewhat limited, so I'm not sure how to put together a sample app to > try out writev. The platform is 64bit CentOS 6.3 (equivalent to RHEL > 6.3) so I'm not expecting any strange or weird behaviour from the OS > level but of course I could be completely wrong there. The OS is > running directly on hardware, so there's no VM layer to worry about. > > Hope this might offer some additional clues? > > Thanks again! > > Kind Regards, > > Peter Membrey > > > > On 24 November 2012 00:13, Patrik Nyblom > wrote: > > Hi again! > > > > Could you go back to the version without the printouts and get back > to the > > situation where writev loops returning 0 (as in the strace)? If so, > it would > > be really interesting to see an 'lsof' of the beam process, to see > if this > > file descriptor really is open and is a socket... > > > > The thing is that writev with a vector that is not empty, would > never return > > 0 for a non blocking socket. Not on any modern (i.e. not ancient) POSIX > > compliant system anyway. Of course it is a *really* large item you are > > trying to write there, but it should be no problem for a 64bit linux. > > > > Also I think there is no use finding the Erlang code, I'll take that > back, > > It would be more interesting to see what really happens at the OS/VM > level > > in this case. > > > > Cheers, > > Patrik > > > > > > On 11/23/2012 01:49 AM, Lo?c Hoguin wrote: > >> > >> Sending this on behalf of someone who didn't manage to get the > email sent > >> to this list after 2 attempts. If someone can check if he's hold up or > >> something that'd be great. > >> > >> Anyway he has a big issue so I hope I can relay the conversation > reliably. > >> > >> Thanks! > >> > >> On 11/23/2012 01:45 AM, Peter Membrey wrote: > >>> > >>> From: Peter Membrey > > >>> Date: 22 November 2012 19:02 > >>> Subject: VM locks up on write to socket (and now it seems to file too) > >>> To: erlang-bugs@REDACTED > >>> > >>> > >>> Hi guys, > >>> > >>> I wrote a simple database application called CakeDB > >>> (https://github.com/pmembrey/cakedb) that basically spends its time > >>> reading and writing files and sockets. There's very little in the way > >>> of complex logic. It is running on CentOS 6.3 with all the updates > >>> applied. I hit this problem on R15B02 so I rolled back to R15B01 but > >>> the issue remained. Erlang was built from source. > >>> > >>> The machine has two Intel X5690 CPUs giving 12 cores plus HT. I've > >>> tried various arguments for the VM but so far nothing has prevented > >>> the problem. At the moment I'm using: > >>> > >>> +K > >>> +A 6 > >>> +sbt tnnps > >>> > >>> The issue I'm seeing is that one of the scheduler threads will hit > >>> 100% cpu usage and the entire VM will become unresponsive. When this > >>> happens, I am not able to connect via the console with attach and > >>> entop is also unable to connect. I can still establish TCP connections > >>> to the application, but I never receive a response. A standard kill > >>> signal will cause the VM to shut down (it doesn't need -9). > >>> > >>> Due to the pedigree of the VM I am quite willing to accept that I've > >>> made a fundamental mistake in my code. I am pretty sure that the way I > >>> am doing the file IO could result in some race conditions. However, my > >>> poor code aside, from what I understand, I still shouldn't be able to > >>> crash / deadlock the VM like this. > >>> > >>> The issue doesn't seem to be caused by load. The app can fail when > >>> it's very busy, but also when it is practically idle. I haven't been > >>> able to find a trigger or any other explanation for the failure. > >>> > >>> The thread maxing out the CPU is attempting to write data to the > socket: > >>> > >>> (gdb) bt > >>> #0 0x00007f9882ab6377 in writev () from /lib64/libc.so.6 > >>> #1 0x000000000058a81f in tcp_inet_output (data=0x2407570, > >>> event=) at drivers/common/inet_drv.c:9681 > >>> #2 tcp_inet_drv_output (data=0x2407570, event=) > >>> at drivers/common/inet_drv.c:9601 > >>> #3 0x00000000004b773f in erts_port_task_execute (runq=0x7f98826019c0, > >>> curr_port_pp=0x7f9881639338) at beam/erl_port_task.c:858 > >>> #4 0x00000000004afd83 in schedule (p=, > >>> calls=) at beam/erl_process.c:6533 > >>> #5 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 > >>> #6 0x00000000004b1279 in sched_thread_func (vesdp=0x7f9881639280) at > >>> beam/erl_process.c:4834 > >>> #7 0x00000000005ba726 in thr_wrapper (vtwd=0x7fff6cfe2300) at > >>> pthread/ethread.c:106 > >>> #8 0x00007f9882f78851 in start_thread () from /lib64/libpthread.so.0 > >>> #9 0x00007f9882abe11d in clone () from /lib64/libc.so.6 > >>> (gdb) > >>> > >>> I then tried running strace on that thread and got (indefinitely): > >>> > >>> writev(15, [{"", 2158022464}], 1) = 0 > >>> writev(15, [{"", 2158022464}], 1) = 0 > >>> writev(15, [{"", 2158022464}], 1) = 0 > >>> writev(15, [{"", 2158022464}], 1) = 0 > >>> writev(15, [{"", 2158022464}], 1) = 0 > >>> writev(15, [{"", 2158022464}], 1) = 0 > >>> writev(15, [{"", 2158022464}], 1) = 0 > >>> writev(15, [{"", 2158022464}], 1) = 0 > >>> writev(15, [{"", 2158022464}], 1) = 0 > >>> writev(15, [{"", 2158022464}], 1) = 0 > >>> ... > >>> > >>> From what I can tell, it's trying to write data to a socket, which is > >>> succeeding, but writing 0 bytes. From the earlier definitions in the > >>> source file, an error condition would be signified by a negative > >>> number. Any other result is the number of bytes written, in this case > >>> 0. I'm not sure if this is desired behaviour or not. I've tried > >>> killing the application on the other end of the socket, but it has no > >>> effect on the VM. > >>> > >>> I have enabled debugging for the inet code, so hopefully this will > >>> give a little more insight. I am currently trying to reproduce the > >>> condition, but as I really have no idea what causes it, it's pretty > >>> much a case of wait and see. > >>> > >>> > >>> **** UPDATE **** > >>> > >>> I managed to lock up the VM again, but this time it was caused by file > >>> IO, > >>> probably from the debugging statements. Although it worked fine > for some > >>> time > >>> the last entry in the file was cut off. > >>> > >>> From GDB: > >>> > >>> (gdb) info threads > >>> 53 Thread 0x7f83e988b700 (LWP 8621) 0x00007f83ea6da54d in read () > >>> from /lib64/libpthread.so.0 > >>> 52 Thread 0x7f83e8c8f700 (LWP 8622) 0x00007f83ea6d743c in > >>> pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > >>> 51 Thread 0x7f83e818d700 (LWP 8623) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 50 Thread 0x7f83e816b700 (LWP 8624) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 49 Thread 0x7f83e8149700 (LWP 8625) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 48 Thread 0x7f83e8127700 (LWP 8626) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 47 Thread 0x7f83e8105700 (LWP 8627) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 46 Thread 0x7f83e80e3700 (LWP 8628) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 45 Thread 0x7f83e80c1700 (LWP 8629) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 44 Thread 0x7f83e809f700 (LWP 8630) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 43 Thread 0x7f83e807d700 (LWP 8631) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 42 Thread 0x7f83e805b700 (LWP 8632) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 41 Thread 0x7f83e8039700 (LWP 8633) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 40 Thread 0x7f83e8017700 (LWP 8634) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 39 Thread 0x7f83e7ff5700 (LWP 8635) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 38 Thread 0x7f83e7fd3700 (LWP 8636) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 37 Thread 0x7f83e7fb1700 (LWP 8637) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 36 Thread 0x7f83e7f8f700 (LWP 8638) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 35 Thread 0x7f83e7f6d700 (LWP 8639) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 34 Thread 0x7f83e7f4b700 (LWP 8640) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 33 Thread 0x7f83e7f29700 (LWP 8641) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 32 Thread 0x7f83e7f07700 (LWP 8642) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 31 Thread 0x7f83e7ee5700 (LWP 8643) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 30 Thread 0x7f83e7ec3700 (LWP 8644) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 29 Thread 0x7f83e7ea1700 (LWP 8645) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 28 Thread 0x7f83e7e7f700 (LWP 8646) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 27 Thread 0x7f83d7c5a700 (LWP 8647) 0x00007f83ea6db09d in waitpid > >>> () from /lib64/libpthread.so.0 > >>> 26 Thread 0x7f83d7c53700 (LWP 8648) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 25 Thread 0x7f83d7252700 (LWP 8649) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 24 Thread 0x7f83d6851700 (LWP 8650) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 23 Thread 0x7f83d5e50700 (LWP 8651) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 22 Thread 0x7f83d544f700 (LWP 8652) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 21 Thread 0x7f83d4a4e700 (LWP 8653) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 20 Thread 0x7f83d404d700 (LWP 8654) 0x00007f83ea20be7d in write () > >>> from /lib64/libc.so.6 > >>> 19 Thread 0x7f83d364c700 (LWP 8655) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 18 Thread 0x7f83d2c4b700 (LWP 8656) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 17 Thread 0x7f83d224a700 (LWP 8657) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 16 Thread 0x7f83d1849700 (LWP 8658) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 15 Thread 0x7f83d0e48700 (LWP 8659) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 14 Thread 0x7f83d0447700 (LWP 8660) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 13 Thread 0x7f83cfa46700 (LWP 8661) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 12 Thread 0x7f83cf045700 (LWP 8662) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 11 Thread 0x7f83ce644700 (LWP 8663) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 10 Thread 0x7f83cdc43700 (LWP 8664) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 9 Thread 0x7f83cd242700 (LWP 8665) 0x00007f83ea215ae9 in > syscall () > >>> from /lib64/libc.so.6 > >>> 8 Thread 0x7f83cc841700 (LWP 8666) 0x00007f83ea215ae9 in > syscall () > >>> from /lib64/libc.so.6 > >>> 7 Thread 0x7f83cbe40700 (LWP 8667) 0x00007f83ea215ae9 in > syscall () > >>> from /lib64/libc.so.6 > >>> 6 Thread 0x7f83cb43f700 (LWP 8668) 0x00007f83ea215ae9 in > syscall () > >>> from /lib64/libc.so.6 > >>> 5 Thread 0x7f83caa3e700 (LWP 8669) 0x00007f83ea215ae9 in > syscall () > >>> from /lib64/libc.so.6 > >>> 4 Thread 0x7f83ca03d700 (LWP 8670) 0x00007f83ea215ae9 in > syscall () > >>> from /lib64/libc.so.6 > >>> 3 Thread 0x7f83c963c700 (LWP 8671) 0x00007f83ea215ae9 in > syscall () > >>> from /lib64/libc.so.6 > >>> 2 Thread 0x7f83c8c3b700 (LWP 8672) 0x00007f83ea215ae9 in > syscall () > >>> from /lib64/libc.so.6 > >>> * 1 Thread 0x7f83eb3a8700 (LWP 8597) 0x00007f83ea211d03 in select () > >>> from /lib64/libc.so.6 > >>> (gdb) > >>> > >>> > >>> (gdb) bt > >>> #0 0x00007f83ea20be7d in write () from /lib64/libc.so.6 > >>> #1 0x00007f83ea1a2583 in _IO_new_file_write () from /lib64/libc.so.6 > >>> #2 0x00007f83ea1a3b35 in _IO_new_do_write () from /lib64/libc.so.6 > >>> #3 0x00007f83ea1a21fd in _IO_new_file_xsputn () from /lib64/libc.so.6 > >>> #4 0x00007f83ea17589d in vfprintf () from /lib64/libc.so.6 > >>> #5 0x00007f83ea18003a in printf () from /lib64/libc.so.6 > >>> #6 0x000000000058f0e8 in tcp_recv (desc=0x2c3d350, request_len=0) at > >>> drivers/common/inet_drv.c:8976 > >>> #7 0x000000000058f63a in tcp_inet_input (data=0x2c3d350, event= >>> optimized out>) at drivers/common/inet_drv.c:9326 > >>> #8 tcp_inet_drv_input (data=0x2c3d350, event=) > >>> at drivers/common/inet_drv.c:9604 > >>> #9 0x00000000004b770f in erts_port_task_execute (runq=0x7f83e9d5d3c0, > >>> curr_port_pp=0x7f83e8dc6e78) at beam/erl_port_task.c:851 > >>> #10 0x00000000004afd83 in schedule (p=, > >>> calls=) at beam/erl_process.c:6533 > >>> #11 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 > >>> #12 0x00000000004b1279 in sched_thread_func (vesdp=0x7f83e8dc6dc0) at > >>> beam/erl_process.c:4834 > >>> #13 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at > >>> pthread/ethread.c:106 > >>> #14 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 > >>> #15 0x00007f83ea21911d in clone () from /lib64/libc.so.6 > >>> (gdb) > >>> > >>> (gdb) bt > >>> #0 0x00007f83ea6da54d in read () from /lib64/libpthread.so.0 > >>> #1 0x0000000000554b6e in signal_dispatcher_thread_func (unused= >>> optimized out>) at sys/unix/sys.c:2776 > >>> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266c80) at > >>> pthread/ethread.c:106 > >>> #3 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 > >>> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 > >>> (gdb) > >>> > >>> (gdb) bt > >>> #0 0x00007f83ea215ae9 in syscall () from /lib64/libc.so.6 > >>> #1 0x00000000005bba35 in wait__ (e=0x2989390) at > >>> pthread/ethr_event.c:92 > >>> #2 ethr_event_wait (e=0x2989390) at pthread/ethr_event.c:218 > >>> #3 0x00000000004ae5bd in erts_tse_wait (fcalls=, > >>> esdp=0x7f83e8e2c440, rq=0x7f83e9d5e7c0) at beam/erl_threads.h:2319 > >>> #4 scheduler_wait (fcalls=, esdp=0x7f83e8e2c440, > >>> rq=0x7f83e9d5e7c0) at beam/erl_process.c:2087 > >>> #5 0x00000000004afb94 in schedule (p=, > >>> calls=) at beam/erl_process.c:6467 > >>> #6 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 > >>> #7 0x00000000004b1279 in sched_thread_func (vesdp=0x7f83e8e2c440) at > >>> beam/erl_process.c:4834 > >>> #8 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at > >>> pthread/ethread.c:106 > >>> #9 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 > >>> #10 0x00007f83ea21911d in clone () from /lib64/libc.so.6 > >>> (gdb) > >>> > >>> > >>> (gdb) bt > >>> #0 0x00007f83ea6db09d in waitpid () from /lib64/libpthread.so.0 > >>> #1 0x0000000000555a9f in child_waiter (unused=) > >>> at sys/unix/sys.c:2700 > >>> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266d50) at > >>> pthread/ethread.c:106 > >>> #3 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 > >>> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 > >>> (gdb) > >>> > >>> > >>> **** END UPDATE **** > >>> > >>> > >>> I'm happy to provide any information I can, so please don't > hesitate to > >>> ask. > >>> > >>> Thanks in advance! > >>> > >>> Kind Regards, > >>> > >>> Peter Membrey > >>> > >> > >> > > > > _______________________________________________ > > erlang-bugs mailing list > > erlang-bugs@REDACTED > > http://erlang.org/mailman/listinfo/erlang-bugs > -- Mobile: + 45 26 36 17 55 | Skype: eriksoesorensen | Twitter: @eriksoe Trifork A/S | Margrethepladsen 4 | DK-8000 Aarhus C | www.trifork.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmembrey@REDACTED Mon Nov 26 13:30:52 2012 From: pmembrey@REDACTED (Peter Membrey) Date: Mon, 26 Nov 2012 20:30:52 +0800 Subject: [erlang-bugs] Fwd: VM locks up on write to socket (and now it seems to file too) In-Reply-To: <50B35EF8.30402@trifork.com> References: <50AEC81B.2000908@ninenines.eu> <50AFA09D.4060100@erlang.org> <50B35EF8.30402@trifork.com> Message-ID: Hi Erik, I am writing a little test app to go through and verify the data file. 2GB seems fairly high - unless something else strange is going on... The reason for using native in the file format was to be specific about which endianness to store the data. The data is sent over the wire in bid-engian format though. For the simple format, what would you recommend I use to encode the numbers rather than native? Any chance you could send me a copy of that program? I'll run the tests on CentOS... Thanks again for your help! Cheers, Pete On 26 November 2012 20:22, Erik S?e S?rensen wrote: > Suggestions for things to look at: > - See what data size is sent, as seen from the Erlang side. Is the 2GB > number correct? > - Verify endian-ness of the timestamps and data lengths you read from the > file. "native"-endian may be correct, but is a bit of a funny thing to have > in your file format. A mistake here may well cause your program to write > more data than you intended. > > As for how writev handles large values, my quick test on 64-bit Ubuntu > shows that (on a non-socket file descriptor) it returns 2147479552=0x7FFFF000 > for an input size of 2158022464 - i.e, it does return something > reasonable and positive, but writes less than 2GB. > That doesn't necessarily say anything about how the behaviour is on a > closed socket on CentOS, of course. > > /Erik > > > On 26-11-2012 12:35, Peter Membrey wrote: > > Hi all, > > Trying to send again under a new account... > > Cheers, > > Pete > > ---------- Forwarded message ---------- > From: Peter Membrey > Date: 24 November 2012 21:57 > Subject: Re: [erlang-bugs] VM locks up on write to socket (and now it > seems to file too) > To: Patrik Nyblom > Cc: erlang-bugs@REDACTED > > > Hi guys, > > Thanks for getting back in touch so quickly! > > I did do an lsof on the process and I can confirm that it was > definitely a socket. However by that time the application it had been > trying to send to had been killed. When I checked the sockets were > showing as waiting to close. Unfortunately I didn't think to do an > lsof until after the apps had been shut down. I was hoping the VM > would recover if I killed the app that had upset it. However even > after all the apps connected had been shut down, the issue didn't > resolve. > > The application receives requests from a client, which contains two > data items. The stream ID and a timestamp. Both are encoded as big > integer unsigned numbers. The server then looks through the file > referenced by the stream ID and uses the timestamp as an index. The > file format is currently really simple, in the form of: > > > > > There is an index file that provides an offset into the file based on > time stamp, but basically it opens the file, and reads sequentially > through it until it finds the timestamps that it cares about. In this > case it reads all data with a greater timestamp until the end of the > file is reached. It's possible the client is sending an incorrect > timestamp, and maybe too much data is being read. However the loop is > very primitive - it reads all the data in one go before passing it > back to the protocol handler to send down the socket; so by that time > even though the response is technically incorrect and the app has > failed, it should still not cause the VM any issues. > > The data is polled every 10 seconds by the client app so I would not > expect there to be 2GB of new data to send. I'm afraid my C skills are > somewhat limited, so I'm not sure how to put together a sample app to > try out writev. The platform is 64bit CentOS 6.3 (equivalent to RHEL > 6.3) so I'm not expecting any strange or weird behaviour from the OS > level but of course I could be completely wrong there. The OS is > running directly on hardware, so there's no VM layer to worry about. > > Hope this might offer some additional clues? > > Thanks again! > > Kind Regards, > > Peter Membrey > > > > On 24 November 2012 00:13, Patrik Nyblom wrote: > > Hi again! > > > > Could you go back to the version without the printouts and get back to > the > > situation where writev loops returning 0 (as in the strace)? If so, it > would > > be really interesting to see an 'lsof' of the beam process, to see if > this > > file descriptor really is open and is a socket... > > > > The thing is that writev with a vector that is not empty, would never > return > > 0 for a non blocking socket. Not on any modern (i.e. not ancient) POSIX > > compliant system anyway. Of course it is a *really* large item you are > > trying to write there, but it should be no problem for a 64bit linux. > > > > Also I think there is no use finding the Erlang code, I'll take that > back, > > It would be more interesting to see what really happens at the OS/VM > level > > in this case. > > > > Cheers, > > Patrik > > > > > > On 11/23/2012 01:49 AM, Lo?c Hoguin wrote: > >> > >> Sending this on behalf of someone who didn't manage to get the email > sent > >> to this list after 2 attempts. If someone can check if he's hold up or > >> something that'd be great. > >> > >> Anyway he has a big issue so I hope I can relay the conversation > reliably. > >> > >> Thanks! > >> > >> On 11/23/2012 01:45 AM, Peter Membrey wrote: > >>> > >>> From: Peter Membrey > >>> Date: 22 November 2012 19:02 > >>> Subject: VM locks up on write to socket (and now it seems to file too) > >>> To: erlang-bugs@REDACTED > >>> > >>> > >>> Hi guys, > >>> > >>> I wrote a simple database application called CakeDB > >>> (https://github.com/pmembrey/cakedb) that basically spends its time > >>> reading and writing files and sockets. There's very little in the way > >>> of complex logic. It is running on CentOS 6.3 with all the updates > >>> applied. I hit this problem on R15B02 so I rolled back to R15B01 but > >>> the issue remained. Erlang was built from source. > >>> > >>> The machine has two Intel X5690 CPUs giving 12 cores plus HT. I've > >>> tried various arguments for the VM but so far nothing has prevented > >>> the problem. At the moment I'm using: > >>> > >>> +K > >>> +A 6 > >>> +sbt tnnps > >>> > >>> The issue I'm seeing is that one of the scheduler threads will hit > >>> 100% cpu usage and the entire VM will become unresponsive. When this > >>> happens, I am not able to connect via the console with attach and > >>> entop is also unable to connect. I can still establish TCP connections > >>> to the application, but I never receive a response. A standard kill > >>> signal will cause the VM to shut down (it doesn't need -9). > >>> > >>> Due to the pedigree of the VM I am quite willing to accept that I've > >>> made a fundamental mistake in my code. I am pretty sure that the way I > >>> am doing the file IO could result in some race conditions. However, my > >>> poor code aside, from what I understand, I still shouldn't be able to > >>> crash / deadlock the VM like this. > >>> > >>> The issue doesn't seem to be caused by load. The app can fail when > >>> it's very busy, but also when it is practically idle. I haven't been > >>> able to find a trigger or any other explanation for the failure. > >>> > >>> The thread maxing out the CPU is attempting to write data to the > socket: > >>> > >>> (gdb) bt > >>> #0 0x00007f9882ab6377 in writev () from /lib64/libc.so.6 > >>> #1 0x000000000058a81f in tcp_inet_output (data=0x2407570, > >>> event=) at drivers/common/inet_drv.c:9681 > >>> #2 tcp_inet_drv_output (data=0x2407570, event=) > >>> at drivers/common/inet_drv.c:9601 > >>> #3 0x00000000004b773f in erts_port_task_execute (runq=0x7f98826019c0, > >>> curr_port_pp=0x7f9881639338) at beam/erl_port_task.c:858 > >>> #4 0x00000000004afd83 in schedule (p=, > >>> calls=) at beam/erl_process.c:6533 > >>> #5 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 > >>> #6 0x00000000004b1279 in sched_thread_func (vesdp=0x7f9881639280) at > >>> beam/erl_process.c:4834 > >>> #7 0x00000000005ba726 in thr_wrapper (vtwd=0x7fff6cfe2300) at > >>> pthread/ethread.c:106 > >>> #8 0x00007f9882f78851 in start_thread () from /lib64/libpthread.so.0 > >>> #9 0x00007f9882abe11d in clone () from /lib64/libc.so.6 > >>> (gdb) > >>> > >>> I then tried running strace on that thread and got (indefinitely): > >>> > >>> writev(15, [{"", 2158022464}], 1) = 0 > >>> writev(15, [{"", 2158022464}], 1) = 0 > >>> writev(15, [{"", 2158022464}], 1) = 0 > >>> writev(15, [{"", 2158022464}], 1) = 0 > >>> writev(15, [{"", 2158022464}], 1) = 0 > >>> writev(15, [{"", 2158022464}], 1) = 0 > >>> writev(15, [{"", 2158022464}], 1) = 0 > >>> writev(15, [{"", 2158022464}], 1) = 0 > >>> writev(15, [{"", 2158022464}], 1) = 0 > >>> writev(15, [{"", 2158022464}], 1) = 0 > >>> ... > >>> > >>> From what I can tell, it's trying to write data to a socket, which is > >>> succeeding, but writing 0 bytes. From the earlier definitions in the > >>> source file, an error condition would be signified by a negative > >>> number. Any other result is the number of bytes written, in this case > >>> 0. I'm not sure if this is desired behaviour or not. I've tried > >>> killing the application on the other end of the socket, but it has no > >>> effect on the VM. > >>> > >>> I have enabled debugging for the inet code, so hopefully this will > >>> give a little more insight. I am currently trying to reproduce the > >>> condition, but as I really have no idea what causes it, it's pretty > >>> much a case of wait and see. > >>> > >>> > >>> **** UPDATE **** > >>> > >>> I managed to lock up the VM again, but this time it was caused by file > >>> IO, > >>> probably from the debugging statements. Although it worked fine for > some > >>> time > >>> the last entry in the file was cut off. > >>> > >>> From GDB: > >>> > >>> (gdb) info threads > >>> 53 Thread 0x7f83e988b700 (LWP 8621) 0x00007f83ea6da54d in read () > >>> from /lib64/libpthread.so.0 > >>> 52 Thread 0x7f83e8c8f700 (LWP 8622) 0x00007f83ea6d743c in > >>> pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > >>> 51 Thread 0x7f83e818d700 (LWP 8623) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 50 Thread 0x7f83e816b700 (LWP 8624) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 49 Thread 0x7f83e8149700 (LWP 8625) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 48 Thread 0x7f83e8127700 (LWP 8626) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 47 Thread 0x7f83e8105700 (LWP 8627) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 46 Thread 0x7f83e80e3700 (LWP 8628) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 45 Thread 0x7f83e80c1700 (LWP 8629) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 44 Thread 0x7f83e809f700 (LWP 8630) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 43 Thread 0x7f83e807d700 (LWP 8631) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 42 Thread 0x7f83e805b700 (LWP 8632) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 41 Thread 0x7f83e8039700 (LWP 8633) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 40 Thread 0x7f83e8017700 (LWP 8634) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 39 Thread 0x7f83e7ff5700 (LWP 8635) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 38 Thread 0x7f83e7fd3700 (LWP 8636) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 37 Thread 0x7f83e7fb1700 (LWP 8637) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 36 Thread 0x7f83e7f8f700 (LWP 8638) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 35 Thread 0x7f83e7f6d700 (LWP 8639) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 34 Thread 0x7f83e7f4b700 (LWP 8640) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 33 Thread 0x7f83e7f29700 (LWP 8641) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 32 Thread 0x7f83e7f07700 (LWP 8642) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 31 Thread 0x7f83e7ee5700 (LWP 8643) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 30 Thread 0x7f83e7ec3700 (LWP 8644) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 29 Thread 0x7f83e7ea1700 (LWP 8645) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 28 Thread 0x7f83e7e7f700 (LWP 8646) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 27 Thread 0x7f83d7c5a700 (LWP 8647) 0x00007f83ea6db09d in waitpid > >>> () from /lib64/libpthread.so.0 > >>> 26 Thread 0x7f83d7c53700 (LWP 8648) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 25 Thread 0x7f83d7252700 (LWP 8649) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 24 Thread 0x7f83d6851700 (LWP 8650) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 23 Thread 0x7f83d5e50700 (LWP 8651) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 22 Thread 0x7f83d544f700 (LWP 8652) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 21 Thread 0x7f83d4a4e700 (LWP 8653) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 20 Thread 0x7f83d404d700 (LWP 8654) 0x00007f83ea20be7d in write () > >>> from /lib64/libc.so.6 > >>> 19 Thread 0x7f83d364c700 (LWP 8655) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 18 Thread 0x7f83d2c4b700 (LWP 8656) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 17 Thread 0x7f83d224a700 (LWP 8657) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 16 Thread 0x7f83d1849700 (LWP 8658) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 15 Thread 0x7f83d0e48700 (LWP 8659) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 14 Thread 0x7f83d0447700 (LWP 8660) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 13 Thread 0x7f83cfa46700 (LWP 8661) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 12 Thread 0x7f83cf045700 (LWP 8662) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 11 Thread 0x7f83ce644700 (LWP 8663) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 10 Thread 0x7f83cdc43700 (LWP 8664) 0x00007f83ea215ae9 in syscall > >>> () from /lib64/libc.so.6 > >>> 9 Thread 0x7f83cd242700 (LWP 8665) 0x00007f83ea215ae9 in syscall () > >>> from /lib64/libc.so.6 > >>> 8 Thread 0x7f83cc841700 (LWP 8666) 0x00007f83ea215ae9 in syscall () > >>> from /lib64/libc.so.6 > >>> 7 Thread 0x7f83cbe40700 (LWP 8667) 0x00007f83ea215ae9 in syscall () > >>> from /lib64/libc.so.6 > >>> 6 Thread 0x7f83cb43f700 (LWP 8668) 0x00007f83ea215ae9 in syscall () > >>> from /lib64/libc.so.6 > >>> 5 Thread 0x7f83caa3e700 (LWP 8669) 0x00007f83ea215ae9 in syscall () > >>> from /lib64/libc.so.6 > >>> 4 Thread 0x7f83ca03d700 (LWP 8670) 0x00007f83ea215ae9 in syscall () > >>> from /lib64/libc.so.6 > >>> 3 Thread 0x7f83c963c700 (LWP 8671) 0x00007f83ea215ae9 in syscall () > >>> from /lib64/libc.so.6 > >>> 2 Thread 0x7f83c8c3b700 (LWP 8672) 0x00007f83ea215ae9 in syscall () > >>> from /lib64/libc.so.6 > >>> * 1 Thread 0x7f83eb3a8700 (LWP 8597) 0x00007f83ea211d03 in select () > >>> from /lib64/libc.so.6 > >>> (gdb) > >>> > >>> > >>> (gdb) bt > >>> #0 0x00007f83ea20be7d in write () from /lib64/libc.so.6 > >>> #1 0x00007f83ea1a2583 in _IO_new_file_write () from /lib64/libc.so.6 > >>> #2 0x00007f83ea1a3b35 in _IO_new_do_write () from /lib64/libc.so.6 > >>> #3 0x00007f83ea1a21fd in _IO_new_file_xsputn () from /lib64/libc.so.6 > >>> #4 0x00007f83ea17589d in vfprintf () from /lib64/libc.so.6 > >>> #5 0x00007f83ea18003a in printf () from /lib64/libc.so.6 > >>> #6 0x000000000058f0e8 in tcp_recv (desc=0x2c3d350, request_len=0) at > >>> drivers/common/inet_drv.c:8976 > >>> #7 0x000000000058f63a in tcp_inet_input (data=0x2c3d350, event= >>> optimized out>) at drivers/common/inet_drv.c:9326 > >>> #8 tcp_inet_drv_input (data=0x2c3d350, event=) > >>> at drivers/common/inet_drv.c:9604 > >>> #9 0x00000000004b770f in erts_port_task_execute (runq=0x7f83e9d5d3c0, > >>> curr_port_pp=0x7f83e8dc6e78) at beam/erl_port_task.c:851 > >>> #10 0x00000000004afd83 in schedule (p=, > >>> calls=) at beam/erl_process.c:6533 > >>> #11 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 > >>> #12 0x00000000004b1279 in sched_thread_func (vesdp=0x7f83e8dc6dc0) at > >>> beam/erl_process.c:4834 > >>> #13 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at > >>> pthread/ethread.c:106 > >>> #14 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 > >>> #15 0x00007f83ea21911d in clone () from /lib64/libc.so.6 > >>> (gdb) > >>> > >>> (gdb) bt > >>> #0 0x00007f83ea6da54d in read () from /lib64/libpthread.so.0 > >>> #1 0x0000000000554b6e in signal_dispatcher_thread_func (unused= >>> optimized out>) at sys/unix/sys.c:2776 > >>> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266c80) at > >>> pthread/ethread.c:106 > >>> #3 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 > >>> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 > >>> (gdb) > >>> > >>> (gdb) bt > >>> #0 0x00007f83ea215ae9 in syscall () from /lib64/libc.so.6 > >>> #1 0x00000000005bba35 in wait__ (e=0x2989390) at > >>> pthread/ethr_event.c:92 > >>> #2 ethr_event_wait (e=0x2989390) at pthread/ethr_event.c:218 > >>> #3 0x00000000004ae5bd in erts_tse_wait (fcalls=, > >>> esdp=0x7f83e8e2c440, rq=0x7f83e9d5e7c0) at beam/erl_threads.h:2319 > >>> #4 scheduler_wait (fcalls=, esdp=0x7f83e8e2c440, > >>> rq=0x7f83e9d5e7c0) at beam/erl_process.c:2087 > >>> #5 0x00000000004afb94 in schedule (p=, > >>> calls=) at beam/erl_process.c:6467 > >>> #6 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 > >>> #7 0x00000000004b1279 in sched_thread_func (vesdp=0x7f83e8e2c440) at > >>> beam/erl_process.c:4834 > >>> #8 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at > >>> pthread/ethread.c:106 > >>> #9 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 > >>> #10 0x00007f83ea21911d in clone () from /lib64/libc.so.6 > >>> (gdb) > >>> > >>> > >>> (gdb) bt > >>> #0 0x00007f83ea6db09d in waitpid () from /lib64/libpthread.so.0 > >>> #1 0x0000000000555a9f in child_waiter (unused=) > >>> at sys/unix/sys.c:2700 > >>> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266d50) at > >>> pthread/ethread.c:106 > >>> #3 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 > >>> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 > >>> (gdb) > >>> > >>> > >>> **** END UPDATE **** > >>> > >>> > >>> I'm happy to provide any information I can, so please don't hesitate to > >>> ask. > >>> > >>> Thanks in advance! > >>> > >>> Kind Regards, > >>> > >>> Peter Membrey > >>> > >> > >> > > > > _______________________________________________ > > erlang-bugs mailing list > > erlang-bugs@REDACTED > > http://erlang.org/mailman/listinfo/erlang-bugs > > > > -- > Mobile: + 45 26 36 17 55 | Skype: eriksoesorensen | Twitter: @eriksoe > Trifork A/S | Margrethepladsen 4 | DK-8000 Aarhus C | > www.trifork.com > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ess@REDACTED Mon Nov 26 15:14:29 2012 From: ess@REDACTED (=?windows-1252?Q?Erik_S=F8e_S=F8rensen?=) Date: Mon, 26 Nov 2012 15:14:29 +0100 Subject: [erlang-bugs] Fwd: VM locks up on write to socket (and now it seems to file too) In-Reply-To: References: <50AEC81B.2000908@ninenines.eu> <50AFA09D.4060100@erlang.org> <50B35EF8.30402@trifork.com> Message-ID: <50B37945.4090508@trifork.com> On 26-11-2012 13:30, Peter Membrey wrote: > Hi Erik, > > I am writing a little test app to go through and verify the data file. > 2GB seems fairly high - unless something else strange is going on... > > The reason for using native in the file format was to be specific > about which endianness to store the data. The data is sent over the > wire in bid-engian format though. For the simple format, what would > you recommend I use to encode the numbers rather than native? The thing about "native" endian is that it's actually sort of less specific than not stating any endianness (in which case network order = big-endian will be used). > > Any chance you could send me a copy of that program? I'll run the > tests on CentOS... It's just what I could cook up from the writev man page, combined with how I can see from your strace output that writev was called...: #include #include #include int main() { size_t size = 2158022464; char* data = calloc(1,size); if (!data) abort(); const struct iovec vec[1] = {{data, size}}; int res = writev(2, vec, 1); // Writes to stderr printf("writev return value: %d\n", res); return 0; } // invoked with: ./a.out 2>/dev/null I'd add a test for a closed socket (fresh from socket(), perhaps) if I thought my employer didn't have other, more important things for me to do than read up on how to call socket()... /Erik > > Thanks again for your help! > > Cheers, > > Pete > > > > On 26 November 2012 20:22, Erik S?e S?rensen > wrote: > > Suggestions for things to look at: > - See what data size is sent, as seen from the Erlang side. Is the > 2GB number correct? > - Verify endian-ness of the timestamps and data lengths you read > from the file. "native"-endian may be correct, but is a bit of a > funny thing to have in your file format. A mistake here may well > cause your program to write more data than you intended. > > As for how writev handles large values, my quick test on 64-bit > Ubuntu shows that (on a non-socket file descriptor) it returns > 2147479552 =0x7FFFF000 for an input size of > 2158022464 - i.e, it does return something > reasonable and positive, but writes less than 2GB. > That doesn't necessarily say anything about how the behaviour is > on a closed socket on CentOS, of course. > > /Erik > > > On 26-11-2012 12:35, Peter Membrey wrote: >> Hi all, >> >> Trying to send again under a new account... >> >> Cheers, >> >> Pete >> >> ---------- Forwarded message ---------- >> From: *Peter Membrey* > >> Date: 24 November 2012 21:57 >> Subject: Re: [erlang-bugs] VM locks up on write to socket (and >> now it seems to file too) >> To: Patrik Nyblom > >> Cc: erlang-bugs@REDACTED >> >> >> Hi guys, >> >> Thanks for getting back in touch so quickly! >> >> I did do an lsof on the process and I can confirm that it was >> definitely a socket. However by that time the application it had been >> trying to send to had been killed. When I checked the sockets were >> showing as waiting to close. Unfortunately I didn't think to do an >> lsof until after the apps had been shut down. I was hoping the VM >> would recover if I killed the app that had upset it. However even >> after all the apps connected had been shut down, the issue didn't >> resolve. >> >> The application receives requests from a client, which contains two >> data items. The stream ID and a timestamp. Both are encoded as big >> integer unsigned numbers. The server then looks through the file >> referenced by the stream ID and uses the timestamp as an index. The >> file format is currently really simple, in the form of: >> >> > >> >> There is an index file that provides an offset into the file based on >> time stamp, but basically it opens the file, and reads sequentially >> through it until it finds the timestamps that it cares about. In this >> case it reads all data with a greater timestamp until the end of the >> file is reached. It's possible the client is sending an incorrect >> timestamp, and maybe too much data is being read. However the loop is >> very primitive - it reads all the data in one go before passing it >> back to the protocol handler to send down the socket; so by that time >> even though the response is technically incorrect and the app has >> failed, it should still not cause the VM any issues. >> >> The data is polled every 10 seconds by the client app so I would not >> expect there to be 2GB of new data to send. I'm afraid my C >> skills are >> somewhat limited, so I'm not sure how to put together a sample app to >> try out writev. The platform is 64bit CentOS 6.3 (equivalent to RHEL >> 6.3) so I'm not expecting any strange or weird behaviour from the OS >> level but of course I could be completely wrong there. The OS is >> running directly on hardware, so there's no VM layer to worry about. >> >> Hope this might offer some additional clues? >> >> Thanks again! >> >> Kind Regards, >> >> Peter Membrey >> >> >> >> On 24 November 2012 00:13, Patrik Nyblom > > wrote: >> > Hi again! >> > >> > Could you go back to the version without the printouts and get >> back to the >> > situation where writev loops returning 0 (as in the strace)? If >> so, it would >> > be really interesting to see an 'lsof' of the beam process, to >> see if this >> > file descriptor really is open and is a socket... >> > >> > The thing is that writev with a vector that is not empty, would >> never return >> > 0 for a non blocking socket. Not on any modern (i.e. not >> ancient) POSIX >> > compliant system anyway. Of course it is a *really* large item >> you are >> > trying to write there, but it should be no problem for a 64bit >> linux. >> > >> > Also I think there is no use finding the Erlang code, I'll take >> that back, >> > It would be more interesting to see what really happens at the >> OS/VM level >> > in this case. >> > >> > Cheers, >> > Patrik >> > >> > >> > On 11/23/2012 01:49 AM, Lo?c Hoguin wrote: >> >> >> >> Sending this on behalf of someone who didn't manage to get the >> email sent >> >> to this list after 2 attempts. If someone can check if he's >> hold up or >> >> something that'd be great. >> >> >> >> Anyway he has a big issue so I hope I can relay the >> conversation reliably. >> >> >> >> Thanks! >> >> >> >> On 11/23/2012 01:45 AM, Peter Membrey wrote: >> >>> >> >>> From: Peter Membrey > >> >>> Date: 22 November 2012 19:02 >> >>> Subject: VM locks up on write to socket (and now it seems to >> file too) >> >>> To: erlang-bugs@REDACTED >> >>> >> >>> >> >>> Hi guys, >> >>> >> >>> I wrote a simple database application called CakeDB >> >>> (https://github.com/pmembrey/cakedb) that basically spends >> its time >> >>> reading and writing files and sockets. There's very little in >> the way >> >>> of complex logic. It is running on CentOS 6.3 with all the >> updates >> >>> applied. I hit this problem on R15B02 so I rolled back to >> R15B01 but >> >>> the issue remained. Erlang was built from source. >> >>> >> >>> The machine has two Intel X5690 CPUs giving 12 cores plus HT. >> I've >> >>> tried various arguments for the VM but so far nothing has >> prevented >> >>> the problem. At the moment I'm using: >> >>> >> >>> +K >> >>> +A 6 >> >>> +sbt tnnps >> >>> >> >>> The issue I'm seeing is that one of the scheduler threads >> will hit >> >>> 100% cpu usage and the entire VM will become unresponsive. >> When this >> >>> happens, I am not able to connect via the console with attach and >> >>> entop is also unable to connect. I can still establish TCP >> connections >> >>> to the application, but I never receive a response. A >> standard kill >> >>> signal will cause the VM to shut down (it doesn't need -9). >> >>> >> >>> Due to the pedigree of the VM I am quite willing to accept >> that I've >> >>> made a fundamental mistake in my code. I am pretty sure that >> the way I >> >>> am doing the file IO could result in some race conditions. >> However, my >> >>> poor code aside, from what I understand, I still shouldn't be >> able to >> >>> crash / deadlock the VM like this. >> >>> >> >>> The issue doesn't seem to be caused by load. The app can fail >> when >> >>> it's very busy, but also when it is practically idle. I >> haven't been >> >>> able to find a trigger or any other explanation for the failure. >> >>> >> >>> The thread maxing out the CPU is attempting to write data to >> the socket: >> >>> >> >>> (gdb) bt >> >>> #0 0x00007f9882ab6377 in writev () from /lib64/libc.so.6 >> >>> #1 0x000000000058a81f in tcp_inet_output (data=0x2407570, >> >>> event=) at drivers/common/inet_drv.c:9681 >> >>> #2 tcp_inet_drv_output (data=0x2407570, event=> optimized out>) >> >>> at drivers/common/inet_drv.c:9601 >> >>> #3 0x00000000004b773f in erts_port_task_execute >> (runq=0x7f98826019c0, >> >>> curr_port_pp=0x7f9881639338) at beam/erl_port_task.c:858 >> >>> #4 0x00000000004afd83 in schedule (p=, >> >>> calls=) at beam/erl_process.c:6533 >> >>> #5 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >> >>> #6 0x00000000004b1279 in sched_thread_func >> (vesdp=0x7f9881639280) at >> >>> beam/erl_process.c:4834 >> >>> #7 0x00000000005ba726 in thr_wrapper (vtwd=0x7fff6cfe2300) at >> >>> pthread/ethread.c:106 >> >>> #8 0x00007f9882f78851 in start_thread () from >> /lib64/libpthread.so.0 >> >>> #9 0x00007f9882abe11d in clone () from /lib64/libc.so.6 >> >>> (gdb) >> >>> >> >>> I then tried running strace on that thread and got >> (indefinitely): >> >>> >> >>> writev(15, [{"", 2158022464 }], 1) = 0 >> >>> writev(15, [{"", 2158022464 }], 1) = 0 >> >>> writev(15, [{"", 2158022464 }], 1) = 0 >> >>> writev(15, [{"", 2158022464 }], 1) = 0 >> >>> writev(15, [{"", 2158022464 }], 1) = 0 >> >>> writev(15, [{"", 2158022464 }], 1) = 0 >> >>> writev(15, [{"", 2158022464 }], 1) = 0 >> >>> writev(15, [{"", 2158022464 }], 1) = 0 >> >>> writev(15, [{"", 2158022464 }], 1) = 0 >> >>> writev(15, [{"", 2158022464 }], 1) = 0 >> >>> ... >> >>> >> >>> From what I can tell, it's trying to write data to a socket, >> which is >> >>> succeeding, but writing 0 bytes. From the earlier definitions >> in the >> >>> source file, an error condition would be signified by a negative >> >>> number. Any other result is the number of bytes written, in >> this case >> >>> 0. I'm not sure if this is desired behaviour or not. I've tried >> >>> killing the application on the other end of the socket, but >> it has no >> >>> effect on the VM. >> >>> >> >>> I have enabled debugging for the inet code, so hopefully this >> will >> >>> give a little more insight. I am currently trying to >> reproduce the >> >>> condition, but as I really have no idea what causes it, it's >> pretty >> >>> much a case of wait and see. >> >>> >> >>> >> >>> **** UPDATE **** >> >>> >> >>> I managed to lock up the VM again, but this time it was >> caused by file >> >>> IO, >> >>> probably from the debugging statements. Although it worked >> fine for some >> >>> time >> >>> the last entry in the file was cut off. >> >>> >> >>> From GDB: >> >>> >> >>> (gdb) info threads >> >>> 53 Thread 0x7f83e988b700 (LWP 8621) 0x00007f83ea6da54d in >> read () >> >>> from /lib64/libpthread.so.0 >> >>> 52 Thread 0x7f83e8c8f700 (LWP 8622) 0x00007f83ea6d743c in >> >>> pthread_cond_wait@@GLIBC_2.3.2 >> () from >> /lib64/libpthread.so.0 >> >>> 51 Thread 0x7f83e818d700 (LWP 8623) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 50 Thread 0x7f83e816b700 (LWP 8624) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 49 Thread 0x7f83e8149700 (LWP 8625) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 48 Thread 0x7f83e8127700 (LWP 8626) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 47 Thread 0x7f83e8105700 (LWP 8627) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 46 Thread 0x7f83e80e3700 (LWP 8628) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 45 Thread 0x7f83e80c1700 (LWP 8629) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 44 Thread 0x7f83e809f700 (LWP 8630) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 43 Thread 0x7f83e807d700 (LWP 8631) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 42 Thread 0x7f83e805b700 (LWP 8632) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 41 Thread 0x7f83e8039700 (LWP 8633) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 40 Thread 0x7f83e8017700 (LWP 8634) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 39 Thread 0x7f83e7ff5700 (LWP 8635) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 38 Thread 0x7f83e7fd3700 (LWP 8636) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 37 Thread 0x7f83e7fb1700 (LWP 8637) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 36 Thread 0x7f83e7f8f700 (LWP 8638) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 35 Thread 0x7f83e7f6d700 (LWP 8639) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 34 Thread 0x7f83e7f4b700 (LWP 8640) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 33 Thread 0x7f83e7f29700 (LWP 8641) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 32 Thread 0x7f83e7f07700 (LWP 8642) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 31 Thread 0x7f83e7ee5700 (LWP 8643) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 30 Thread 0x7f83e7ec3700 (LWP 8644) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 29 Thread 0x7f83e7ea1700 (LWP 8645) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 28 Thread 0x7f83e7e7f700 (LWP 8646) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 27 Thread 0x7f83d7c5a700 (LWP 8647) 0x00007f83ea6db09d in >> waitpid >> >>> () from /lib64/libpthread.so.0 >> >>> 26 Thread 0x7f83d7c53700 (LWP 8648) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 25 Thread 0x7f83d7252700 (LWP 8649) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 24 Thread 0x7f83d6851700 (LWP 8650) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 23 Thread 0x7f83d5e50700 (LWP 8651) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 22 Thread 0x7f83d544f700 (LWP 8652) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 21 Thread 0x7f83d4a4e700 (LWP 8653) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 20 Thread 0x7f83d404d700 (LWP 8654) 0x00007f83ea20be7d in >> write () >> >>> from /lib64/libc.so.6 >> >>> 19 Thread 0x7f83d364c700 (LWP 8655) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 18 Thread 0x7f83d2c4b700 (LWP 8656) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 17 Thread 0x7f83d224a700 (LWP 8657) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 16 Thread 0x7f83d1849700 (LWP 8658) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 15 Thread 0x7f83d0e48700 (LWP 8659) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 14 Thread 0x7f83d0447700 (LWP 8660) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 13 Thread 0x7f83cfa46700 (LWP 8661) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 12 Thread 0x7f83cf045700 (LWP 8662) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 11 Thread 0x7f83ce644700 (LWP 8663) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 10 Thread 0x7f83cdc43700 (LWP 8664) 0x00007f83ea215ae9 in >> syscall >> >>> () from /lib64/libc.so.6 >> >>> 9 Thread 0x7f83cd242700 (LWP 8665) 0x00007f83ea215ae9 in >> syscall () >> >>> from /lib64/libc.so.6 >> >>> 8 Thread 0x7f83cc841700 (LWP 8666) 0x00007f83ea215ae9 in >> syscall () >> >>> from /lib64/libc.so.6 >> >>> 7 Thread 0x7f83cbe40700 (LWP 8667) 0x00007f83ea215ae9 in >> syscall () >> >>> from /lib64/libc.so.6 >> >>> 6 Thread 0x7f83cb43f700 (LWP 8668) 0x00007f83ea215ae9 in >> syscall () >> >>> from /lib64/libc.so.6 >> >>> 5 Thread 0x7f83caa3e700 (LWP 8669) 0x00007f83ea215ae9 in >> syscall () >> >>> from /lib64/libc.so.6 >> >>> 4 Thread 0x7f83ca03d700 (LWP 8670) 0x00007f83ea215ae9 in >> syscall () >> >>> from /lib64/libc.so.6 >> >>> 3 Thread 0x7f83c963c700 (LWP 8671) 0x00007f83ea215ae9 in >> syscall () >> >>> from /lib64/libc.so.6 >> >>> 2 Thread 0x7f83c8c3b700 (LWP 8672) 0x00007f83ea215ae9 in >> syscall () >> >>> from /lib64/libc.so.6 >> >>> * 1 Thread 0x7f83eb3a8700 (LWP 8597) 0x00007f83ea211d03 in >> select () >> >>> from /lib64/libc.so.6 >> >>> (gdb) >> >>> >> >>> >> >>> (gdb) bt >> >>> #0 0x00007f83ea20be7d in write () from /lib64/libc.so.6 >> >>> #1 0x00007f83ea1a2583 in _IO_new_file_write () from >> /lib64/libc.so.6 >> >>> #2 0x00007f83ea1a3b35 in _IO_new_do_write () from >> /lib64/libc.so.6 >> >>> #3 0x00007f83ea1a21fd in _IO_new_file_xsputn () from >> /lib64/libc.so.6 >> >>> #4 0x00007f83ea17589d in vfprintf () from /lib64/libc.so.6 >> >>> #5 0x00007f83ea18003a in printf () from /lib64/libc.so.6 >> >>> #6 0x000000000058f0e8 in tcp_recv (desc=0x2c3d350, >> request_len=0) at >> >>> drivers/common/inet_drv.c:8976 >> >>> #7 0x000000000058f63a in tcp_inet_input (data=0x2c3d350, >> event=> >>> optimized out>) at drivers/common/inet_drv.c:9326 >> >>> #8 tcp_inet_drv_input (data=0x2c3d350, event=> optimized out>) >> >>> at drivers/common/inet_drv.c:9604 >> >>> #9 0x00000000004b770f in erts_port_task_execute >> (runq=0x7f83e9d5d3c0, >> >>> curr_port_pp=0x7f83e8dc6e78) at beam/erl_port_task.c:851 >> >>> #10 0x00000000004afd83 in schedule (p=, >> >>> calls=) at beam/erl_process.c:6533 >> >>> #11 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >> >>> #12 0x00000000004b1279 in sched_thread_func >> (vesdp=0x7f83e8dc6dc0) at >> >>> beam/erl_process.c:4834 >> >>> #13 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at >> >>> pthread/ethread.c:106 >> >>> #14 0x00007f83ea6d3851 in start_thread () from >> /lib64/libpthread.so.0 >> >>> #15 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >> >>> (gdb) >> >>> >> >>> (gdb) bt >> >>> #0 0x00007f83ea6da54d in read () from /lib64/libpthread.so.0 >> >>> #1 0x0000000000554b6e in signal_dispatcher_thread_func >> (unused=> >>> optimized out>) at sys/unix/sys.c:2776 >> >>> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266c80) at >> >>> pthread/ethread.c:106 >> >>> #3 0x00007f83ea6d3851 in start_thread () from >> /lib64/libpthread.so.0 >> >>> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >> >>> (gdb) >> >>> >> >>> (gdb) bt >> >>> #0 0x00007f83ea215ae9 in syscall () from /lib64/libc.so.6 >> >>> #1 0x00000000005bba35 in wait__ (e=0x2989390) at >> >>> pthread/ethr_event.c:92 >> >>> #2 ethr_event_wait (e=0x2989390) at pthread/ethr_event.c:218 >> >>> #3 0x00000000004ae5bd in erts_tse_wait (fcalls=> optimized out>, >> >>> esdp=0x7f83e8e2c440, rq=0x7f83e9d5e7c0) at >> beam/erl_threads.h:2319 >> >>> #4 scheduler_wait (fcalls=, >> esdp=0x7f83e8e2c440, >> >>> rq=0x7f83e9d5e7c0) at beam/erl_process.c:2087 >> >>> #5 0x00000000004afb94 in schedule (p=, >> >>> calls=) at beam/erl_process.c:6467 >> >>> #6 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >> >>> #7 0x00000000004b1279 in sched_thread_func >> (vesdp=0x7f83e8e2c440) at >> >>> beam/erl_process.c:4834 >> >>> #8 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at >> >>> pthread/ethread.c:106 >> >>> #9 0x00007f83ea6d3851 in start_thread () from >> /lib64/libpthread.so.0 >> >>> #10 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >> >>> (gdb) >> >>> >> >>> >> >>> (gdb) bt >> >>> #0 0x00007f83ea6db09d in waitpid () from /lib64/libpthread.so.0 >> >>> #1 0x0000000000555a9f in child_waiter (unused=> optimized out>) >> >>> at sys/unix/sys.c:2700 >> >>> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266d50) at >> >>> pthread/ethread.c:106 >> >>> #3 0x00007f83ea6d3851 in start_thread () from >> /lib64/libpthread.so.0 >> >>> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >> >>> (gdb) >> >>> >> >>> >> >>> **** END UPDATE **** >> >>> >> >>> >> >>> I'm happy to provide any information I can, so please don't >> hesitate to >> >>> ask. >> >>> >> >>> Thanks in advance! >> >>> >> >>> Kind Regards, >> >>> >> >>> Peter Membrey >> >>> >> >> >> >> >> > >> > _______________________________________________ >> > erlang-bugs mailing list >> > erlang-bugs@REDACTED >> > http://erlang.org/mailman/listinfo/erlang-bugs >> > > > -- > Mobile: + 45 26 36 17 55 | Skype: eriksoesorensen | Twitter: @eriksoe > Trifork A/S | Margrethepladsen 4 | DK-8000 Aarhus C | > www.trifork.com > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > > -- Mobile: + 45 26 36 17 55 | Skype: eriksoesorensen | Twitter: @eriksoe Trifork A/S | Margrethepladsen 4 | DK-8000 Aarhus C | www.trifork.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From kostis@REDACTED Mon Nov 26 15:35:14 2012 From: kostis@REDACTED (Kostis Sagonas) Date: Mon, 26 Nov 2012 15:35:14 +0100 Subject: [erlang-bugs] R15B02 HiPE can't compile modules with on_load attribute In-Reply-To: References: Message-ID: <50B37E22.40401@cs.ntua.gr> On 11/25/2012 08:16 PM, Jos? Valim wrote: > HiPE can't compile a module with on_load attribute. This sample module > fails: > > -module(foo). > -on_load(do_nothing/0). > > %% Exporting the function doesn't affect the outcome > %% -exports([do_nothing/0]). > > do_nothing() -> ok. > > > When compiled via command line or via compile:forms. > A snippet of the error message is: > > =ERROR REPORT==== 25-Nov-2012::18:48:29 === > Error in process <0.88.0> with exit value: > {{badmatch,{'EXIT',{{hipe_beam_to_icode,1103,{'trans_fun/2',on_load}},[{hipe_beam_to_icode,trans_fun,2,[{file,"hipe_beam_to_icode.erl"},{line,1103}]},{hipe_beam_to_icode,trans_fun,2,[{file,"hipe_beam_to_icode.erl"},{line,253}]},{hipe_beam_to_icode... I can easily provide a patch that makes the HiPE compiler not crash when it finds an "on_load" BEAM instruction, but the real fix involves changing the HiPE loader to trigger execution of functions specified as part of the on_load attribute declaration. Perhaps somebody at OTP who is more familiar with the magic that the BEAM loader does when it finds BEAM code with on_load instructions can do this -- or at least help here. Kostis From sverker.eriksson@REDACTED Tue Nov 27 15:08:59 2012 From: sverker.eriksson@REDACTED (Sverker Eriksson) Date: Tue, 27 Nov 2012 15:08:59 +0100 Subject: [erlang-bugs] R15B02 HiPE can't compile modules with on_load attribute In-Reply-To: <50B37E22.40401@cs.ntua.gr> References: <50B37E22.40401@cs.ntua.gr> Message-ID: <50B4C97B.4030303@erix.ericsson.se> Kostis Sagonas wrote: > On 11/25/2012 08:16 PM, Jos? Valim wrote: >> HiPE can't compile a module with on_load attribute. This sample module >> fails: >> >> -module(foo). >> -on_load(do_nothing/0). >> >> %% Exporting the function doesn't affect the outcome >> %% -exports([do_nothing/0]). >> >> do_nothing() -> ok. >> >> >> When compiled via command line or via compile:forms. >> A snippet of the error message is: >> >> =ERROR REPORT==== 25-Nov-2012::18:48:29 === >> Error in process <0.88.0> with exit value: >> {{badmatch,{'EXIT',{{hipe_beam_to_icode,1103,{'trans_fun/2',on_load}},[{hipe_beam_to_icode,trans_fun,2,[{file,"hipe_beam_to_icode.erl"},{line,1103}]},{hipe_beam_to_icode,trans_fun,2,[{file,"hipe_beam_to_icode.erl"},{line,253}]},{hipe_beam_to_icode... >> > > I can easily provide a patch that makes the HiPE compiler not crash > when it finds an "on_load" BEAM instruction, but the real fix involves > changing the HiPE loader to trigger execution of functions specified > as part of the on_load attribute declaration. > > Perhaps somebody at OTP who is more familiar with the magic that the > BEAM loader does when it finds BEAM code with on_load instructions can > do this -- or at least help here. > > Kostis Supporting "on_load" for Hipe is not trivial to do and it is not a prioritized job for us right now. If on_load is used to call erlang:load_nif/2, that's yet another thing that would need to be fixed. Right now you can not mix Hipe and NIFs in the same module. /Sverker, Erlang/OTP From pan@REDACTED Wed Nov 28 16:54:47 2012 From: pan@REDACTED (Patrik Nyblom) Date: Wed, 28 Nov 2012 16:54:47 +0100 Subject: [erlang-bugs] VM locks up on write to socket (and now it seems to file too) In-Reply-To: References: <50AEC81B.2000908@ninenines.eu> <50AFA09D.4060100@erlang.org> Message-ID: <50B633C7.7000709@erlang.org> Hi! I'll upgrade the CentOS VM I have to 6.3 (only had 6.1 :() and see if I can reproduce. If that fails, could you run a VM with a patch to try to handle the unexpected case and see if that fixes it? Cheers, /Patrik On 11/24/2012 02:57 PM, Peter Membrey wrote: > Hi guys, > > Thanks for getting back in touch so quickly! > > I did do an lsof on the process and I can confirm that it was > definitely a socket. However by that time the application it had been > trying to send to had been killed. When I checked the sockets were > showing as waiting to close. Unfortunately I didn't think to do an > lsof until after the apps had been shut down. I was hoping the VM > would recover if I killed the app that had upset it. However even > after all the apps connected had been shut down, the issue didn't > resolve. > > The application receives requests from a client, which contains two > data items. The stream ID and a timestamp. Both are encoded as big > integer unsigned numbers. The server then looks through the file > referenced by the stream ID and uses the timestamp as an index. The > file format is currently really simple, in the form of: > > > > > There is an index file that provides an offset into the file based on > time stamp, but basically it opens the file, and reads sequentially > through it until it finds the timestamps that it cares about. In this > case it reads all data with a greater timestamp until the end of the > file is reached. It's possible the client is sending an incorrect > timestamp, and maybe too much data is being read. However the loop is > very primitive - it reads all the data in one go before passing it > back to the protocol handler to send down the socket; so by that time > even though the response is technically incorrect and the app has > failed, it should still not cause the VM any issues. > > The data is polled every 10 seconds by the client app so I would not > expect there to be 2GB of new data to send. I'm afraid my C skills are > somewhat limited, so I'm not sure how to put together a sample app to > try out writev. The platform is 64bit CentOS 6.3 (equivalent to RHEL > 6.3) so I'm not expecting any strange or weird behaviour from the OS > level but of course I could be completely wrong there. The OS is > running directly on hardware, so there's no VM layer to worry about. > > Hope this might offer some additional clues? > > Thanks again! > > Kind Regards, > > Peter Membrey > > > > On 24 November 2012 00:13, Patrik Nyblom wrote: >> Hi again! >> >> Could you go back to the version without the printouts and get back to the >> situation where writev loops returning 0 (as in the strace)? If so, it would >> be really interesting to see an 'lsof' of the beam process, to see if this >> file descriptor really is open and is a socket... >> >> The thing is that writev with a vector that is not empty, would never return >> 0 for a non blocking socket. Not on any modern (i.e. not ancient) POSIX >> compliant system anyway. Of course it is a *really* large item you are >> trying to write there, but it should be no problem for a 64bit linux. >> >> Also I think there is no use finding the Erlang code, I'll take that back, >> It would be more interesting to see what really happens at the OS/VM level >> in this case. >> >> Cheers, >> Patrik >> >> >> On 11/23/2012 01:49 AM, Lo?c Hoguin wrote: >>> Sending this on behalf of someone who didn't manage to get the email sent >>> to this list after 2 attempts. If someone can check if he's hold up or >>> something that'd be great. >>> >>> Anyway he has a big issue so I hope I can relay the conversation reliably. >>> >>> Thanks! >>> >>> On 11/23/2012 01:45 AM, Peter Membrey wrote: >>>> From: Peter Membrey >>>> Date: 22 November 2012 19:02 >>>> Subject: VM locks up on write to socket (and now it seems to file too) >>>> To: erlang-bugs@REDACTED >>>> >>>> >>>> Hi guys, >>>> >>>> I wrote a simple database application called CakeDB >>>> (https://github.com/pmembrey/cakedb) that basically spends its time >>>> reading and writing files and sockets. There's very little in the way >>>> of complex logic. It is running on CentOS 6.3 with all the updates >>>> applied. I hit this problem on R15B02 so I rolled back to R15B01 but >>>> the issue remained. Erlang was built from source. >>>> >>>> The machine has two Intel X5690 CPUs giving 12 cores plus HT. I've >>>> tried various arguments for the VM but so far nothing has prevented >>>> the problem. At the moment I'm using: >>>> >>>> +K >>>> +A 6 >>>> +sbt tnnps >>>> >>>> The issue I'm seeing is that one of the scheduler threads will hit >>>> 100% cpu usage and the entire VM will become unresponsive. When this >>>> happens, I am not able to connect via the console with attach and >>>> entop is also unable to connect. I can still establish TCP connections >>>> to the application, but I never receive a response. A standard kill >>>> signal will cause the VM to shut down (it doesn't need -9). >>>> >>>> Due to the pedigree of the VM I am quite willing to accept that I've >>>> made a fundamental mistake in my code. I am pretty sure that the way I >>>> am doing the file IO could result in some race conditions. However, my >>>> poor code aside, from what I understand, I still shouldn't be able to >>>> crash / deadlock the VM like this. >>>> >>>> The issue doesn't seem to be caused by load. The app can fail when >>>> it's very busy, but also when it is practically idle. I haven't been >>>> able to find a trigger or any other explanation for the failure. >>>> >>>> The thread maxing out the CPU is attempting to write data to the socket: >>>> >>>> (gdb) bt >>>> #0 0x00007f9882ab6377 in writev () from /lib64/libc.so.6 >>>> #1 0x000000000058a81f in tcp_inet_output (data=0x2407570, >>>> event=) at drivers/common/inet_drv.c:9681 >>>> #2 tcp_inet_drv_output (data=0x2407570, event=) >>>> at drivers/common/inet_drv.c:9601 >>>> #3 0x00000000004b773f in erts_port_task_execute (runq=0x7f98826019c0, >>>> curr_port_pp=0x7f9881639338) at beam/erl_port_task.c:858 >>>> #4 0x00000000004afd83 in schedule (p=, >>>> calls=) at beam/erl_process.c:6533 >>>> #5 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >>>> #6 0x00000000004b1279 in sched_thread_func (vesdp=0x7f9881639280) at >>>> beam/erl_process.c:4834 >>>> #7 0x00000000005ba726 in thr_wrapper (vtwd=0x7fff6cfe2300) at >>>> pthread/ethread.c:106 >>>> #8 0x00007f9882f78851 in start_thread () from /lib64/libpthread.so.0 >>>> #9 0x00007f9882abe11d in clone () from /lib64/libc.so.6 >>>> (gdb) >>>> >>>> I then tried running strace on that thread and got (indefinitely): >>>> >>>> writev(15, [{"", 2158022464}], 1) = 0 >>>> writev(15, [{"", 2158022464}], 1) = 0 >>>> writev(15, [{"", 2158022464}], 1) = 0 >>>> writev(15, [{"", 2158022464}], 1) = 0 >>>> writev(15, [{"", 2158022464}], 1) = 0 >>>> writev(15, [{"", 2158022464}], 1) = 0 >>>> writev(15, [{"", 2158022464}], 1) = 0 >>>> writev(15, [{"", 2158022464}], 1) = 0 >>>> writev(15, [{"", 2158022464}], 1) = 0 >>>> writev(15, [{"", 2158022464}], 1) = 0 >>>> ... >>>> >>>> From what I can tell, it's trying to write data to a socket, which is >>>> succeeding, but writing 0 bytes. From the earlier definitions in the >>>> source file, an error condition would be signified by a negative >>>> number. Any other result is the number of bytes written, in this case >>>> 0. I'm not sure if this is desired behaviour or not. I've tried >>>> killing the application on the other end of the socket, but it has no >>>> effect on the VM. >>>> >>>> I have enabled debugging for the inet code, so hopefully this will >>>> give a little more insight. I am currently trying to reproduce the >>>> condition, but as I really have no idea what causes it, it's pretty >>>> much a case of wait and see. >>>> >>>> >>>> **** UPDATE **** >>>> >>>> I managed to lock up the VM again, but this time it was caused by file >>>> IO, >>>> probably from the debugging statements. Although it worked fine for some >>>> time >>>> the last entry in the file was cut off. >>>> >>>> From GDB: >>>> >>>> (gdb) info threads >>>> 53 Thread 0x7f83e988b700 (LWP 8621) 0x00007f83ea6da54d in read () >>>> from /lib64/libpthread.so.0 >>>> 52 Thread 0x7f83e8c8f700 (LWP 8622) 0x00007f83ea6d743c in >>>> pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >>>> 51 Thread 0x7f83e818d700 (LWP 8623) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 50 Thread 0x7f83e816b700 (LWP 8624) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 49 Thread 0x7f83e8149700 (LWP 8625) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 48 Thread 0x7f83e8127700 (LWP 8626) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 47 Thread 0x7f83e8105700 (LWP 8627) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 46 Thread 0x7f83e80e3700 (LWP 8628) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 45 Thread 0x7f83e80c1700 (LWP 8629) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 44 Thread 0x7f83e809f700 (LWP 8630) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 43 Thread 0x7f83e807d700 (LWP 8631) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 42 Thread 0x7f83e805b700 (LWP 8632) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 41 Thread 0x7f83e8039700 (LWP 8633) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 40 Thread 0x7f83e8017700 (LWP 8634) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 39 Thread 0x7f83e7ff5700 (LWP 8635) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 38 Thread 0x7f83e7fd3700 (LWP 8636) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 37 Thread 0x7f83e7fb1700 (LWP 8637) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 36 Thread 0x7f83e7f8f700 (LWP 8638) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 35 Thread 0x7f83e7f6d700 (LWP 8639) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 34 Thread 0x7f83e7f4b700 (LWP 8640) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 33 Thread 0x7f83e7f29700 (LWP 8641) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 32 Thread 0x7f83e7f07700 (LWP 8642) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 31 Thread 0x7f83e7ee5700 (LWP 8643) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 30 Thread 0x7f83e7ec3700 (LWP 8644) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 29 Thread 0x7f83e7ea1700 (LWP 8645) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 28 Thread 0x7f83e7e7f700 (LWP 8646) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 27 Thread 0x7f83d7c5a700 (LWP 8647) 0x00007f83ea6db09d in waitpid >>>> () from /lib64/libpthread.so.0 >>>> 26 Thread 0x7f83d7c53700 (LWP 8648) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 25 Thread 0x7f83d7252700 (LWP 8649) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 24 Thread 0x7f83d6851700 (LWP 8650) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 23 Thread 0x7f83d5e50700 (LWP 8651) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 22 Thread 0x7f83d544f700 (LWP 8652) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 21 Thread 0x7f83d4a4e700 (LWP 8653) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 20 Thread 0x7f83d404d700 (LWP 8654) 0x00007f83ea20be7d in write () >>>> from /lib64/libc.so.6 >>>> 19 Thread 0x7f83d364c700 (LWP 8655) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 18 Thread 0x7f83d2c4b700 (LWP 8656) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 17 Thread 0x7f83d224a700 (LWP 8657) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 16 Thread 0x7f83d1849700 (LWP 8658) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 15 Thread 0x7f83d0e48700 (LWP 8659) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 14 Thread 0x7f83d0447700 (LWP 8660) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 13 Thread 0x7f83cfa46700 (LWP 8661) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 12 Thread 0x7f83cf045700 (LWP 8662) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 11 Thread 0x7f83ce644700 (LWP 8663) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 10 Thread 0x7f83cdc43700 (LWP 8664) 0x00007f83ea215ae9 in syscall >>>> () from /lib64/libc.so.6 >>>> 9 Thread 0x7f83cd242700 (LWP 8665) 0x00007f83ea215ae9 in syscall () >>>> from /lib64/libc.so.6 >>>> 8 Thread 0x7f83cc841700 (LWP 8666) 0x00007f83ea215ae9 in syscall () >>>> from /lib64/libc.so.6 >>>> 7 Thread 0x7f83cbe40700 (LWP 8667) 0x00007f83ea215ae9 in syscall () >>>> from /lib64/libc.so.6 >>>> 6 Thread 0x7f83cb43f700 (LWP 8668) 0x00007f83ea215ae9 in syscall () >>>> from /lib64/libc.so.6 >>>> 5 Thread 0x7f83caa3e700 (LWP 8669) 0x00007f83ea215ae9 in syscall () >>>> from /lib64/libc.so.6 >>>> 4 Thread 0x7f83ca03d700 (LWP 8670) 0x00007f83ea215ae9 in syscall () >>>> from /lib64/libc.so.6 >>>> 3 Thread 0x7f83c963c700 (LWP 8671) 0x00007f83ea215ae9 in syscall () >>>> from /lib64/libc.so.6 >>>> 2 Thread 0x7f83c8c3b700 (LWP 8672) 0x00007f83ea215ae9 in syscall () >>>> from /lib64/libc.so.6 >>>> * 1 Thread 0x7f83eb3a8700 (LWP 8597) 0x00007f83ea211d03 in select () >>>> from /lib64/libc.so.6 >>>> (gdb) >>>> >>>> >>>> (gdb) bt >>>> #0 0x00007f83ea20be7d in write () from /lib64/libc.so.6 >>>> #1 0x00007f83ea1a2583 in _IO_new_file_write () from /lib64/libc.so.6 >>>> #2 0x00007f83ea1a3b35 in _IO_new_do_write () from /lib64/libc.so.6 >>>> #3 0x00007f83ea1a21fd in _IO_new_file_xsputn () from /lib64/libc.so.6 >>>> #4 0x00007f83ea17589d in vfprintf () from /lib64/libc.so.6 >>>> #5 0x00007f83ea18003a in printf () from /lib64/libc.so.6 >>>> #6 0x000000000058f0e8 in tcp_recv (desc=0x2c3d350, request_len=0) at >>>> drivers/common/inet_drv.c:8976 >>>> #7 0x000000000058f63a in tcp_inet_input (data=0x2c3d350, event=>>> optimized out>) at drivers/common/inet_drv.c:9326 >>>> #8 tcp_inet_drv_input (data=0x2c3d350, event=) >>>> at drivers/common/inet_drv.c:9604 >>>> #9 0x00000000004b770f in erts_port_task_execute (runq=0x7f83e9d5d3c0, >>>> curr_port_pp=0x7f83e8dc6e78) at beam/erl_port_task.c:851 >>>> #10 0x00000000004afd83 in schedule (p=, >>>> calls=) at beam/erl_process.c:6533 >>>> #11 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >>>> #12 0x00000000004b1279 in sched_thread_func (vesdp=0x7f83e8dc6dc0) at >>>> beam/erl_process.c:4834 >>>> #13 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at >>>> pthread/ethread.c:106 >>>> #14 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >>>> #15 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>> (gdb) >>>> >>>> (gdb) bt >>>> #0 0x00007f83ea6da54d in read () from /lib64/libpthread.so.0 >>>> #1 0x0000000000554b6e in signal_dispatcher_thread_func (unused=>>> optimized out>) at sys/unix/sys.c:2776 >>>> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266c80) at >>>> pthread/ethread.c:106 >>>> #3 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >>>> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>> (gdb) >>>> >>>> (gdb) bt >>>> #0 0x00007f83ea215ae9 in syscall () from /lib64/libc.so.6 >>>> #1 0x00000000005bba35 in wait__ (e=0x2989390) at >>>> pthread/ethr_event.c:92 >>>> #2 ethr_event_wait (e=0x2989390) at pthread/ethr_event.c:218 >>>> #3 0x00000000004ae5bd in erts_tse_wait (fcalls=, >>>> esdp=0x7f83e8e2c440, rq=0x7f83e9d5e7c0) at beam/erl_threads.h:2319 >>>> #4 scheduler_wait (fcalls=, esdp=0x7f83e8e2c440, >>>> rq=0x7f83e9d5e7c0) at beam/erl_process.c:2087 >>>> #5 0x00000000004afb94 in schedule (p=, >>>> calls=) at beam/erl_process.c:6467 >>>> #6 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >>>> #7 0x00000000004b1279 in sched_thread_func (vesdp=0x7f83e8e2c440) at >>>> beam/erl_process.c:4834 >>>> #8 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at >>>> pthread/ethread.c:106 >>>> #9 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >>>> #10 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>> (gdb) >>>> >>>> >>>> (gdb) bt >>>> #0 0x00007f83ea6db09d in waitpid () from /lib64/libpthread.so.0 >>>> #1 0x0000000000555a9f in child_waiter (unused=) >>>> at sys/unix/sys.c:2700 >>>> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266d50) at >>>> pthread/ethread.c:106 >>>> #3 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >>>> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>> (gdb) >>>> >>>> >>>> **** END UPDATE **** >>>> >>>> >>>> I'm happy to provide any information I can, so please don't hesitate to >>>> ask. >>>> >>>> Thanks in advance! >>>> >>>> Kind Regards, >>>> >>>> Peter Membrey >>>> >>> >> _______________________________________________ >> erlang-bugs mailing list >> erlang-bugs@REDACTED >> http://erlang.org/mailman/listinfo/erlang-bugs From peter@REDACTED Wed Nov 28 17:23:08 2012 From: peter@REDACTED (Peter Membrey) Date: Thu, 29 Nov 2012 00:23:08 +0800 Subject: [erlang-bugs] VM locks up on write to socket (and now it seems to file too) In-Reply-To: <50B633C7.7000709@erlang.org> References: <50AEC81B.2000908@ninenines.eu> <50AFA09D.4060100@erlang.org> <50B633C7.7000709@erlang.org> Message-ID: Hi, No problem, I'll do what I can to help - thanks for looking into this so quickly! Any idea what might be causing it? Cheers, Pete On 28 November 2012 23:54, Patrik Nyblom wrote: > Hi! > > I'll upgrade the CentOS VM I have to 6.3 (only had 6.1 :() and see if I can > reproduce. If that fails, could you run a VM with a patch to try to handle > the unexpected case and see if that fixes it? > > Cheers, > /Patrik > > On 11/24/2012 02:57 PM, Peter Membrey wrote: >> >> Hi guys, >> >> Thanks for getting back in touch so quickly! >> >> I did do an lsof on the process and I can confirm that it was >> definitely a socket. However by that time the application it had been >> trying to send to had been killed. When I checked the sockets were >> showing as waiting to close. Unfortunately I didn't think to do an >> lsof until after the apps had been shut down. I was hoping the VM >> would recover if I killed the app that had upset it. However even >> after all the apps connected had been shut down, the issue didn't >> resolve. >> >> The application receives requests from a client, which contains two >> data items. The stream ID and a timestamp. Both are encoded as big >> integer unsigned numbers. The server then looks through the file >> referenced by the stream ID and uses the timestamp as an index. The >> file format is currently really simple, in the form of: >> >> > >> >> There is an index file that provides an offset into the file based on >> time stamp, but basically it opens the file, and reads sequentially >> through it until it finds the timestamps that it cares about. In this >> case it reads all data with a greater timestamp until the end of the >> file is reached. It's possible the client is sending an incorrect >> timestamp, and maybe too much data is being read. However the loop is >> very primitive - it reads all the data in one go before passing it >> back to the protocol handler to send down the socket; so by that time >> even though the response is technically incorrect and the app has >> failed, it should still not cause the VM any issues. >> >> The data is polled every 10 seconds by the client app so I would not >> expect there to be 2GB of new data to send. I'm afraid my C skills are >> somewhat limited, so I'm not sure how to put together a sample app to >> try out writev. The platform is 64bit CentOS 6.3 (equivalent to RHEL >> 6.3) so I'm not expecting any strange or weird behaviour from the OS >> level but of course I could be completely wrong there. The OS is >> running directly on hardware, so there's no VM layer to worry about. >> >> Hope this might offer some additional clues? >> >> Thanks again! >> >> Kind Regards, >> >> Peter Membrey >> >> >> >> On 24 November 2012 00:13, Patrik Nyblom wrote: >>> >>> Hi again! >>> >>> Could you go back to the version without the printouts and get back to >>> the >>> situation where writev loops returning 0 (as in the strace)? If so, it >>> would >>> be really interesting to see an 'lsof' of the beam process, to see if >>> this >>> file descriptor really is open and is a socket... >>> >>> The thing is that writev with a vector that is not empty, would never >>> return >>> 0 for a non blocking socket. Not on any modern (i.e. not ancient) POSIX >>> compliant system anyway. Of course it is a *really* large item you are >>> trying to write there, but it should be no problem for a 64bit linux. >>> >>> Also I think there is no use finding the Erlang code, I'll take that >>> back, >>> It would be more interesting to see what really happens at the OS/VM >>> level >>> in this case. >>> >>> Cheers, >>> Patrik >>> >>> >>> On 11/23/2012 01:49 AM, Lo?c Hoguin wrote: >>>> >>>> Sending this on behalf of someone who didn't manage to get the email >>>> sent >>>> to this list after 2 attempts. If someone can check if he's hold up or >>>> something that'd be great. >>>> >>>> Anyway he has a big issue so I hope I can relay the conversation >>>> reliably. >>>> >>>> Thanks! >>>> >>>> On 11/23/2012 01:45 AM, Peter Membrey wrote: >>>>> >>>>> From: Peter Membrey >>>>> Date: 22 November 2012 19:02 >>>>> Subject: VM locks up on write to socket (and now it seems to file too) >>>>> To: erlang-bugs@REDACTED >>>>> >>>>> >>>>> Hi guys, >>>>> >>>>> I wrote a simple database application called CakeDB >>>>> (https://github.com/pmembrey/cakedb) that basically spends its time >>>>> reading and writing files and sockets. There's very little in the way >>>>> of complex logic. It is running on CentOS 6.3 with all the updates >>>>> applied. I hit this problem on R15B02 so I rolled back to R15B01 but >>>>> the issue remained. Erlang was built from source. >>>>> >>>>> The machine has two Intel X5690 CPUs giving 12 cores plus HT. I've >>>>> tried various arguments for the VM but so far nothing has prevented >>>>> the problem. At the moment I'm using: >>>>> >>>>> +K >>>>> +A 6 >>>>> +sbt tnnps >>>>> >>>>> The issue I'm seeing is that one of the scheduler threads will hit >>>>> 100% cpu usage and the entire VM will become unresponsive. When this >>>>> happens, I am not able to connect via the console with attach and >>>>> entop is also unable to connect. I can still establish TCP connections >>>>> to the application, but I never receive a response. A standard kill >>>>> signal will cause the VM to shut down (it doesn't need -9). >>>>> >>>>> Due to the pedigree of the VM I am quite willing to accept that I've >>>>> made a fundamental mistake in my code. I am pretty sure that the way I >>>>> am doing the file IO could result in some race conditions. However, my >>>>> poor code aside, from what I understand, I still shouldn't be able to >>>>> crash / deadlock the VM like this. >>>>> >>>>> The issue doesn't seem to be caused by load. The app can fail when >>>>> it's very busy, but also when it is practically idle. I haven't been >>>>> able to find a trigger or any other explanation for the failure. >>>>> >>>>> The thread maxing out the CPU is attempting to write data to the >>>>> socket: >>>>> >>>>> (gdb) bt >>>>> #0 0x00007f9882ab6377 in writev () from /lib64/libc.so.6 >>>>> #1 0x000000000058a81f in tcp_inet_output (data=0x2407570, >>>>> event=) at drivers/common/inet_drv.c:9681 >>>>> #2 tcp_inet_drv_output (data=0x2407570, event=) >>>>> at drivers/common/inet_drv.c:9601 >>>>> #3 0x00000000004b773f in erts_port_task_execute (runq=0x7f98826019c0, >>>>> curr_port_pp=0x7f9881639338) at beam/erl_port_task.c:858 >>>>> #4 0x00000000004afd83 in schedule (p=, >>>>> calls=) at beam/erl_process.c:6533 >>>>> #5 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >>>>> #6 0x00000000004b1279 in sched_thread_func (vesdp=0x7f9881639280) at >>>>> beam/erl_process.c:4834 >>>>> #7 0x00000000005ba726 in thr_wrapper (vtwd=0x7fff6cfe2300) at >>>>> pthread/ethread.c:106 >>>>> #8 0x00007f9882f78851 in start_thread () from /lib64/libpthread.so.0 >>>>> #9 0x00007f9882abe11d in clone () from /lib64/libc.so.6 >>>>> (gdb) >>>>> >>>>> I then tried running strace on that thread and got (indefinitely): >>>>> >>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>> ... >>>>> >>>>> From what I can tell, it's trying to write data to a socket, which is >>>>> succeeding, but writing 0 bytes. From the earlier definitions in the >>>>> source file, an error condition would be signified by a negative >>>>> number. Any other result is the number of bytes written, in this case >>>>> 0. I'm not sure if this is desired behaviour or not. I've tried >>>>> killing the application on the other end of the socket, but it has no >>>>> effect on the VM. >>>>> >>>>> I have enabled debugging for the inet code, so hopefully this will >>>>> give a little more insight. I am currently trying to reproduce the >>>>> condition, but as I really have no idea what causes it, it's pretty >>>>> much a case of wait and see. >>>>> >>>>> >>>>> **** UPDATE **** >>>>> >>>>> I managed to lock up the VM again, but this time it was caused by file >>>>> IO, >>>>> probably from the debugging statements. Although it worked fine for >>>>> some >>>>> time >>>>> the last entry in the file was cut off. >>>>> >>>>> From GDB: >>>>> >>>>> (gdb) info threads >>>>> 53 Thread 0x7f83e988b700 (LWP 8621) 0x00007f83ea6da54d in read () >>>>> from /lib64/libpthread.so.0 >>>>> 52 Thread 0x7f83e8c8f700 (LWP 8622) 0x00007f83ea6d743c in >>>>> pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >>>>> 51 Thread 0x7f83e818d700 (LWP 8623) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 50 Thread 0x7f83e816b700 (LWP 8624) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 49 Thread 0x7f83e8149700 (LWP 8625) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 48 Thread 0x7f83e8127700 (LWP 8626) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 47 Thread 0x7f83e8105700 (LWP 8627) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 46 Thread 0x7f83e80e3700 (LWP 8628) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 45 Thread 0x7f83e80c1700 (LWP 8629) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 44 Thread 0x7f83e809f700 (LWP 8630) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 43 Thread 0x7f83e807d700 (LWP 8631) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 42 Thread 0x7f83e805b700 (LWP 8632) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 41 Thread 0x7f83e8039700 (LWP 8633) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 40 Thread 0x7f83e8017700 (LWP 8634) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 39 Thread 0x7f83e7ff5700 (LWP 8635) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 38 Thread 0x7f83e7fd3700 (LWP 8636) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 37 Thread 0x7f83e7fb1700 (LWP 8637) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 36 Thread 0x7f83e7f8f700 (LWP 8638) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 35 Thread 0x7f83e7f6d700 (LWP 8639) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 34 Thread 0x7f83e7f4b700 (LWP 8640) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 33 Thread 0x7f83e7f29700 (LWP 8641) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 32 Thread 0x7f83e7f07700 (LWP 8642) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 31 Thread 0x7f83e7ee5700 (LWP 8643) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 30 Thread 0x7f83e7ec3700 (LWP 8644) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 29 Thread 0x7f83e7ea1700 (LWP 8645) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 28 Thread 0x7f83e7e7f700 (LWP 8646) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 27 Thread 0x7f83d7c5a700 (LWP 8647) 0x00007f83ea6db09d in waitpid >>>>> () from /lib64/libpthread.so.0 >>>>> 26 Thread 0x7f83d7c53700 (LWP 8648) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 25 Thread 0x7f83d7252700 (LWP 8649) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 24 Thread 0x7f83d6851700 (LWP 8650) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 23 Thread 0x7f83d5e50700 (LWP 8651) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 22 Thread 0x7f83d544f700 (LWP 8652) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 21 Thread 0x7f83d4a4e700 (LWP 8653) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 20 Thread 0x7f83d404d700 (LWP 8654) 0x00007f83ea20be7d in write () >>>>> from /lib64/libc.so.6 >>>>> 19 Thread 0x7f83d364c700 (LWP 8655) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 18 Thread 0x7f83d2c4b700 (LWP 8656) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 17 Thread 0x7f83d224a700 (LWP 8657) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 16 Thread 0x7f83d1849700 (LWP 8658) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 15 Thread 0x7f83d0e48700 (LWP 8659) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 14 Thread 0x7f83d0447700 (LWP 8660) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 13 Thread 0x7f83cfa46700 (LWP 8661) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 12 Thread 0x7f83cf045700 (LWP 8662) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 11 Thread 0x7f83ce644700 (LWP 8663) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 10 Thread 0x7f83cdc43700 (LWP 8664) 0x00007f83ea215ae9 in syscall >>>>> () from /lib64/libc.so.6 >>>>> 9 Thread 0x7f83cd242700 (LWP 8665) 0x00007f83ea215ae9 in syscall >>>>> () >>>>> from /lib64/libc.so.6 >>>>> 8 Thread 0x7f83cc841700 (LWP 8666) 0x00007f83ea215ae9 in syscall >>>>> () >>>>> from /lib64/libc.so.6 >>>>> 7 Thread 0x7f83cbe40700 (LWP 8667) 0x00007f83ea215ae9 in syscall >>>>> () >>>>> from /lib64/libc.so.6 >>>>> 6 Thread 0x7f83cb43f700 (LWP 8668) 0x00007f83ea215ae9 in syscall >>>>> () >>>>> from /lib64/libc.so.6 >>>>> 5 Thread 0x7f83caa3e700 (LWP 8669) 0x00007f83ea215ae9 in syscall >>>>> () >>>>> from /lib64/libc.so.6 >>>>> 4 Thread 0x7f83ca03d700 (LWP 8670) 0x00007f83ea215ae9 in syscall >>>>> () >>>>> from /lib64/libc.so.6 >>>>> 3 Thread 0x7f83c963c700 (LWP 8671) 0x00007f83ea215ae9 in syscall >>>>> () >>>>> from /lib64/libc.so.6 >>>>> 2 Thread 0x7f83c8c3b700 (LWP 8672) 0x00007f83ea215ae9 in syscall >>>>> () >>>>> from /lib64/libc.so.6 >>>>> * 1 Thread 0x7f83eb3a8700 (LWP 8597) 0x00007f83ea211d03 in select () >>>>> from /lib64/libc.so.6 >>>>> (gdb) >>>>> >>>>> >>>>> (gdb) bt >>>>> #0 0x00007f83ea20be7d in write () from /lib64/libc.so.6 >>>>> #1 0x00007f83ea1a2583 in _IO_new_file_write () from /lib64/libc.so.6 >>>>> #2 0x00007f83ea1a3b35 in _IO_new_do_write () from /lib64/libc.so.6 >>>>> #3 0x00007f83ea1a21fd in _IO_new_file_xsputn () from /lib64/libc.so.6 >>>>> #4 0x00007f83ea17589d in vfprintf () from /lib64/libc.so.6 >>>>> #5 0x00007f83ea18003a in printf () from /lib64/libc.so.6 >>>>> #6 0x000000000058f0e8 in tcp_recv (desc=0x2c3d350, request_len=0) at >>>>> drivers/common/inet_drv.c:8976 >>>>> #7 0x000000000058f63a in tcp_inet_input (data=0x2c3d350, event=>>>> optimized out>) at drivers/common/inet_drv.c:9326 >>>>> #8 tcp_inet_drv_input (data=0x2c3d350, event=) >>>>> at drivers/common/inet_drv.c:9604 >>>>> #9 0x00000000004b770f in erts_port_task_execute (runq=0x7f83e9d5d3c0, >>>>> curr_port_pp=0x7f83e8dc6e78) at beam/erl_port_task.c:851 >>>>> #10 0x00000000004afd83 in schedule (p=, >>>>> calls=) at beam/erl_process.c:6533 >>>>> #11 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >>>>> #12 0x00000000004b1279 in sched_thread_func (vesdp=0x7f83e8dc6dc0) at >>>>> beam/erl_process.c:4834 >>>>> #13 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at >>>>> pthread/ethread.c:106 >>>>> #14 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >>>>> #15 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>> (gdb) >>>>> >>>>> (gdb) bt >>>>> #0 0x00007f83ea6da54d in read () from /lib64/libpthread.so.0 >>>>> #1 0x0000000000554b6e in signal_dispatcher_thread_func (unused=>>>> optimized out>) at sys/unix/sys.c:2776 >>>>> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266c80) at >>>>> pthread/ethread.c:106 >>>>> #3 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >>>>> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>> (gdb) >>>>> >>>>> (gdb) bt >>>>> #0 0x00007f83ea215ae9 in syscall () from /lib64/libc.so.6 >>>>> #1 0x00000000005bba35 in wait__ (e=0x2989390) at >>>>> pthread/ethr_event.c:92 >>>>> #2 ethr_event_wait (e=0x2989390) at pthread/ethr_event.c:218 >>>>> #3 0x00000000004ae5bd in erts_tse_wait (fcalls=, >>>>> esdp=0x7f83e8e2c440, rq=0x7f83e9d5e7c0) at beam/erl_threads.h:2319 >>>>> #4 scheduler_wait (fcalls=, esdp=0x7f83e8e2c440, >>>>> rq=0x7f83e9d5e7c0) at beam/erl_process.c:2087 >>>>> #5 0x00000000004afb94 in schedule (p=, >>>>> calls=) at beam/erl_process.c:6467 >>>>> #6 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >>>>> #7 0x00000000004b1279 in sched_thread_func (vesdp=0x7f83e8e2c440) at >>>>> beam/erl_process.c:4834 >>>>> #8 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at >>>>> pthread/ethread.c:106 >>>>> #9 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >>>>> #10 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>> (gdb) >>>>> >>>>> >>>>> (gdb) bt >>>>> #0 0x00007f83ea6db09d in waitpid () from /lib64/libpthread.so.0 >>>>> #1 0x0000000000555a9f in child_waiter (unused=) >>>>> at sys/unix/sys.c:2700 >>>>> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266d50) at >>>>> pthread/ethread.c:106 >>>>> #3 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >>>>> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>> (gdb) >>>>> >>>>> >>>>> **** END UPDATE **** >>>>> >>>>> >>>>> I'm happy to provide any information I can, so please don't hesitate to >>>>> ask. >>>>> >>>>> Thanks in advance! >>>>> >>>>> Kind Regards, >>>>> >>>>> Peter Membrey >>>>> >>>> >>> _______________________________________________ >>> erlang-bugs mailing list >>> erlang-bugs@REDACTED >>> http://erlang.org/mailman/listinfo/erlang-bugs > > From aronisstav@REDACTED Wed Nov 28 17:50:24 2012 From: aronisstav@REDACTED (Stavros Aronis) Date: Wed, 28 Nov 2012 17:50:24 +0100 Subject: [erlang-bugs] Fwd: exit(self(), normal) causes calling process to exit In-Reply-To: References: Message-ID: After some speculation on stackoverflow I think I will report this here as well. (I am directly copying the content of the question.) I am playing around with the exit/2function and its behavior when self() is used as a Pid and normal as a Reason. Erlang R15B03 (erts-5.9.3) [source] [64-bit] [smp:8:8] [async-threads:0] [hipe] [kernel-poll:false] Eshell V5.9.3 (abort with ^G) 1> self(). <0.32.0> 2> exit(self(), normal). ** exception exit: normal 3> self(). <0.35.0> Shouldn't it be the case that only a 'normal' exit message is sent to the shell process, so there is no reason to exit? Similarly: 4> spawn(fun() -> receive Pid -> Pid ! ok end end). <0.38.0> 5> exit(v(4), normal). true 6> v(4) ! self(). <0.35.0> 7> flush(). Shell got ok ok But: 8> spawn(fun() -> exit(self(), normal), receive _ -> ok end end). <0.43.0> 9> is_process_alive(v(8)). false -------------- next part -------------- An HTML attachment was scrubbed... URL: From pan@REDACTED Wed Nov 28 18:23:35 2012 From: pan@REDACTED (Patrik Nyblom) Date: Wed, 28 Nov 2012 18:23:35 +0100 Subject: [erlang-bugs] VM locks up on write to socket (and now it seems to file too) In-Reply-To: References: <50AEC81B.2000908@ninenines.eu> <50AFA09D.4060100@erlang.org> <50B633C7.7000709@erlang.org> Message-ID: <50B64897.2050300@erlang.org> Hi again! No problem reproducing when I've got CentOS 6.3... The following commands in the Erlang shell: {ok,L} = gen_tcp:listen(4747,[{active,false}]). {ok,S} = gen_tcp:connect("localhost",4747,[{active,false}]). {ok,A} = gen_tcp:accept(L). gen_tcp:send(A,binary:copy(<<$a:8>>,2158022464)). gives the following strace: [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 2158022464}], 1) = 0 [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 2158022464}], 1) = 0 [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 2158022464}], 1) = 0 [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 2158022464}], 1) = 0 [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 2158022464}], 1) = 0 [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 2158022464}], 1) = 0 [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 2158022464}], 1) = 0 [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 2158022464}], 1) = 0 [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 2158022464}], 1) = 0 [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 2158022464}], 1) = 0 [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 2158022464}], 1) = 0 [.....] While on ubuntu for example it works like it should...Looks like a kernel bug to me... I wonder if this should be worked around or just reported... I suppose both... Sigh... /Patrik On 11/28/2012 05:23 PM, Peter Membrey wrote: > Hi, > > No problem, I'll do what I can to help - thanks for looking into this > so quickly! > > Any idea what might be causing it? > > Cheers, > > Pete > > On 28 November 2012 23:54, Patrik Nyblom wrote: >> Hi! >> >> I'll upgrade the CentOS VM I have to 6.3 (only had 6.1 :() and see if I can >> reproduce. If that fails, could you run a VM with a patch to try to handle >> the unexpected case and see if that fixes it? >> >> Cheers, >> /Patrik >> >> On 11/24/2012 02:57 PM, Peter Membrey wrote: >>> Hi guys, >>> >>> Thanks for getting back in touch so quickly! >>> >>> I did do an lsof on the process and I can confirm that it was >>> definitely a socket. However by that time the application it had been >>> trying to send to had been killed. When I checked the sockets were >>> showing as waiting to close. Unfortunately I didn't think to do an >>> lsof until after the apps had been shut down. I was hoping the VM >>> would recover if I killed the app that had upset it. However even >>> after all the apps connected had been shut down, the issue didn't >>> resolve. >>> >>> The application receives requests from a client, which contains two >>> data items. The stream ID and a timestamp. Both are encoded as big >>> integer unsigned numbers. The server then looks through the file >>> referenced by the stream ID and uses the timestamp as an index. The >>> file format is currently really simple, in the form of: >>> >>> > >>> >>> There is an index file that provides an offset into the file based on >>> time stamp, but basically it opens the file, and reads sequentially >>> through it until it finds the timestamps that it cares about. In this >>> case it reads all data with a greater timestamp until the end of the >>> file is reached. It's possible the client is sending an incorrect >>> timestamp, and maybe too much data is being read. However the loop is >>> very primitive - it reads all the data in one go before passing it >>> back to the protocol handler to send down the socket; so by that time >>> even though the response is technically incorrect and the app has >>> failed, it should still not cause the VM any issues. >>> >>> The data is polled every 10 seconds by the client app so I would not >>> expect there to be 2GB of new data to send. I'm afraid my C skills are >>> somewhat limited, so I'm not sure how to put together a sample app to >>> try out writev. The platform is 64bit CentOS 6.3 (equivalent to RHEL >>> 6.3) so I'm not expecting any strange or weird behaviour from the OS >>> level but of course I could be completely wrong there. The OS is >>> running directly on hardware, so there's no VM layer to worry about. >>> >>> Hope this might offer some additional clues? >>> >>> Thanks again! >>> >>> Kind Regards, >>> >>> Peter Membrey >>> >>> >>> >>> On 24 November 2012 00:13, Patrik Nyblom wrote: >>>> Hi again! >>>> >>>> Could you go back to the version without the printouts and get back to >>>> the >>>> situation where writev loops returning 0 (as in the strace)? If so, it >>>> would >>>> be really interesting to see an 'lsof' of the beam process, to see if >>>> this >>>> file descriptor really is open and is a socket... >>>> >>>> The thing is that writev with a vector that is not empty, would never >>>> return >>>> 0 for a non blocking socket. Not on any modern (i.e. not ancient) POSIX >>>> compliant system anyway. Of course it is a *really* large item you are >>>> trying to write there, but it should be no problem for a 64bit linux. >>>> >>>> Also I think there is no use finding the Erlang code, I'll take that >>>> back, >>>> It would be more interesting to see what really happens at the OS/VM >>>> level >>>> in this case. >>>> >>>> Cheers, >>>> Patrik >>>> >>>> >>>> On 11/23/2012 01:49 AM, Lo?c Hoguin wrote: >>>>> Sending this on behalf of someone who didn't manage to get the email >>>>> sent >>>>> to this list after 2 attempts. If someone can check if he's hold up or >>>>> something that'd be great. >>>>> >>>>> Anyway he has a big issue so I hope I can relay the conversation >>>>> reliably. >>>>> >>>>> Thanks! >>>>> >>>>> On 11/23/2012 01:45 AM, Peter Membrey wrote: >>>>>> From: Peter Membrey >>>>>> Date: 22 November 2012 19:02 >>>>>> Subject: VM locks up on write to socket (and now it seems to file too) >>>>>> To: erlang-bugs@REDACTED >>>>>> >>>>>> >>>>>> Hi guys, >>>>>> >>>>>> I wrote a simple database application called CakeDB >>>>>> (https://github.com/pmembrey/cakedb) that basically spends its time >>>>>> reading and writing files and sockets. There's very little in the way >>>>>> of complex logic. It is running on CentOS 6.3 with all the updates >>>>>> applied. I hit this problem on R15B02 so I rolled back to R15B01 but >>>>>> the issue remained. Erlang was built from source. >>>>>> >>>>>> The machine has two Intel X5690 CPUs giving 12 cores plus HT. I've >>>>>> tried various arguments for the VM but so far nothing has prevented >>>>>> the problem. At the moment I'm using: >>>>>> >>>>>> +K >>>>>> +A 6 >>>>>> +sbt tnnps >>>>>> >>>>>> The issue I'm seeing is that one of the scheduler threads will hit >>>>>> 100% cpu usage and the entire VM will become unresponsive. When this >>>>>> happens, I am not able to connect via the console with attach and >>>>>> entop is also unable to connect. I can still establish TCP connections >>>>>> to the application, but I never receive a response. A standard kill >>>>>> signal will cause the VM to shut down (it doesn't need -9). >>>>>> >>>>>> Due to the pedigree of the VM I am quite willing to accept that I've >>>>>> made a fundamental mistake in my code. I am pretty sure that the way I >>>>>> am doing the file IO could result in some race conditions. However, my >>>>>> poor code aside, from what I understand, I still shouldn't be able to >>>>>> crash / deadlock the VM like this. >>>>>> >>>>>> The issue doesn't seem to be caused by load. The app can fail when >>>>>> it's very busy, but also when it is practically idle. I haven't been >>>>>> able to find a trigger or any other explanation for the failure. >>>>>> >>>>>> The thread maxing out the CPU is attempting to write data to the >>>>>> socket: >>>>>> >>>>>> (gdb) bt >>>>>> #0 0x00007f9882ab6377 in writev () from /lib64/libc.so.6 >>>>>> #1 0x000000000058a81f in tcp_inet_output (data=0x2407570, >>>>>> event=) at drivers/common/inet_drv.c:9681 >>>>>> #2 tcp_inet_drv_output (data=0x2407570, event=) >>>>>> at drivers/common/inet_drv.c:9601 >>>>>> #3 0x00000000004b773f in erts_port_task_execute (runq=0x7f98826019c0, >>>>>> curr_port_pp=0x7f9881639338) at beam/erl_port_task.c:858 >>>>>> #4 0x00000000004afd83 in schedule (p=, >>>>>> calls=) at beam/erl_process.c:6533 >>>>>> #5 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >>>>>> #6 0x00000000004b1279 in sched_thread_func (vesdp=0x7f9881639280) at >>>>>> beam/erl_process.c:4834 >>>>>> #7 0x00000000005ba726 in thr_wrapper (vtwd=0x7fff6cfe2300) at >>>>>> pthread/ethread.c:106 >>>>>> #8 0x00007f9882f78851 in start_thread () from /lib64/libpthread.so.0 >>>>>> #9 0x00007f9882abe11d in clone () from /lib64/libc.so.6 >>>>>> (gdb) >>>>>> >>>>>> I then tried running strace on that thread and got (indefinitely): >>>>>> >>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>> ... >>>>>> >>>>>> From what I can tell, it's trying to write data to a socket, which is >>>>>> succeeding, but writing 0 bytes. From the earlier definitions in the >>>>>> source file, an error condition would be signified by a negative >>>>>> number. Any other result is the number of bytes written, in this case >>>>>> 0. I'm not sure if this is desired behaviour or not. I've tried >>>>>> killing the application on the other end of the socket, but it has no >>>>>> effect on the VM. >>>>>> >>>>>> I have enabled debugging for the inet code, so hopefully this will >>>>>> give a little more insight. I am currently trying to reproduce the >>>>>> condition, but as I really have no idea what causes it, it's pretty >>>>>> much a case of wait and see. >>>>>> >>>>>> >>>>>> **** UPDATE **** >>>>>> >>>>>> I managed to lock up the VM again, but this time it was caused by file >>>>>> IO, >>>>>> probably from the debugging statements. Although it worked fine for >>>>>> some >>>>>> time >>>>>> the last entry in the file was cut off. >>>>>> >>>>>> From GDB: >>>>>> >>>>>> (gdb) info threads >>>>>> 53 Thread 0x7f83e988b700 (LWP 8621) 0x00007f83ea6da54d in read () >>>>>> from /lib64/libpthread.so.0 >>>>>> 52 Thread 0x7f83e8c8f700 (LWP 8622) 0x00007f83ea6d743c in >>>>>> pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >>>>>> 51 Thread 0x7f83e818d700 (LWP 8623) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 50 Thread 0x7f83e816b700 (LWP 8624) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 49 Thread 0x7f83e8149700 (LWP 8625) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 48 Thread 0x7f83e8127700 (LWP 8626) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 47 Thread 0x7f83e8105700 (LWP 8627) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 46 Thread 0x7f83e80e3700 (LWP 8628) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 45 Thread 0x7f83e80c1700 (LWP 8629) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 44 Thread 0x7f83e809f700 (LWP 8630) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 43 Thread 0x7f83e807d700 (LWP 8631) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 42 Thread 0x7f83e805b700 (LWP 8632) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 41 Thread 0x7f83e8039700 (LWP 8633) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 40 Thread 0x7f83e8017700 (LWP 8634) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 39 Thread 0x7f83e7ff5700 (LWP 8635) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 38 Thread 0x7f83e7fd3700 (LWP 8636) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 37 Thread 0x7f83e7fb1700 (LWP 8637) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 36 Thread 0x7f83e7f8f700 (LWP 8638) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 35 Thread 0x7f83e7f6d700 (LWP 8639) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 34 Thread 0x7f83e7f4b700 (LWP 8640) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 33 Thread 0x7f83e7f29700 (LWP 8641) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 32 Thread 0x7f83e7f07700 (LWP 8642) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 31 Thread 0x7f83e7ee5700 (LWP 8643) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 30 Thread 0x7f83e7ec3700 (LWP 8644) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 29 Thread 0x7f83e7ea1700 (LWP 8645) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 28 Thread 0x7f83e7e7f700 (LWP 8646) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 27 Thread 0x7f83d7c5a700 (LWP 8647) 0x00007f83ea6db09d in waitpid >>>>>> () from /lib64/libpthread.so.0 >>>>>> 26 Thread 0x7f83d7c53700 (LWP 8648) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 25 Thread 0x7f83d7252700 (LWP 8649) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 24 Thread 0x7f83d6851700 (LWP 8650) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 23 Thread 0x7f83d5e50700 (LWP 8651) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 22 Thread 0x7f83d544f700 (LWP 8652) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 21 Thread 0x7f83d4a4e700 (LWP 8653) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 20 Thread 0x7f83d404d700 (LWP 8654) 0x00007f83ea20be7d in write () >>>>>> from /lib64/libc.so.6 >>>>>> 19 Thread 0x7f83d364c700 (LWP 8655) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 18 Thread 0x7f83d2c4b700 (LWP 8656) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 17 Thread 0x7f83d224a700 (LWP 8657) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 16 Thread 0x7f83d1849700 (LWP 8658) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 15 Thread 0x7f83d0e48700 (LWP 8659) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 14 Thread 0x7f83d0447700 (LWP 8660) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 13 Thread 0x7f83cfa46700 (LWP 8661) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 12 Thread 0x7f83cf045700 (LWP 8662) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 11 Thread 0x7f83ce644700 (LWP 8663) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 10 Thread 0x7f83cdc43700 (LWP 8664) 0x00007f83ea215ae9 in syscall >>>>>> () from /lib64/libc.so.6 >>>>>> 9 Thread 0x7f83cd242700 (LWP 8665) 0x00007f83ea215ae9 in syscall >>>>>> () >>>>>> from /lib64/libc.so.6 >>>>>> 8 Thread 0x7f83cc841700 (LWP 8666) 0x00007f83ea215ae9 in syscall >>>>>> () >>>>>> from /lib64/libc.so.6 >>>>>> 7 Thread 0x7f83cbe40700 (LWP 8667) 0x00007f83ea215ae9 in syscall >>>>>> () >>>>>> from /lib64/libc.so.6 >>>>>> 6 Thread 0x7f83cb43f700 (LWP 8668) 0x00007f83ea215ae9 in syscall >>>>>> () >>>>>> from /lib64/libc.so.6 >>>>>> 5 Thread 0x7f83caa3e700 (LWP 8669) 0x00007f83ea215ae9 in syscall >>>>>> () >>>>>> from /lib64/libc.so.6 >>>>>> 4 Thread 0x7f83ca03d700 (LWP 8670) 0x00007f83ea215ae9 in syscall >>>>>> () >>>>>> from /lib64/libc.so.6 >>>>>> 3 Thread 0x7f83c963c700 (LWP 8671) 0x00007f83ea215ae9 in syscall >>>>>> () >>>>>> from /lib64/libc.so.6 >>>>>> 2 Thread 0x7f83c8c3b700 (LWP 8672) 0x00007f83ea215ae9 in syscall >>>>>> () >>>>>> from /lib64/libc.so.6 >>>>>> * 1 Thread 0x7f83eb3a8700 (LWP 8597) 0x00007f83ea211d03 in select () >>>>>> from /lib64/libc.so.6 >>>>>> (gdb) >>>>>> >>>>>> >>>>>> (gdb) bt >>>>>> #0 0x00007f83ea20be7d in write () from /lib64/libc.so.6 >>>>>> #1 0x00007f83ea1a2583 in _IO_new_file_write () from /lib64/libc.so.6 >>>>>> #2 0x00007f83ea1a3b35 in _IO_new_do_write () from /lib64/libc.so.6 >>>>>> #3 0x00007f83ea1a21fd in _IO_new_file_xsputn () from /lib64/libc.so.6 >>>>>> #4 0x00007f83ea17589d in vfprintf () from /lib64/libc.so.6 >>>>>> #5 0x00007f83ea18003a in printf () from /lib64/libc.so.6 >>>>>> #6 0x000000000058f0e8 in tcp_recv (desc=0x2c3d350, request_len=0) at >>>>>> drivers/common/inet_drv.c:8976 >>>>>> #7 0x000000000058f63a in tcp_inet_input (data=0x2c3d350, event=>>>>> optimized out>) at drivers/common/inet_drv.c:9326 >>>>>> #8 tcp_inet_drv_input (data=0x2c3d350, event=) >>>>>> at drivers/common/inet_drv.c:9604 >>>>>> #9 0x00000000004b770f in erts_port_task_execute (runq=0x7f83e9d5d3c0, >>>>>> curr_port_pp=0x7f83e8dc6e78) at beam/erl_port_task.c:851 >>>>>> #10 0x00000000004afd83 in schedule (p=, >>>>>> calls=) at beam/erl_process.c:6533 >>>>>> #11 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >>>>>> #12 0x00000000004b1279 in sched_thread_func (vesdp=0x7f83e8dc6dc0) at >>>>>> beam/erl_process.c:4834 >>>>>> #13 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at >>>>>> pthread/ethread.c:106 >>>>>> #14 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >>>>>> #15 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>>> (gdb) >>>>>> >>>>>> (gdb) bt >>>>>> #0 0x00007f83ea6da54d in read () from /lib64/libpthread.so.0 >>>>>> #1 0x0000000000554b6e in signal_dispatcher_thread_func (unused=>>>>> optimized out>) at sys/unix/sys.c:2776 >>>>>> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266c80) at >>>>>> pthread/ethread.c:106 >>>>>> #3 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >>>>>> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>>> (gdb) >>>>>> >>>>>> (gdb) bt >>>>>> #0 0x00007f83ea215ae9 in syscall () from /lib64/libc.so.6 >>>>>> #1 0x00000000005bba35 in wait__ (e=0x2989390) at >>>>>> pthread/ethr_event.c:92 >>>>>> #2 ethr_event_wait (e=0x2989390) at pthread/ethr_event.c:218 >>>>>> #3 0x00000000004ae5bd in erts_tse_wait (fcalls=, >>>>>> esdp=0x7f83e8e2c440, rq=0x7f83e9d5e7c0) at beam/erl_threads.h:2319 >>>>>> #4 scheduler_wait (fcalls=, esdp=0x7f83e8e2c440, >>>>>> rq=0x7f83e9d5e7c0) at beam/erl_process.c:2087 >>>>>> #5 0x00000000004afb94 in schedule (p=, >>>>>> calls=) at beam/erl_process.c:6467 >>>>>> #6 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >>>>>> #7 0x00000000004b1279 in sched_thread_func (vesdp=0x7f83e8e2c440) at >>>>>> beam/erl_process.c:4834 >>>>>> #8 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at >>>>>> pthread/ethread.c:106 >>>>>> #9 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >>>>>> #10 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>>> (gdb) >>>>>> >>>>>> >>>>>> (gdb) bt >>>>>> #0 0x00007f83ea6db09d in waitpid () from /lib64/libpthread.so.0 >>>>>> #1 0x0000000000555a9f in child_waiter (unused=) >>>>>> at sys/unix/sys.c:2700 >>>>>> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266d50) at >>>>>> pthread/ethread.c:106 >>>>>> #3 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >>>>>> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>>> (gdb) >>>>>> >>>>>> >>>>>> **** END UPDATE **** >>>>>> >>>>>> >>>>>> I'm happy to provide any information I can, so please don't hesitate to >>>>>> ask. >>>>>> >>>>>> Thanks in advance! >>>>>> >>>>>> Kind Regards, >>>>>> >>>>>> Peter Membrey >>>>>> >>>> _______________________________________________ >>>> erlang-bugs mailing list >>>> erlang-bugs@REDACTED >>>> http://erlang.org/mailman/listinfo/erlang-bugs >> From daniel@REDACTED Wed Nov 28 19:07:52 2012 From: daniel@REDACTED (Daniel Luna) Date: Wed, 28 Nov 2012 13:07:52 -0500 Subject: [erlang-bugs] Fwd: exit(self(), normal) causes calling process to exit In-Reply-To: References: Message-ID: I replied on StackOverflow, but the gist of the problem is that you don't trap exits. 1> self(). <0.32.0> 2> process_flag(trap_exit, true). false 3> exit(self(), normal). true 4> self(). <0.32.0> 5> flush(). Shell got {'EXIT',<0.32.0>,normal} ok Cheers, Daniel On 28 November 2012 11:50, Stavros Aronis wrote: > After some speculation on stackoverflow I think I will report this here as > well. (I am directly copying the content of the question.) > > I am playing around with the exit/2 function and its behavior when self() is > used as a Pid and normal as a Reason. > > Erlang R15B03 (erts-5.9.3) [source] [64-bit] [smp:8:8] [async-threads:0] > [hipe] [kernel-poll:false] > > Eshell V5.9.3 (abort with ^G) > 1> self(). > <0.32.0> > 2> exit(self(), normal). > ** exception exit: normal > 3> self(). > <0.35.0> > > Shouldn't it be the case that only a 'normal' exit message is sent to the > shell process, so there is no reason to exit? > > Similarly: > > 4> spawn(fun() -> receive Pid -> Pid ! ok end end). > <0.38.0> > 5> exit(v(4), normal). > true > 6> v(4) ! self(). > <0.35.0> > 7> flush(). > Shell got ok > ok > > But: > > 8> spawn(fun() -> exit(self(), normal), receive _ -> ok end end). > <0.43.0> > 9> is_process_alive(v(8)). > false > > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > From vinoski@REDACTED Wed Nov 28 19:27:58 2012 From: vinoski@REDACTED (Steve Vinoski) Date: Wed, 28 Nov 2012 13:27:58 -0500 Subject: [erlang-bugs] SSL accept timeout broken in R15B03? Message-ID: In trying to verify Yaws under R15B03 I noticed it was failing its SSL accept timeout test, which works fine under previous Erlang/OTP versions. Compile this module and run its start/0 function to reproduce the problem: https://gist.github.com/4163038 The test does an SSL accept with a timeout, then does a TCP connect from a client which of course won't complete the handshake and should cause the timeout to kick in. Unfortunately the timeout doesn't occur. --steve -------------- next part -------------- An HTML attachment was scrubbed... URL: From vinoski@REDACTED Wed Nov 28 20:47:22 2012 From: vinoski@REDACTED (Steve Vinoski) Date: Wed, 28 Nov 2012 14:47:22 -0500 Subject: [erlang-bugs] SSL accept timeout broken in R15B03? In-Reply-To: References: Message-ID: On Wed, Nov 28, 2012 at 1:27 PM, Steve Vinoski wrote: > In trying to verify Yaws under R15B03 I noticed it was failing its SSL > accept timeout test, which works fine under previous Erlang/OTP versions. > > Compile this module and run its start/0 function to reproduce the problem: > > https://gist.github.com/4163038 > > The test does an SSL accept with a timeout, then does a TCP connect from a > client which of course won't complete the handshake and should cause the > timeout to kick in. Unfortunately the timeout doesn't occur. > Running a git bisect in the otp repo shows commit 8a789189 to be the culprit. --steve -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel@REDACTED Wed Nov 28 20:50:49 2012 From: daniel@REDACTED (Daniel Luna) Date: Wed, 28 Nov 2012 14:50:49 -0500 Subject: [erlang-bugs] Fwd: exit(self(), normal) causes calling process to exit In-Reply-To: References: Message-ID: I withdraw my comment. It's still true that it works when trapping exits, but apparently you shouldn't have to. >From the docs: "If Reason is the atom normal, Pid will not exit." I call bug on this. Cheers, Daniel On 28 November 2012 13:07, Daniel Luna wrote: > I replied on StackOverflow, but the gist of the problem is that you > don't trap exits. > > 1> self(). > <0.32.0> > 2> process_flag(trap_exit, true). > false > 3> exit(self(), normal). > true > 4> self(). > <0.32.0> > 5> flush(). > Shell got {'EXIT',<0.32.0>,normal} > ok > > Cheers, > > Daniel > > On 28 November 2012 11:50, Stavros Aronis wrote: >> After some speculation on stackoverflow I think I will report this here as >> well. (I am directly copying the content of the question.) >> >> I am playing around with the exit/2 function and its behavior when self() is >> used as a Pid and normal as a Reason. >> >> Erlang R15B03 (erts-5.9.3) [source] [64-bit] [smp:8:8] [async-threads:0] >> [hipe] [kernel-poll:false] >> >> Eshell V5.9.3 (abort with ^G) >> 1> self(). >> <0.32.0> >> 2> exit(self(), normal). >> ** exception exit: normal >> 3> self(). >> <0.35.0> >> >> Shouldn't it be the case that only a 'normal' exit message is sent to the >> shell process, so there is no reason to exit? >> >> Similarly: >> >> 4> spawn(fun() -> receive Pid -> Pid ! ok end end). >> <0.38.0> >> 5> exit(v(4), normal). >> true >> 6> v(4) ! self(). >> <0.35.0> >> 7> flush(). >> Shell got ok >> ok >> >> But: >> >> 8> spawn(fun() -> exit(self(), normal), receive _ -> ok end end). >> <0.43.0> >> 9> is_process_alive(v(8)). >> false >> >> >> _______________________________________________ >> erlang-bugs mailing list >> erlang-bugs@REDACTED >> http://erlang.org/mailman/listinfo/erlang-bugs >> From tuncer.ayaz@REDACTED Thu Nov 29 00:22:15 2012 From: tuncer.ayaz@REDACTED (Tuncer Ayaz) Date: Thu, 29 Nov 2012 00:22:15 +0100 Subject: [erlang-bugs] R15B03 PLT three unknown types Message-ID: Building a fairly complete PLT for R15B03 Dialyzer correctly reports three unknown types: Unknown types: ct:hook_options/0 inet:host_name/0 ssl:sslsock/0 Grepping the tree for the types: lib/common_test/src/ct_netconfc.erl: -export_type([hook_options/0, lib/common_test/src/ct_netconfc.erl: -type hook_options() :: [hook_option()]. lib/common_test/src/ct_netconfc.erl: -type host() :: inet:host_name() | inet:ip_address(). lib/diameter/src/transport/diameter_tcp.erl: {socket :: inet:socket() | ssl:sslsock(), %% accept or connect socket0 From peter@REDACTED Thu Nov 29 04:41:42 2012 From: peter@REDACTED (Peter Membrey) Date: Thu, 29 Nov 2012 11:41:42 +0800 Subject: [erlang-bugs] VM locks up on write to socket (and now it seems to file too) In-Reply-To: <50B64897.2050300@erlang.org> References: <50AEC81B.2000908@ninenines.eu> <50AFA09D.4060100@erlang.org> <50B633C7.7000709@erlang.org> <50B64897.2050300@erlang.org> Message-ID: Hi Patrik, I can also confirm that this bug exists on Red Hat Enterprise Linux 6.3. I'll raise a support ticket with them as well. A workaround in the vm would be nice if you have time? :-) Cheers, Pete On 29 November 2012 01:23, Patrik Nyblom wrote: > Hi again! > > No problem reproducing when I've got CentOS 6.3... The following commands in > the Erlang shell: > {ok,L} = gen_tcp:listen(4747,[{active,false}]). > {ok,S} = gen_tcp:connect("localhost",4747,[{active,false}]). > {ok,A} = gen_tcp:accept(L). > gen_tcp:send(A,binary:copy(<<$a:8>>,2158022464)). > > gives the following strace: > [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., > 2158022464}], 1) = 0 > [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., > 2158022464}], 1) = 0 > [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., > 2158022464}], 1) = 0 > [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., > 2158022464}], 1) = 0 > [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., > 2158022464}], 1) = 0 > [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., > 2158022464}], 1) = 0 > [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., > 2158022464}], 1) = 0 > [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., > 2158022464}], 1) = 0 > [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., > 2158022464}], 1) = 0 > [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., > 2158022464}], 1) = 0 > [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., > 2158022464}], 1) = 0 > [.....] > > While on ubuntu for example it works like it should...Looks like a kernel > bug to me... I wonder if this should be worked around or just reported... I > suppose both... Sigh... > > /Patrik > > > On 11/28/2012 05:23 PM, Peter Membrey wrote: >> >> Hi, >> >> No problem, I'll do what I can to help - thanks for looking into this >> so quickly! >> >> Any idea what might be causing it? >> >> Cheers, >> >> Pete >> >> On 28 November 2012 23:54, Patrik Nyblom wrote: >>> >>> Hi! >>> >>> I'll upgrade the CentOS VM I have to 6.3 (only had 6.1 :() and see if I >>> can >>> reproduce. If that fails, could you run a VM with a patch to try to >>> handle >>> the unexpected case and see if that fixes it? >>> >>> Cheers, >>> /Patrik >>> >>> On 11/24/2012 02:57 PM, Peter Membrey wrote: >>>> >>>> Hi guys, >>>> >>>> Thanks for getting back in touch so quickly! >>>> >>>> I did do an lsof on the process and I can confirm that it was >>>> definitely a socket. However by that time the application it had been >>>> trying to send to had been killed. When I checked the sockets were >>>> showing as waiting to close. Unfortunately I didn't think to do an >>>> lsof until after the apps had been shut down. I was hoping the VM >>>> would recover if I killed the app that had upset it. However even >>>> after all the apps connected had been shut down, the issue didn't >>>> resolve. >>>> >>>> The application receives requests from a client, which contains two >>>> data items. The stream ID and a timestamp. Both are encoded as big >>>> integer unsigned numbers. The server then looks through the file >>>> referenced by the stream ID and uses the timestamp as an index. The >>>> file format is currently really simple, in the form of: >>>> >>>> >>>> > >>>> >>>> There is an index file that provides an offset into the file based on >>>> time stamp, but basically it opens the file, and reads sequentially >>>> through it until it finds the timestamps that it cares about. In this >>>> case it reads all data with a greater timestamp until the end of the >>>> file is reached. It's possible the client is sending an incorrect >>>> timestamp, and maybe too much data is being read. However the loop is >>>> very primitive - it reads all the data in one go before passing it >>>> back to the protocol handler to send down the socket; so by that time >>>> even though the response is technically incorrect and the app has >>>> failed, it should still not cause the VM any issues. >>>> >>>> The data is polled every 10 seconds by the client app so I would not >>>> expect there to be 2GB of new data to send. I'm afraid my C skills are >>>> somewhat limited, so I'm not sure how to put together a sample app to >>>> try out writev. The platform is 64bit CentOS 6.3 (equivalent to RHEL >>>> 6.3) so I'm not expecting any strange or weird behaviour from the OS >>>> level but of course I could be completely wrong there. The OS is >>>> running directly on hardware, so there's no VM layer to worry about. >>>> >>>> Hope this might offer some additional clues? >>>> >>>> Thanks again! >>>> >>>> Kind Regards, >>>> >>>> Peter Membrey >>>> >>>> >>>> >>>> On 24 November 2012 00:13, Patrik Nyblom wrote: >>>>> >>>>> Hi again! >>>>> >>>>> Could you go back to the version without the printouts and get back to >>>>> the >>>>> situation where writev loops returning 0 (as in the strace)? If so, it >>>>> would >>>>> be really interesting to see an 'lsof' of the beam process, to see if >>>>> this >>>>> file descriptor really is open and is a socket... >>>>> >>>>> The thing is that writev with a vector that is not empty, would never >>>>> return >>>>> 0 for a non blocking socket. Not on any modern (i.e. not ancient) POSIX >>>>> compliant system anyway. Of course it is a *really* large item you are >>>>> trying to write there, but it should be no problem for a 64bit linux. >>>>> >>>>> Also I think there is no use finding the Erlang code, I'll take that >>>>> back, >>>>> It would be more interesting to see what really happens at the OS/VM >>>>> level >>>>> in this case. >>>>> >>>>> Cheers, >>>>> Patrik >>>>> >>>>> >>>>> On 11/23/2012 01:49 AM, Lo?c Hoguin wrote: >>>>>> >>>>>> Sending this on behalf of someone who didn't manage to get the email >>>>>> sent >>>>>> to this list after 2 attempts. If someone can check if he's hold up or >>>>>> something that'd be great. >>>>>> >>>>>> Anyway he has a big issue so I hope I can relay the conversation >>>>>> reliably. >>>>>> >>>>>> Thanks! >>>>>> >>>>>> On 11/23/2012 01:45 AM, Peter Membrey wrote: >>>>>>> >>>>>>> From: Peter Membrey >>>>>>> Date: 22 November 2012 19:02 >>>>>>> Subject: VM locks up on write to socket (and now it seems to file >>>>>>> too) >>>>>>> To: erlang-bugs@REDACTED >>>>>>> >>>>>>> >>>>>>> Hi guys, >>>>>>> >>>>>>> I wrote a simple database application called CakeDB >>>>>>> (https://github.com/pmembrey/cakedb) that basically spends its time >>>>>>> reading and writing files and sockets. There's very little in the way >>>>>>> of complex logic. It is running on CentOS 6.3 with all the updates >>>>>>> applied. I hit this problem on R15B02 so I rolled back to R15B01 but >>>>>>> the issue remained. Erlang was built from source. >>>>>>> >>>>>>> The machine has two Intel X5690 CPUs giving 12 cores plus HT. I've >>>>>>> tried various arguments for the VM but so far nothing has prevented >>>>>>> the problem. At the moment I'm using: >>>>>>> >>>>>>> +K >>>>>>> +A 6 >>>>>>> +sbt tnnps >>>>>>> >>>>>>> The issue I'm seeing is that one of the scheduler threads will hit >>>>>>> 100% cpu usage and the entire VM will become unresponsive. When this >>>>>>> happens, I am not able to connect via the console with attach and >>>>>>> entop is also unable to connect. I can still establish TCP >>>>>>> connections >>>>>>> to the application, but I never receive a response. A standard kill >>>>>>> signal will cause the VM to shut down (it doesn't need -9). >>>>>>> >>>>>>> Due to the pedigree of the VM I am quite willing to accept that I've >>>>>>> made a fundamental mistake in my code. I am pretty sure that the way >>>>>>> I >>>>>>> am doing the file IO could result in some race conditions. However, >>>>>>> my >>>>>>> poor code aside, from what I understand, I still shouldn't be able to >>>>>>> crash / deadlock the VM like this. >>>>>>> >>>>>>> The issue doesn't seem to be caused by load. The app can fail when >>>>>>> it's very busy, but also when it is practically idle. I haven't been >>>>>>> able to find a trigger or any other explanation for the failure. >>>>>>> >>>>>>> The thread maxing out the CPU is attempting to write data to the >>>>>>> socket: >>>>>>> >>>>>>> (gdb) bt >>>>>>> #0 0x00007f9882ab6377 in writev () from /lib64/libc.so.6 >>>>>>> #1 0x000000000058a81f in tcp_inet_output (data=0x2407570, >>>>>>> event=) at drivers/common/inet_drv.c:9681 >>>>>>> #2 tcp_inet_drv_output (data=0x2407570, event=) >>>>>>> at drivers/common/inet_drv.c:9601 >>>>>>> #3 0x00000000004b773f in erts_port_task_execute >>>>>>> (runq=0x7f98826019c0, >>>>>>> curr_port_pp=0x7f9881639338) at beam/erl_port_task.c:858 >>>>>>> #4 0x00000000004afd83 in schedule (p=, >>>>>>> calls=) at beam/erl_process.c:6533 >>>>>>> #5 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >>>>>>> #6 0x00000000004b1279 in sched_thread_func (vesdp=0x7f9881639280) at >>>>>>> beam/erl_process.c:4834 >>>>>>> #7 0x00000000005ba726 in thr_wrapper (vtwd=0x7fff6cfe2300) at >>>>>>> pthread/ethread.c:106 >>>>>>> #8 0x00007f9882f78851 in start_thread () from /lib64/libpthread.so.0 >>>>>>> #9 0x00007f9882abe11d in clone () from /lib64/libc.so.6 >>>>>>> (gdb) >>>>>>> >>>>>>> I then tried running strace on that thread and got (indefinitely): >>>>>>> >>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>> ... >>>>>>> >>>>>>> From what I can tell, it's trying to write data to a socket, which >>>>>>> is >>>>>>> succeeding, but writing 0 bytes. From the earlier definitions in the >>>>>>> source file, an error condition would be signified by a negative >>>>>>> number. Any other result is the number of bytes written, in this case >>>>>>> 0. I'm not sure if this is desired behaviour or not. I've tried >>>>>>> killing the application on the other end of the socket, but it has no >>>>>>> effect on the VM. >>>>>>> >>>>>>> I have enabled debugging for the inet code, so hopefully this will >>>>>>> give a little more insight. I am currently trying to reproduce the >>>>>>> condition, but as I really have no idea what causes it, it's pretty >>>>>>> much a case of wait and see. >>>>>>> >>>>>>> >>>>>>> **** UPDATE **** >>>>>>> >>>>>>> I managed to lock up the VM again, but this time it was caused by >>>>>>> file >>>>>>> IO, >>>>>>> probably from the debugging statements. Although it worked fine for >>>>>>> some >>>>>>> time >>>>>>> the last entry in the file was cut off. >>>>>>> >>>>>>> From GDB: >>>>>>> >>>>>>> (gdb) info threads >>>>>>> 53 Thread 0x7f83e988b700 (LWP 8621) 0x00007f83ea6da54d in read >>>>>>> () >>>>>>> from /lib64/libpthread.so.0 >>>>>>> 52 Thread 0x7f83e8c8f700 (LWP 8622) 0x00007f83ea6d743c in >>>>>>> pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >>>>>>> 51 Thread 0x7f83e818d700 (LWP 8623) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 50 Thread 0x7f83e816b700 (LWP 8624) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 49 Thread 0x7f83e8149700 (LWP 8625) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 48 Thread 0x7f83e8127700 (LWP 8626) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 47 Thread 0x7f83e8105700 (LWP 8627) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 46 Thread 0x7f83e80e3700 (LWP 8628) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 45 Thread 0x7f83e80c1700 (LWP 8629) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 44 Thread 0x7f83e809f700 (LWP 8630) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 43 Thread 0x7f83e807d700 (LWP 8631) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 42 Thread 0x7f83e805b700 (LWP 8632) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 41 Thread 0x7f83e8039700 (LWP 8633) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 40 Thread 0x7f83e8017700 (LWP 8634) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 39 Thread 0x7f83e7ff5700 (LWP 8635) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 38 Thread 0x7f83e7fd3700 (LWP 8636) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 37 Thread 0x7f83e7fb1700 (LWP 8637) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 36 Thread 0x7f83e7f8f700 (LWP 8638) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 35 Thread 0x7f83e7f6d700 (LWP 8639) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 34 Thread 0x7f83e7f4b700 (LWP 8640) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 33 Thread 0x7f83e7f29700 (LWP 8641) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 32 Thread 0x7f83e7f07700 (LWP 8642) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 31 Thread 0x7f83e7ee5700 (LWP 8643) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 30 Thread 0x7f83e7ec3700 (LWP 8644) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 29 Thread 0x7f83e7ea1700 (LWP 8645) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 28 Thread 0x7f83e7e7f700 (LWP 8646) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 27 Thread 0x7f83d7c5a700 (LWP 8647) 0x00007f83ea6db09d in >>>>>>> waitpid >>>>>>> () from /lib64/libpthread.so.0 >>>>>>> 26 Thread 0x7f83d7c53700 (LWP 8648) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 25 Thread 0x7f83d7252700 (LWP 8649) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 24 Thread 0x7f83d6851700 (LWP 8650) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 23 Thread 0x7f83d5e50700 (LWP 8651) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 22 Thread 0x7f83d544f700 (LWP 8652) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 21 Thread 0x7f83d4a4e700 (LWP 8653) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 20 Thread 0x7f83d404d700 (LWP 8654) 0x00007f83ea20be7d in write >>>>>>> () >>>>>>> from /lib64/libc.so.6 >>>>>>> 19 Thread 0x7f83d364c700 (LWP 8655) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 18 Thread 0x7f83d2c4b700 (LWP 8656) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 17 Thread 0x7f83d224a700 (LWP 8657) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 16 Thread 0x7f83d1849700 (LWP 8658) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 15 Thread 0x7f83d0e48700 (LWP 8659) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 14 Thread 0x7f83d0447700 (LWP 8660) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 13 Thread 0x7f83cfa46700 (LWP 8661) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 12 Thread 0x7f83cf045700 (LWP 8662) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 11 Thread 0x7f83ce644700 (LWP 8663) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 10 Thread 0x7f83cdc43700 (LWP 8664) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () from /lib64/libc.so.6 >>>>>>> 9 Thread 0x7f83cd242700 (LWP 8665) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () >>>>>>> from /lib64/libc.so.6 >>>>>>> 8 Thread 0x7f83cc841700 (LWP 8666) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () >>>>>>> from /lib64/libc.so.6 >>>>>>> 7 Thread 0x7f83cbe40700 (LWP 8667) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () >>>>>>> from /lib64/libc.so.6 >>>>>>> 6 Thread 0x7f83cb43f700 (LWP 8668) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () >>>>>>> from /lib64/libc.so.6 >>>>>>> 5 Thread 0x7f83caa3e700 (LWP 8669) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () >>>>>>> from /lib64/libc.so.6 >>>>>>> 4 Thread 0x7f83ca03d700 (LWP 8670) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () >>>>>>> from /lib64/libc.so.6 >>>>>>> 3 Thread 0x7f83c963c700 (LWP 8671) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () >>>>>>> from /lib64/libc.so.6 >>>>>>> 2 Thread 0x7f83c8c3b700 (LWP 8672) 0x00007f83ea215ae9 in >>>>>>> syscall >>>>>>> () >>>>>>> from /lib64/libc.so.6 >>>>>>> * 1 Thread 0x7f83eb3a8700 (LWP 8597) 0x00007f83ea211d03 in select () >>>>>>> from /lib64/libc.so.6 >>>>>>> (gdb) >>>>>>> >>>>>>> >>>>>>> (gdb) bt >>>>>>> #0 0x00007f83ea20be7d in write () from /lib64/libc.so.6 >>>>>>> #1 0x00007f83ea1a2583 in _IO_new_file_write () from /lib64/libc.so.6 >>>>>>> #2 0x00007f83ea1a3b35 in _IO_new_do_write () from /lib64/libc.so.6 >>>>>>> #3 0x00007f83ea1a21fd in _IO_new_file_xsputn () from >>>>>>> /lib64/libc.so.6 >>>>>>> #4 0x00007f83ea17589d in vfprintf () from /lib64/libc.so.6 >>>>>>> #5 0x00007f83ea18003a in printf () from /lib64/libc.so.6 >>>>>>> #6 0x000000000058f0e8 in tcp_recv (desc=0x2c3d350, request_len=0) at >>>>>>> drivers/common/inet_drv.c:8976 >>>>>>> #7 0x000000000058f63a in tcp_inet_input (data=0x2c3d350, >>>>>>> event=>>>>>> optimized out>) at drivers/common/inet_drv.c:9326 >>>>>>> #8 tcp_inet_drv_input (data=0x2c3d350, event=) >>>>>>> at drivers/common/inet_drv.c:9604 >>>>>>> #9 0x00000000004b770f in erts_port_task_execute >>>>>>> (runq=0x7f83e9d5d3c0, >>>>>>> curr_port_pp=0x7f83e8dc6e78) at beam/erl_port_task.c:851 >>>>>>> #10 0x00000000004afd83 in schedule (p=, >>>>>>> calls=) at beam/erl_process.c:6533 >>>>>>> #11 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >>>>>>> #12 0x00000000004b1279 in sched_thread_func (vesdp=0x7f83e8dc6dc0) at >>>>>>> beam/erl_process.c:4834 >>>>>>> #13 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at >>>>>>> pthread/ethread.c:106 >>>>>>> #14 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >>>>>>> #15 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>>>> (gdb) >>>>>>> >>>>>>> (gdb) bt >>>>>>> #0 0x00007f83ea6da54d in read () from /lib64/libpthread.so.0 >>>>>>> #1 0x0000000000554b6e in signal_dispatcher_thread_func >>>>>>> (unused=>>>>>> optimized out>) at sys/unix/sys.c:2776 >>>>>>> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266c80) at >>>>>>> pthread/ethread.c:106 >>>>>>> #3 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >>>>>>> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>>>> (gdb) >>>>>>> >>>>>>> (gdb) bt >>>>>>> #0 0x00007f83ea215ae9 in syscall () from /lib64/libc.so.6 >>>>>>> #1 0x00000000005bba35 in wait__ (e=0x2989390) at >>>>>>> pthread/ethr_event.c:92 >>>>>>> #2 ethr_event_wait (e=0x2989390) at pthread/ethr_event.c:218 >>>>>>> #3 0x00000000004ae5bd in erts_tse_wait (fcalls=>>>>>> out>, >>>>>>> esdp=0x7f83e8e2c440, rq=0x7f83e9d5e7c0) at beam/erl_threads.h:2319 >>>>>>> #4 scheduler_wait (fcalls=, >>>>>>> esdp=0x7f83e8e2c440, >>>>>>> rq=0x7f83e9d5e7c0) at beam/erl_process.c:2087 >>>>>>> #5 0x00000000004afb94 in schedule (p=, >>>>>>> calls=) at beam/erl_process.c:6467 >>>>>>> #6 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >>>>>>> #7 0x00000000004b1279 in sched_thread_func (vesdp=0x7f83e8e2c440) at >>>>>>> beam/erl_process.c:4834 >>>>>>> #8 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at >>>>>>> pthread/ethread.c:106 >>>>>>> #9 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >>>>>>> #10 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>>>> (gdb) >>>>>>> >>>>>>> >>>>>>> (gdb) bt >>>>>>> #0 0x00007f83ea6db09d in waitpid () from /lib64/libpthread.so.0 >>>>>>> #1 0x0000000000555a9f in child_waiter (unused=) >>>>>>> at sys/unix/sys.c:2700 >>>>>>> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266d50) at >>>>>>> pthread/ethread.c:106 >>>>>>> #3 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >>>>>>> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>>>> (gdb) >>>>>>> >>>>>>> >>>>>>> **** END UPDATE **** >>>>>>> >>>>>>> >>>>>>> I'm happy to provide any information I can, so please don't hesitate >>>>>>> to >>>>>>> ask. >>>>>>> >>>>>>> Thanks in advance! >>>>>>> >>>>>>> Kind Regards, >>>>>>> >>>>>>> Peter Membrey >>>>>>> >>>>> _______________________________________________ >>>>> erlang-bugs mailing list >>>>> erlang-bugs@REDACTED >>>>> http://erlang.org/mailman/listinfo/erlang-bugs >>> >>> > From norton@REDACTED Thu Nov 29 07:26:38 2012 From: norton@REDACTED (Joseph Wayne Norton) Date: Thu, 29 Nov 2012 15:26:38 +0900 Subject: [erlang-bugs] http://www.erlang.org/faq/academic.html - section 10.23 with respect to shared heap needs correction? Message-ID: I noticed the last statement of section 10.23 with respect to shared heap. It seems this feature is no longer supported or is undocumented? If it is still present, could you point me to the appropriate documentation? regards, Joe N. http://www.erlang.org/faq/academic.html 10.23 How does the Garbage Collector work? The current default GC is a "stop the world" generational mark-sweep collector. Each Erlang process has its own heap and these are collected individually, so although every process is stopped while GC happens for one processes, this stop time is expected to be short because each process is expected to have a small heap. The GC for a new process is full-sweep. Once the process' live data grows above a certain size, the GC switches to a generational strategy. If the generational strategy reclaims less than a certain amount, the GC reverts to a full sweep. If the full sweep also fails to recover enough space, then the heap size is increased. In practice, this works quite well. It scales well because larger systems tend to have more processes rather than (just) larger processes. Measurements in AXD301 (the large ATM switch) showed that about 5% of CPU time is spent garbage collecting. Problems arise when the assumptions are violated, e.g. having processes with rapidly growing large heaps. There are some alternative approaches to memory management which can be enabled at run-time, including a shared heap. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ingela.Anderton.Andin@REDACTED Thu Nov 29 11:16:06 2012 From: Ingela.Anderton.Andin@REDACTED (Ingela Anderton Andin) Date: Thu, 29 Nov 2012 11:16:06 +0100 Subject: [erlang-bugs] SSL accept timeout broken in R15B03? In-Reply-To: References: Message-ID: <50B735E6.4070406@ericsson.com> Hi Steve! There is a missing function clause to handle the ssl:ssl_accept-timeout so alas it was treated as a canceled timeout. I failed to realize that we needed a special test case for the accept case when I solved the problem with client side timeouts for ssl:recv. The client side timeout is a problem for accept/connect too and is solved by the same mechanism with the only difference being the following clause: index 87cf49d..102dd4a 100644 --- a/lib/ssl/src/ssl_connection.erl +++ b/lib/ssl/src/ssl_connection.erl @@ -1001,6 +1001,10 @@ handle_info({cancel_start_or_recv, RecvFrom}, connection = StateName, #state{sta gen_fsm:reply(RecvFrom, {error, timeout}), {next_state, StateName, State#state{start_or_recv_from = undefined}, get_timeout(State)}; +handle_info({cancel_start_or_recv, RecvFrom}, StateName, State) when connection =/= StateName-> + gen_fsm:reply(RecvFrom, {error, timeout}), + {next_state, StateName, State#state{start_or_recv_from = undefined}, get_timeout(State)}; + handle_info({cancel_start_or_recv, _RecvFrom}, StateName, State) -> {next_state, StateName, State, get_timeout(State)}; Thank you for reporting this and I will make your your test into a test case. Regards Ingela Erlang/OTP team - Ericsson AB Steve Vinoski wrote: > > > On Wed, Nov 28, 2012 at 1:27 PM, Steve Vinoski > wrote: > > In trying to verify Yaws under R15B03 I noticed it was failing its > SSL accept timeout test, which works fine under previous Erlang/OTP > versions. > > Compile this module and run its start/0 function to reproduce the > problem: > > https://gist.github.com/4163038 > > The test does an SSL accept with a timeout, then does a TCP connect > from a client which of course won't complete the handshake and > should cause the timeout to kick in. Unfortunately the timeout > doesn't occur. > > > Running a git bisect in the otp repo shows commit 8a789189 to be the > culprit. > > --steve > > > ------------------------------------------------------------------------ > > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs From pan@REDACTED Thu Nov 29 19:10:02 2012 From: pan@REDACTED (Patrik Nyblom) Date: Thu, 29 Nov 2012 19:10:02 +0100 Subject: [erlang-bugs] Fwd: exit(self(), normal) causes calling process to exit In-Reply-To: References: Message-ID: <50B7A4FA.5010401@erlang.org> On 11/28/2012 08:50 PM, Daniel Luna wrote: > I withdraw my comment. It's still true that it works when trapping > exits, but apparently you shouldn't have to. > > From the docs: > > "If Reason is the atom normal, Pid will not exit." > > I call bug on this. I agree. It's in the pipe. > > Cheers, > > Daniel Cheers, /Patrik > > On 28 November 2012 13:07, Daniel Luna wrote: >> I replied on StackOverflow, but the gist of the problem is that you >> don't trap exits. >> >> 1> self(). >> <0.32.0> >> 2> process_flag(trap_exit, true). >> false >> 3> exit(self(), normal). >> true >> 4> self(). >> <0.32.0> >> 5> flush(). >> Shell got {'EXIT',<0.32.0>,normal} >> ok >> >> Cheers, >> >> Daniel >> >> On 28 November 2012 11:50, Stavros Aronis wrote: >>> After some speculation on stackoverflow I think I will report this here as >>> well. (I am directly copying the content of the question.) >>> >>> I am playing around with the exit/2 function and its behavior when self() is >>> used as a Pid and normal as a Reason. >>> >>> Erlang R15B03 (erts-5.9.3) [source] [64-bit] [smp:8:8] [async-threads:0] >>> [hipe] [kernel-poll:false] >>> >>> Eshell V5.9.3 (abort with ^G) >>> 1> self(). >>> <0.32.0> >>> 2> exit(self(), normal). >>> ** exception exit: normal >>> 3> self(). >>> <0.35.0> >>> >>> Shouldn't it be the case that only a 'normal' exit message is sent to the >>> shell process, so there is no reason to exit? >>> >>> Similarly: >>> >>> 4> spawn(fun() -> receive Pid -> Pid ! ok end end). >>> <0.38.0> >>> 5> exit(v(4), normal). >>> true >>> 6> v(4) ! self(). >>> <0.35.0> >>> 7> flush(). >>> Shell got ok >>> ok >>> >>> But: >>> >>> 8> spawn(fun() -> exit(self(), normal), receive _ -> ok end end). >>> <0.43.0> >>> 9> is_process_alive(v(8)). >>> false >>> >>> >>> _______________________________________________ >>> erlang-bugs mailing list >>> erlang-bugs@REDACTED >>> http://erlang.org/mailman/listinfo/erlang-bugs >>> > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs From pan@REDACTED Thu Nov 29 19:13:28 2012 From: pan@REDACTED (Patrik Nyblom) Date: Thu, 29 Nov 2012 19:13:28 +0100 Subject: [erlang-bugs] VM locks up on write to socket (and now it seems to file too) In-Reply-To: References: <50AEC81B.2000908@ninenines.eu> <50AFA09D.4060100@erlang.org> <50B633C7.7000709@erlang.org> <50B64897.2050300@erlang.org> Message-ID: <50B7A5C8.2020604@erlang.org> Hi! I'm not sure if it's all that easy, I'll write some smaller programs to verify if something was actually written when this happened, in which case we have a problem and need to handle this "before it happens" so to say, otherwise there might be an easier way out, I'll get back to you when I know more! Cheers, /Patrik On 11/29/2012 04:41 AM, Peter Membrey wrote: > Hi Patrik, > > I can also confirm that this bug exists on Red Hat Enterprise Linux > 6.3. I'll raise a support ticket with them as well. > > A workaround in the vm would be nice if you have time? :-) > > Cheers, > > Pete > > > On 29 November 2012 01:23, Patrik Nyblom wrote: >> Hi again! >> >> No problem reproducing when I've got CentOS 6.3... The following commands in >> the Erlang shell: >> {ok,L} = gen_tcp:listen(4747,[{active,false}]). >> {ok,S} = gen_tcp:connect("localhost",4747,[{active,false}]). >> {ok,A} = gen_tcp:accept(L). >> gen_tcp:send(A,binary:copy(<<$a:8>>,2158022464)). >> >> gives the following strace: >> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >> 2158022464}], 1) = 0 >> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >> 2158022464}], 1) = 0 >> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >> 2158022464}], 1) = 0 >> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >> 2158022464}], 1) = 0 >> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >> 2158022464}], 1) = 0 >> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >> 2158022464}], 1) = 0 >> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >> 2158022464}], 1) = 0 >> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >> 2158022464}], 1) = 0 >> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >> 2158022464}], 1) = 0 >> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >> 2158022464}], 1) = 0 >> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >> 2158022464}], 1) = 0 >> [.....] >> >> While on ubuntu for example it works like it should...Looks like a kernel >> bug to me... I wonder if this should be worked around or just reported... I >> suppose both... Sigh... >> >> /Patrik >> >> >> On 11/28/2012 05:23 PM, Peter Membrey wrote: >>> Hi, >>> >>> No problem, I'll do what I can to help - thanks for looking into this >>> so quickly! >>> >>> Any idea what might be causing it? >>> >>> Cheers, >>> >>> Pete >>> >>> On 28 November 2012 23:54, Patrik Nyblom wrote: >>>> Hi! >>>> >>>> I'll upgrade the CentOS VM I have to 6.3 (only had 6.1 :() and see if I >>>> can >>>> reproduce. If that fails, could you run a VM with a patch to try to >>>> handle >>>> the unexpected case and see if that fixes it? >>>> >>>> Cheers, >>>> /Patrik >>>> >>>> On 11/24/2012 02:57 PM, Peter Membrey wrote: >>>>> Hi guys, >>>>> >>>>> Thanks for getting back in touch so quickly! >>>>> >>>>> I did do an lsof on the process and I can confirm that it was >>>>> definitely a socket. However by that time the application it had been >>>>> trying to send to had been killed. When I checked the sockets were >>>>> showing as waiting to close. Unfortunately I didn't think to do an >>>>> lsof until after the apps had been shut down. I was hoping the VM >>>>> would recover if I killed the app that had upset it. However even >>>>> after all the apps connected had been shut down, the issue didn't >>>>> resolve. >>>>> >>>>> The application receives requests from a client, which contains two >>>>> data items. The stream ID and a timestamp. Both are encoded as big >>>>> integer unsigned numbers. The server then looks through the file >>>>> referenced by the stream ID and uses the timestamp as an index. The >>>>> file format is currently really simple, in the form of: >>>>> >>>>> >>>>> > >>>>> >>>>> There is an index file that provides an offset into the file based on >>>>> time stamp, but basically it opens the file, and reads sequentially >>>>> through it until it finds the timestamps that it cares about. In this >>>>> case it reads all data with a greater timestamp until the end of the >>>>> file is reached. It's possible the client is sending an incorrect >>>>> timestamp, and maybe too much data is being read. However the loop is >>>>> very primitive - it reads all the data in one go before passing it >>>>> back to the protocol handler to send down the socket; so by that time >>>>> even though the response is technically incorrect and the app has >>>>> failed, it should still not cause the VM any issues. >>>>> >>>>> The data is polled every 10 seconds by the client app so I would not >>>>> expect there to be 2GB of new data to send. I'm afraid my C skills are >>>>> somewhat limited, so I'm not sure how to put together a sample app to >>>>> try out writev. The platform is 64bit CentOS 6.3 (equivalent to RHEL >>>>> 6.3) so I'm not expecting any strange or weird behaviour from the OS >>>>> level but of course I could be completely wrong there. The OS is >>>>> running directly on hardware, so there's no VM layer to worry about. >>>>> >>>>> Hope this might offer some additional clues? >>>>> >>>>> Thanks again! >>>>> >>>>> Kind Regards, >>>>> >>>>> Peter Membrey >>>>> >>>>> >>>>> >>>>> On 24 November 2012 00:13, Patrik Nyblom wrote: >>>>>> Hi again! >>>>>> >>>>>> Could you go back to the version without the printouts and get back to >>>>>> the >>>>>> situation where writev loops returning 0 (as in the strace)? If so, it >>>>>> would >>>>>> be really interesting to see an 'lsof' of the beam process, to see if >>>>>> this >>>>>> file descriptor really is open and is a socket... >>>>>> >>>>>> The thing is that writev with a vector that is not empty, would never >>>>>> return >>>>>> 0 for a non blocking socket. Not on any modern (i.e. not ancient) POSIX >>>>>> compliant system anyway. Of course it is a *really* large item you are >>>>>> trying to write there, but it should be no problem for a 64bit linux. >>>>>> >>>>>> Also I think there is no use finding the Erlang code, I'll take that >>>>>> back, >>>>>> It would be more interesting to see what really happens at the OS/VM >>>>>> level >>>>>> in this case. >>>>>> >>>>>> Cheers, >>>>>> Patrik >>>>>> >>>>>> >>>>>> On 11/23/2012 01:49 AM, Lo?c Hoguin wrote: >>>>>>> Sending this on behalf of someone who didn't manage to get the email >>>>>>> sent >>>>>>> to this list after 2 attempts. If someone can check if he's hold up or >>>>>>> something that'd be great. >>>>>>> >>>>>>> Anyway he has a big issue so I hope I can relay the conversation >>>>>>> reliably. >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> On 11/23/2012 01:45 AM, Peter Membrey wrote: >>>>>>>> From: Peter Membrey >>>>>>>> Date: 22 November 2012 19:02 >>>>>>>> Subject: VM locks up on write to socket (and now it seems to file >>>>>>>> too) >>>>>>>> To: erlang-bugs@REDACTED >>>>>>>> >>>>>>>> >>>>>>>> Hi guys, >>>>>>>> >>>>>>>> I wrote a simple database application called CakeDB >>>>>>>> (https://github.com/pmembrey/cakedb) that basically spends its time >>>>>>>> reading and writing files and sockets. There's very little in the way >>>>>>>> of complex logic. It is running on CentOS 6.3 with all the updates >>>>>>>> applied. I hit this problem on R15B02 so I rolled back to R15B01 but >>>>>>>> the issue remained. Erlang was built from source. >>>>>>>> >>>>>>>> The machine has two Intel X5690 CPUs giving 12 cores plus HT. I've >>>>>>>> tried various arguments for the VM but so far nothing has prevented >>>>>>>> the problem. At the moment I'm using: >>>>>>>> >>>>>>>> +K >>>>>>>> +A 6 >>>>>>>> +sbt tnnps >>>>>>>> >>>>>>>> The issue I'm seeing is that one of the scheduler threads will hit >>>>>>>> 100% cpu usage and the entire VM will become unresponsive. When this >>>>>>>> happens, I am not able to connect via the console with attach and >>>>>>>> entop is also unable to connect. I can still establish TCP >>>>>>>> connections >>>>>>>> to the application, but I never receive a response. A standard kill >>>>>>>> signal will cause the VM to shut down (it doesn't need -9). >>>>>>>> >>>>>>>> Due to the pedigree of the VM I am quite willing to accept that I've >>>>>>>> made a fundamental mistake in my code. I am pretty sure that the way >>>>>>>> I >>>>>>>> am doing the file IO could result in some race conditions. However, >>>>>>>> my >>>>>>>> poor code aside, from what I understand, I still shouldn't be able to >>>>>>>> crash / deadlock the VM like this. >>>>>>>> >>>>>>>> The issue doesn't seem to be caused by load. The app can fail when >>>>>>>> it's very busy, but also when it is practically idle. I haven't been >>>>>>>> able to find a trigger or any other explanation for the failure. >>>>>>>> >>>>>>>> The thread maxing out the CPU is attempting to write data to the >>>>>>>> socket: >>>>>>>> >>>>>>>> (gdb) bt >>>>>>>> #0 0x00007f9882ab6377 in writev () from /lib64/libc.so.6 >>>>>>>> #1 0x000000000058a81f in tcp_inet_output (data=0x2407570, >>>>>>>> event=) at drivers/common/inet_drv.c:9681 >>>>>>>> #2 tcp_inet_drv_output (data=0x2407570, event=) >>>>>>>> at drivers/common/inet_drv.c:9601 >>>>>>>> #3 0x00000000004b773f in erts_port_task_execute >>>>>>>> (runq=0x7f98826019c0, >>>>>>>> curr_port_pp=0x7f9881639338) at beam/erl_port_task.c:858 >>>>>>>> #4 0x00000000004afd83 in schedule (p=, >>>>>>>> calls=) at beam/erl_process.c:6533 >>>>>>>> #5 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >>>>>>>> #6 0x00000000004b1279 in sched_thread_func (vesdp=0x7f9881639280) at >>>>>>>> beam/erl_process.c:4834 >>>>>>>> #7 0x00000000005ba726 in thr_wrapper (vtwd=0x7fff6cfe2300) at >>>>>>>> pthread/ethread.c:106 >>>>>>>> #8 0x00007f9882f78851 in start_thread () from /lib64/libpthread.so.0 >>>>>>>> #9 0x00007f9882abe11d in clone () from /lib64/libc.so.6 >>>>>>>> (gdb) >>>>>>>> >>>>>>>> I then tried running strace on that thread and got (indefinitely): >>>>>>>> >>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>> ... >>>>>>>> >>>>>>>> From what I can tell, it's trying to write data to a socket, which >>>>>>>> is >>>>>>>> succeeding, but writing 0 bytes. From the earlier definitions in the >>>>>>>> source file, an error condition would be signified by a negative >>>>>>>> number. Any other result is the number of bytes written, in this case >>>>>>>> 0. I'm not sure if this is desired behaviour or not. I've tried >>>>>>>> killing the application on the other end of the socket, but it has no >>>>>>>> effect on the VM. >>>>>>>> >>>>>>>> I have enabled debugging for the inet code, so hopefully this will >>>>>>>> give a little more insight. I am currently trying to reproduce the >>>>>>>> condition, but as I really have no idea what causes it, it's pretty >>>>>>>> much a case of wait and see. >>>>>>>> >>>>>>>> >>>>>>>> **** UPDATE **** >>>>>>>> >>>>>>>> I managed to lock up the VM again, but this time it was caused by >>>>>>>> file >>>>>>>> IO, >>>>>>>> probably from the debugging statements. Although it worked fine for >>>>>>>> some >>>>>>>> time >>>>>>>> the last entry in the file was cut off. >>>>>>>> >>>>>>>> From GDB: >>>>>>>> >>>>>>>> (gdb) info threads >>>>>>>> 53 Thread 0x7f83e988b700 (LWP 8621) 0x00007f83ea6da54d in read >>>>>>>> () >>>>>>>> from /lib64/libpthread.so.0 >>>>>>>> 52 Thread 0x7f83e8c8f700 (LWP 8622) 0x00007f83ea6d743c in >>>>>>>> pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >>>>>>>> 51 Thread 0x7f83e818d700 (LWP 8623) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 50 Thread 0x7f83e816b700 (LWP 8624) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 49 Thread 0x7f83e8149700 (LWP 8625) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 48 Thread 0x7f83e8127700 (LWP 8626) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 47 Thread 0x7f83e8105700 (LWP 8627) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 46 Thread 0x7f83e80e3700 (LWP 8628) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 45 Thread 0x7f83e80c1700 (LWP 8629) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 44 Thread 0x7f83e809f700 (LWP 8630) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 43 Thread 0x7f83e807d700 (LWP 8631) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 42 Thread 0x7f83e805b700 (LWP 8632) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 41 Thread 0x7f83e8039700 (LWP 8633) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 40 Thread 0x7f83e8017700 (LWP 8634) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 39 Thread 0x7f83e7ff5700 (LWP 8635) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 38 Thread 0x7f83e7fd3700 (LWP 8636) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 37 Thread 0x7f83e7fb1700 (LWP 8637) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 36 Thread 0x7f83e7f8f700 (LWP 8638) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 35 Thread 0x7f83e7f6d700 (LWP 8639) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 34 Thread 0x7f83e7f4b700 (LWP 8640) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 33 Thread 0x7f83e7f29700 (LWP 8641) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 32 Thread 0x7f83e7f07700 (LWP 8642) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 31 Thread 0x7f83e7ee5700 (LWP 8643) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 30 Thread 0x7f83e7ec3700 (LWP 8644) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 29 Thread 0x7f83e7ea1700 (LWP 8645) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 28 Thread 0x7f83e7e7f700 (LWP 8646) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 27 Thread 0x7f83d7c5a700 (LWP 8647) 0x00007f83ea6db09d in >>>>>>>> waitpid >>>>>>>> () from /lib64/libpthread.so.0 >>>>>>>> 26 Thread 0x7f83d7c53700 (LWP 8648) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 25 Thread 0x7f83d7252700 (LWP 8649) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 24 Thread 0x7f83d6851700 (LWP 8650) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 23 Thread 0x7f83d5e50700 (LWP 8651) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 22 Thread 0x7f83d544f700 (LWP 8652) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 21 Thread 0x7f83d4a4e700 (LWP 8653) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 20 Thread 0x7f83d404d700 (LWP 8654) 0x00007f83ea20be7d in write >>>>>>>> () >>>>>>>> from /lib64/libc.so.6 >>>>>>>> 19 Thread 0x7f83d364c700 (LWP 8655) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 18 Thread 0x7f83d2c4b700 (LWP 8656) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 17 Thread 0x7f83d224a700 (LWP 8657) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 16 Thread 0x7f83d1849700 (LWP 8658) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 15 Thread 0x7f83d0e48700 (LWP 8659) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 14 Thread 0x7f83d0447700 (LWP 8660) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 13 Thread 0x7f83cfa46700 (LWP 8661) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 12 Thread 0x7f83cf045700 (LWP 8662) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 11 Thread 0x7f83ce644700 (LWP 8663) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 10 Thread 0x7f83cdc43700 (LWP 8664) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 9 Thread 0x7f83cd242700 (LWP 8665) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () >>>>>>>> from /lib64/libc.so.6 >>>>>>>> 8 Thread 0x7f83cc841700 (LWP 8666) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () >>>>>>>> from /lib64/libc.so.6 >>>>>>>> 7 Thread 0x7f83cbe40700 (LWP 8667) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () >>>>>>>> from /lib64/libc.so.6 >>>>>>>> 6 Thread 0x7f83cb43f700 (LWP 8668) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () >>>>>>>> from /lib64/libc.so.6 >>>>>>>> 5 Thread 0x7f83caa3e700 (LWP 8669) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () >>>>>>>> from /lib64/libc.so.6 >>>>>>>> 4 Thread 0x7f83ca03d700 (LWP 8670) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () >>>>>>>> from /lib64/libc.so.6 >>>>>>>> 3 Thread 0x7f83c963c700 (LWP 8671) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () >>>>>>>> from /lib64/libc.so.6 >>>>>>>> 2 Thread 0x7f83c8c3b700 (LWP 8672) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () >>>>>>>> from /lib64/libc.so.6 >>>>>>>> * 1 Thread 0x7f83eb3a8700 (LWP 8597) 0x00007f83ea211d03 in select () >>>>>>>> from /lib64/libc.so.6 >>>>>>>> (gdb) >>>>>>>> >>>>>>>> >>>>>>>> (gdb) bt >>>>>>>> #0 0x00007f83ea20be7d in write () from /lib64/libc.so.6 >>>>>>>> #1 0x00007f83ea1a2583 in _IO_new_file_write () from /lib64/libc.so.6 >>>>>>>> #2 0x00007f83ea1a3b35 in _IO_new_do_write () from /lib64/libc.so.6 >>>>>>>> #3 0x00007f83ea1a21fd in _IO_new_file_xsputn () from >>>>>>>> /lib64/libc.so.6 >>>>>>>> #4 0x00007f83ea17589d in vfprintf () from /lib64/libc.so.6 >>>>>>>> #5 0x00007f83ea18003a in printf () from /lib64/libc.so.6 >>>>>>>> #6 0x000000000058f0e8 in tcp_recv (desc=0x2c3d350, request_len=0) at >>>>>>>> drivers/common/inet_drv.c:8976 >>>>>>>> #7 0x000000000058f63a in tcp_inet_input (data=0x2c3d350, >>>>>>>> event=>>>>>>> optimized out>) at drivers/common/inet_drv.c:9326 >>>>>>>> #8 tcp_inet_drv_input (data=0x2c3d350, event=) >>>>>>>> at drivers/common/inet_drv.c:9604 >>>>>>>> #9 0x00000000004b770f in erts_port_task_execute >>>>>>>> (runq=0x7f83e9d5d3c0, >>>>>>>> curr_port_pp=0x7f83e8dc6e78) at beam/erl_port_task.c:851 >>>>>>>> #10 0x00000000004afd83 in schedule (p=, >>>>>>>> calls=) at beam/erl_process.c:6533 >>>>>>>> #11 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >>>>>>>> #12 0x00000000004b1279 in sched_thread_func (vesdp=0x7f83e8dc6dc0) at >>>>>>>> beam/erl_process.c:4834 >>>>>>>> #13 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at >>>>>>>> pthread/ethread.c:106 >>>>>>>> #14 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >>>>>>>> #15 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>>>>> (gdb) >>>>>>>> >>>>>>>> (gdb) bt >>>>>>>> #0 0x00007f83ea6da54d in read () from /lib64/libpthread.so.0 >>>>>>>> #1 0x0000000000554b6e in signal_dispatcher_thread_func >>>>>>>> (unused=>>>>>>> optimized out>) at sys/unix/sys.c:2776 >>>>>>>> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266c80) at >>>>>>>> pthread/ethread.c:106 >>>>>>>> #3 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >>>>>>>> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>>>>> (gdb) >>>>>>>> >>>>>>>> (gdb) bt >>>>>>>> #0 0x00007f83ea215ae9 in syscall () from /lib64/libc.so.6 >>>>>>>> #1 0x00000000005bba35 in wait__ (e=0x2989390) at >>>>>>>> pthread/ethr_event.c:92 >>>>>>>> #2 ethr_event_wait (e=0x2989390) at pthread/ethr_event.c:218 >>>>>>>> #3 0x00000000004ae5bd in erts_tse_wait (fcalls=>>>>>>> out>, >>>>>>>> esdp=0x7f83e8e2c440, rq=0x7f83e9d5e7c0) at beam/erl_threads.h:2319 >>>>>>>> #4 scheduler_wait (fcalls=, >>>>>>>> esdp=0x7f83e8e2c440, >>>>>>>> rq=0x7f83e9d5e7c0) at beam/erl_process.c:2087 >>>>>>>> #5 0x00000000004afb94 in schedule (p=, >>>>>>>> calls=) at beam/erl_process.c:6467 >>>>>>>> #6 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >>>>>>>> #7 0x00000000004b1279 in sched_thread_func (vesdp=0x7f83e8e2c440) at >>>>>>>> beam/erl_process.c:4834 >>>>>>>> #8 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at >>>>>>>> pthread/ethread.c:106 >>>>>>>> #9 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >>>>>>>> #10 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>>>>> (gdb) >>>>>>>> >>>>>>>> >>>>>>>> (gdb) bt >>>>>>>> #0 0x00007f83ea6db09d in waitpid () from /lib64/libpthread.so.0 >>>>>>>> #1 0x0000000000555a9f in child_waiter (unused=) >>>>>>>> at sys/unix/sys.c:2700 >>>>>>>> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266d50) at >>>>>>>> pthread/ethread.c:106 >>>>>>>> #3 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >>>>>>>> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>>>>> (gdb) >>>>>>>> >>>>>>>> >>>>>>>> **** END UPDATE **** >>>>>>>> >>>>>>>> >>>>>>>> I'm happy to provide any information I can, so please don't hesitate >>>>>>>> to >>>>>>>> ask. >>>>>>>> >>>>>>>> Thanks in advance! >>>>>>>> >>>>>>>> Kind Regards, >>>>>>>> >>>>>>>> Peter Membrey >>>>>>>> >>>>>> _______________________________________________ >>>>>> erlang-bugs mailing list >>>>>> erlang-bugs@REDACTED >>>>>> http://erlang.org/mailman/listinfo/erlang-bugs >>>> From pan@REDACTED Thu Nov 29 19:46:20 2012 From: pan@REDACTED (Patrik Nyblom) Date: Thu, 29 Nov 2012 19:46:20 +0100 Subject: [erlang-bugs] VM locks up on write to socket (and now it seems to file too) In-Reply-To: References: <50AEC81B.2000908@ninenines.eu> <50AFA09D.4060100@erlang.org> <50B633C7.7000709@erlang.org> <50B64897.2050300@erlang.org> Message-ID: <50B7AD7C.3060809@erlang.org> Hi! On 11/29/2012 04:41 AM, Peter Membrey wrote: > Hi Patrik, > > I can also confirm that this bug exists on Red Hat Enterprise Linux > 6.3. I'll raise a support ticket with them as well. > > A workaround in the vm would be nice if you have time? :-) Could you try the attached diff and see if it works for your environment? It would seem nothing is written when 0 is returned, so it should be safe to try again... Cheers, /Patrik > Cheers, > > Pete > > > On 29 November 2012 01:23, Patrik Nyblom wrote: >> Hi again! >> >> No problem reproducing when I've got CentOS 6.3... The following commands in >> the Erlang shell: >> {ok,L} = gen_tcp:listen(4747,[{active,false}]). >> {ok,S} = gen_tcp:connect("localhost",4747,[{active,false}]). >> {ok,A} = gen_tcp:accept(L). >> gen_tcp:send(A,binary:copy(<<$a:8>>,2158022464)). >> >> gives the following strace: >> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >> 2158022464}], 1) = 0 >> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >> 2158022464}], 1) = 0 >> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >> 2158022464}], 1) = 0 >> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >> 2158022464}], 1) = 0 >> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >> 2158022464}], 1) = 0 >> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >> 2158022464}], 1) = 0 >> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >> 2158022464}], 1) = 0 >> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >> 2158022464}], 1) = 0 >> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >> 2158022464}], 1) = 0 >> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >> 2158022464}], 1) = 0 >> [pid 15859] writev(10, [{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., >> 2158022464}], 1) = 0 >> [.....] >> >> While on ubuntu for example it works like it should...Looks like a kernel >> bug to me... I wonder if this should be worked around or just reported... I >> suppose both... Sigh... >> >> /Patrik >> >> >> On 11/28/2012 05:23 PM, Peter Membrey wrote: >>> Hi, >>> >>> No problem, I'll do what I can to help - thanks for looking into this >>> so quickly! >>> >>> Any idea what might be causing it? >>> >>> Cheers, >>> >>> Pete >>> >>> On 28 November 2012 23:54, Patrik Nyblom wrote: >>>> Hi! >>>> >>>> I'll upgrade the CentOS VM I have to 6.3 (only had 6.1 :() and see if I >>>> can >>>> reproduce. If that fails, could you run a VM with a patch to try to >>>> handle >>>> the unexpected case and see if that fixes it? >>>> >>>> Cheers, >>>> /Patrik >>>> >>>> On 11/24/2012 02:57 PM, Peter Membrey wrote: >>>>> Hi guys, >>>>> >>>>> Thanks for getting back in touch so quickly! >>>>> >>>>> I did do an lsof on the process and I can confirm that it was >>>>> definitely a socket. However by that time the application it had been >>>>> trying to send to had been killed. When I checked the sockets were >>>>> showing as waiting to close. Unfortunately I didn't think to do an >>>>> lsof until after the apps had been shut down. I was hoping the VM >>>>> would recover if I killed the app that had upset it. However even >>>>> after all the apps connected had been shut down, the issue didn't >>>>> resolve. >>>>> >>>>> The application receives requests from a client, which contains two >>>>> data items. The stream ID and a timestamp. Both are encoded as big >>>>> integer unsigned numbers. The server then looks through the file >>>>> referenced by the stream ID and uses the timestamp as an index. The >>>>> file format is currently really simple, in the form of: >>>>> >>>>> >>>>> > >>>>> >>>>> There is an index file that provides an offset into the file based on >>>>> time stamp, but basically it opens the file, and reads sequentially >>>>> through it until it finds the timestamps that it cares about. In this >>>>> case it reads all data with a greater timestamp until the end of the >>>>> file is reached. It's possible the client is sending an incorrect >>>>> timestamp, and maybe too much data is being read. However the loop is >>>>> very primitive - it reads all the data in one go before passing it >>>>> back to the protocol handler to send down the socket; so by that time >>>>> even though the response is technically incorrect and the app has >>>>> failed, it should still not cause the VM any issues. >>>>> >>>>> The data is polled every 10 seconds by the client app so I would not >>>>> expect there to be 2GB of new data to send. I'm afraid my C skills are >>>>> somewhat limited, so I'm not sure how to put together a sample app to >>>>> try out writev. The platform is 64bit CentOS 6.3 (equivalent to RHEL >>>>> 6.3) so I'm not expecting any strange or weird behaviour from the OS >>>>> level but of course I could be completely wrong there. The OS is >>>>> running directly on hardware, so there's no VM layer to worry about. >>>>> >>>>> Hope this might offer some additional clues? >>>>> >>>>> Thanks again! >>>>> >>>>> Kind Regards, >>>>> >>>>> Peter Membrey >>>>> >>>>> >>>>> >>>>> On 24 November 2012 00:13, Patrik Nyblom wrote: >>>>>> Hi again! >>>>>> >>>>>> Could you go back to the version without the printouts and get back to >>>>>> the >>>>>> situation where writev loops returning 0 (as in the strace)? If so, it >>>>>> would >>>>>> be really interesting to see an 'lsof' of the beam process, to see if >>>>>> this >>>>>> file descriptor really is open and is a socket... >>>>>> >>>>>> The thing is that writev with a vector that is not empty, would never >>>>>> return >>>>>> 0 for a non blocking socket. Not on any modern (i.e. not ancient) POSIX >>>>>> compliant system anyway. Of course it is a *really* large item you are >>>>>> trying to write there, but it should be no problem for a 64bit linux. >>>>>> >>>>>> Also I think there is no use finding the Erlang code, I'll take that >>>>>> back, >>>>>> It would be more interesting to see what really happens at the OS/VM >>>>>> level >>>>>> in this case. >>>>>> >>>>>> Cheers, >>>>>> Patrik >>>>>> >>>>>> >>>>>> On 11/23/2012 01:49 AM, Lo?c Hoguin wrote: >>>>>>> Sending this on behalf of someone who didn't manage to get the email >>>>>>> sent >>>>>>> to this list after 2 attempts. If someone can check if he's hold up or >>>>>>> something that'd be great. >>>>>>> >>>>>>> Anyway he has a big issue so I hope I can relay the conversation >>>>>>> reliably. >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> On 11/23/2012 01:45 AM, Peter Membrey wrote: >>>>>>>> From: Peter Membrey >>>>>>>> Date: 22 November 2012 19:02 >>>>>>>> Subject: VM locks up on write to socket (and now it seems to file >>>>>>>> too) >>>>>>>> To: erlang-bugs@REDACTED >>>>>>>> >>>>>>>> >>>>>>>> Hi guys, >>>>>>>> >>>>>>>> I wrote a simple database application called CakeDB >>>>>>>> (https://github.com/pmembrey/cakedb) that basically spends its time >>>>>>>> reading and writing files and sockets. There's very little in the way >>>>>>>> of complex logic. It is running on CentOS 6.3 with all the updates >>>>>>>> applied. I hit this problem on R15B02 so I rolled back to R15B01 but >>>>>>>> the issue remained. Erlang was built from source. >>>>>>>> >>>>>>>> The machine has two Intel X5690 CPUs giving 12 cores plus HT. I've >>>>>>>> tried various arguments for the VM but so far nothing has prevented >>>>>>>> the problem. At the moment I'm using: >>>>>>>> >>>>>>>> +K >>>>>>>> +A 6 >>>>>>>> +sbt tnnps >>>>>>>> >>>>>>>> The issue I'm seeing is that one of the scheduler threads will hit >>>>>>>> 100% cpu usage and the entire VM will become unresponsive. When this >>>>>>>> happens, I am not able to connect via the console with attach and >>>>>>>> entop is also unable to connect. I can still establish TCP >>>>>>>> connections >>>>>>>> to the application, but I never receive a response. A standard kill >>>>>>>> signal will cause the VM to shut down (it doesn't need -9). >>>>>>>> >>>>>>>> Due to the pedigree of the VM I am quite willing to accept that I've >>>>>>>> made a fundamental mistake in my code. I am pretty sure that the way >>>>>>>> I >>>>>>>> am doing the file IO could result in some race conditions. However, >>>>>>>> my >>>>>>>> poor code aside, from what I understand, I still shouldn't be able to >>>>>>>> crash / deadlock the VM like this. >>>>>>>> >>>>>>>> The issue doesn't seem to be caused by load. The app can fail when >>>>>>>> it's very busy, but also when it is practically idle. I haven't been >>>>>>>> able to find a trigger or any other explanation for the failure. >>>>>>>> >>>>>>>> The thread maxing out the CPU is attempting to write data to the >>>>>>>> socket: >>>>>>>> >>>>>>>> (gdb) bt >>>>>>>> #0 0x00007f9882ab6377 in writev () from /lib64/libc.so.6 >>>>>>>> #1 0x000000000058a81f in tcp_inet_output (data=0x2407570, >>>>>>>> event=) at drivers/common/inet_drv.c:9681 >>>>>>>> #2 tcp_inet_drv_output (data=0x2407570, event=) >>>>>>>> at drivers/common/inet_drv.c:9601 >>>>>>>> #3 0x00000000004b773f in erts_port_task_execute >>>>>>>> (runq=0x7f98826019c0, >>>>>>>> curr_port_pp=0x7f9881639338) at beam/erl_port_task.c:858 >>>>>>>> #4 0x00000000004afd83 in schedule (p=, >>>>>>>> calls=) at beam/erl_process.c:6533 >>>>>>>> #5 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >>>>>>>> #6 0x00000000004b1279 in sched_thread_func (vesdp=0x7f9881639280) at >>>>>>>> beam/erl_process.c:4834 >>>>>>>> #7 0x00000000005ba726 in thr_wrapper (vtwd=0x7fff6cfe2300) at >>>>>>>> pthread/ethread.c:106 >>>>>>>> #8 0x00007f9882f78851 in start_thread () from /lib64/libpthread.so.0 >>>>>>>> #9 0x00007f9882abe11d in clone () from /lib64/libc.so.6 >>>>>>>> (gdb) >>>>>>>> >>>>>>>> I then tried running strace on that thread and got (indefinitely): >>>>>>>> >>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>> writev(15, [{"", 2158022464}], 1) = 0 >>>>>>>> ... >>>>>>>> >>>>>>>> From what I can tell, it's trying to write data to a socket, which >>>>>>>> is >>>>>>>> succeeding, but writing 0 bytes. From the earlier definitions in the >>>>>>>> source file, an error condition would be signified by a negative >>>>>>>> number. Any other result is the number of bytes written, in this case >>>>>>>> 0. I'm not sure if this is desired behaviour or not. I've tried >>>>>>>> killing the application on the other end of the socket, but it has no >>>>>>>> effect on the VM. >>>>>>>> >>>>>>>> I have enabled debugging for the inet code, so hopefully this will >>>>>>>> give a little more insight. I am currently trying to reproduce the >>>>>>>> condition, but as I really have no idea what causes it, it's pretty >>>>>>>> much a case of wait and see. >>>>>>>> >>>>>>>> >>>>>>>> **** UPDATE **** >>>>>>>> >>>>>>>> I managed to lock up the VM again, but this time it was caused by >>>>>>>> file >>>>>>>> IO, >>>>>>>> probably from the debugging statements. Although it worked fine for >>>>>>>> some >>>>>>>> time >>>>>>>> the last entry in the file was cut off. >>>>>>>> >>>>>>>> From GDB: >>>>>>>> >>>>>>>> (gdb) info threads >>>>>>>> 53 Thread 0x7f83e988b700 (LWP 8621) 0x00007f83ea6da54d in read >>>>>>>> () >>>>>>>> from /lib64/libpthread.so.0 >>>>>>>> 52 Thread 0x7f83e8c8f700 (LWP 8622) 0x00007f83ea6d743c in >>>>>>>> pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >>>>>>>> 51 Thread 0x7f83e818d700 (LWP 8623) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 50 Thread 0x7f83e816b700 (LWP 8624) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 49 Thread 0x7f83e8149700 (LWP 8625) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 48 Thread 0x7f83e8127700 (LWP 8626) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 47 Thread 0x7f83e8105700 (LWP 8627) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 46 Thread 0x7f83e80e3700 (LWP 8628) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 45 Thread 0x7f83e80c1700 (LWP 8629) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 44 Thread 0x7f83e809f700 (LWP 8630) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 43 Thread 0x7f83e807d700 (LWP 8631) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 42 Thread 0x7f83e805b700 (LWP 8632) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 41 Thread 0x7f83e8039700 (LWP 8633) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 40 Thread 0x7f83e8017700 (LWP 8634) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 39 Thread 0x7f83e7ff5700 (LWP 8635) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 38 Thread 0x7f83e7fd3700 (LWP 8636) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 37 Thread 0x7f83e7fb1700 (LWP 8637) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 36 Thread 0x7f83e7f8f700 (LWP 8638) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 35 Thread 0x7f83e7f6d700 (LWP 8639) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 34 Thread 0x7f83e7f4b700 (LWP 8640) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 33 Thread 0x7f83e7f29700 (LWP 8641) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 32 Thread 0x7f83e7f07700 (LWP 8642) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 31 Thread 0x7f83e7ee5700 (LWP 8643) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 30 Thread 0x7f83e7ec3700 (LWP 8644) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 29 Thread 0x7f83e7ea1700 (LWP 8645) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 28 Thread 0x7f83e7e7f700 (LWP 8646) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 27 Thread 0x7f83d7c5a700 (LWP 8647) 0x00007f83ea6db09d in >>>>>>>> waitpid >>>>>>>> () from /lib64/libpthread.so.0 >>>>>>>> 26 Thread 0x7f83d7c53700 (LWP 8648) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 25 Thread 0x7f83d7252700 (LWP 8649) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 24 Thread 0x7f83d6851700 (LWP 8650) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 23 Thread 0x7f83d5e50700 (LWP 8651) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 22 Thread 0x7f83d544f700 (LWP 8652) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 21 Thread 0x7f83d4a4e700 (LWP 8653) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 20 Thread 0x7f83d404d700 (LWP 8654) 0x00007f83ea20be7d in write >>>>>>>> () >>>>>>>> from /lib64/libc.so.6 >>>>>>>> 19 Thread 0x7f83d364c700 (LWP 8655) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 18 Thread 0x7f83d2c4b700 (LWP 8656) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 17 Thread 0x7f83d224a700 (LWP 8657) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 16 Thread 0x7f83d1849700 (LWP 8658) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 15 Thread 0x7f83d0e48700 (LWP 8659) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 14 Thread 0x7f83d0447700 (LWP 8660) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 13 Thread 0x7f83cfa46700 (LWP 8661) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 12 Thread 0x7f83cf045700 (LWP 8662) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 11 Thread 0x7f83ce644700 (LWP 8663) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 10 Thread 0x7f83cdc43700 (LWP 8664) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () from /lib64/libc.so.6 >>>>>>>> 9 Thread 0x7f83cd242700 (LWP 8665) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () >>>>>>>> from /lib64/libc.so.6 >>>>>>>> 8 Thread 0x7f83cc841700 (LWP 8666) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () >>>>>>>> from /lib64/libc.so.6 >>>>>>>> 7 Thread 0x7f83cbe40700 (LWP 8667) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () >>>>>>>> from /lib64/libc.so.6 >>>>>>>> 6 Thread 0x7f83cb43f700 (LWP 8668) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () >>>>>>>> from /lib64/libc.so.6 >>>>>>>> 5 Thread 0x7f83caa3e700 (LWP 8669) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () >>>>>>>> from /lib64/libc.so.6 >>>>>>>> 4 Thread 0x7f83ca03d700 (LWP 8670) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () >>>>>>>> from /lib64/libc.so.6 >>>>>>>> 3 Thread 0x7f83c963c700 (LWP 8671) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () >>>>>>>> from /lib64/libc.so.6 >>>>>>>> 2 Thread 0x7f83c8c3b700 (LWP 8672) 0x00007f83ea215ae9 in >>>>>>>> syscall >>>>>>>> () >>>>>>>> from /lib64/libc.so.6 >>>>>>>> * 1 Thread 0x7f83eb3a8700 (LWP 8597) 0x00007f83ea211d03 in select () >>>>>>>> from /lib64/libc.so.6 >>>>>>>> (gdb) >>>>>>>> >>>>>>>> >>>>>>>> (gdb) bt >>>>>>>> #0 0x00007f83ea20be7d in write () from /lib64/libc.so.6 >>>>>>>> #1 0x00007f83ea1a2583 in _IO_new_file_write () from /lib64/libc.so.6 >>>>>>>> #2 0x00007f83ea1a3b35 in _IO_new_do_write () from /lib64/libc.so.6 >>>>>>>> #3 0x00007f83ea1a21fd in _IO_new_file_xsputn () from >>>>>>>> /lib64/libc.so.6 >>>>>>>> #4 0x00007f83ea17589d in vfprintf () from /lib64/libc.so.6 >>>>>>>> #5 0x00007f83ea18003a in printf () from /lib64/libc.so.6 >>>>>>>> #6 0x000000000058f0e8 in tcp_recv (desc=0x2c3d350, request_len=0) at >>>>>>>> drivers/common/inet_drv.c:8976 >>>>>>>> #7 0x000000000058f63a in tcp_inet_input (data=0x2c3d350, >>>>>>>> event=>>>>>>> optimized out>) at drivers/common/inet_drv.c:9326 >>>>>>>> #8 tcp_inet_drv_input (data=0x2c3d350, event=) >>>>>>>> at drivers/common/inet_drv.c:9604 >>>>>>>> #9 0x00000000004b770f in erts_port_task_execute >>>>>>>> (runq=0x7f83e9d5d3c0, >>>>>>>> curr_port_pp=0x7f83e8dc6e78) at beam/erl_port_task.c:851 >>>>>>>> #10 0x00000000004afd83 in schedule (p=, >>>>>>>> calls=) at beam/erl_process.c:6533 >>>>>>>> #11 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >>>>>>>> #12 0x00000000004b1279 in sched_thread_func (vesdp=0x7f83e8dc6dc0) at >>>>>>>> beam/erl_process.c:4834 >>>>>>>> #13 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at >>>>>>>> pthread/ethread.c:106 >>>>>>>> #14 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >>>>>>>> #15 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>>>>> (gdb) >>>>>>>> >>>>>>>> (gdb) bt >>>>>>>> #0 0x00007f83ea6da54d in read () from /lib64/libpthread.so.0 >>>>>>>> #1 0x0000000000554b6e in signal_dispatcher_thread_func >>>>>>>> (unused=>>>>>>> optimized out>) at sys/unix/sys.c:2776 >>>>>>>> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266c80) at >>>>>>>> pthread/ethread.c:106 >>>>>>>> #3 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >>>>>>>> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>>>>> (gdb) >>>>>>>> >>>>>>>> (gdb) bt >>>>>>>> #0 0x00007f83ea215ae9 in syscall () from /lib64/libc.so.6 >>>>>>>> #1 0x00000000005bba35 in wait__ (e=0x2989390) at >>>>>>>> pthread/ethr_event.c:92 >>>>>>>> #2 ethr_event_wait (e=0x2989390) at pthread/ethr_event.c:218 >>>>>>>> #3 0x00000000004ae5bd in erts_tse_wait (fcalls=>>>>>>> out>, >>>>>>>> esdp=0x7f83e8e2c440, rq=0x7f83e9d5e7c0) at beam/erl_threads.h:2319 >>>>>>>> #4 scheduler_wait (fcalls=, >>>>>>>> esdp=0x7f83e8e2c440, >>>>>>>> rq=0x7f83e9d5e7c0) at beam/erl_process.c:2087 >>>>>>>> #5 0x00000000004afb94 in schedule (p=, >>>>>>>> calls=) at beam/erl_process.c:6467 >>>>>>>> #6 0x0000000000539ca2 in process_main () at beam/beam_emu.c:1268 >>>>>>>> #7 0x00000000004b1279 in sched_thread_func (vesdp=0x7f83e8e2c440) at >>>>>>>> beam/erl_process.c:4834 >>>>>>>> #8 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266da0) at >>>>>>>> pthread/ethread.c:106 >>>>>>>> #9 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >>>>>>>> #10 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>>>>> (gdb) >>>>>>>> >>>>>>>> >>>>>>>> (gdb) bt >>>>>>>> #0 0x00007f83ea6db09d in waitpid () from /lib64/libpthread.so.0 >>>>>>>> #1 0x0000000000555a9f in child_waiter (unused=) >>>>>>>> at sys/unix/sys.c:2700 >>>>>>>> #2 0x00000000005bb3e6 in thr_wrapper (vtwd=0x7fffe8266d50) at >>>>>>>> pthread/ethread.c:106 >>>>>>>> #3 0x00007f83ea6d3851 in start_thread () from /lib64/libpthread.so.0 >>>>>>>> #4 0x00007f83ea21911d in clone () from /lib64/libc.so.6 >>>>>>>> (gdb) >>>>>>>> >>>>>>>> >>>>>>>> **** END UPDATE **** >>>>>>>> >>>>>>>> >>>>>>>> I'm happy to provide any information I can, so please don't hesitate >>>>>>>> to >>>>>>>> ask. >>>>>>>> >>>>>>>> Thanks in advance! >>>>>>>> >>>>>>>> Kind Regards, >>>>>>>> >>>>>>>> Peter Membrey >>>>>>>> >>>>>> _______________________________________________ >>>>>> erlang-bugs mailing list >>>>>> erlang-bugs@REDACTED >>>>>> http://erlang.org/mailman/listinfo/erlang-bugs >>>> -------------- next part -------------- A non-text attachment was scrubbed... Name: redhat_workaround.diff Type: text/x-patch Size: 1773 bytes Desc: not available URL: From wallentin.dahlberg@REDACTED Thu Nov 29 20:53:52 2012 From: wallentin.dahlberg@REDACTED (=?ISO-8859-1?Q?Bj=F6rn=2DEgil_Dahlberg?=) Date: Thu, 29 Nov 2012 20:53:52 +0100 Subject: [erlang-bugs] R15B03 PLT three unknown types In-Reply-To: References: Message-ID: Type spec fix written - will be out later. // Bj?rn-Egil 2012/11/29 Tuncer Ayaz > Building a fairly complete PLT for R15B03 Dialyzer correctly reports > three unknown types: > > Unknown types: > ct:hook_options/0 > inet:host_name/0 > ssl:sslsock/0 > > Grepping the tree for the types: > > lib/common_test/src/ct_netconfc.erl: > -export_type([hook_options/0, > lib/common_test/src/ct_netconfc.erl: > -type hook_options() :: [hook_option()]. > > lib/common_test/src/ct_netconfc.erl: > -type host() :: inet:host_name() | inet:ip_address(). > > lib/diameter/src/transport/diameter_tcp.erl: > {socket :: inet:socket() | ssl:sslsock(), %% accept or connect socket0 > _______________________________________________ > erlang-bugs mailing list > erlang-bugs@REDACTED > http://erlang.org/mailman/listinfo/erlang-bugs > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vinoski@REDACTED Fri Nov 30 05:08:47 2012 From: vinoski@REDACTED (Steve Vinoski) Date: Thu, 29 Nov 2012 23:08:47 -0500 Subject: [erlang-bugs] SSL accept timeout broken in R15B03? In-Reply-To: <50B735E6.4070406@ericsson.com> References: <50B735E6.4070406@ericsson.com> Message-ID: On Thu, Nov 29, 2012 at 5:16 AM, Ingela Anderton Andin < Ingela.Anderton.Andin@REDACTED> wrote: > Hi Steve! > > There is a missing function clause to handle the ssl:ssl_accept-timeout so > alas it was treated as a canceled timeout. I failed to realize that > we needed a special test case for the accept case when I solved the > problem with client side timeouts for ssl:recv. The client side timeout > is a problem for accept/connect too and is solved by the same mechanism > with the only difference being the following clause: > > > index 87cf49d..102dd4a 100644 > --- a/lib/ssl/src/ssl_connection.**erl > +++ b/lib/ssl/src/ssl_connection.**erl > @@ -1001,6 +1001,10 @@ handle_info({cancel_start_or_**recv, RecvFrom}, > connection = StateName, #state{sta > gen_fsm:reply(RecvFrom, {error, timeout}), > {next_state, StateName, State#state{start_or_recv_from = undefined}, > get_timeout(State)}; > > +handle_info({cancel_start_or_**recv, RecvFrom}, StateName, State) when > connection =/= StateName-> > + gen_fsm:reply(RecvFrom, {error, timeout}), > + {next_state, StateName, State#state{start_or_recv_from = undefined}, > get_timeout(State)}; > + > handle_info({cancel_start_or_**recv, _RecvFrom}, StateName, State) -> > {next_state, StateName, State, get_timeout(State)}; > > Thank you for reporting this and I will make your your test into a test > case. > Thanks -- I verified that this patch fixes the problem I saw. --steve -------------- next part -------------- An HTML attachment was scrubbed... URL: From jean-sebastien.pedron@REDACTED Fri Nov 30 10:13:02 2012 From: jean-sebastien.pedron@REDACTED (=?ISO-8859-1?Q?Jean-S=E9bastien_P=E9dron?=) Date: Fri, 30 Nov 2012 10:13:02 +0100 Subject: [erlang-bugs] dialyzer: Issues with opaque types Message-ID: <50B8789E.1050505@dumbbell.fr> Hi! I posted a message on erlang-questions@ a week ago about a warning reported by Dialyzer: http://erlang.org/pipermail/erlang-questions/2012-November/070757.html Ignore the test programs in this previous mail, I narrowed the issue down a bit. I had a look at dialyzer_dataflow.erl and I suspect that "opaque" types are not handled properly. However, I'm not able to understand what the problem(s) could be. I attached 3 test programs to this mail (tested with R15B02 and Git revision d30cee99): o test1_work.erl passes dialyzer checks. The record #state{} is declared as a public type (-type) called state(). o test1_fail_1.erl doesn't pass dialyzer checks. The record #state{} is declared as an opaque type (-opaque) called state(). The warnings are: - The contract test1_fail_1:next_step(#state{},step_fun()) -> 'ok' cannot be right because the inferred return for next_step(State::test1_fail_1:state(),fun((_) -> any())) on line 19 is 'ok' - Fun application with arguments (State::test1_fail_1:state()) will fail since the function has type fun(({_,_}) -> {_,_}) o test1_fail_2.erl doesn't pass dialyzer checks. The record is declared as an opaque type and this type is used in other types & specs directives. The warning is: - Fun application with arguments (State::test1_fail_2:state()) will fail since the function has type fun((test1_fail_2:state()) -> test1_fail_2:state()) Beside those differences regarding -type/-opaque/-spec, these programs are exactly the same. Note that there's a "recursion" in the #state{}/state() and step_fun() types: the #state{} contains a step_fun() and this step_fun() takes a #state{} as its argument. I would expect dialyzer to treat these programs the same. Am I wrong with my usage of -opaque and state()? Or is this a bug somewhere in dialyzer? -- Jean-S?bastien P?dron -------------- next part -------------- A non-text attachment was scrubbed... Name: test1_work.erl Type: text/x-erlang Size: 521 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test1_fail_1.erl Type: text/x-erlang Size: 523 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test1_fail_2.erl Type: text/x-erlang Size: 518 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 259 bytes Desc: OpenPGP digital signature URL: