[erlang-questions] Erlang crash gen_tcp related (probably only under Windows)

Edwin Fine erlang-questions_efine@REDACTED
Sat Sep 13 00:20:14 CEST 2008


My mistake, thanks.

On Fri, Sep 12, 2008 at 5:42 PM, Michael Regen <michael.regen@REDACTED>wrote:

> I would be very interested to hear about some tests from others! And Edwin,
> I guess you mean SP3. Windows XP SP3 is the most recent service pack. SP4 is
> if at all a giant trojan, isn't it? ;)
>
> Regards,
> Michael
>
>
> On Fri, Sep 12, 2008 at 9:01 PM, Edwin Fine <
> erlang-questions_efine@REDACTED> wrote:
>
>> I hear you  :).
>>
>> When I've got some time I will try out my R12B-3 installation on SP4 with
>> your program, but can't right now. I am interested to see what happens.
>>
>> Rgds,
>> Ed
>>
>>
>> On Fri, Sep 12, 2008 at 2:47 PM, Michael Regen <michael.regen@REDACTED>wrote:
>>
>>> Just tested it on another machine with SP3 installed. No difference. Same
>>> problem.
>>>
>>> Yes, Windows is flaky and I personally would like to be able to say I
>>> would rather go to hell than installing a server on Windows which is
>>> expected to run robust.
>>> But well, either Erlang is robust on Windows as well or no Erlang on
>>> Windows. :(
>>>
>>> Regards,
>>> Michael
>>>
>>> --
>>> Quote from a >3000 employees IT centric company's CIO I had the pleasure
>>> to witness four weeks ago: 'For the messaging back end? No, we can't use
>>> Java. Java is too slow.'
>>>
>>>
>>>
>>>
>>> On Fri, Sep 12, 2008 at 7:54 PM, Edwin Fine <
>>> erlang-questions_efine@REDACTED> wrote:
>>>
>>>> Michael,
>>>>
>>>> I've always felt that the Windows version of Erlang is a bit flaky. Then
>>>> again, I think Windows itself is more than a bit flaky, so maybe it's not
>>>> Erlang's fault ;)
>>>> I wonder if running on SP4 would improve things?
>>>>
>>>>
>>>> On Fri, Sep 12, 2008 at 1:39 PM, Michael Regen <michael.regen@REDACTED
>>>> > wrote:
>>>>
>>>>> Hi Edwin,
>>>>>
>>>>> It is possible that both issues have a similar source but I do not see
>>>>> many reasons why there must be a common source.
>>>>> I was running my tests on a 32bit single core Windows XP SP2 system
>>>>> just by running
>>>>>   werl.exe -boot start_sasl
>>>>> or
>>>>>   werl.exe
>>>>>
>>>>> and did nothing fancy. My R12B-3 version is self compiled, R12B-4 is
>>>>> out of the erlang.org box.
>>>>> Client and server tests where done by starting two different instances
>>>>> of werl.
>>>>> Furthermore my tcp_test:test does not care whether results from
>>>>> gen_tcp:connect are correct or not. It just assumes {ok, Socket} and crashes
>>>>> the process if otherwise. Of course it was a surprise that under some
>>>>> circumstances the whole emulator crashes.
>>>>>
>>>>> By the way, the crash dump slogan is unspectecular: 'Slogan:
>>>>> Inconsistent, why isnt io reported?'
>>>>>
>>>>> UPDATE: I got some more observations which puzzle me even more:
>>>>>
>>>>> Just did some more of the same tests but this time by starting:
>>>>>   erl.exe
>>>>>   application:start(tcp_server).
>>>>> and
>>>>>   erl.exe
>>>>>   tcp_test:test(1000).
>>>>>
>>>>> There seems to be a difference between erl.exe and werl.exe.
>>>>>
>>>>> This time results are pretty different:
>>>>> Now it is much harder to crash the emulator. It takes significant more
>>>>> processes / tries until something bad happens:
>>>>>
>>>>> client only (tcp_test:test(5000)) crashes eventually in the same way
>>>>> but Window's cmd.exe now follows with a:
>>>>>   The exception unknown software exception (0x40000015) occured in the
>>>>> application at location 0x008fff86
>>>>>
>>>>> after the 'Crash dump was written to: erl_crash.dump / Inconsistent,
>>>>> why isnt io reported?' message and the crash dump file.
>>>>>
>>>>> The exception seems to always occure at the same location.
>>>>>
>>>>> A lot more error messages are printed now (as expected) until the
>>>>> crash.
>>>>> Besides the {{badmatch,{error,econnrefused}},[{tcp_test,test_con,0}]} I
>>>>> can now also watch lots of
>>>>>   {{badmatch,{error,eaddrinuse}},[{tcp_test,test_con,0}]}
>>>>> and
>>>>>   {{badmatch,{error,system_limit}},[{tcp_test,test_con,0}]}
>>>>> errors.
>>>>>
>>>>> The good message: During tests together with the server backend I was
>>>>> not able to crash the server. But I am not convinced that erl.exe solves
>>>>> everthing server side.
>>>>>
>>>>> Regards,
>>>>> Michael
>>>>>
>>>>>
>>>>> On Fri, Sep 12, 2008 at 6:32 PM, Edwin Fine <emofine@REDACTED> wrote:
>>>>>
>>>>>> Please be aware that I reported a bug a while ago on erlang-bugs,
>>>>>> where attempting to connect to a socket that is not being listened on will
>>>>>> sometimes return an actual success return, but subsequent operations will
>>>>>> fail. Here is an excerpt from that bug report.
>>>>>>
>>>>>> When calling gen_tcp:connect/3 or /4 on a host/port that does not have a
>>>>>> running program listening on it, at random intervals gen_tcp:connect returns
>>>>>> an {ok, Sock} instead of the expected {error, econnrefused}. If
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> gen_tcp:recv(Sock, 0) is called immediately using the socket just returned,
>>>>>> it returns an {error, econnrefused}. Connection options used were [binary,
>>>>>> {packet, raw}, {active, false}]. It should be noted that the gen_tcp:connect
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> succeeds when there is a program listening on that sane host/port, so it's
>>>>>> unlikely to be a firewall issue.
>>>>>>
>>>>>> See
>>>>>> http://www.erlang.org/pipermail/erlang-bugs/2008-August/000931.html
>>>>>>
>>>>>> This bug is still present in R12B-4. Could this be affecting you?
>>>>>>
>>>>>> Regards,
>>>>>> Edwin Fine
>>>>>>
>>>>>> 2008/9/12 Michael Regen <michael.regen@REDACTED>
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I got a series of troubles with gen_tcp all eventually resulting in
>>>>>>> crashes. I tested this under Windows XP and with R12B-3 as well as R12B-4.
>>>>>>> Under Linux it seems to work but I am not perfectly sure since the crash
>>>>>>> happens sporadically and seems to be timing related.
>>>>>>>
>>>>>>> The two problems below lead me to a couple of questions:
>>>>>>> a) What is the real cause? Is it the socket error enfile? Do both
>>>>>>> problems have the same root cause?
>>>>>>> b) Is there a bug in Erlang? I guess this should not lead to a crash.
>>>>>>> c) How do you avoid this problem on systems you do not control
>>>>>>> yourself?
>>>>>>>
>>>>>>>
>>>>>>> Problem #1:
>>>>>>> ###########
>>>>>>>
>>>>>>> Just compile the following code and run it with sasl enabled and the
>>>>>>> following command:
>>>>>>>   tcp_test:test(1000).
>>>>>>> and - yes - without anything listening on port 2222. And sometimes
>>>>>>> you have to try two times!
>>>>>>>
>>>>>>> -------------------------- start: tcp_test.erl
>>>>>>> --------------------------
>>>>>>> -module(tcp_test).
>>>>>>>
>>>>>>> -export([test/1, test_con/0]).
>>>>>>>
>>>>>>> -define(DEF_PORT, 2222).
>>>>>>> -define(DEF_IP, {127,0,0,1}).
>>>>>>>
>>>>>>> test(0) -> ok;
>>>>>>> test(HowManyProcs) ->
>>>>>>>   spawn(?MODULE, test_con, []),
>>>>>>>   test(HowManyProcs-1).
>>>>>>>
>>>>>>> test_con() ->
>>>>>>>   {ok,S} = gen_tcp:connect(?DEF_IP, ?DEF_PORT,[]),
>>>>>>>   gen_tcp:send(S,<<0,5,65,66,67,68,69>>),
>>>>>>>   receive
>>>>>>>     {tcp_closed, _Socket} -> ok;
>>>>>>>     _Msg -> gen_tcp:close(S)
>>>>>>>   after 500 ->
>>>>>>>     gen_tcp:close(S)
>>>>>>>   end.
>>>>>>> -------------------------- end: tcp_test.erl
>>>>>>> --------------------------
>>>>>>>
>>>>>>> It just spawns a bunch of processes all trying to connect to a
>>>>>>> currently closed port and sending some garbage there. This is what happens:
>>>>>>>
>>>>>>> -------------------------- start: log tcp_test.erl
>>>>>>> --------------------------
>>>>>>> =ERROR REPORT==== 12-Sep-2008::15:28:47 ===
>>>>>>> Error in process <0.41.0> with exit value:
>>>>>>> {{badmatch,{error,econnrefused}},[{tcp_test,test_con,0}]}
>>>>>>>
>>>>>>> [... a couple of them but usually between 1 and 20.]
>>>>>>>
>>>>>>> =ERROR REPORT==== 12-Sep-2008::15:28:47 ===
>>>>>>> Error in process <0.103.0> with exit value:
>>>>>>> {{badmatch,{error,econnrefused}},[{tcp_test,test_con,0}]}
>>>>>>>
>>>>>>>
>>>>>>> Crash dump was written to: erl_crash.dump
>>>>>>> Inconsistent, why isnt io reported?
>>>>>>>
>>>>>>> Abnormal termination
>>>>>>> -------------------------- end: log tcp_test.erl
>>>>>>> --------------------------
>>>>>>>
>>>>>>> It might have something to do with the socket error enfile 'file
>>>>>>> table overflow'  but I guess it should not simply crash the emulator!?
>>>>>>> Searching google for 'Inconsistent, why isnt io reported?' just gives
>>>>>>> one hit to Erlang's source code.
>>>>>>> I can provide the crash dump if needed. Just did not want to spam the
>>>>>>> whole list with big attachments.
>>>>>>> Spawning only 500 processes (tcp_test:test(500).) usually leads to a
>>>>>>> crash, spawning only 200 seems to work.
>>>>>>>
>>>>>>>
>>>>>>> Problem #2:
>>>>>>> ###########
>>>>>>>
>>>>>>> Now let's try the same with a server answering to port 2222: Just
>>>>>>> take the code from the trapexit tutorial 'Building a Non-blocking TCP server
>>>>>>> using OTP principles'
>>>>>>> http://trapexit.org/Building_a_Non-blocking_TCP_server_using_OTP_principles
>>>>>>> Start it first and then our test module in a different erlang node as
>>>>>>> described above. Now, usually the client survives (have seen crashes as
>>>>>>> well!) and the server crashes in a similar way. Sometimes it survives and in
>>>>>>> very rare cases you will see the following logs in the erlang server
>>>>>>> instance:
>>>>>>>
>>>>>>> -------------------------- start: log server
>>>>>>> --------------------------
>>>>>>> =ERROR REPORT==== 12-Sep-2008::12:58:56 ===
>>>>>>> File operation error: system_limit. Function: get_cwd. Process:
>>>>>>> code_server.
>>>>>>>
>>>>>>> =ERROR REPORT==== 12-Sep-2008::12:58:56 ===
>>>>>>> Error in async accept: {async_accept,"file table overflow"}.
>>>>>>>
>>>>>>> =ERROR REPORT==== 12-Sep-2008::12:58:56 ===
>>>>>>> ** Generic server tcp_listener terminating
>>>>>>> ** Last message in was
>>>>>>> {inet_async,#Port<0.109>,1019,{ok,#Port<0.2141>}}
>>>>>>> ** When Server state == {state,#Port<0.109>,1019,tcp_echo_fsm}
>>>>>>> ** Reason for termination ==
>>>>>>> ** {async_accept,"file table overflow"}
>>>>>>>
>>>>>>> [...]
>>>>>>> -------------------------- end: log server --------------------------
>>>>>>>
>>>>>>> Can anyone help? Thank you!
>>>>>>>
>>>>>>> Regards,
>>>>>>> Michael
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> erlang-questions mailing list
>>>>>>> erlang-questions@REDACTED
>>>>>>> http://www.erlang.org/mailman/listinfo/erlang-questions
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20080912/9d60fd40/attachment.htm>


More information about the erlang-questions mailing list