[erlang-questions] Erlang crash gen_tcp related (probably only under Windows)

Edwin Fine <>
Fri Sep 12 21:01:10 CEST 2008


I hear you  :).

When I've got some time I will try out my R12B-3 installation on SP4 with
your program, but can't right now. I am interested to see what happens.

Rgds,
Ed

On Fri, Sep 12, 2008 at 2:47 PM, Michael Regen <>wrote:

> Just tested it on another machine with SP3 installed. No difference. Same
> problem.
>
> Yes, Windows is flaky and I personally would like to be able to say I would
> rather go to hell than installing a server on Windows which is expected to
> run robust.
> But well, either Erlang is robust on Windows as well or no Erlang on
> Windows. :(
>
> Regards,
> Michael
>
> --
> Quote from a >3000 employees IT centric company's CIO I had the pleasure to
> witness four weeks ago: 'For the messaging back end? No, we can't use Java.
> Java is too slow.'
>
>
>
>
> On Fri, Sep 12, 2008 at 7:54 PM, Edwin Fine <
> > wrote:
>
>> Michael,
>>
>> I've always felt that the Windows version of Erlang is a bit flaky. Then
>> again, I think Windows itself is more than a bit flaky, so maybe it's not
>> Erlang's fault ;)
>> I wonder if running on SP4 would improve things?
>>
>>
>> On Fri, Sep 12, 2008 at 1:39 PM, Michael Regen <>wrote:
>>
>>> Hi Edwin,
>>>
>>> It is possible that both issues have a similar source but I do not see
>>> many reasons why there must be a common source.
>>> I was running my tests on a 32bit single core Windows XP SP2 system just
>>> by running
>>>   werl.exe -boot start_sasl
>>> or
>>>   werl.exe
>>>
>>> and did nothing fancy. My R12B-3 version is self compiled, R12B-4 is out
>>> of the erlang.org box.
>>> Client and server tests where done by starting two different instances of
>>> werl.
>>> Furthermore my tcp_test:test does not care whether results from
>>> gen_tcp:connect are correct or not. It just assumes {ok, Socket} and crashes
>>> the process if otherwise. Of course it was a surprise that under some
>>> circumstances the whole emulator crashes.
>>>
>>> By the way, the crash dump slogan is unspectecular: 'Slogan:
>>> Inconsistent, why isnt io reported?'
>>>
>>> UPDATE: I got some more observations which puzzle me even more:
>>>
>>> Just did some more of the same tests but this time by starting:
>>>   erl.exe
>>>   application:start(tcp_server).
>>> and
>>>   erl.exe
>>>   tcp_test:test(1000).
>>>
>>> There seems to be a difference between erl.exe and werl.exe.
>>>
>>> This time results are pretty different:
>>> Now it is much harder to crash the emulator. It takes significant more
>>> processes / tries until something bad happens:
>>>
>>> client only (tcp_test:test(5000)) crashes eventually in the same way but
>>> Window's cmd.exe now follows with a:
>>>   The exception unknown software exception (0x40000015) occured in the
>>> application at location 0x008fff86
>>>
>>> after the 'Crash dump was written to: erl_crash.dump / Inconsistent, why
>>> isnt io reported?' message and the crash dump file.
>>>
>>> The exception seems to always occure at the same location.
>>>
>>> A lot more error messages are printed now (as expected) until the crash.
>>> Besides the {{badmatch,{error,econnrefused}},[{tcp_test,test_con,0}]} I
>>> can now also watch lots of
>>>   {{badmatch,{error,eaddrinuse}},[{tcp_test,test_con,0}]}
>>> and
>>>   {{badmatch,{error,system_limit}},[{tcp_test,test_con,0}]}
>>> errors.
>>>
>>> The good message: During tests together with the server backend I was not
>>> able to crash the server. But I am not convinced that erl.exe solves
>>> everthing server side.
>>>
>>> Regards,
>>> Michael
>>>
>>>
>>> On Fri, Sep 12, 2008 at 6:32 PM, Edwin Fine <> wrote:
>>>
>>>> Please be aware that I reported a bug a while ago on erlang-bugs, where
>>>> attempting to connect to a socket that is not being listened on will
>>>> sometimes return an actual success return, but subsequent operations will
>>>> fail. Here is an excerpt from that bug report.
>>>>
>>>> When calling gen_tcp:connect/3 or /4 on a host/port that does not have a
>>>> running program listening on it, at random intervals gen_tcp:connect returns
>>>> an {ok, Sock} instead of the expected {error, econnrefused}. If
>>>>
>>>>
>>>>
>>>>
>>>> gen_tcp:recv(Sock, 0) is called immediately using the socket just returned,
>>>> it returns an {error, econnrefused}. Connection options used were [binary,
>>>> {packet, raw}, {active, false}]. It should be noted that the gen_tcp:connect
>>>>
>>>>
>>>>
>>>>
>>>> succeeds when there is a program listening on that sane host/port, so it's
>>>> unlikely to be a firewall issue.
>>>>
>>>> See http://www.erlang.org/pipermail/erlang-bugs/2008-August/000931.html
>>>>
>>>> This bug is still present in R12B-4. Could this be affecting you?
>>>>
>>>> Regards,
>>>> Edwin Fine
>>>>
>>>> 2008/9/12 Michael Regen <>
>>>>
>>>>> Hi,
>>>>>
>>>>> I got a series of troubles with gen_tcp all eventually resulting in
>>>>> crashes. I tested this under Windows XP and with R12B-3 as well as R12B-4.
>>>>> Under Linux it seems to work but I am not perfectly sure since the crash
>>>>> happens sporadically and seems to be timing related.
>>>>>
>>>>> The two problems below lead me to a couple of questions:
>>>>> a) What is the real cause? Is it the socket error enfile? Do both
>>>>> problems have the same root cause?
>>>>> b) Is there a bug in Erlang? I guess this should not lead to a crash.
>>>>> c) How do you avoid this problem on systems you do not control
>>>>> yourself?
>>>>>
>>>>>
>>>>> Problem #1:
>>>>> ###########
>>>>>
>>>>> Just compile the following code and run it with sasl enabled and the
>>>>> following command:
>>>>>   tcp_test:test(1000).
>>>>> and - yes - without anything listening on port 2222. And sometimes you
>>>>> have to try two times!
>>>>>
>>>>> -------------------------- start: tcp_test.erl
>>>>> --------------------------
>>>>> -module(tcp_test).
>>>>>
>>>>> -export([test/1, test_con/0]).
>>>>>
>>>>> -define(DEF_PORT, 2222).
>>>>> -define(DEF_IP, {127,0,0,1}).
>>>>>
>>>>> test(0) -> ok;
>>>>> test(HowManyProcs) ->
>>>>>   spawn(?MODULE, test_con, []),
>>>>>   test(HowManyProcs-1).
>>>>>
>>>>> test_con() ->
>>>>>   {ok,S} = gen_tcp:connect(?DEF_IP, ?DEF_PORT,[]),
>>>>>   gen_tcp:send(S,<<0,5,65,66,67,68,69>>),
>>>>>   receive
>>>>>     {tcp_closed, _Socket} -> ok;
>>>>>     _Msg -> gen_tcp:close(S)
>>>>>   after 500 ->
>>>>>     gen_tcp:close(S)
>>>>>   end.
>>>>> -------------------------- end: tcp_test.erl --------------------------
>>>>>
>>>>> It just spawns a bunch of processes all trying to connect to a
>>>>> currently closed port and sending some garbage there. This is what happens:
>>>>>
>>>>> -------------------------- start: log tcp_test.erl
>>>>> --------------------------
>>>>> =ERROR REPORT==== 12-Sep-2008::15:28:47 ===
>>>>> Error in process <0.41.0> with exit value:
>>>>> {{badmatch,{error,econnrefused}},[{tcp_test,test_con,0}]}
>>>>>
>>>>> [... a couple of them but usually between 1 and 20.]
>>>>>
>>>>> =ERROR REPORT==== 12-Sep-2008::15:28:47 ===
>>>>> Error in process <0.103.0> with exit value:
>>>>> {{badmatch,{error,econnrefused}},[{tcp_test,test_con,0}]}
>>>>>
>>>>>
>>>>> Crash dump was written to: erl_crash.dump
>>>>> Inconsistent, why isnt io reported?
>>>>>
>>>>> Abnormal termination
>>>>> -------------------------- end: log tcp_test.erl
>>>>> --------------------------
>>>>>
>>>>> It might have something to do with the socket error enfile 'file table
>>>>> overflow'  but I guess it should not simply crash the emulator!?
>>>>> Searching google for 'Inconsistent, why isnt io reported?' just gives
>>>>> one hit to Erlang's source code.
>>>>> I can provide the crash dump if needed. Just did not want to spam the
>>>>> whole list with big attachments.
>>>>> Spawning only 500 processes (tcp_test:test(500).) usually leads to a
>>>>> crash, spawning only 200 seems to work.
>>>>>
>>>>>
>>>>> Problem #2:
>>>>> ###########
>>>>>
>>>>> Now let's try the same with a server answering to port 2222: Just take
>>>>> the code from the trapexit tutorial 'Building a Non-blocking TCP server
>>>>> using OTP principles'
>>>>> http://trapexit.org/Building_a_Non-blocking_TCP_server_using_OTP_principles
>>>>> Start it first and then our test module in a different erlang node as
>>>>> described above. Now, usually the client survives (have seen crashes as
>>>>> well!) and the server crashes in a similar way. Sometimes it survives and in
>>>>> very rare cases you will see the following logs in the erlang server
>>>>> instance:
>>>>>
>>>>> -------------------------- start: log server --------------------------
>>>>> =ERROR REPORT==== 12-Sep-2008::12:58:56 ===
>>>>> File operation error: system_limit. Function: get_cwd. Process:
>>>>> code_server.
>>>>>
>>>>> =ERROR REPORT==== 12-Sep-2008::12:58:56 ===
>>>>> Error in async accept: {async_accept,"file table overflow"}.
>>>>>
>>>>> =ERROR REPORT==== 12-Sep-2008::12:58:56 ===
>>>>> ** Generic server tcp_listener terminating
>>>>> ** Last message in was
>>>>> {inet_async,#Port<0.109>,1019,{ok,#Port<0.2141>}}
>>>>> ** When Server state == {state,#Port<0.109>,1019,tcp_echo_fsm}
>>>>> ** Reason for termination ==
>>>>> ** {async_accept,"file table overflow"}
>>>>>
>>>>> [...]
>>>>> -------------------------- end: log server --------------------------
>>>>>
>>>>> Can anyone help? Thank you!
>>>>>
>>>>> Regards,
>>>>> Michael
>>>>>
>>>>> _______________________________________________
>>>>> erlang-questions mailing list
>>>>> 
>>>>> http://www.erlang.org/mailman/listinfo/erlang-questions
>>>>>
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20080912/65f67448/attachment.html>


More information about the erlang-questions mailing list