[erlang-questions] Erlang crash gen_tcp related (probably only under Windows)

Michael Regen michael.regen@REDACTED
Fri Sep 12 23:42:21 CEST 2008

I would be very interested to hear about some tests from others! And Edwin,
I guess you mean SP3. Windows XP SP3 is the most recent service pack. SP4 is
if at all a giant trojan, isn't it? ;)


On Fri, Sep 12, 2008 at 9:01 PM, Edwin Fine

> I hear you  :).
> When I've got some time I will try out my R12B-3 installation on SP4 with
> your program, but can't right now. I am interested to see what happens.
> Rgds,
> Ed
> On Fri, Sep 12, 2008 at 2:47 PM, Michael Regen <michael.regen@REDACTED>wrote:
>> Just tested it on another machine with SP3 installed. No difference. Same
>> problem.
>> Yes, Windows is flaky and I personally would like to be able to say I
>> would rather go to hell than installing a server on Windows which is
>> expected to run robust.
>> But well, either Erlang is robust on Windows as well or no Erlang on
>> Windows. :(
>> Regards,
>> Michael
>> --
>> Quote from a >3000 employees IT centric company's CIO I had the pleasure
>> to witness four weeks ago: 'For the messaging back end? No, we can't use
>> Java. Java is too slow.'
>> On Fri, Sep 12, 2008 at 7:54 PM, Edwin Fine <
>> erlang-questions_efine@REDACTED> wrote:
>>> Michael,
>>> I've always felt that the Windows version of Erlang is a bit flaky. Then
>>> again, I think Windows itself is more than a bit flaky, so maybe it's not
>>> Erlang's fault ;)
>>> I wonder if running on SP4 would improve things?
>>> On Fri, Sep 12, 2008 at 1:39 PM, Michael Regen <michael.regen@REDACTED>wrote:
>>>> Hi Edwin,
>>>> It is possible that both issues have a similar source but I do not see
>>>> many reasons why there must be a common source.
>>>> I was running my tests on a 32bit single core Windows XP SP2 system just
>>>> by running
>>>>   werl.exe -boot start_sasl
>>>> or
>>>>   werl.exe
>>>> and did nothing fancy. My R12B-3 version is self compiled, R12B-4 is out
>>>> of the erlang.org box.
>>>> Client and server tests where done by starting two different instances
>>>> of werl.
>>>> Furthermore my tcp_test:test does not care whether results from
>>>> gen_tcp:connect are correct or not. It just assumes {ok, Socket} and crashes
>>>> the process if otherwise. Of course it was a surprise that under some
>>>> circumstances the whole emulator crashes.
>>>> By the way, the crash dump slogan is unspectecular: 'Slogan:
>>>> Inconsistent, why isnt io reported?'
>>>> UPDATE: I got some more observations which puzzle me even more:
>>>> Just did some more of the same tests but this time by starting:
>>>>   erl.exe
>>>>   application:start(tcp_server).
>>>> and
>>>>   erl.exe
>>>>   tcp_test:test(1000).
>>>> There seems to be a difference between erl.exe and werl.exe.
>>>> This time results are pretty different:
>>>> Now it is much harder to crash the emulator. It takes significant more
>>>> processes / tries until something bad happens:
>>>> client only (tcp_test:test(5000)) crashes eventually in the same way but
>>>> Window's cmd.exe now follows with a:
>>>>   The exception unknown software exception (0x40000015) occured in the
>>>> application at location 0x008fff86
>>>> after the 'Crash dump was written to: erl_crash.dump / Inconsistent, why
>>>> isnt io reported?' message and the crash dump file.
>>>> The exception seems to always occure at the same location.
>>>> A lot more error messages are printed now (as expected) until the crash.
>>>> Besides the {{badmatch,{error,econnrefused}},[{tcp_test,test_con,0}]} I
>>>> can now also watch lots of
>>>>   {{badmatch,{error,eaddrinuse}},[{tcp_test,test_con,0}]}
>>>> and
>>>>   {{badmatch,{error,system_limit}},[{tcp_test,test_con,0}]}
>>>> errors.
>>>> The good message: During tests together with the server backend I was
>>>> not able to crash the server. But I am not convinced that erl.exe solves
>>>> everthing server side.
>>>> Regards,
>>>> Michael
>>>> On Fri, Sep 12, 2008 at 6:32 PM, Edwin Fine <emofine@REDACTED> wrote:
>>>>> Please be aware that I reported a bug a while ago on erlang-bugs, where
>>>>> attempting to connect to a socket that is not being listened on will
>>>>> sometimes return an actual success return, but subsequent operations will
>>>>> fail. Here is an excerpt from that bug report.
>>>>> When calling gen_tcp:connect/3 or /4 on a host/port that does not have a
>>>>> running program listening on it, at random intervals gen_tcp:connect returns
>>>>> an {ok, Sock} instead of the expected {error, econnrefused}. If
>>>>> gen_tcp:recv(Sock, 0) is called immediately using the socket just returned,
>>>>> it returns an {error, econnrefused}. Connection options used were [binary,
>>>>> {packet, raw}, {active, false}]. It should be noted that the gen_tcp:connect
>>>>> succeeds when there is a program listening on that sane host/port, so it's
>>>>> unlikely to be a firewall issue.
>>>>> See
>>>>> http://www.erlang.org/pipermail/erlang-bugs/2008-August/000931.html
>>>>> This bug is still present in R12B-4. Could this be affecting you?
>>>>> Regards,
>>>>> Edwin Fine
>>>>> 2008/9/12 Michael Regen <michael.regen@REDACTED>
>>>>>> Hi,
>>>>>> I got a series of troubles with gen_tcp all eventually resulting in
>>>>>> crashes. I tested this under Windows XP and with R12B-3 as well as R12B-4.
>>>>>> Under Linux it seems to work but I am not perfectly sure since the crash
>>>>>> happens sporadically and seems to be timing related.
>>>>>> The two problems below lead me to a couple of questions:
>>>>>> a) What is the real cause? Is it the socket error enfile? Do both
>>>>>> problems have the same root cause?
>>>>>> b) Is there a bug in Erlang? I guess this should not lead to a crash.
>>>>>> c) How do you avoid this problem on systems you do not control
>>>>>> yourself?
>>>>>> Problem #1:
>>>>>> ###########
>>>>>> Just compile the following code and run it with sasl enabled and the
>>>>>> following command:
>>>>>>   tcp_test:test(1000).
>>>>>> and - yes - without anything listening on port 2222. And sometimes you
>>>>>> have to try two times!
>>>>>> -------------------------- start: tcp_test.erl
>>>>>> --------------------------
>>>>>> -module(tcp_test).
>>>>>> -export([test/1, test_con/0]).
>>>>>> -define(DEF_PORT, 2222).
>>>>>> -define(DEF_IP, {127,0,0,1}).
>>>>>> test(0) -> ok;
>>>>>> test(HowManyProcs) ->
>>>>>>   spawn(?MODULE, test_con, []),
>>>>>>   test(HowManyProcs-1).
>>>>>> test_con() ->
>>>>>>   {ok,S} = gen_tcp:connect(?DEF_IP, ?DEF_PORT,[]),
>>>>>>   gen_tcp:send(S,<<0,5,65,66,67,68,69>>),
>>>>>>   receive
>>>>>>     {tcp_closed, _Socket} -> ok;
>>>>>>     _Msg -> gen_tcp:close(S)
>>>>>>   after 500 ->
>>>>>>     gen_tcp:close(S)
>>>>>>   end.
>>>>>> -------------------------- end: tcp_test.erl
>>>>>> --------------------------
>>>>>> It just spawns a bunch of processes all trying to connect to a
>>>>>> currently closed port and sending some garbage there. This is what happens:
>>>>>> -------------------------- start: log tcp_test.erl
>>>>>> --------------------------
>>>>>> =ERROR REPORT==== 12-Sep-2008::15:28:47 ===
>>>>>> Error in process <0.41.0> with exit value:
>>>>>> {{badmatch,{error,econnrefused}},[{tcp_test,test_con,0}]}
>>>>>> [... a couple of them but usually between 1 and 20.]
>>>>>> =ERROR REPORT==== 12-Sep-2008::15:28:47 ===
>>>>>> Error in process <0.103.0> with exit value:
>>>>>> {{badmatch,{error,econnrefused}},[{tcp_test,test_con,0}]}
>>>>>> Crash dump was written to: erl_crash.dump
>>>>>> Inconsistent, why isnt io reported?
>>>>>> Abnormal termination
>>>>>> -------------------------- end: log tcp_test.erl
>>>>>> --------------------------
>>>>>> It might have something to do with the socket error enfile 'file table
>>>>>> overflow'  but I guess it should not simply crash the emulator!?
>>>>>> Searching google for 'Inconsistent, why isnt io reported?' just gives
>>>>>> one hit to Erlang's source code.
>>>>>> I can provide the crash dump if needed. Just did not want to spam the
>>>>>> whole list with big attachments.
>>>>>> Spawning only 500 processes (tcp_test:test(500).) usually leads to a
>>>>>> crash, spawning only 200 seems to work.
>>>>>> Problem #2:
>>>>>> ###########
>>>>>> Now let's try the same with a server answering to port 2222: Just take
>>>>>> the code from the trapexit tutorial 'Building a Non-blocking TCP server
>>>>>> using OTP principles'
>>>>>> http://trapexit.org/Building_a_Non-blocking_TCP_server_using_OTP_principles
>>>>>> Start it first and then our test module in a different erlang node as
>>>>>> described above. Now, usually the client survives (have seen crashes as
>>>>>> well!) and the server crashes in a similar way. Sometimes it survives and in
>>>>>> very rare cases you will see the following logs in the erlang server
>>>>>> instance:
>>>>>> -------------------------- start: log server
>>>>>> --------------------------
>>>>>> =ERROR REPORT==== 12-Sep-2008::12:58:56 ===
>>>>>> File operation error: system_limit. Function: get_cwd. Process:
>>>>>> code_server.
>>>>>> =ERROR REPORT==== 12-Sep-2008::12:58:56 ===
>>>>>> Error in async accept: {async_accept,"file table overflow"}.
>>>>>> =ERROR REPORT==== 12-Sep-2008::12:58:56 ===
>>>>>> ** Generic server tcp_listener terminating
>>>>>> ** Last message in was
>>>>>> {inet_async,#Port<0.109>,1019,{ok,#Port<0.2141>}}
>>>>>> ** When Server state == {state,#Port<0.109>,1019,tcp_echo_fsm}
>>>>>> ** Reason for termination ==
>>>>>> ** {async_accept,"file table overflow"}
>>>>>> [...]
>>>>>> -------------------------- end: log server --------------------------
>>>>>> Can anyone help? Thank you!
>>>>>> Regards,
>>>>>> Michael
>>>>>> _______________________________________________
>>>>>> erlang-questions mailing list
>>>>>> erlang-questions@REDACTED
>>>>>> http://www.erlang.org/mailman/listinfo/erlang-questions
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20080912/8ba20482/attachment.htm>

More information about the erlang-questions mailing list