[erlang-questions] Erlang crash gen_tcp related (probably only under Windows)

Michael Regen michael.regen@REDACTED
Fri Sep 12 20:47:31 CEST 2008


Just tested it on another machine with SP3 installed. No difference. Same
problem.

Yes, Windows is flaky and I personally would like to be able to say I would
rather go to hell than installing a server on Windows which is expected to
run robust.
But well, either Erlang is robust on Windows as well or no Erlang on
Windows. :(

Regards,
Michael

-- 
Quote from a >3000 employees IT centric company's CIO I had the pleasure to
witness four weeks ago: 'For the messaging back end? No, we can't use Java.
Java is too slow.'



On Fri, Sep 12, 2008 at 7:54 PM, Edwin Fine
<erlang-questions_efine@REDACTED>wrote:

> Michael,
>
> I've always felt that the Windows version of Erlang is a bit flaky. Then
> again, I think Windows itself is more than a bit flaky, so maybe it's not
> Erlang's fault ;)
> I wonder if running on SP4 would improve things?
>
>
> On Fri, Sep 12, 2008 at 1:39 PM, Michael Regen <michael.regen@REDACTED>wrote:
>
>> Hi Edwin,
>>
>> It is possible that both issues have a similar source but I do not see
>> many reasons why there must be a common source.
>> I was running my tests on a 32bit single core Windows XP SP2 system just
>> by running
>>   werl.exe -boot start_sasl
>> or
>>   werl.exe
>>
>> and did nothing fancy. My R12B-3 version is self compiled, R12B-4 is out
>> of the erlang.org box.
>> Client and server tests where done by starting two different instances of
>> werl.
>> Furthermore my tcp_test:test does not care whether results from
>> gen_tcp:connect are correct or not. It just assumes {ok, Socket} and crashes
>> the process if otherwise. Of course it was a surprise that under some
>> circumstances the whole emulator crashes.
>>
>> By the way, the crash dump slogan is unspectecular: 'Slogan: Inconsistent,
>> why isnt io reported?'
>>
>> UPDATE: I got some more observations which puzzle me even more:
>>
>> Just did some more of the same tests but this time by starting:
>>   erl.exe
>>   application:start(tcp_server).
>> and
>>   erl.exe
>>   tcp_test:test(1000).
>>
>> There seems to be a difference between erl.exe and werl.exe.
>>
>> This time results are pretty different:
>> Now it is much harder to crash the emulator. It takes significant more
>> processes / tries until something bad happens:
>>
>> client only (tcp_test:test(5000)) crashes eventually in the same way but
>> Window's cmd.exe now follows with a:
>>   The exception unknown software exception (0x40000015) occured in the
>> application at location 0x008fff86
>>
>> after the 'Crash dump was written to: erl_crash.dump / Inconsistent, why
>> isnt io reported?' message and the crash dump file.
>>
>> The exception seems to always occure at the same location.
>>
>> A lot more error messages are printed now (as expected) until the crash.
>> Besides the {{badmatch,{error,econnrefused}},[{tcp_test,test_con,0}]} I
>> can now also watch lots of
>>   {{badmatch,{error,eaddrinuse}},[{tcp_test,test_con,0}]}
>> and
>>   {{badmatch,{error,system_limit}},[{tcp_test,test_con,0}]}
>> errors.
>>
>> The good message: During tests together with the server backend I was not
>> able to crash the server. But I am not convinced that erl.exe solves
>> everthing server side.
>>
>> Regards,
>> Michael
>>
>>
>> On Fri, Sep 12, 2008 at 6:32 PM, Edwin Fine <emofine@REDACTED> wrote:
>>
>>> Please be aware that I reported a bug a while ago on erlang-bugs, where
>>> attempting to connect to a socket that is not being listened on will
>>> sometimes return an actual success return, but subsequent operations will
>>> fail. Here is an excerpt from that bug report.
>>>
>>> When calling gen_tcp:connect/3 or /4 on a host/port that does not have a
>>> running program listening on it, at random intervals gen_tcp:connect returns
>>> an {ok, Sock} instead of the expected {error, econnrefused}. If
>>>
>>>
>>>
>>> gen_tcp:recv(Sock, 0) is called immediately using the socket just returned,
>>> it returns an {error, econnrefused}. Connection options used were [binary,
>>> {packet, raw}, {active, false}]. It should be noted that the gen_tcp:connect
>>>
>>>
>>>
>>> succeeds when there is a program listening on that sane host/port, so it's
>>> unlikely to be a firewall issue.
>>>
>>> See http://www.erlang.org/pipermail/erlang-bugs/2008-August/000931.html
>>>
>>> This bug is still present in R12B-4. Could this be affecting you?
>>>
>>> Regards,
>>> Edwin Fine
>>>
>>> 2008/9/12 Michael Regen <michael.regen@REDACTED>
>>>
>>>> Hi,
>>>>
>>>> I got a series of troubles with gen_tcp all eventually resulting in
>>>> crashes. I tested this under Windows XP and with R12B-3 as well as R12B-4.
>>>> Under Linux it seems to work but I am not perfectly sure since the crash
>>>> happens sporadically and seems to be timing related.
>>>>
>>>> The two problems below lead me to a couple of questions:
>>>> a) What is the real cause? Is it the socket error enfile? Do both
>>>> problems have the same root cause?
>>>> b) Is there a bug in Erlang? I guess this should not lead to a crash.
>>>> c) How do you avoid this problem on systems you do not control yourself?
>>>>
>>>>
>>>> Problem #1:
>>>> ###########
>>>>
>>>> Just compile the following code and run it with sasl enabled and the
>>>> following command:
>>>>   tcp_test:test(1000).
>>>> and - yes - without anything listening on port 2222. And sometimes you
>>>> have to try two times!
>>>>
>>>> -------------------------- start: tcp_test.erl
>>>> --------------------------
>>>> -module(tcp_test).
>>>>
>>>> -export([test/1, test_con/0]).
>>>>
>>>> -define(DEF_PORT, 2222).
>>>> -define(DEF_IP, {127,0,0,1}).
>>>>
>>>> test(0) -> ok;
>>>> test(HowManyProcs) ->
>>>>   spawn(?MODULE, test_con, []),
>>>>   test(HowManyProcs-1).
>>>>
>>>> test_con() ->
>>>>   {ok,S} = gen_tcp:connect(?DEF_IP, ?DEF_PORT,[]),
>>>>   gen_tcp:send(S,<<0,5,65,66,67,68,69>>),
>>>>   receive
>>>>     {tcp_closed, _Socket} -> ok;
>>>>     _Msg -> gen_tcp:close(S)
>>>>   after 500 ->
>>>>     gen_tcp:close(S)
>>>>   end.
>>>> -------------------------- end: tcp_test.erl --------------------------
>>>>
>>>> It just spawns a bunch of processes all trying to connect to a currently
>>>> closed port and sending some garbage there. This is what happens:
>>>>
>>>> -------------------------- start: log tcp_test.erl
>>>> --------------------------
>>>> =ERROR REPORT==== 12-Sep-2008::15:28:47 ===
>>>> Error in process <0.41.0> with exit value:
>>>> {{badmatch,{error,econnrefused}},[{tcp_test,test_con,0}]}
>>>>
>>>> [... a couple of them but usually between 1 and 20.]
>>>>
>>>> =ERROR REPORT==== 12-Sep-2008::15:28:47 ===
>>>> Error in process <0.103.0> with exit value:
>>>> {{badmatch,{error,econnrefused}},[{tcp_test,test_con,0}]}
>>>>
>>>>
>>>> Crash dump was written to: erl_crash.dump
>>>> Inconsistent, why isnt io reported?
>>>>
>>>> Abnormal termination
>>>> -------------------------- end: log tcp_test.erl
>>>> --------------------------
>>>>
>>>> It might have something to do with the socket error enfile 'file table
>>>> overflow'  but I guess it should not simply crash the emulator!?
>>>> Searching google for 'Inconsistent, why isnt io reported?' just gives
>>>> one hit to Erlang's source code.
>>>> I can provide the crash dump if needed. Just did not want to spam the
>>>> whole list with big attachments.
>>>> Spawning only 500 processes (tcp_test:test(500).) usually leads to a
>>>> crash, spawning only 200 seems to work.
>>>>
>>>>
>>>> Problem #2:
>>>> ###########
>>>>
>>>> Now let's try the same with a server answering to port 2222: Just take
>>>> the code from the trapexit tutorial 'Building a Non-blocking TCP server
>>>> using OTP principles'
>>>> http://trapexit.org/Building_a_Non-blocking_TCP_server_using_OTP_principles
>>>> Start it first and then our test module in a different erlang node as
>>>> described above. Now, usually the client survives (have seen crashes as
>>>> well!) and the server crashes in a similar way. Sometimes it survives and in
>>>> very rare cases you will see the following logs in the erlang server
>>>> instance:
>>>>
>>>> -------------------------- start: log server --------------------------
>>>> =ERROR REPORT==== 12-Sep-2008::12:58:56 ===
>>>> File operation error: system_limit. Function: get_cwd. Process:
>>>> code_server.
>>>>
>>>> =ERROR REPORT==== 12-Sep-2008::12:58:56 ===
>>>> Error in async accept: {async_accept,"file table overflow"}.
>>>>
>>>> =ERROR REPORT==== 12-Sep-2008::12:58:56 ===
>>>> ** Generic server tcp_listener terminating
>>>> ** Last message in was {inet_async,#Port<0.109>,1019,{ok,#Port<0.2141>}}
>>>> ** When Server state == {state,#Port<0.109>,1019,tcp_echo_fsm}
>>>> ** Reason for termination ==
>>>> ** {async_accept,"file table overflow"}
>>>>
>>>> [...]
>>>> -------------------------- end: log server --------------------------
>>>>
>>>> Can anyone help? Thank you!
>>>>
>>>> Regards,
>>>> Michael
>>>>
>>>> _______________________________________________
>>>> erlang-questions mailing list
>>>> erlang-questions@REDACTED
>>>> http://www.erlang.org/mailman/listinfo/erlang-questions
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20080912/9aed8004/attachment.htm>


More information about the erlang-questions mailing list