[erlang-questions] Erlang crash gen_tcp related (probably only under Windows)

Edwin Fine erlang-questions_efine@REDACTED
Fri Sep 12 19:54:46 CEST 2008


Michael,

I've always felt that the Windows version of Erlang is a bit flaky. Then
again, I think Windows itself is more than a bit flaky, so maybe it's not
Erlang's fault ;)
I wonder if running on SP4 would improve things?

On Fri, Sep 12, 2008 at 1:39 PM, Michael Regen <michael.regen@REDACTED>wrote:

> Hi Edwin,
>
> It is possible that both issues have a similar source but I do not see many
> reasons why there must be a common source.
> I was running my tests on a 32bit single core Windows XP SP2 system just by
> running
>   werl.exe -boot start_sasl
> or
>   werl.exe
>
> and did nothing fancy. My R12B-3 version is self compiled, R12B-4 is out of
> the erlang.org box.
> Client and server tests where done by starting two different instances of
> werl.
> Furthermore my tcp_test:test does not care whether results from
> gen_tcp:connect are correct or not. It just assumes {ok, Socket} and crashes
> the process if otherwise. Of course it was a surprise that under some
> circumstances the whole emulator crashes.
>
> By the way, the crash dump slogan is unspectecular: 'Slogan: Inconsistent,
> why isnt io reported?'
>
> UPDATE: I got some more observations which puzzle me even more:
>
> Just did some more of the same tests but this time by starting:
>   erl.exe
>   application:start(tcp_server).
> and
>   erl.exe
>   tcp_test:test(1000).
>
> There seems to be a difference between erl.exe and werl.exe.
>
> This time results are pretty different:
> Now it is much harder to crash the emulator. It takes significant more
> processes / tries until something bad happens:
>
> client only (tcp_test:test(5000)) crashes eventually in the same way but
> Window's cmd.exe now follows with a:
>   The exception unknown software exception (0x40000015) occured in the
> application at location 0x008fff86
>
> after the 'Crash dump was written to: erl_crash.dump / Inconsistent, why
> isnt io reported?' message and the crash dump file.
>
> The exception seems to always occure at the same location.
>
> A lot more error messages are printed now (as expected) until the crash.
> Besides the {{badmatch,{error,econnrefused}},[{tcp_test,test_con,0}]} I can
> now also watch lots of
>   {{badmatch,{error,eaddrinuse}},[{tcp_test,test_con,0}]}
> and
>   {{badmatch,{error,system_limit}},[{tcp_test,test_con,0}]}
> errors.
>
> The good message: During tests together with the server backend I was not
> able to crash the server. But I am not convinced that erl.exe solves
> everthing server side.
>
> Regards,
> Michael
>
>
> On Fri, Sep 12, 2008 at 6:32 PM, Edwin Fine <emofine@REDACTED> wrote:
>
>> Please be aware that I reported a bug a while ago on erlang-bugs, where
>> attempting to connect to a socket that is not being listened on will
>> sometimes return an actual success return, but subsequent operations will
>> fail. Here is an excerpt from that bug report.
>>
>> When calling gen_tcp:connect/3 or /4 on a host/port that does not have a
>> running program listening on it, at random intervals gen_tcp:connect returns
>> an {ok, Sock} instead of the expected {error, econnrefused}. If
>>
>>
>> gen_tcp:recv(Sock, 0) is called immediately using the socket just returned,
>> it returns an {error, econnrefused}. Connection options used were [binary,
>> {packet, raw}, {active, false}]. It should be noted that the gen_tcp:connect
>>
>>
>> succeeds when there is a program listening on that sane host/port, so it's
>> unlikely to be a firewall issue.
>>
>> See http://www.erlang.org/pipermail/erlang-bugs/2008-August/000931.html
>>
>> This bug is still present in R12B-4. Could this be affecting you?
>>
>> Regards,
>> Edwin Fine
>>
>> 2008/9/12 Michael Regen <michael.regen@REDACTED>
>>
>>> Hi,
>>>
>>> I got a series of troubles with gen_tcp all eventually resulting in
>>> crashes. I tested this under Windows XP and with R12B-3 as well as R12B-4.
>>> Under Linux it seems to work but I am not perfectly sure since the crash
>>> happens sporadically and seems to be timing related.
>>>
>>> The two problems below lead me to a couple of questions:
>>> a) What is the real cause? Is it the socket error enfile? Do both
>>> problems have the same root cause?
>>> b) Is there a bug in Erlang? I guess this should not lead to a crash.
>>> c) How do you avoid this problem on systems you do not control yourself?
>>>
>>>
>>> Problem #1:
>>> ###########
>>>
>>> Just compile the following code and run it with sasl enabled and the
>>> following command:
>>>   tcp_test:test(1000).
>>> and - yes - without anything listening on port 2222. And sometimes you
>>> have to try two times!
>>>
>>> -------------------------- start: tcp_test.erl --------------------------
>>> -module(tcp_test).
>>>
>>> -export([test/1, test_con/0]).
>>>
>>> -define(DEF_PORT, 2222).
>>> -define(DEF_IP, {127,0,0,1}).
>>>
>>> test(0) -> ok;
>>> test(HowManyProcs) ->
>>>   spawn(?MODULE, test_con, []),
>>>   test(HowManyProcs-1).
>>>
>>> test_con() ->
>>>   {ok,S} = gen_tcp:connect(?DEF_IP, ?DEF_PORT,[]),
>>>   gen_tcp:send(S,<<0,5,65,66,67,68,69>>),
>>>   receive
>>>     {tcp_closed, _Socket} -> ok;
>>>     _Msg -> gen_tcp:close(S)
>>>   after 500 ->
>>>     gen_tcp:close(S)
>>>   end.
>>> -------------------------- end: tcp_test.erl --------------------------
>>>
>>> It just spawns a bunch of processes all trying to connect to a currently
>>> closed port and sending some garbage there. This is what happens:
>>>
>>> -------------------------- start: log tcp_test.erl
>>> --------------------------
>>> =ERROR REPORT==== 12-Sep-2008::15:28:47 ===
>>> Error in process <0.41.0> with exit value:
>>> {{badmatch,{error,econnrefused}},[{tcp_test,test_con,0}]}
>>>
>>> [... a couple of them but usually between 1 and 20.]
>>>
>>> =ERROR REPORT==== 12-Sep-2008::15:28:47 ===
>>> Error in process <0.103.0> with exit value:
>>> {{badmatch,{error,econnrefused}},[{tcp_test,test_con,0}]}
>>>
>>>
>>> Crash dump was written to: erl_crash.dump
>>> Inconsistent, why isnt io reported?
>>>
>>> Abnormal termination
>>> -------------------------- end: log tcp_test.erl
>>> --------------------------
>>>
>>> It might have something to do with the socket error enfile 'file table
>>> overflow'  but I guess it should not simply crash the emulator!?
>>> Searching google for 'Inconsistent, why isnt io reported?' just gives one
>>> hit to Erlang's source code.
>>> I can provide the crash dump if needed. Just did not want to spam the
>>> whole list with big attachments.
>>> Spawning only 500 processes (tcp_test:test(500).) usually leads to a
>>> crash, spawning only 200 seems to work.
>>>
>>>
>>> Problem #2:
>>> ###########
>>>
>>> Now let's try the same with a server answering to port 2222: Just take
>>> the code from the trapexit tutorial 'Building a Non-blocking TCP server
>>> using OTP principles'
>>> http://trapexit.org/Building_a_Non-blocking_TCP_server_using_OTP_principles
>>> Start it first and then our test module in a different erlang node as
>>> described above. Now, usually the client survives (have seen crashes as
>>> well!) and the server crashes in a similar way. Sometimes it survives and in
>>> very rare cases you will see the following logs in the erlang server
>>> instance:
>>>
>>> -------------------------- start: log server --------------------------
>>> =ERROR REPORT==== 12-Sep-2008::12:58:56 ===
>>> File operation error: system_limit. Function: get_cwd. Process:
>>> code_server.
>>>
>>> =ERROR REPORT==== 12-Sep-2008::12:58:56 ===
>>> Error in async accept: {async_accept,"file table overflow"}.
>>>
>>> =ERROR REPORT==== 12-Sep-2008::12:58:56 ===
>>> ** Generic server tcp_listener terminating
>>> ** Last message in was {inet_async,#Port<0.109>,1019,{ok,#Port<0.2141>}}
>>> ** When Server state == {state,#Port<0.109>,1019,tcp_echo_fsm}
>>> ** Reason for termination ==
>>> ** {async_accept,"file table overflow"}
>>>
>>> [...]
>>> -------------------------- end: log server --------------------------
>>>
>>> Can anyone help? Thank you!
>>>
>>> Regards,
>>> Michael
>>>
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions@REDACTED
>>> http://www.erlang.org/mailman/listinfo/erlang-questions
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20080912/909b656c/attachment.htm>


More information about the erlang-questions mailing list