[erlang-questions] Erlang crash gen_tcp related (probably only under Windows)

Michael Regen michael.regen@REDACTED
Fri Sep 12 19:39:19 CEST 2008


Hi Edwin,

It is possible that both issues have a similar source but I do not see many
reasons why there must be a common source.
I was running my tests on a 32bit single core Windows XP SP2 system just by
running
  werl.exe -boot start_sasl
or
  werl.exe

and did nothing fancy. My R12B-3 version is self compiled, R12B-4 is out of
the erlang.org box.
Client and server tests where done by starting two different instances of
werl.
Furthermore my tcp_test:test does not care whether results from
gen_tcp:connect are correct or not. It just assumes {ok, Socket} and crashes
the process if otherwise. Of course it was a surprise that under some
circumstances the whole emulator crashes.

By the way, the crash dump slogan is unspectecular: 'Slogan: Inconsistent,
why isnt io reported?'

UPDATE: I got some more observations which puzzle me even more:

Just did some more of the same tests but this time by starting:
  erl.exe
  application:start(tcp_server).
and
  erl.exe
  tcp_test:test(1000).

There seems to be a difference between erl.exe and werl.exe.

This time results are pretty different:
Now it is much harder to crash the emulator. It takes significant more
processes / tries until something bad happens:

client only (tcp_test:test(5000)) crashes eventually in the same way but
Window's cmd.exe now follows with a:
  The exception unknown software exception (0x40000015) occured in the
application at location 0x008fff86

after the 'Crash dump was written to: erl_crash.dump / Inconsistent, why
isnt io reported?' message and the crash dump file.

The exception seems to always occure at the same location.

A lot more error messages are printed now (as expected) until the crash.
Besides the {{badmatch,{error,econnrefused}},[{tcp_test,test_con,0}]} I can
now also watch lots of
  {{badmatch,{error,eaddrinuse}},[{tcp_test,test_con,0}]}
and
  {{badmatch,{error,system_limit}},[{tcp_test,test_con,0}]}
errors.

The good message: During tests together with the server backend I was not
able to crash the server. But I am not convinced that erl.exe solves
everthing server side.

Regards,
Michael

On Fri, Sep 12, 2008 at 6:32 PM, Edwin Fine <emofine@REDACTED> wrote:

> Please be aware that I reported a bug a while ago on erlang-bugs, where
> attempting to connect to a socket that is not being listened on will
> sometimes return an actual success return, but subsequent operations will
> fail. Here is an excerpt from that bug report.
>
> When calling gen_tcp:connect/3 or /4 on a host/port that does not have a
> running program listening on it, at random intervals gen_tcp:connect returns
> an {ok, Sock} instead of the expected {error, econnrefused}. If
>
> gen_tcp:recv(Sock, 0) is called immediately using the socket just returned,
> it returns an {error, econnrefused}. Connection options used were [binary,
> {packet, raw}, {active, false}]. It should be noted that the gen_tcp:connect
>
> succeeds when there is a program listening on that sane host/port, so it's
> unlikely to be a firewall issue.
>
> See http://www.erlang.org/pipermail/erlang-bugs/2008-August/000931.html
>
> This bug is still present in R12B-4. Could this be affecting you?
>
> Regards,
> Edwin Fine
>
> 2008/9/12 Michael Regen <michael.regen@REDACTED>
>
>> Hi,
>>
>> I got a series of troubles with gen_tcp all eventually resulting in
>> crashes. I tested this under Windows XP and with R12B-3 as well as R12B-4.
>> Under Linux it seems to work but I am not perfectly sure since the crash
>> happens sporadically and seems to be timing related.
>>
>> The two problems below lead me to a couple of questions:
>> a) What is the real cause? Is it the socket error enfile? Do both problems
>> have the same root cause?
>> b) Is there a bug in Erlang? I guess this should not lead to a crash.
>> c) How do you avoid this problem on systems you do not control yourself?
>>
>>
>> Problem #1:
>> ###########
>>
>> Just compile the following code and run it with sasl enabled and the
>> following command:
>>   tcp_test:test(1000).
>> and - yes - without anything listening on port 2222. And sometimes you
>> have to try two times!
>>
>> -------------------------- start: tcp_test.erl --------------------------
>> -module(tcp_test).
>>
>> -export([test/1, test_con/0]).
>>
>> -define(DEF_PORT, 2222).
>> -define(DEF_IP, {127,0,0,1}).
>>
>> test(0) -> ok;
>> test(HowManyProcs) ->
>>   spawn(?MODULE, test_con, []),
>>   test(HowManyProcs-1).
>>
>> test_con() ->
>>   {ok,S} = gen_tcp:connect(?DEF_IP, ?DEF_PORT,[]),
>>   gen_tcp:send(S,<<0,5,65,66,67,68,69>>),
>>   receive
>>     {tcp_closed, _Socket} -> ok;
>>     _Msg -> gen_tcp:close(S)
>>   after 500 ->
>>     gen_tcp:close(S)
>>   end.
>> -------------------------- end: tcp_test.erl --------------------------
>>
>> It just spawns a bunch of processes all trying to connect to a currently
>> closed port and sending some garbage there. This is what happens:
>>
>> -------------------------- start: log tcp_test.erl
>> --------------------------
>> =ERROR REPORT==== 12-Sep-2008::15:28:47 ===
>> Error in process <0.41.0> with exit value:
>> {{badmatch,{error,econnrefused}},[{tcp_test,test_con,0}]}
>>
>> [... a couple of them but usually between 1 and 20.]
>>
>> =ERROR REPORT==== 12-Sep-2008::15:28:47 ===
>> Error in process <0.103.0> with exit value:
>> {{badmatch,{error,econnrefused}},[{tcp_test,test_con,0}]}
>>
>>
>> Crash dump was written to: erl_crash.dump
>> Inconsistent, why isnt io reported?
>>
>> Abnormal termination
>> -------------------------- end: log tcp_test.erl
>> --------------------------
>>
>> It might have something to do with the socket error enfile 'file table
>> overflow'  but I guess it should not simply crash the emulator!?
>> Searching google for 'Inconsistent, why isnt io reported?' just gives one
>> hit to Erlang's source code.
>> I can provide the crash dump if needed. Just did not want to spam the
>> whole list with big attachments.
>> Spawning only 500 processes (tcp_test:test(500).) usually leads to a
>> crash, spawning only 200 seems to work.
>>
>>
>> Problem #2:
>> ###########
>>
>> Now let's try the same with a server answering to port 2222: Just take the
>> code from the trapexit tutorial 'Building a Non-blocking TCP server using
>> OTP principles'
>> http://trapexit.org/Building_a_Non-blocking_TCP_server_using_OTP_principles
>> Start it first and then our test module in a different erlang node as
>> described above. Now, usually the client survives (have seen crashes as
>> well!) and the server crashes in a similar way. Sometimes it survives and in
>> very rare cases you will see the following logs in the erlang server
>> instance:
>>
>> -------------------------- start: log server --------------------------
>> =ERROR REPORT==== 12-Sep-2008::12:58:56 ===
>> File operation error: system_limit. Function: get_cwd. Process:
>> code_server.
>>
>> =ERROR REPORT==== 12-Sep-2008::12:58:56 ===
>> Error in async accept: {async_accept,"file table overflow"}.
>>
>> =ERROR REPORT==== 12-Sep-2008::12:58:56 ===
>> ** Generic server tcp_listener terminating
>> ** Last message in was {inet_async,#Port<0.109>,1019,{ok,#Port<0.2141>}}
>> ** When Server state == {state,#Port<0.109>,1019,tcp_echo_fsm}
>> ** Reason for termination ==
>> ** {async_accept,"file table overflow"}
>>
>> [...]
>> -------------------------- end: log server --------------------------
>>
>> Can anyone help? Thank you!
>>
>> Regards,
>> Michael
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://www.erlang.org/mailman/listinfo/erlang-questions
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20080912/72f013f7/attachment.htm>


More information about the erlang-questions mailing list