[erlang-questions] Erlang crash gen_tcp related (probably only under Windows)
Michael Regen
michael.regen@REDACTED
Fri Sep 12 19:39:19 CEST 2008
Hi Edwin,
It is possible that both issues have a similar source but I do not see many
reasons why there must be a common source.
I was running my tests on a 32bit single core Windows XP SP2 system just by
running
werl.exe -boot start_sasl
or
werl.exe
and did nothing fancy. My R12B-3 version is self compiled, R12B-4 is out of
the erlang.org box.
Client and server tests where done by starting two different instances of
werl.
Furthermore my tcp_test:test does not care whether results from
gen_tcp:connect are correct or not. It just assumes {ok, Socket} and crashes
the process if otherwise. Of course it was a surprise that under some
circumstances the whole emulator crashes.
By the way, the crash dump slogan is unspectecular: 'Slogan: Inconsistent,
why isnt io reported?'
UPDATE: I got some more observations which puzzle me even more:
Just did some more of the same tests but this time by starting:
erl.exe
application:start(tcp_server).
and
erl.exe
tcp_test:test(1000).
There seems to be a difference between erl.exe and werl.exe.
This time results are pretty different:
Now it is much harder to crash the emulator. It takes significant more
processes / tries until something bad happens:
client only (tcp_test:test(5000)) crashes eventually in the same way but
Window's cmd.exe now follows with a:
The exception unknown software exception (0x40000015) occured in the
application at location 0x008fff86
after the 'Crash dump was written to: erl_crash.dump / Inconsistent, why
isnt io reported?' message and the crash dump file.
The exception seems to always occure at the same location.
A lot more error messages are printed now (as expected) until the crash.
Besides the {{badmatch,{error,econnrefused}},[{tcp_test,test_con,0}]} I can
now also watch lots of
{{badmatch,{error,eaddrinuse}},[{tcp_test,test_con,0}]}
and
{{badmatch,{error,system_limit}},[{tcp_test,test_con,0}]}
errors.
The good message: During tests together with the server backend I was not
able to crash the server. But I am not convinced that erl.exe solves
everthing server side.
Regards,
Michael
On Fri, Sep 12, 2008 at 6:32 PM, Edwin Fine <emofine@REDACTED> wrote:
> Please be aware that I reported a bug a while ago on erlang-bugs, where
> attempting to connect to a socket that is not being listened on will
> sometimes return an actual success return, but subsequent operations will
> fail. Here is an excerpt from that bug report.
>
> When calling gen_tcp:connect/3 or /4 on a host/port that does not have a
> running program listening on it, at random intervals gen_tcp:connect returns
> an {ok, Sock} instead of the expected {error, econnrefused}. If
>
> gen_tcp:recv(Sock, 0) is called immediately using the socket just returned,
> it returns an {error, econnrefused}. Connection options used were [binary,
> {packet, raw}, {active, false}]. It should be noted that the gen_tcp:connect
>
> succeeds when there is a program listening on that sane host/port, so it's
> unlikely to be a firewall issue.
>
> See http://www.erlang.org/pipermail/erlang-bugs/2008-August/000931.html
>
> This bug is still present in R12B-4. Could this be affecting you?
>
> Regards,
> Edwin Fine
>
> 2008/9/12 Michael Regen <michael.regen@REDACTED>
>
>> Hi,
>>
>> I got a series of troubles with gen_tcp all eventually resulting in
>> crashes. I tested this under Windows XP and with R12B-3 as well as R12B-4.
>> Under Linux it seems to work but I am not perfectly sure since the crash
>> happens sporadically and seems to be timing related.
>>
>> The two problems below lead me to a couple of questions:
>> a) What is the real cause? Is it the socket error enfile? Do both problems
>> have the same root cause?
>> b) Is there a bug in Erlang? I guess this should not lead to a crash.
>> c) How do you avoid this problem on systems you do not control yourself?
>>
>>
>> Problem #1:
>> ###########
>>
>> Just compile the following code and run it with sasl enabled and the
>> following command:
>> tcp_test:test(1000).
>> and - yes - without anything listening on port 2222. And sometimes you
>> have to try two times!
>>
>> -------------------------- start: tcp_test.erl --------------------------
>> -module(tcp_test).
>>
>> -export([test/1, test_con/0]).
>>
>> -define(DEF_PORT, 2222).
>> -define(DEF_IP, {127,0,0,1}).
>>
>> test(0) -> ok;
>> test(HowManyProcs) ->
>> spawn(?MODULE, test_con, []),
>> test(HowManyProcs-1).
>>
>> test_con() ->
>> {ok,S} = gen_tcp:connect(?DEF_IP, ?DEF_PORT,[]),
>> gen_tcp:send(S,<<0,5,65,66,67,68,69>>),
>> receive
>> {tcp_closed, _Socket} -> ok;
>> _Msg -> gen_tcp:close(S)
>> after 500 ->
>> gen_tcp:close(S)
>> end.
>> -------------------------- end: tcp_test.erl --------------------------
>>
>> It just spawns a bunch of processes all trying to connect to a currently
>> closed port and sending some garbage there. This is what happens:
>>
>> -------------------------- start: log tcp_test.erl
>> --------------------------
>> =ERROR REPORT==== 12-Sep-2008::15:28:47 ===
>> Error in process <0.41.0> with exit value:
>> {{badmatch,{error,econnrefused}},[{tcp_test,test_con,0}]}
>>
>> [... a couple of them but usually between 1 and 20.]
>>
>> =ERROR REPORT==== 12-Sep-2008::15:28:47 ===
>> Error in process <0.103.0> with exit value:
>> {{badmatch,{error,econnrefused}},[{tcp_test,test_con,0}]}
>>
>>
>> Crash dump was written to: erl_crash.dump
>> Inconsistent, why isnt io reported?
>>
>> Abnormal termination
>> -------------------------- end: log tcp_test.erl
>> --------------------------
>>
>> It might have something to do with the socket error enfile 'file table
>> overflow' but I guess it should not simply crash the emulator!?
>> Searching google for 'Inconsistent, why isnt io reported?' just gives one
>> hit to Erlang's source code.
>> I can provide the crash dump if needed. Just did not want to spam the
>> whole list with big attachments.
>> Spawning only 500 processes (tcp_test:test(500).) usually leads to a
>> crash, spawning only 200 seems to work.
>>
>>
>> Problem #2:
>> ###########
>>
>> Now let's try the same with a server answering to port 2222: Just take the
>> code from the trapexit tutorial 'Building a Non-blocking TCP server using
>> OTP principles'
>> http://trapexit.org/Building_a_Non-blocking_TCP_server_using_OTP_principles
>> Start it first and then our test module in a different erlang node as
>> described above. Now, usually the client survives (have seen crashes as
>> well!) and the server crashes in a similar way. Sometimes it survives and in
>> very rare cases you will see the following logs in the erlang server
>> instance:
>>
>> -------------------------- start: log server --------------------------
>> =ERROR REPORT==== 12-Sep-2008::12:58:56 ===
>> File operation error: system_limit. Function: get_cwd. Process:
>> code_server.
>>
>> =ERROR REPORT==== 12-Sep-2008::12:58:56 ===
>> Error in async accept: {async_accept,"file table overflow"}.
>>
>> =ERROR REPORT==== 12-Sep-2008::12:58:56 ===
>> ** Generic server tcp_listener terminating
>> ** Last message in was {inet_async,#Port<0.109>,1019,{ok,#Port<0.2141>}}
>> ** When Server state == {state,#Port<0.109>,1019,tcp_echo_fsm}
>> ** Reason for termination ==
>> ** {async_accept,"file table overflow"}
>>
>> [...]
>> -------------------------- end: log server --------------------------
>>
>> Can anyone help? Thank you!
>>
>> Regards,
>> Michael
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://www.erlang.org/mailman/listinfo/erlang-questions
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20080912/72f013f7/attachment.htm>
More information about the erlang-questions
mailing list