[erlang-bugs] Erlang emulator crash, gen_tcp related (probably only under Windows)

Michael Regen michael.regen@REDACTED
Sat Sep 13 18:30:02 CEST 2008


Hi,

I run into Erlang emulator crashes when I do basic gen_tcp operations. My
code crashes with the message:
  Crash dump was written to: erl_crash.dump
  Inconsistent, why isnt io reported?
  Abnormal termination
without any significant error message before.

The problem occures on Windows XP. I am not sure whether Linux is affected
as well but short tests showed no problems there.

To reproduce the problem:

1) tcp_test.erl
Is a simple gen_tcp client which spawns processes which connect to a port,
send a few bytes, try to get the answer and close the port:

-------------------------- start: tcp_test.erl --------------------------
-module(tcp_test).

-export([test/1, test_con/0]).

-define(DEF_PORT, 2222).
-define(DEF_IP, {127,0,0,1}).

test(0) -> ok;
test(HowManyProcs) ->
  spawn(?MODULE, test_con, []),
  test(HowManyProcs-1).

test_con() ->
  {ok,S} = gen_tcp:connect(?DEF_IP, ?DEF_PORT,[]),
  gen_tcp:send(S,<<0,5,65,66,67,68,69>>),
  receive
    {tcp_closed, _Socket} -> ok;
    _Msg -> gen_tcp:close(S)
  after 500 ->
    gen_tcp:close(S)
  end.
-------------------------- end: tcp_test.erl --------------------------


2) tcp_server_app
Please just take the code from the trapexit tutorial 'Building a
Non-blocking TCP server using OTP principles'
http://trapexit.org/Building_a_Non-blocking_TCP_server_using_OTP_principles

There is a size limit on the erlang-bugs mailing list which is why I do not
send the whole code + crash dump as attachment. Please give me a note if you
want it!

To start the client:
  werl.exe
  tcp_test:test(1000).
  % 1000 is the number of processes to start. Please play with higher
numbers as well!

To start the server:
  werl.exe
  application:start(tcp_server).

I tested with running the client and server in different Erlang nodes.

You should be able to crash the emulator by:
a) only running the client without anything listening on the port 2222
b) running the client together with the server (client crashes)
c) running the client together with the server (server crashes)
d) running the client together with the server (client + server crash at the
same time)

It might take some attempts and in some cases starting the client with even
10.000 processes (tcp_test:test(10000)). But usually only very few attempts
(one) are necessary.

An interesting observation:
If you run the tests under erl.exe (instead of werl.exe) it takes
significant more processes/tries to reach the crash.
Furthermore erl.exe (or is it cmd.exe?) crashes with a:
  The exception unknown software exception (0x40000015) occured in the
application at location 0x008fff86
The location seems to be always the same.

The problem might be timing related.

Tests where done on R12B-3 and R12B-4 on Windows XP SP2 and SP3 systems.
Hardware: Athlon XP64 in 32-bit mode, 1GB Ram, Centrino Notebook 1GB Ram,
Intel E6600 dual-core in 32-bit mode, 4 GB of RAM.
So, the problem might occure on SMP and non-SMP systems.

During various tests you might see the following errors in the client node:
{{badmatch,{error,econnrefused}}    % this is expected
{{badmatch,{error,eaddrinuse}}
{{badmatch,{error,system_limit}}

If you start the server with sasl enabled you might under rare circumstances
see the following error messages as well:
-------------------------- start: log server --------------------------
=ERROR REPORT==== 12-Sep-2008::12:58:56 ===
File operation error: system_limit. Function: get_cwd. Process: code_server.

=ERROR REPORT==== 12-Sep-2008::12:58:56 ===
Error in async accept: {async_accept,"file table overflow"}.

=ERROR REPORT==== 12-Sep-2008::12:58:56 ===
** Generic server tcp_listener terminating
** Last message in was {inet_async,#Port<0.109>,1019,{ok,#Port<0.2141>}}
** When Server state == {state,#Port<0.109>,1019,tcp_echo_fsm}
** Reason for termination ==
** {async_accept,"file table overflow"}

[...]
-------------------------- end: log server --------------------------

Running out of ephemeral ports (all user ports in TIME_WAIT) should not be
the problem since it also occures with the registry key
HKEY_LOCAL_MACHINE\SYSTEM\
CurrentControlSet\Services\Tcpip\Parameters\MaxUserPort set to 60000 and
after only 2000 processes.

Local firewall did not affect the outcome (turned on/off).

There is a thread on erlang-questions which might in the future contain
additional information:
http://erlang.org/pipermail/erlang-questions/2008-September/038118.html

Thank you!

Regards,
Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20080913/b387764d/attachment.htm>


More information about the erlang-bugs mailing list