<div dir="ltr">Hi,<br><br>I run into Erlang emulator crashes when I do basic gen_tcp operations. My code crashes with the message:<br> Crash dump was written to: erl_crash.dump<br> Inconsistent, why isnt io reported?<br>
Abnormal termination<br>without any significant error message before.<br><br>The problem occures on Windows XP. I am not sure whether Linux is affected as well but short tests showed no problems there.<br><br>To reproduce the problem:<br>
<br>1) tcp_test.erl<br>Is a simple gen_tcp client which spawns processes which connect to a port, send a few bytes, try to get the answer and close the port:<br><br>-------------------------- start: tcp_test.erl --------------------------<br>
-module(tcp_test).<br><br>-export([test/1, test_con/0]).<br><br>-define(DEF_PORT, 2222).<br>-define(DEF_IP, {127,0,0,1}).<br><br>test(0) -> ok;<br>test(HowManyProcs) -><br> spawn(?MODULE, test_con, []),<br> test(HowManyProcs-1).<br>
<br>test_con() -><br> {ok,S} = gen_tcp:connect(?DEF_IP, ?DEF_PORT,[]),<br> gen_tcp:send(S,<<0,5,65,66,67,68,69>>),<br> receive<br> {tcp_closed, _Socket} -> ok;<br> _Msg -> gen_tcp:close(S)<br>
after 500 -><br> gen_tcp:close(S)<br> end.<br>-------------------------- end: tcp_test.erl --------------------------<br><br><br>2) tcp_server_app<br>Please just take the code from the trapexit tutorial 'Building a Non-blocking TCP server using OTP principles' <a href="http://trapexit.org/Building_a_Non-blocking_TCP_server_using_OTP_principles">http://trapexit.org/Building_a_Non-blocking_TCP_server_using_OTP_principles</a><br>
<br>There is a size limit on the erlang-bugs mailing list which is why I do not send the whole code + crash dump as attachment. Please give me a note if you want it!<br><br>To start the client:<br> werl.exe<br> tcp_test:test(1000).<br>
% 1000 is the number of processes to start. Please play with higher numbers as well!<br><br>To start the server:<br> werl.exe<br> application:start(tcp_server).<br><br>I tested with running the client and server in different Erlang nodes.<br>
<br>You should be able to crash the emulator by:<br>a) only running the client without anything listening on the port 2222<br>b) running the client together with the server (client crashes)<br>c) running the client together with the server (server crashes)<br>
d) running the client together with the server (client + server crash at the same time)<br><br>It might take some attempts and in some cases starting the client with even 10.000 processes (tcp_test:test(10000)). But usually only very few attempts (one) are necessary.<br>
<br>An interesting observation:<br>If you run the tests under erl.exe (instead of werl.exe) it takes significant more processes/tries to reach the crash.<br>Furthermore erl.exe (or is it cmd.exe?) crashes with a:<br> The exception unknown software exception (0x40000015) occured in the application at location 0x008fff86<br>
The location seems to be always the same.<br><br>The problem might be timing related.<br><br>Tests where done on R12B-3 and R12B-4 on Windows XP SP2 and SP3 systems. Hardware: Athlon XP64 in 32-bit mode, 1GB Ram, Centrino Notebook 1GB Ram, Intel E6600 dual-core in 32-bit mode, 4 GB of RAM.<br>
So, the problem might occure on SMP and non-SMP systems.<br><br>During various tests you might see the following errors in the client node:<br>{{badmatch,{error,econnrefused}} % this is expected<br>{{badmatch,{error,eaddrinuse}}<br>
{{badmatch,{error,system_limit}}<br><br>If you start the server with sasl enabled you might under rare circumstances see the following error messages as well:<br>-------------------------- start: log server --------------------------<br>
=ERROR REPORT==== 12-Sep-2008::12:58:56 ===<br>File operation error: system_limit. Function: get_cwd. Process: code_server.<br><br>=ERROR REPORT==== 12-Sep-2008::12:58:56 ===<br>Error in async accept: {async_accept,"file table overflow"}.<br>
<br>=ERROR REPORT==== 12-Sep-2008::12:58:56 ===<br>** Generic server tcp_listener terminating<br>** Last message in was {inet_async,#Port<0.109>,1019,{ok,#Port<0.2141>}}<br>** When Server state == {state,#Port<0.109>,1019,tcp_echo_fsm}<br>
** Reason for termination ==<br>** {async_accept,"file table overflow"}<br><br>[...]<br>-------------------------- end: log server --------------------------<br><br>Running out of ephemeral ports (all user ports in TIME_WAIT) should not be the problem since it also occures with the registry key HKEY_LOCAL_MACHINE\SYSTEM\<br>
CurrentControlSet\Services\Tcpip\Parameters\MaxUserPort set to 60000 and after only 2000 processes.<br><br>Local firewall did not affect the outcome (turned on/off).<br><br>There is a thread on erlang-questions which might in the future contain additional information:<br>
<a href="http://erlang.org/pipermail/erlang-questions/2008-September/038118.html">http://erlang.org/pipermail/erlang-questions/2008-September/038118.html</a><br><br>Thank you!<br><br>Regards,<br>Michael<br><br></div>