inets: httpd_request_handler crash

Sun Nov 15 17:16:02 CET 2009

Hello,

A while back I moved our code from mod_esi-based module to the 'raw' 
httpd:do handler because our software required a specific URL path we 
can't control (and to my understanding I can't use mod_esi to handle any 
URL I like, like, for example mod_perl does). So far so good, but 
sometimes I see following in the logs.

=CRASH REPORT==== 15-Nov-2009::15:49:39 ===
  crasher:
     initial call: httpd_request_handler:init/1
     pid: <0.12084.62>
     registered_name: []
     exception exit: {{badmatch,socket_closed},
                      [{httpd_response,send_body,3},
                       {httpd_response,generate_and_send_response,1},
                       {httpd_request_handler,handle_response,1},
                       {gen_server,handle_msg,5},
                       {proc_lib,init_p_do_apply,3}]}
       in function  gen_server:terminate/6
     ancestors: [<0.121.0>,httpd_acc_sup_1729,httpd_instance_sup_1729,
                   httpd_sup,inets_sup,<0.54.0>]
     messages: []
     links: [<0.118.0>]
     dictionary: []
     trap_exit: false
     status: running
     heap_size: 6765
     stack_size: 24
     reductions: 16438

Apparently it depends on the client behaviour somehow, because when I 
tune a loadbalancer settings related to the 'http close', frequency of 
crashes change, but they never go away completely.
Anyway, I tried to figure out what's happening and looked into the OTP 
code.
What I see there is two clauses of send_body/3 (line 180 of 
httpd_response.erl) of which one is sending a list Body and requires 
httpd_socket:deliver to return 'ok' and second is a 'functional' version 
which calls a function and then delivers the result. This functional 
version handles all cases of httpd_socket:deliver including 'ok' and _ 
in which case it just returns with 'done'.

Is there a reason for that? Apparently I'm hitting the first clause 
where I send a cooked body and httpd_socket:deliver returns 
socket_closed. Why the crash anyway? It's the exception I can't handle 
and process, since it happens deep inside httpd secret machinery. All I 
can do is just restart the failed instance with supervisor which is 
rather inefficient given our loads.

Worse, sometimes this is accompanied by
Nov 15 15:49:58 agaccel-demo2 kernel: beam.smp[5703]: segfault at 
0000000000000028 rip 000000000044d874 rsp 0000000044332e60 error 4

Centos 5.3/x86_64 R13
Erlang R13B01 (erts-5.7.2) [source] [64-bit] [smp:2:2] [rq:2] 
[async-threads:0] [hipe] [kernel-poll:false]

May be it's just a coincidence.

Thanks for all advice and help you can offer.

-- 
Best regards,
Cyril