From mj@REDACTED Tue Jul 1 09:13:34 2008 From: mj@REDACTED (Mikkel Jensen) Date: Tue, 1 Jul 2008 09:13:34 +0200 Subject: [erlang-bugs] Bug in xmerl In-Reply-To: References: Message-ID: Is it possible for someone from the OTP team to confirm if this is a bug or not? If it is I could really use a patch :-) - Mikkel On Fri, Jun 27, 2008 at 2:57 PM, Mikkel Jensen wrote: > It seems there is a bug in xmerl when loading elements that contain numeric > character references followed by UTF-8 characters. > > Example: ? newline ? > > 1> element(1, xmerl_scan:string("\303\251 \303\251", [{encoding, > 'utf-8'}])). > {xmlElement,a,a,[], > {xmlNamespace,[],[]}, > [],1,[], > [{xmlText,[{a,1}],1,[],"\303\251",text}, > {xmlText,[{a,1}],2,[],[10,195,131,194,169],text}], > [],"/",undeclared} > > Xmerl splits the parsed value around the newline character (strange but > ok). However, the first part is encoded correctly while the second part is > garbled! > > It's worth noticing that attribute values are encoded correctly: > > 2> element(1, xmerl_scan:string("", > [{encoding, 'utf-8'}])). > {xmlElement,a,a,[], > {xmlNamespace,[],[]}, > [],1, > [{xmlAttribute,b,[],[],[],[],1,[],"\303\251 \303\251",false}], > [],[],"/",undeclared} > > - Mikkel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From erlang-questions_efine@REDACTED Wed Jul 2 01:06:52 2008 From: erlang-questions_efine@REDACTED (Edwin Fine) Date: Tue, 1 Jul 2008 19:06:52 -0400 Subject: [erlang-bugs] [BUG] OTP R12B-3: HTTP "space after headers" bug in inet_drv.c Message-ID: <6c2563b20807011606x7f00fa5bne75649182294bfba@mail.gmail.com> First of all, many thanks to Claes Wikstrom for isolating and finding the module in which this bug lives (and for doing it so quickly). I originally thought this was a Yaws bug, and emailed Claes with the symptoms. It's quite repeatable (test code at end of email). He traced the bug to inet_drv.c and emailed me back with a small test program and details, and I offered to report the bug. Then I got curious as to what it could have been, so I did some debugging. The bug is in inet_drv.c. It will cause the end of the HTTP headers to be missed if the first character following the last CRLF (the one on its "own line") is a space or tab. This causes Yaws to time out after 30 seconds waiting for more header data that will never arrive. I am sure this will be true for httpd, too. The culprit (well, sort of; if you read on you will see why) is the following code on line 8324: *if (SP(ptr2+1)) {* ptr1 = ptr2+1; len = n - plen; } else goto done; Consider the following HTTP POST data: "POST /invalid/url HTTP/1.1\r\n" "Connection: close\r\n" "Host: localhost:8000\r\n" "User-Agent: perl post\r\n" "Content-Length: 4\r\n" "Content-Type: text/xml; charset=utf-8\r\n" "\r\n" " postdata..." <-- Note this data starts with a space character When the inet_drv.c code gets to the last CRLF, it doesn't detect that it's the last CRLF (end of headers) and checks to see if the next character is a space or tab using the macro SP(ptr2+1). If there's any data following the last CRLF and it starts with a space or tab, the code as it stands now will think it's another header line and try to get more data, then time out when nothing arrives. What needed to be done was to check to see if the data (in that particular state) was an LF or CRLF standing on its own, and only check for a space following that if not. This check correctly detects the CRLF that terminates the header data. I added two lines of code to inet_drv.c to fix that error, and the Erlang test code (kindly supplied by Claes, and at the end of this email) now runs without error. I can't guarantee that the fix will work under all circumstances, and it needs to be tested thoroughly, but it's a starting point and illustrates the problem. (I would say that the HTTP header parsing in inet_drv.c probably could be beefed up a little). The patch to inet_drv.c is as follows (tabs may not be correct because I reformatted the code): 8318a8319,8323 > > /* Test needed in case buffer is in form "\r\n\srequestdata" where \s is SP or TAB */ > if (((plen == 1) && NL(ptr)) || ((plen == 2) && CRNL(ptr))) > goto done; > Claes's test code is here (slightly modified by me to make it easier to test the normal case and the "bug" case). You can run it as post:p(Host,Port,data) or post:p(Host,Port,bug). Regards, Edwin Fine ====================================== -module(post). -compile(export_all). p() -> p({127,0,0,1},8000,data). p(Host, Port, What) when What =:= bug; What =:= data -> {ok, S} = gen_tcp:connect(Host, Port, [{active, true}]), gen_tcp:send(S, select_data(What)), recloop(). recloop() -> receive X -> io:format("GOT ~p~n", [X]), recloop() after 4000 -> timeout end. select_data(What) -> case What of bug -> bug(); data -> data() end. data() -> Msg = "dfoo", H = "POST /invalid/url HTTP/1.1\r\n" "Connection: close\r\n" "Host: localhost:8000\r\n" "User-Agent: perl post\r\n" "Content-Length: 4\r\n" "Content-Type: text/xml; charset=utf-8\r\n" "\r\n", H ++ Msg. %% space bug bug() -> Msg = " foo", H = "POST /invalid/url HTTP/1.1\r\n" "Connection: close\r\n" "Host: localhost:8000\r\n" "User-Agent: perl post\r\n" "Content-Length: 4\r\n" "Content-Type: text/xml; charset=utf-8\r\n" "\r\n", H ++ Msg. -- The great enemy of the truth is very often not the lie -- deliberate, contrived and dishonest, but the myth, persistent, persuasive, and unrealistic. Belief in myths allows the comfort of opinion without the discomfort of thought. John F. Kennedy 35th president of US 1961-1963 (1917 - 1963) -------------- next part -------------- An HTML attachment was scrubbed... URL: From pfisher@REDACTED Wed Jul 2 02:24:05 2008 From: pfisher@REDACTED (Paul Fisher) Date: Tue, 01 Jul 2008 19:24:05 -0500 Subject: [erlang-bugs] erts_port[].drv_ptr == 0, when erts_port[].status not free Message-ID: <1214958245.16472.46.camel@localhost> We have a system where we run lots of linked-in driver ports that get created/used/closed frequently and sometimes very quickly. Today when several open_port/2, port_command/2 and port_close/1 cycles happened rapid succession, a SIGSEGV occurrect in erl_bif_ddl.c: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 1125235040 (LWP 12087)] 0x0000000000449712 in erl_ddll_try_unload_2 (p=0x2aaaab11fc90, name_term=659339, options=46912503328425) at beam/erl_bif_ddll.c:592 The emulator was run on a Q6600 (quad-core, 2.4Ghz), and started with +A 8, and the linked-in driver executes the bulk of its work with driver_async(). There were continuously 8 driver cycles running for 5-10 seconds before the segfault occurred. ?(gdb) where #0 0x0000000000449712 in erl_ddll_try_unload_2 (p=0x2aaaab11fc90, name_term=659339, options=46912503328425) at beam/erl_bif_ddll.c:592 #1 0x000000000052337f in process_main () at beam/beam_emu.c:2073 #2 0x000000000049c213 in sched_thread_func (vesdp=0x2ae18cb74f98) at beam/erl_process.c:741 #3 0x00000000005b6818 in thr_wrapper (vtwd=0x7fff1eb77de0) at common/ethread.c:474 #4 0x00002ae18c530f1a in start_thread () from /lib/libpthread.so.0 #5 0x00002ae18c8135d2 in clone () from /lib/libc.so.6 #6 0x0000000000000000 in ?? () So the code at the point of the SIGSEGV @ erl_bif_ddll.c:592 says: for (j = 0; j < erts_max_ports; j++) { => if (!(erts_port[j].status & FREE_PORT_FLAGS) && erts_port[j].drv_ptr->handle == dh) { It appears that the code assumes that if the erts_port array entry being evaluated during the search has a valid (non-zero) drv_ptr value, if the entry is not marked as free. At the time of the crash, this is clearly not the case: (gdb) p j $8 = 896 (gdb) p erts_port[j] $7 = {sched = {next = 0x0, prev = 0x0, taskq = 0x0, exe_taskq = 0x0}, timeout_task = {counter = 0}, refc = {counter = 2}, lock = 0x81b3c8, xports = 0x0, id = 14343, connected = 0, caller = 0, data = 0, bp = 0x0, nlinks = 0x0, monitors = 0x0, bytes_in = 0, bytes_out = 0, ptimer = 0x0, tracer_proc = 18446744073709551611, trace_flags = 0, ioq = {size = 0, v_start = 0x0, v_end = 0x0, v_head = 0x0, v_tail = 0x0, v_small = {{ iov_base = 0x0, iov_len = 0}, {iov_base = 0x0, iov_len = 0}, { iov_base = 0x0, iov_len = 0}, {iov_base = 0x0, iov_len = 0}, { iov_base = 0x0, iov_len = 0}}, b_start = 0x0, b_end = 0x0, b_head = 0x0, b_tail = 0x0, b_small = {0x0, 0x0, 0x0, 0x0, 0x0}}, dist_entry = 0x0, name = 0x0, drv_ptr = 0x0, drv_data = 0, suspended = 0x0, linebuf = 0x0, status = 4096, control_flags = 0, reg = 0x0, port_data_lock = 0x0} (gdb) p erts_port[j].drv_ptr $6 = (ErlDrvEntry *) 0x0 So the real questions are: 1) is whether the assumption built into this code is correct; and 2) if so, how did we get in the position of violating it. I'd appreciate some insight into what could be going on here, and where I should can start looking. -- paul From lars@REDACTED Wed Jul 2 09:38:58 2008 From: lars@REDACTED (Lars Thorsen) Date: Wed, 02 Jul 2008 09:38:58 +0200 Subject: [erlang-bugs] Bug in xmerl In-Reply-To: References: Message-ID: <486B3092.8060604@erix.ericsson.se> Hi, it was a bug in xmerl. The ending parenthesis in the call to string_to_char_set/2 (line 2449 in xmerl_scan)was placed wrong. This will be fixed in R12B-4 but I include some patch lines below. ------------------------- Patch start ---------------------------------- --- xmerl_scan.erl@@/main/xmerl/108 2008-04-25 09:20:41.000000000 +0200 +++ xmerl_scan.erl 2008-07-01 17:11:18.000000000 +0200 @@ -2446,7 +2446,7 @@ case markup_delimeter(ExpRef) of true -> scan_content(ExpRef++T1,S1,Pos,Name,Attrs,Space,Lang,Parents,NS,Acc,ExpRef); _ -> - scan_content(string_to_char_set(S1#xmerl_scanner.encoding,ExpRef++T1),S1,Pos,Name,Attrs,Space,Lang,Parents,NS,Acc,[]) + scan_content(string_to_char_set(S1#xmerl_scanner.encoding,ExpRef)++T1,S1,Pos,Name,Attrs,Space,Lang,Parents,NS,Acc,[]) end; scan_content("