From richardc@REDACTED  Fri Jan  2 17:47:39 2009
From: richardc@REDACTED (Richard Carlsson)
Date: Fri, 02 Jan 2009 17:47:39 +0100
Subject: [erlang-bugs] EUnit discards all output
In-Reply-To: <3E7B459FEA684B97BC8F5091EC883EEE@JTablet2007>
References: <3E7B459FEA684B97BC8F5091EC883EEE@JTablet2007>
Message-ID: <495E452B.7060709@it.uu.se>

John Hughes wrote:
> From the eunit Users' Guide:
> 
> *EUnit captures standard output*
> 
> If your test code writes to the standard output, you may be surprised to
> see that the text does not appear on the console when the tests are
> running. This is because EUnit captures all standard output from test
> functions (this also includes setup and cleanup functions, but not
> generator functions), so that it can be included in the test report if
> errors occur.
> 
> OK, it says the output CAN be included in the test report if errors
> occur, not that it WILL be--but I nevertheless expected the latter to
> happen. When I run EUnit, however, ALL output is discarded, even output
> from failing tests. Is that really the intention? Maybe I'm just doing
> something wrong here--but I have not found any documented way to turn ON
> reporting of output from failed tests.

Ah, it seems I didn't remember to actually present that information.
I.e., the presentation layer has always received the data, but didn't
bother to ever print it.

I just checked in a fix in the repository at
https://svn.process-one.net/contribs/trunk/eunit, that prints the output
(truncated if it gets too long) if there is any.

In general, EUnit bug reports can be filed at
https://support.process-one.net/browse/EUNIT.

    /Richard

-- 
 "Having users is like optimization: the wise course is to delay it."
   -- Paul Graham


From richardc@REDACTED  Fri Jan  2 23:43:41 2009
From: richardc@REDACTED (Richard Carlsson)
Date: Fri, 02 Jan 2009 23:43:41 +0100
Subject: [erlang-bugs] EUnit treats a process that kills itself as
 a	successful test
In-Reply-To: <D1EF1D3AB8FB40A38C9590E6F546BC5F@JTablet2007>
References: <D1EF1D3AB8FB40A38C9590E6F546BC5F@JTablet2007>
Message-ID: <495E989D.5020507@it.uu.se>

John Hughes wrote:
> Here's my code:
>  
> -module(eunit_example).
> -include_lib("eunit/include/eunit.hrl").
>  
> exit_test() ->
>     exit(self(),die).
>  
> In the shell, I run:
>  
> 1> c(eunit_example).
> {ok,eunit_example}
> 2> eunit_example:test().
>   Test successful.
> ok
> 
> Is that really the intention?
>  
> Likewise, this test passes:
>  
> spawn_test() ->
>     spawn_link(erlang,exit,[dying]),
>     timer:sleep(1).
>  
> (The sleep is there to allow time for the child process to die, and the
> exit signal to be propagated).
>  
> Presumably the process running the test is trapping exits--but is that
> really appropriate? As in this last example, crashes in child processes
> won't cause the test to fail.

At some point, I had the idea that it would be good to make the test
processes default to trapping exits, to make it easier to write some
kinds of process-spawning tests. (There was a comment to this effect
in the code, but it was never documented.) This was probably misguided,
so I have now checked in a change that makes the test processes
non-trapping, which will probably cause less astonishment.
(Trapping can still be enabled by the user where necessary).

One thing to remember, though, is that unless you wrap every single
test in a separate {spawn,Test}, a test process will run several
tests, one at a time. (In the simplest case only one test process
is ever spawned, and runs all the tests.) Hence, if the process dies,
this will not simply "fail" a single test, but cause all the tests that
were to be executed by that process to instead be cancelled/skipped.
Hence, if a test (or group of tests) is known to be at risk of
receiving an exit signal, it is best to wrap it in {spawn, Test},
so that the effect is isolated.

    /Richard

-- 
 "Having users is like optimization: the wise course is to delay it."
   -- Paul Graham


From yubao.liu@REDACTED  Sun Jan  4 04:15:46 2009
From: yubao.liu@REDACTED (Liu Yubao)
Date: Sun, 04 Jan 2009 11:15:46 +0800
Subject: [erlang-bugs] openssl s_client hangs when accessing https
 service in inets application
In-Reply-To: <495AE082.1050502@gmail.com>
References: <495AE082.1050502@gmail.com>
Message-ID: <496029E2.6030107@gmail.com>

Hi,

The documentation and code of inets application are not consistent,
the corresponding option in {proplist_file, path()} to "SocketType"
option in {file, path()} is "com_type", not "socket_type".

Liu Yubao wrote:
> Hi,
> 
> The https services in inets application doesn't work, I guess
> I got something wrong. Below is the steps to recur:
> 
>    a. use gen-cert.sh to generate server.pem; 
>       (All scripts and configuration are provided at
>           http://jff.googlecode.com/files/inets-https-test.tar
>       )
> 
>    b. execute runerl.sh and input these clauses in the erlang shell:
>          application:start(ssl).
>          application:start(inets).
> 
>    c. execute `openssl s_client -connect localhost:8443 -debug -msg`,
>       you can see openssl hangs after sending a CLIENT-HELLO message,
>       the TCP connection is established successfully but https server
>       doesn't response to the CLIENT-HELLO message.
> 
> 
> I tested "ssl:listen" in erlang shell and succeed to communication between
> openssl and erlang shell:
> 
>      application:start(ssl).
>      {ok, S} = ssl:listen(8443, [{certfile, "server.pem"}, {active, false}]).
>      {ok, S2} = ssl:accept(S).
>          # execute in another bash: openssl s_client -connect localhost:8443
>      ssl:send(S2, <<"hello world\n">>).
>          # "openssl s_client" can receive this greeting.
> 
> 
> I tested against the latest erlang 5.6.5 under Windows XP and 5.6.3 under
> Debian Lenny.
> 
> I'm looking forward your help!
> 
> 
> Best regards,
> 
> Liu Yubao
> 


From jack@REDACTED  Mon Jan  5 17:42:38 2009
From: jack@REDACTED (Jack Moffitt)
Date: Mon, 5 Jan 2009 09:42:38 -0700
Subject: [erlang-bugs] small documentation typo in http module
Message-ID: <9b58f4550901050842v47f57b96vb6cc80408da7a66d@mail.gmail.com>

On http://www.erlang.org/doc/man/http.html
ipv6Mode (under set_options) should be capitalized like the rest of
the types.  I misread this as an atom due to the lower casing and then
spent a little while trying to figure out why the option had no
effect.

jack.


From sedinin@REDACTED  Tue Jan  6 09:20:00 2009
From: sedinin@REDACTED (Andrey Sedinin)
Date: Tue, 6 Jan 2009 10:20:00 +0200
Subject: [erlang-bugs] Possible bug in xmerl_xsd (validating XML using XSD
	schema file).
Message-ID: <D1B77E4D-FCD7-4E75-9351-FB2B93D33A50@gmail.com>

Hi,

i guess it is a bug:

Validate XML using schema:

     {ok, State } = xmerl_xsd:process_schema("test.xsd"),
     {Entity ,_} = xmerl_scan:file("test.xml"),
     xmerl_xsd:validate(Entity, State).

Schema:

<?xml version="1.0" encoding="UTF-8"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
	elementFormDefault="qualified" attributeFormDefault="unqualified">
	
	<xs:element name="status">
		<xs:complexType>
			<xs:simpleContent>
				<xs:extension base="status-type"/>
			</xs:simpleContent>
		</xs:complexType>
	</xs:element>
	<xs:simpleType name="status-type">
		<xs:restriction base="xs:string">
			<xs:enumeration value="Valid" />
			<xs:enumeration value="Invalid" />
			<xs:enumeration value="" />
		</xs:restriction>
	</xs:simpleType>
</xs:schema>

XML:

<?xml version="1.0" encoding="UTF-8"?>
<status/>


I think it should validate. Possible values:

<status>Valid</status>
<status>Invalid</status>
<status/>

  but last one give an error:

{error,[{[],xmerl_xsd,
          {empty_content_not_allowed,[{enumeration,"Valid"},
                                      {enumeration,"Invalid"},
                                      {enumeration,[]}]}}]}


May be i wrong?
Also posted here: http://www.erlang.org/pipermail/erlang-questions/2008-December/040744.html

I use R12B-5 on Mac OS X 10.5.6.


--
Sedinin

--
???????


From c.romain@REDACTED  Thu Jan  8 02:09:22 2009
From: c.romain@REDACTED (cyril Romain)
Date: Thu, 08 Jan 2009 02:09:22 +0100
Subject: [erlang-bugs] code:load_abs/1 fails for packaged modules
In-Reply-To: <495800C5.6050709@laposte.net>
References: <495800C5.6050709@laposte.net>
Message-ID: <49655242.4020308@laposte.net>

cyril Romain wrote:
> _FixSuggestions_:
> I think in code_server.erl the load_abs/3 function should be fix so that it:
>   - Successively calls try_load_module with mymodule, to.mymodule, 
> path.to.mymodule, stopping on sucess. Not so elegant though...
>   - Calls try_load_module with mymodule (it actually does). But if the 
> module name in object code does match mymodule, try_load_module with the 
> module name found in object code. So that there is at most 2 calls of 
> try_load_module. Problem: the object code (and the module name) is read 
> by a C function (in beam_load.c) and it seems not straightforward to let 
> Erlang know about the module name read in that object code.
>   - Reading the file once, and use the module name defined within; 
> avoiding multiple call to try_load_module. Better solution, but is it 
> possible ?
>   

Here is a patch following 1st suggestion:
http://www.erlang.org/pipermail/erlang-patches/2009-January/000359.html


From ingela@REDACTED  Thu Jan  8 09:44:45 2009
From: ingela@REDACTED (Ingela Anderton Andin)
Date: Thu, 08 Jan 2009 09:44:45 +0100
Subject: [erlang-bugs] ssh:connect() documentation
In-Reply-To: <46167e6a0812221647y739015ay72d10469e5d62204@mail.gmail.com>
References: <46167e6a0812221647y739015ay72d10469e5d62204@mail.gmail.com>
Message-ID: <4965BCFD.50705@erix.ericsson.se>

Hi!

Thank you for reporting this, in the latest code (not yet released) 
however the
option name corresponds to the documentation.

Regards Ingela Erlang/OTP - Ericssson


Anton Krasovsky wrote:
> Documentation for ssh:connect() says:
>
> {connect_timeout, Milliseconds | infinity}
>     Sets the default timeout when trying to connect to.
>
> however the actual option is 'timeout'.
>
> anton
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-bugs
>
>   


From bertil.karlsson@REDACTED  Thu Jan  8 10:29:23 2009
From: bertil.karlsson@REDACTED (Bertil Karlsson)
Date: Thu, 08 Jan 2009 10:29:23 +0100
Subject: [erlang-bugs] Possible bug in xmerl_xsd (validating XML using
 XSD	schema file).
In-Reply-To: <D1B77E4D-FCD7-4E75-9351-FB2B93D33A50@gmail.com>
References: <D1B77E4D-FCD7-4E75-9351-FB2B93D33A50@gmail.com>
Message-ID: <4965C773.2040203@ericsson.com>

Hi,

this is a bug that will be fixed as soon as possible.

/Bertil

Andrey Sedinin wrote:
> Hi,
>
> i guess it is a bug:
>
> Validate XML using schema:
>
>      {ok, State } = xmerl_xsd:process_schema("test.xsd"),
>      {Entity ,_} = xmerl_scan:file("test.xml"),
>      xmerl_xsd:validate(Entity, State).
>
> Schema:
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
> 	elementFormDefault="qualified" attributeFormDefault="unqualified">
> 	
> 	<xs:element name="status">
> 		<xs:complexType>
> 			<xs:simpleContent>
> 				<xs:extension base="status-type"/>
> 			</xs:simpleContent>
> 		</xs:complexType>
> 	</xs:element>
> 	<xs:simpleType name="status-type">
> 		<xs:restriction base="xs:string">
> 			<xs:enumeration value="Valid" />
> 			<xs:enumeration value="Invalid" />
> 			<xs:enumeration value="" />
> 		</xs:restriction>
> 	</xs:simpleType>
> </xs:schema>
>
> XML:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <status/>
>
>
> I think it should validate. Possible values:
>
> <status>Valid</status>
> <status>Invalid</status>
> <status/>
>
>   but last one give an error:
>
> {error,[{[],xmerl_xsd,
>           {empty_content_not_allowed,[{enumeration,"Valid"},
>                                       {enumeration,"Invalid"},
>                                       {enumeration,[]}]}}]}
>
>
> May be i wrong?
> Also posted here: http://www.erlang.org/pipermail/erlang-questions/2008-December/040744.html
>
> I use R12B-5 on Mac OS X 10.5.6.
>
>
> --
> Sedinin
>
> --
> ???????
>
>
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-bugs


From geoff.cant@REDACTED  Thu Jan  8 15:36:16 2009
From: geoff.cant@REDACTED (Geoff Cant)
Date: Thu, 08 Jan 2009 15:36:16 +0100
Subject: [erlang-bugs] R12B-3/64bit/smp Stuck disk_log_server
Message-ID: <m1y6xlj25b.fsf@lisp.geek.nz>


Hi all, Mats Cronqvist suggested I take this one up on erlang-bugs. What
follows is a rough transcript of a debugging session in which we suspect
that the reason an ejabberd node cannot dump mnesia logs is due to the
disk_log_server process being impossibly stuck in gen_server:loop/6.

It would be good if someone could confirm for me that my reasoning is
correct (or at least plausible) that the disk_log_server is stuck, that
this is the reason why mnesia can't dump logs and that the
disk_log_server is stuck in a seemingly impossible way.

The client on whose cluster this occurred has seen this problem before,
so we may get another chance at live debugging sometime in the near
future.

I would greatly appreciate any suggestions as to additional debugging
techniques I could try if this problem recurs.

Thank you,
--Geoff Cant


The erlang version information is "Erlang (BEAM) emulator version 5.6.3
[source] [64-bit] [smp:8] [async-threads:0] [hipe] [kernel-poll:false]"
 - the stock Debian erlang-hipe-base from lenny on amd64 hardware.

(in the transcript nodenames and file paths have been slightly but
consistently rewritten to obscure some private network information)

We tried mnesia:dump_log() which hung, so we tried to figure out why.

mnesia_controller:get_workers(2000) => {workers,[],[],<0.22676.260>}

process_info(<0.22676.260>) =>
[{current_function,{gen,wait_resp_mon,3}},
 {initial_call,{mnesia_controller,dump_and_reply,2}},
 {status,waiting},
 {message_queue_len,0},
 {messages,[]},
 {links,[<0.116.0>]},
 {dictionary,[]},
 {trap_exit,false},
 {error_handler,error_handler},
 {priority,normal},
 {group_leader,<0.57.0>},
 {total_heap_size,233},
 {heap_size,233},
 {stack_size,21},
 {reductions,4311},
 {garbage_collection,[{fullsweep_after,0},{minor_gcs,0}]},
 {suspending,[]}]

Backtrace <0.22676.260>:
Program counter: 0x00007f61c0e0c2a8 (gen:wait_resp_mon/3 + 64)
CP: 0x00007f61c43645d8 (gen_server:call/3 + 160)
arity = 0

0x00007f60f844c108 Return addr 0x00007f61c43645d8 (gen_server:call/3 + 160)
y(0)     infinity
y(1)     #Ref<0.0.992.227032>
y(2)     'ejabberd@REDACTED'

0x00007f60f844c128 Return addr 0x00007f61c049f1d0 (mnesia_log:save_decision_tab/1 + 248)
y(0)     infinity
y(1)     {close_log,decision_tab}
y(2)     <0.62.0>
y(3)     Catch 0x00007f61c43645d8 (gen_server:call/3 + 160)

0x00007f60f844c150 Return addr 0x00007f61c03c6ec8 (mnesia_dumper:perform_dump/2 + 1648)
y(0)     "/fake/path/ejabberd/DECISION_TAB.TMP"
y(1)     []

0x00007f60f844c168 Return addr 0x00007f61c056bdd0 (mnesia_controller:dump_and_reply/2 + 152)
y(0)     []
y(1)     []
y(2)     []
y(3)     15
y(4)     []
y(5)     []

0x00007f60f844c1a0 Return addr 0x000000000084bd18 (<terminate process normally>)
y(0)     <0.116.0>

Here the log dumping process appears to be waiting on
gen_server:call(mnesia_monitor, {close_log,decision_tab}).

process_info(<0.62.0>) =>
[{registered_name,mnesia_monitor},
 {current_function,{disk_log,monitor_request,2}},
 {initial_call,{proc_lib,init_p,5}},
 {status,waiting},
 {message_queue_len,34},
 {messages,[{nodeup,'gc@REDACTED'},
            {nodedown,'gc@REDACTED'},
            {nodeup,'fakenode1-16-26-06@REDACTED'},
            {nodedown,'fakenode1-16-26-06@REDACTED'},
            {nodeup,'fakenode1-16-27-20@REDACTED'},
            {nodedown,'fakenode1-16-27-20@REDACTED'},
            {nodeup,'fakenode1-16-29-25@REDACTED'},
            {nodedown,'fakenode1-16-29-25@REDACTED'},
            {nodeup,'gc@REDACTED'},
            {nodedown,'gc@REDACTED'},
            {nodeup,'fakenode2-16-36-53@REDACTED'},
            {nodeup,'gc@REDACTED'},
            {nodedown,'gc@REDACTED'},
            {nodeup,'gc@REDACTED'},
            {nodedown,'gc@REDACTED'},
            {nodeup,'gc@REDACTED'},
            {nodedown,'gc@REDACTED'},
            {nodeup,'gc@REDACTED'},
            {nodedown,'gc@REDACTED'},
            {nodeup,...},
            {...}|...]},
 {links,[<6749.62.0>,<6753.62.0>,<0.111.0>,<0.22677.260>,
         <6752.104.0>,<6747.62.0>,<6748.62.0>,<0.61.0>,<6751.62.0>,
         <6750.62.0>,<0.52.0>]},
 {dictionary,[{'$ancestors',[mnesia_kernel_sup,mnesia_sup,
                             <0.58.0>]},
              {'$initial_call',{gen,init_it,
                                    [gen_server,<0.61.0>,<0.61.0>,
                                     {local,mnesia_monitor},
                                     mnesia_monitor,
                                     [<0.61.0>],
                                     [{timeout,infinity}]]}}]},
 {trap_exit,true},
 {error_handler,error_handler},
 {priority,normal},
 {group_leader,<0.57.0>},
 {total_heap_size,377},
 {heap_size,377},
 {stack_size,20},
 {reductions,2326000},
 {garbage_collection,[{fullsweep_after,0},{minor_gcs,0}]},
 {suspending,[]}]

We didn't take a backtrace of mnesia_monitor, but
{current_function,{disk_log,monitor_request,2}} led us to think that
mnesia_monitor was trying to close the decision_tab log file, so we
tried to find out which process that was. At this point,
disk_log:info(decision_tab) hung, so we tried
disk_log_server:get_log_pids(decision_tab) which gave us
{local,<0.22681.260>}.

Backtrace <0.22681.260>:
Program counter: 0x00007f61c0e0c2a8 (gen:wait_resp_mon/3 + 64)
CP: 0x00007f61c43645d8 (gen_server:call/3 + 160)
arity = 0

0x00007f60f75818a0 Return addr 0x00007f61c43645d8 (gen_server:call/3 + 160)
y(0)     infinity
y(1)     #Ref<0.0.992.227035>
y(2)     'ejabberd@REDACTED'

0x00007f60f75818c0 Return addr 0x00007f61c03b1610 (disk_log:do_exit/4 + 440)
y(0)     infinity
y(1)     {close,<0.22681.260>}
y(2)     disk_log_server
y(3)     Catch 0x00007f61c43645d8 (gen_server:call/3 + 160)

0x00007f60f75818e8 Return addr 0x00007f61c0e23fe8 (proc_lib:init_p/5 + 400)
y(0)     normal
y(1)     []
y(2)     <0.62.0>
y(3)     ok

0x00007f60f7581910 Return addr 0x000000000084bd18 (<terminate process normally>)
y(0)     Catch 0x00007f61c0e24008 (proc_lib:init_p/5 + 432)
y(1)     disk_log
y(2)     init
y(3)     [<0.70.0>,<0.71.0>]

The disk_log process for 'decision_tab' was waiting for a reply from the
disk_log_server to gen_server:call(disk_log_server, {close, self()}).

Backtrace disk_log_server:
Program counter: 0x00007f61c4365af8 (gen_server:loop/6 + 288)
CP: 0x00007f61c0e23fe8 (proc_lib:init_p/5 + 400)
arity = 0

0x00007f60fb043f78 Return addr 0x00007f61c0e23fe8 (proc_lib:init_p/5 + 400)
y(0)     []
y(1)     infinity
y(2)     disk_log_server
y(3)     {state,[]}
y(4)     disk_log_server
y(5)     <0.30.0>

0x00007f60fb043fb0 Return addr 0x000000000084bd18 (<terminate process normally>)
y(0)     Catch 0x00007f61c0e24008 (proc_lib:init_p/5 + 432)
y(1)     gen
y(2)     init_it
y(3)
[gen_server,<0.30.0>,<0.30.0>,{local,disk_log_server},disk_log_server,[],[]]

process_info(whereis(disk_log_server)) =>
[{registered_name,disk_log_server},
 {current_function,{gen_server,loop,6}},
 {initial_call,{proc_lib,init_p,5}},
 {status,waiting},
 {message_queue_len,1},
 {messages,[{'$gen_call',{<0.22681.260>,#Ref<0.0.992.227035>},
                         {close,<0.22681.260>}}]},
 {links,[<0.111.0>,<0.22677.260>,<0.22681.260>,<0.30.0>]},
 {dictionary,[{<0.111.0>,latest_log},
              {<0.22677.260>,previous_log},
              {'$ancestors',[kernel_safe_sup,kernel_sup,<0.8.0>]},
              {<0.22681.260>,decision_tab},
              {'$initial_call',{gen,init_it,
                                    [gen_server,<0.30.0>,<0.30.0>,
                                     {local,disk_log_server},
                                     disk_log_server,[],[]]}}]},
 {trap_exit,true},
 {error_handler,error_handler},
 {priority,normal},
 {group_leader,<0.7.0>},
 {total_heap_size,246},
 {heap_size,233},
 {stack_size,12},
 {reductions,2366165},
 {garbage_collection,[{fullsweep_after,0},{minor_gcs,0}]},
 {suspending,[]}]

Which appears to be doing something impossible - blocked in the receive
statement in gen_server:loop/6 with a valid message in its queue. We
used process_info to check the reductions a couple of times, but they
stayed the same at 2366165 over a period of at least a minute.

This line of investigation is all we have as the server has now been
restarted.


From raimo+erlang-bugs@REDACTED  Thu Jan  8 15:55:42 2009
From: raimo+erlang-bugs@REDACTED (Raimo Niskanen)
Date: Thu, 8 Jan 2009 15:55:42 +0100
Subject: [erlang-bugs] A bug in file:pread or not?
In-Reply-To: <bc5c855a0812281631w4dc1b21cv6e5d7439ac84a0dd@mail.gmail.com>
References: <bc5c855a0812281631w4dc1b21cv6e5d7439ac84a0dd@mail.gmail.com>
Message-ID: <20090108145542.GA15705@erix.ericsson.se>

On Mon, Dec 29, 2008 at 01:31:19AM +0100, Christian wrote:
> Sending a LocNum to file:pread/2 where the Size is zero returns eof
> rather than an empty binary.
> 
> 2> {ok, File} = file:open("transpose.erl", [binary, read, raw]).
> {ok,{file_descriptor,prim_file,{#Port<0.93>,7}}}
> 3> file:pread(File, []).
> {ok,[]}
> 4> file:pread(File, [{10,10}]).
> {ok,[<<"ranspose).">>]}
> 5> file:pread(File, [{10,1}]).
> {ok,[<<"r">>]}
> 6> file:pread(File, [{10,0}]).
> {ok,[eof]}
> 
> If I do:
> 
> 8> file:pread(File, [{10, 10}, {10, 1}, {10,0}]).
> {ok,[<<"ranspose).">>,<<"r">>,eof]}

This seems to be a bug. file:read was corrected for this
in some special case, but file:pread was forgotten then.
We will most probably fix it in a bugfix release.
It is now inconsistent since file:position followed
by file:read does not give the same as file:pread.


> 
> Then I see this syscalls being performed using 'strace':
> 
> pread64(7, "ranspose).", 10, 10)        = 10
> pread64(7, "r", 1, 10)                  = 1
> pread64(7, "", 0, 10)                   = 0
> 
> So it looks like the syscall tell you that zero bytes were read. It is
> just reported as having tried to read outside the file.
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-bugs

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB


From dgud@REDACTED  Thu Jan  8 15:55:58 2009
From: dgud@REDACTED (Dan Gudmundsson)
Date: Thu, 08 Jan 2009 15:55:58 +0100
Subject: [erlang-bugs] R12B-3/64bit/smp Stuck disk_log_server
In-Reply-To: <m1y6xlj25b.fsf@lisp.geek.nz>
References: <m1y6xlj25b.fsf@lisp.geek.nz>
Message-ID: <496613FE.8040003@erix.ericsson.se>

I have seen it on erl-q (most of us devs read that list to),
your reasoning seems valid and currently I don't have any more ideas.

I have asked the emulator guys to take a look.

/Dan "Mnesia" G

Geoff Cant wrote:
> Hi all, Mats Cronqvist suggested I take this one up on erlang-bugs. What
> follows is a rough transcript of a debugging session in which we suspect
> that the reason an ejabberd node cannot dump mnesia logs is due to the
> disk_log_server process being impossibly stuck in gen_server:loop/6.
> 
> It would be good if someone could confirm for me that my reasoning is
> correct (or at least plausible) that the disk_log_server is stuck, that
> this is the reason why mnesia can't dump logs and that the
> disk_log_server is stuck in a seemingly impossible way.
> 
> The client on whose cluster this occurred has seen this problem before,
> so we may get another chance at live debugging sometime in the near
> future.
> 
> I would greatly appreciate any suggestions as to additional debugging
> techniques I could try if this problem recurs.
> 
> Thank you,
> --Geoff Cant
> 
> 
> The erlang version information is "Erlang (BEAM) emulator version 5.6.3
> [source] [64-bit] [smp:8] [async-threads:0] [hipe] [kernel-poll:false]"
>  - the stock Debian erlang-hipe-base from lenny on amd64 hardware.
> 
> (in the transcript nodenames and file paths have been slightly but
> consistently rewritten to obscure some private network information)
> 
> We tried mnesia:dump_log() which hung, so we tried to figure out why.
> 
> mnesia_controller:get_workers(2000) => {workers,[],[],<0.22676.260>}
> 
> process_info(<0.22676.260>) =>
> [{current_function,{gen,wait_resp_mon,3}},
>  {initial_call,{mnesia_controller,dump_and_reply,2}},
>  {status,waiting},
>  {message_queue_len,0},
>  {messages,[]},
>  {links,[<0.116.0>]},
>  {dictionary,[]},
>  {trap_exit,false},
>  {error_handler,error_handler},
>  {priority,normal},
>  {group_leader,<0.57.0>},
>  {total_heap_size,233},
>  {heap_size,233},
>  {stack_size,21},
>  {reductions,4311},
>  {garbage_collection,[{fullsweep_after,0},{minor_gcs,0}]},
>  {suspending,[]}]
> 
> Backtrace <0.22676.260>:
> Program counter: 0x00007f61c0e0c2a8 (gen:wait_resp_mon/3 + 64)
> CP: 0x00007f61c43645d8 (gen_server:call/3 + 160)
> arity = 0
> 
> 0x00007f60f844c108 Return addr 0x00007f61c43645d8 (gen_server:call/3 + 160)
> y(0)     infinity
> y(1)     #Ref<0.0.992.227032>
> y(2)     'ejabberd@REDACTED'
> 
> 0x00007f60f844c128 Return addr 0x00007f61c049f1d0 (mnesia_log:save_decision_tab/1 + 248)
> y(0)     infinity
> y(1)     {close_log,decision_tab}
> y(2)     <0.62.0>
> y(3)     Catch 0x00007f61c43645d8 (gen_server:call/3 + 160)
> 
> 0x00007f60f844c150 Return addr 0x00007f61c03c6ec8 (mnesia_dumper:perform_dump/2 + 1648)
> y(0)     "/fake/path/ejabberd/DECISION_TAB.TMP"
> y(1)     []
> 
> 0x00007f60f844c168 Return addr 0x00007f61c056bdd0 (mnesia_controller:dump_and_reply/2 + 152)
> y(0)     []
> y(1)     []
> y(2)     []
> y(3)     15
> y(4)     []
> y(5)     []
> 
> 0x00007f60f844c1a0 Return addr 0x000000000084bd18 (<terminate process normally>)
> y(0)     <0.116.0>
> 
> Here the log dumping process appears to be waiting on
> gen_server:call(mnesia_monitor, {close_log,decision_tab}).
> 
> process_info(<0.62.0>) =>
> [{registered_name,mnesia_monitor},
>  {current_function,{disk_log,monitor_request,2}},
>  {initial_call,{proc_lib,init_p,5}},
>  {status,waiting},
>  {message_queue_len,34},
>  {messages,[{nodeup,'gc@REDACTED'},
>             {nodedown,'gc@REDACTED'},
>             {nodeup,'fakenode1-16-26-06@REDACTED'},
>             {nodedown,'fakenode1-16-26-06@REDACTED'},
>             {nodeup,'fakenode1-16-27-20@REDACTED'},
>             {nodedown,'fakenode1-16-27-20@REDACTED'},
>             {nodeup,'fakenode1-16-29-25@REDACTED'},
>             {nodedown,'fakenode1-16-29-25@REDACTED'},
>             {nodeup,'gc@REDACTED'},
>             {nodedown,'gc@REDACTED'},
>             {nodeup,'fakenode2-16-36-53@REDACTED'},
>             {nodeup,'gc@REDACTED'},
>             {nodedown,'gc@REDACTED'},
>             {nodeup,'gc@REDACTED'},
>             {nodedown,'gc@REDACTED'},
>             {nodeup,'gc@REDACTED'},
>             {nodedown,'gc@REDACTED'},
>             {nodeup,'gc@REDACTED'},
>             {nodedown,'gc@REDACTED'},
>             {nodeup,...},
>             {...}|...]},
>  {links,[<6749.62.0>,<6753.62.0>,<0.111.0>,<0.22677.260>,
>          <6752.104.0>,<6747.62.0>,<6748.62.0>,<0.61.0>,<6751.62.0>,
>          <6750.62.0>,<0.52.0>]},
>  {dictionary,[{'$ancestors',[mnesia_kernel_sup,mnesia_sup,
>                              <0.58.0>]},
>               {'$initial_call',{gen,init_it,
>                                     [gen_server,<0.61.0>,<0.61.0>,
>                                      {local,mnesia_monitor},
>                                      mnesia_monitor,
>                                      [<0.61.0>],
>                                      [{timeout,infinity}]]}}]},
>  {trap_exit,true},
>  {error_handler,error_handler},
>  {priority,normal},
>  {group_leader,<0.57.0>},
>  {total_heap_size,377},
>  {heap_size,377},
>  {stack_size,20},
>  {reductions,2326000},
>  {garbage_collection,[{fullsweep_after,0},{minor_gcs,0}]},
>  {suspending,[]}]
> 
> We didn't take a backtrace of mnesia_monitor, but
> {current_function,{disk_log,monitor_request,2}} led us to think that
> mnesia_monitor was trying to close the decision_tab log file, so we
> tried to find out which process that was. At this point,
> disk_log:info(decision_tab) hung, so we tried
> disk_log_server:get_log_pids(decision_tab) which gave us
> {local,<0.22681.260>}.
> 
> Backtrace <0.22681.260>:
> Program counter: 0x00007f61c0e0c2a8 (gen:wait_resp_mon/3 + 64)
> CP: 0x00007f61c43645d8 (gen_server:call/3 + 160)
> arity = 0
> 
> 0x00007f60f75818a0 Return addr 0x00007f61c43645d8 (gen_server:call/3 + 160)
> y(0)     infinity
> y(1)     #Ref<0.0.992.227035>
> y(2)     'ejabberd@REDACTED'
> 
> 0x00007f60f75818c0 Return addr 0x00007f61c03b1610 (disk_log:do_exit/4 + 440)
> y(0)     infinity
> y(1)     {close,<0.22681.260>}
> y(2)     disk_log_server
> y(3)     Catch 0x00007f61c43645d8 (gen_server:call/3 + 160)
> 
> 0x00007f60f75818e8 Return addr 0x00007f61c0e23fe8 (proc_lib:init_p/5 + 400)
> y(0)     normal
> y(1)     []
> y(2)     <0.62.0>
> y(3)     ok
> 
> 0x00007f60f7581910 Return addr 0x000000000084bd18 (<terminate process normally>)
> y(0)     Catch 0x00007f61c0e24008 (proc_lib:init_p/5 + 432)
> y(1)     disk_log
> y(2)     init
> y(3)     [<0.70.0>,<0.71.0>]
> 
> The disk_log process for 'decision_tab' was waiting for a reply from the
> disk_log_server to gen_server:call(disk_log_server, {close, self()}).
> 
> Backtrace disk_log_server:
> Program counter: 0x00007f61c4365af8 (gen_server:loop/6 + 288)
> CP: 0x00007f61c0e23fe8 (proc_lib:init_p/5 + 400)
> arity = 0
> 
> 0x00007f60fb043f78 Return addr 0x00007f61c0e23fe8 (proc_lib:init_p/5 + 400)
> y(0)     []
> y(1)     infinity
> y(2)     disk_log_server
> y(3)     {state,[]}
> y(4)     disk_log_server
> y(5)     <0.30.0>
> 
> 0x00007f60fb043fb0 Return addr 0x000000000084bd18 (<terminate process normally>)
> y(0)     Catch 0x00007f61c0e24008 (proc_lib:init_p/5 + 432)
> y(1)     gen
> y(2)     init_it
> y(3)
> [gen_server,<0.30.0>,<0.30.0>,{local,disk_log_server},disk_log_server,[],[]]
> 
> process_info(whereis(disk_log_server)) =>
> [{registered_name,disk_log_server},
>  {current_function,{gen_server,loop,6}},
>  {initial_call,{proc_lib,init_p,5}},
>  {status,waiting},
>  {message_queue_len,1},
>  {messages,[{'$gen_call',{<0.22681.260>,#Ref<0.0.992.227035>},
>                          {close,<0.22681.260>}}]},
>  {links,[<0.111.0>,<0.22677.260>,<0.22681.260>,<0.30.0>]},
>  {dictionary,[{<0.111.0>,latest_log},
>               {<0.22677.260>,previous_log},
>               {'$ancestors',[kernel_safe_sup,kernel_sup,<0.8.0>]},
>               {<0.22681.260>,decision_tab},
>               {'$initial_call',{gen,init_it,
>                                     [gen_server,<0.30.0>,<0.30.0>,
>                                      {local,disk_log_server},
>                                      disk_log_server,[],[]]}}]},
>  {trap_exit,true},
>  {error_handler,error_handler},
>  {priority,normal},
>  {group_leader,<0.7.0>},
>  {total_heap_size,246},
>  {heap_size,233},
>  {stack_size,12},
>  {reductions,2366165},
>  {garbage_collection,[{fullsweep_after,0},{minor_gcs,0}]},
>  {suspending,[]}]
> 
> Which appears to be doing something impossible - blocked in the receive
> statement in gen_server:loop/6 with a valid message in its queue. We
> used process_info to check the reductions a couple of times, but they
> stayed the same at 2366165 over a period of at least a minute.
> 
> This line of investigation is all we have as the server has now been
> restarted.
> 
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-bugs
> 


From geoff.cant@REDACTED  Thu Jan  8 18:44:59 2009
From: geoff.cant@REDACTED (Geoff Cant)
Date: Thu, 08 Jan 2009 18:44:59 +0100
Subject: [erlang-bugs] R12B-3/64bit/smp Stuck disk_log_server
In-Reply-To: <m1y6xlj25b.fsf@lisp.geek.nz> (Geoff Cant's message of "Thu, 08
	Jan 2009 15:36:16 +0100")
References: <m1y6xlj25b.fsf@lisp.geek.nz>
Message-ID: <m1r63dites.fsf@lisp.geek.nz>


Hi all, we just took another look at the cluster and discovered another
stuck gen_server. This time we sent it a bogus message of '$ignore_me' -
the process then woke up, processed the first message in its queue (an
ejabberd internal message) and exited (as expected from the ejabberd
code). 

It appears that this bug causes processes to sometimes not get
scheduled in when they receive a message. It seems to strike randomly
and subsequent messages cause the process to be scheduled properly.

Most of the time this doesn't cause major problems as the affected
process will receive another message in the course of normal events and
will now run normally. However, sometimes this strikes the wrong process
in just the wrong way (the disk_log_server case) and we get visible
error behaviour.

This problem has been discovered on two different machines in the same
ejabberd cluster, so I don't think this is a heisenbug due to bad RAM.

We're going to try replicating this with a tsung test of the same
emulator package (http://packages.debian.org/lenny/erlang-base-hipe
1:12.b.3-dfsg-4) and then see if the same problem exists with a source
compile of R12B-5.

Thanks,
--Geoff Cant

The debugging session transcript follows.

Running
[ Pid 
 || Pid <- erlang:processes(),
    element(2, erlang:process_info(Pid, current_function)) =:= {gen_server,loop,6},
    element(2, erlang:process_info(Pid, status)) =:= waiting,
    length(element(2, erlang:process_info(Pid, message_queue))) > 0].

gave us the process <0.19313.279>:

(ejabberd@REDACTED)10> process_info(pid(0,19313,279)).
[{current_function,{gen_server,loop,6}},
 {initial_call,{proc_lib,init_p,5}},
 {status,waiting},
 {message_queue_len,1},
 {messages,[{timeout,#Ref<0.0.1009.52090>,activate}]},
 {links,[#Port<0.15757334>,<0.235.0>,#Port<0.15757327>]},
 {dictionary,[{'$ancestors',[ejabberd_receiver_sup,
                             ejabberd_sup,<0.37.0>]},
              {'$initial_call',{gen,init_it,
                                    [gen_server,<0.235.0>,<0.235.0>,ejabberd_receiver,
                                     [#Port<0.15757327>,gen_tcp,none,131072,<0.19312.279>],
                                     []]}}]},
 {trap_exit,false},
 {error_handler,error_handler},
 {priority,normal},
 {group_leader,<0.36.0>},
 {total_heap_size,987},
 {heap_size,987},
 {stack_size,12},
 {reductions,922822},
 {garbage_collection,[{fullsweep_after,0},{minor_gcs,0}]},
 {suspending,[]}]

This process stayed at {reductions,922822} for over a minute.

It was sitting on a backtrace of:
Program counter: 0x00007f3374b86af8 (gen_server:loop/6 + 288)
CP: 0x00007f3371644fe8 (proc_lib:init_p/5 + 400)
arity = 0

0x00007f32d07fd988 Return addr 0x00007f3371644fe8 (proc_lib:init_p/5 + 400)
y(0)     []
y(1)     infinity
y(2)     ejabberd_receiver
y(3)     {state,{tlssock,#Port<0.15757327>,#Port<0.15757329>},tls,{maxrate,32768,3.247837e+04,1231372801805878},<0.19312.279>,131072,{xml_stream_state,<0.19312.279>,#Port<0.15757334>,[{xmlelement,"stream:stream",[{"to","fake.domain"},{"xmlns","jabber:client"},{"xmlns:stream","http://etherx.jabber.org/streams"},{"xml:lang","de"},{"version","1.0"}],[]}],0,131072},infinity}
y(4)     <0.19313.279>
y(5)     <0.235.0>

0x00007f32d07fd9c0 Return addr 0x000000000084bd18 (<terminate process normally>)
y(0)     Catch 0x00007f3371645008 (proc_lib:init_p/5 + 432)
y(1)     gen
y(2)     init_it
y(3)     [gen_server,<0.235.0>,<0.235.0>,ejabberd_receiver,[#Port<0.15757327>,gen_tcp,none,131072,<0.19312.279>],[]]

As this process ignores messages it doesn't understand, we sent it a
bogus message:

pid(0,19313,279) ! '$ignore_me'.

The process then logged:
=ERROR REPORT==== 2009-01-08 17:33:37 ===
E(<0.19313.279>:ejabberd_receiver:264): ejabberd_reciever:activate_socket missed the tcp_closed event

before exiting. This is the expected behaviour on receiving a message
like {timeout,#Ref<0.0.1009.52090>,activate} - the one in the queue
while it was stuck before we sent the '$ignore_me' message.

So, it appears that this bug causes processes to sometimes not get
scheduled in when they receive a message. It seems to strike randomly
and subsequent messages cause the process to be scheduled properly.


From masse@REDACTED  Thu Jan  8 21:47:58 2009
From: masse@REDACTED (mats cronqvist)
Date: Thu, 08 Jan 2009 21:47:58 +0100
Subject: [erlang-bugs] R12B-3/64bit/smp Stuck disk_log_server
In-Reply-To: <m1r63dites.fsf@lisp.geek.nz> (Geoff Cant's message of "Thu\,
	08 Jan 2009 18\:44\:59 +0100")
References: <m1y6xlj25b.fsf@lisp.geek.nz> <m1r63dites.fsf@lisp.geek.nz>
Message-ID: <87wsd5ikxt.fsf@dixie.cronqvi.st>

Geoff Cant <geoff.cant@REDACTED> writes:

> Hi all, we just took another look at the cluster and discovered another
> stuck gen_server. This time we sent it a bogus message of '$ignore_me' -
> the process then woke up, processed the first message in its queue (an
> ejabberd internal message) and exited (as expected from the ejabberd
> code). 
>
> It appears that this bug causes processes to sometimes not get
> scheduled in when they receive a message. It seems to strike randomly
> and subsequent messages cause the process to be scheduled properly.

  Ouch. You can't trust anyone these days. At least the message wasn't
  dropped or out of order.

  mats


From rickard.s.green@REDACTED  Mon Jan 12 13:46:22 2009
From: rickard.s.green@REDACTED (Rickard Green)
Date: Mon, 12 Jan 2009 13:46:22 +0100
Subject: [erlang-bugs] R12B-3/64bit/smp Stuck disk_log_server
In-Reply-To: <m1r63dites.fsf@lisp.geek.nz>
References: <m1y6xlj25b.fsf@lisp.geek.nz> <m1r63dites.fsf@lisp.geek.nz>
Message-ID: <496B3B9E.3070407@ericsson.com>

Hi Geoff,

I've looked at this and found a bug that may have caused this. When a 
process garbage collect another process and the process being garbage 
collected also receives a message during the garbage collect, the 
process being garbage collected can end up in the state that you described.

This kind of garbage collect only happen when someone calls the 
garbage_collect/1 BIF or when code is purged. In the case with the 
disk_log server being stuck I think we can rule out the purge, i.e., if 
it is this bug that caused your problem another process must have 
garbage collected the disk_log server via the garbage_collect/1 BIF. Do 
you have any code that may have garbage collected the disk_log server 
via the garbage_collect/1 BIF? The garbage collect may also have been 
done explicitly in the shell.

Regards,
Rickard Green, Erlang/OTP, Ericsson AB.


Geoff Cant wrote:
> Hi all, we just took another look at the cluster and discovered another
> stuck gen_server. This time we sent it a bogus message of '$ignore_me' -
> the process then woke up, processed the first message in its queue (an
> ejabberd internal message) and exited (as expected from the ejabberd
> code). 
> 
> It appears that this bug causes processes to sometimes not get
> scheduled in when they receive a message. It seems to strike randomly
> and subsequent messages cause the process to be scheduled properly.
> 
> Most of the time this doesn't cause major problems as the affected
> process will receive another message in the course of normal events and
> will now run normally. However, sometimes this strikes the wrong process
> in just the wrong way (the disk_log_server case) and we get visible
> error behaviour.
> 
> This problem has been discovered on two different machines in the same
> ejabberd cluster, so I don't think this is a heisenbug due to bad RAM.
> 
> We're going to try replicating this with a tsung test of the same
> emulator package (http://packages.debian.org/lenny/erlang-base-hipe
> 1:12.b.3-dfsg-4) and then see if the same problem exists with a source
> compile of R12B-5.
> 
> Thanks,
> --Geoff Cant
> 
> The debugging session transcript follows.
> 
> Running
> [ Pid 
>  || Pid <- erlang:processes(),
>     element(2, erlang:process_info(Pid, current_function)) =:= {gen_server,loop,6},
>     element(2, erlang:process_info(Pid, status)) =:= waiting,
>     length(element(2, erlang:process_info(Pid, message_queue))) > 0].
> 
> gave us the process <0.19313.279>:
> 
> (ejabberd@REDACTED)10> process_info(pid(0,19313,279)).
> [{current_function,{gen_server,loop,6}},
>  {initial_call,{proc_lib,init_p,5}},
>  {status,waiting},
>  {message_queue_len,1},
>  {messages,[{timeout,#Ref<0.0.1009.52090>,activate}]},
>  {links,[#Port<0.15757334>,<0.235.0>,#Port<0.15757327>]},
>  {dictionary,[{'$ancestors',[ejabberd_receiver_sup,
>                              ejabberd_sup,<0.37.0>]},
>               {'$initial_call',{gen,init_it,
>                                     [gen_server,<0.235.0>,<0.235.0>,ejabberd_receiver,
>                                      [#Port<0.15757327>,gen_tcp,none,131072,<0.19312.279>],
>                                      []]}}]},
>  {trap_exit,false},
>  {error_handler,error_handler},
>  {priority,normal},
>  {group_leader,<0.36.0>},
>  {total_heap_size,987},
>  {heap_size,987},
>  {stack_size,12},
>  {reductions,922822},
>  {garbage_collection,[{fullsweep_after,0},{minor_gcs,0}]},
>  {suspending,[]}]
> 
> This process stayed at {reductions,922822} for over a minute.
> 
> It was sitting on a backtrace of:
> Program counter: 0x00007f3374b86af8 (gen_server:loop/6 + 288)
> CP: 0x00007f3371644fe8 (proc_lib:init_p/5 + 400)
> arity = 0
> 
> 0x00007f32d07fd988 Return addr 0x00007f3371644fe8 (proc_lib:init_p/5 + 400)
> y(0)     []
> y(1)     infinity
> y(2)     ejabberd_receiver
> y(3)     {state,{tlssock,#Port<0.15757327>,#Port<0.15757329>},tls,{maxrate,32768,3.247837e+04,1231372801805878},<0.19312.279>,131072,{xml_stream_state,<0.19312.279>,#Port<0.15757334>,[{xmlelement,"stream:stream",[{"to","fake.domain"},{"xmlns","jabber:client"},{"xmlns:stream","http://etherx.jabber.org/streams"},{"xml:lang","de"},{"version","1.0"}],[]}],0,131072},infinity}
> y(4)     <0.19313.279>
> y(5)     <0.235.0>
> 
> 0x00007f32d07fd9c0 Return addr 0x000000000084bd18 (<terminate process normally>)
> y(0)     Catch 0x00007f3371645008 (proc_lib:init_p/5 + 432)
> y(1)     gen
> y(2)     init_it
> y(3)     [gen_server,<0.235.0>,<0.235.0>,ejabberd_receiver,[#Port<0.15757327>,gen_tcp,none,131072,<0.19312.279>],[]]
> 
> As this process ignores messages it doesn't understand, we sent it a
> bogus message:
> 
> pid(0,19313,279) ! '$ignore_me'.
> 
> The process then logged:
> =ERROR REPORT==== 2009-01-08 17:33:37 ===
> E(<0.19313.279>:ejabberd_receiver:264): ejabberd_reciever:activate_socket missed the tcp_closed event
> 
> before exiting. This is the expected behaviour on receiving a message
> like {timeout,#Ref<0.0.1009.52090>,activate} - the one in the queue
> while it was stuck before we sent the '$ignore_me' message.
> 
> So, it appears that this bug causes processes to sometimes not get
> scheduled in when they receive a message. It seems to strike randomly
> and subsequent messages cause the process to be scheduled properly.
> 
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-bugs
> 


From geoff.cant@REDACTED  Tue Jan 13 15:07:30 2009
From: geoff.cant@REDACTED (Geoff Cant)
Date: Tue, 13 Jan 2009 15:07:30 +0100
Subject: [erlang-bugs] R12B-3/64bit/smp Stuck disk_log_server
In-Reply-To: <496B3B9E.3070407@ericsson.com> (Rickard Green's message of "Mon, 
	12 Jan 2009 13:46:22 +0100")
References: <m1y6xlj25b.fsf@lisp.geek.nz> <m1r63dites.fsf@lisp.geek.nz>
	<496B3B9E.3070407@ericsson.com>
Message-ID: <m1fxjn47vh.fsf@lisp.geek.nz>


Hi Rickard, thank you very much - this sounds correct to me. The
customer cluster is still running a cron job that effectively does
lists:foreach(fun erlang:garbage_collect/1, erlang:processes()) every
ten minutes.

This script was introduced as a stop-gap measure when running a heavily
loaded ejabberd cluster on the 32bit VM where an out of memory condition
would take down the node and then the entire cluster due to some
problems with cross-node monitor storms. The cluster now runs on 64bit
VMs so we'll revisit the memory consumption problem and avoid using
erlang:garbage_collect/1.

We'll disable the script and see if the problem recurs.

Once again, thank you very much - I'm always very impressed by the level
of support the OTP team gives the erlang community.

Cheers,
--Geoff


Rickard Green <rickard.s.green@REDACTED> writes:

> Hi Geoff,
>
> I've looked at this and found a bug that may have caused this. When a
> process garbage collect another process and the process being garbage
> collected also receives a message during the garbage collect, the
> process being garbage collected can end up in the state that you
> described.
>
> This kind of garbage collect only happen when someone calls the
> garbage_collect/1 BIF or when code is purged. In the case with the
> disk_log server being stuck I think we can rule out the purge, i.e.,
> if it is this bug that caused your problem another process must have
> garbage collected the disk_log server via the garbage_collect/1
> BIF. Do you have any code that may have garbage collected the disk_log
> server via the garbage_collect/1 BIF? The garbage collect may also
> have been done explicitly in the shell.
>
> Regards,
> Rickard Green, Erlang/OTP, Ericsson AB.


From rickard.s.green@REDACTED  Tue Jan 13 15:25:41 2009
From: rickard.s.green@REDACTED (Rickard Green)
Date: Tue, 13 Jan 2009 15:25:41 +0100
Subject: [erlang-bugs] R12B-3/64bit/smp Stuck disk_log_server
In-Reply-To: <m1fxjn47vh.fsf@lisp.geek.nz>
References: <m1y6xlj25b.fsf@lisp.geek.nz>
	<m1r63dites.fsf@lisp.geek.nz>	<496B3B9E.3070407@ericsson.com>
	<m1fxjn47vh.fsf@lisp.geek.nz>
Message-ID: <496CA465.8080302@ericsson.com>

I'll prepare a source patch fixing the problem. I wont be able to post 
it until tomorrow, though.

Regards,
Rickard Green, Erlang/OTP, Ericsson AB.


Geoff Cant wrote:
> Hi Rickard, thank you very much - this sounds correct to me. The
> customer cluster is still running a cron job that effectively does
> lists:foreach(fun erlang:garbage_collect/1, erlang:processes()) every
> ten minutes.
> 
> This script was introduced as a stop-gap measure when running a heavily
> loaded ejabberd cluster on the 32bit VM where an out of memory condition
> would take down the node and then the entire cluster due to some
> problems with cross-node monitor storms. The cluster now runs on 64bit
> VMs so we'll revisit the memory consumption problem and avoid using
> erlang:garbage_collect/1.
> 
> We'll disable the script and see if the problem recurs.
> 
> Once again, thank you very much - I'm always very impressed by the level
> of support the OTP team gives the erlang community.
> 
> Cheers,
> --Geoff
> 
> 
> Rickard Green <rickard.s.green@REDACTED> writes:
> 
>> Hi Geoff,
>>
>> I've looked at this and found a bug that may have caused this. When a
>> process garbage collect another process and the process being garbage
>> collected also receives a message during the garbage collect, the
>> process being garbage collected can end up in the state that you
>> described.
>>
>> This kind of garbage collect only happen when someone calls the
>> garbage_collect/1 BIF or when code is purged. In the case with the
>> disk_log server being stuck I think we can rule out the purge, i.e.,
>> if it is this bug that caused your problem another process must have
>> garbage collected the disk_log server via the garbage_collect/1
>> BIF. Do you have any code that may have garbage collected the disk_log
>> server via the garbage_collect/1 BIF? The garbage collect may also
>> have been done explicitly in the shell.
>>
>> Regards,
>> Rickard Green, Erlang/OTP, Ericsson AB.
> 
> 


From rickard.s.green@REDACTED  Wed Jan 14 10:59:19 2009
From: rickard.s.green@REDACTED (Rickard Green)
Date: Wed, 14 Jan 2009 10:59:19 +0100
Subject: [erlang-bugs] R12B-3/64bit/smp Stuck disk_log_server
In-Reply-To: <496CA465.8080302@ericsson.com>
References: <m1y6xlj25b.fsf@lisp.geek.nz>
	<m1r63dites.fsf@lisp.geek.nz>	<496B3B9E.3070407@ericsson.com>
	<m1fxjn47vh.fsf@lisp.geek.nz> <496CA465.8080302@ericsson.com>
Message-ID: <496DB777.5090400@ericsson.com>

A source patch can now be downloaded:
  http://www.erlang.org/download/patches/otp_src_R12B-5_OTP-7738.patch
  http://www.erlang.org/download/patches/otp_src_R12B-5_OTP-7738.readme

Regards,
Rickard Green, Erlang/OTP, Ericsson AB.


Rickard Green wrote:
> I'll prepare a source patch fixing the problem. I wont be able to post 
> it until tomorrow, though.
> 
> Regards,
> Rickard Green, Erlang/OTP, Ericsson AB.
> 
> 
> Geoff Cant wrote:
>> Hi Rickard, thank you very much - this sounds correct to me. The
>> customer cluster is still running a cron job that effectively does
>> lists:foreach(fun erlang:garbage_collect/1, erlang:processes()) every
>> ten minutes.
>>
>> This script was introduced as a stop-gap measure when running a heavily
>> loaded ejabberd cluster on the 32bit VM where an out of memory condition
>> would take down the node and then the entire cluster due to some
>> problems with cross-node monitor storms. The cluster now runs on 64bit
>> VMs so we'll revisit the memory consumption problem and avoid using
>> erlang:garbage_collect/1.
>>
>> We'll disable the script and see if the problem recurs.
>>
>> Once again, thank you very much - I'm always very impressed by the level
>> of support the OTP team gives the erlang community.
>>
>> Cheers,
>> --Geoff
>>
>>
>> Rickard Green <rickard.s.green@REDACTED> writes:
>>
>>> Hi Geoff,
>>>
>>> I've looked at this and found a bug that may have caused this. When a
>>> process garbage collect another process and the process being garbage
>>> collected also receives a message during the garbage collect, the
>>> process being garbage collected can end up in the state that you
>>> described.
>>>
>>> This kind of garbage collect only happen when someone calls the
>>> garbage_collect/1 BIF or when code is purged. In the case with the
>>> disk_log server being stuck I think we can rule out the purge, i.e.,
>>> if it is this bug that caused your problem another process must have
>>> garbage collected the disk_log server via the garbage_collect/1
>>> BIF. Do you have any code that may have garbage collected the disk_log
>>> server via the garbage_collect/1 BIF? The garbage collect may also
>>> have been done explicitly in the shell.
>>>
>>> Regards,
>>> Rickard Green, Erlang/OTP, Ericsson AB.
>>
>>
> 


From ad.sergey@REDACTED  Wed Jan 14 22:38:39 2009
From: ad.sergey@REDACTED (Sergey S)
Date: Wed, 14 Jan 2009 13:38:39 -0800
Subject: [erlang-bugs] Segmentation fault when running HIPE-compilled code
Message-ID: <ab09a9970901141338q4ad5b777sd2120365dd600519@mail.gmail.com>

Hello.

While I was playing with +native option, I run into a bug in HIPE
which leads to segmentation fault.

To reproduce the bug just compile the code below using HIPE and run
crash:start/0. Your will see the following:

Erlang (BEAM) emulator version 5.6.5 [source] [smp:2]
[async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.6.5  (abort with ^G)
1> crash:start().
# This message will be printed only once when compiled with +native
Segmentation fault

Here is the code (don't look for intention of this example, it has not
got that):

%---------------------------------------------------
-module(crash).
-export([start/0]).

start() ->
    spawn(fun() -> init() end).

init() ->
    repeat(10, fun() -> void end),
    receive after infinity -> ok end.

repeat(0, _) ->
    ok;
repeat(N, Fun) ->
    io:format("# This message will be printed only once when compiled
with +native~n"),
    Fun(),
    repeat(N - 1, Fun). % <------ It never will be called if you use HIPE
%---------------------------------------------------

The same code compiled without +native flag works well to me. I'm
using Erlang R12B5.

When I saw that segfault, I tried to replace "receive" statement with
"timer:sleep(999999)" call, and it helped!

--
Sergey


From mikpe@REDACTED  Thu Jan 15 09:53:57 2009
From: mikpe@REDACTED (Mikael Pettersson)
Date: Thu, 15 Jan 2009 09:53:57 +0100
Subject: [erlang-bugs] Segmentation fault when running HIPE-compilled
	code
In-Reply-To: <ab09a9970901141338q4ad5b777sd2120365dd600519@mail.gmail.com>
References: <ab09a9970901141338q4ad5b777sd2120365dd600519@mail.gmail.com>
Message-ID: <18798.63909.14341.336262@harpo.it.uu.se>

Sergey S writes:
 > Hello.
 > 
 > While I was playing with +native option, I run into a bug in HIPE
 > which leads to segmentation fault.
 > 
 > To reproduce the bug just compile the code below using HIPE and run
 > crash:start/0. Your will see the following:
 > 
 > Erlang (BEAM) emulator version 5.6.5 [source] [smp:2]
 > [async-threads:0] [hipe] [kernel-poll:false]
 > 
 > Eshell V5.6.5  (abort with ^G)
 > 1> crash:start().
 > # This message will be printed only once when compiled with +native
 > Segmentation fault
 > 
 > Here is the code (don't look for intention of this example, it has not
 > got that):
 > 
 > %---------------------------------------------------
 > -module(crash).
 > -export([start/0]).
 > 
 > start() ->
 >     spawn(fun() -> init() end).
 > 
 > init() ->
 >     repeat(10, fun() -> void end),
 >     receive after infinity -> ok end.
 > 
 > repeat(0, _) ->
 >     ok;
 > repeat(N, Fun) ->
 >     io:format("# This message will be printed only once when compiled
 > with +native~n"),
 >     Fun(),
 >     repeat(N - 1, Fun). % <------ It never will be called if you use HIPE
 > %---------------------------------------------------
 > 
 > The same code compiled without +native flag works well to me. I'm
 > using Erlang R12B5.

Please give us some information about your system:
1. Which CPU type? Is it 32- or 64-bit?
2. Which C compiler and version?
3. Which OS / distribution / version?


From ad.sergey@REDACTED  Thu Jan 15 11:11:03 2009
From: ad.sergey@REDACTED (Sergey S)
Date: Thu, 15 Jan 2009 02:11:03 -0800
Subject: [erlang-bugs] Segmentation fault when running HIPE-compilled
	code
In-Reply-To: <18798.63909.14341.336262@harpo.it.uu.se>
References: <ab09a9970901141338q4ad5b777sd2120365dd600519@mail.gmail.com>
	<18798.63909.14341.336262@harpo.it.uu.se>
Message-ID: <ab09a9970901150211qb104e96i8a25d1af205ddfb4@mail.gmail.com>

Hello.

I reproduced this bug on two separate computers running the same software.

> Please give us some information about your system:
> 1. Which CPU type? Is it 32- or 64-bit?
32-bit (i686)

> 2. Which C compiler and version?
GCC 4.3.2

> 3. Which OS / distribution / version?
Up-to-date Archinux i686.

--
Sergey.


From mikpe@REDACTED  Thu Jan 15 11:35:05 2009
From: mikpe@REDACTED (Mikael Pettersson)
Date: Thu, 15 Jan 2009 11:35:05 +0100
Subject: [erlang-bugs] Segmentation fault when running HIPE-compilled
	code
In-Reply-To: <ab09a9970901150211qb104e96i8a25d1af205ddfb4@mail.gmail.com>
References: <ab09a9970901141338q4ad5b777sd2120365dd600519@mail.gmail.com>
	<18798.63909.14341.336262@harpo.it.uu.se>
	<ab09a9970901150211qb104e96i8a25d1af205ddfb4@mail.gmail.com>
Message-ID: <18799.4441.149848.485106@harpo.it.uu.se>

Sergey S writes:
 > Hello.
 > 
 > I reproduced this bug on two separate computers running the same software.
 > 
 > > Please give us some information about your system:
 > > 1. Which CPU type? Is it 32- or 64-bit?
 > 32-bit (i686)
 > 
 > > 2. Which C compiler and version?
 > GCC 4.3.2
 > 
 > > 3. Which OS / distribution / version?
 > Up-to-date Archinux i686.

Ok. I'll take a look at this issue tomorrow.

/Mikael


From ville@REDACTED  Wed Jan 14 16:01:16 2009
From: ville@REDACTED (Ville Koivula)
Date: Wed, 14 Jan 2009 17:01:16 +0200
Subject: [erlang-bugs] Bug in filename:dirname?
Message-ID: <C593CADC.65F2%ville@mloon.fi>

Hi,

Why is filename:dirname working differently than UNIX equivalent?

koivula@REDACTED:~ % dirname "/foo"
/
koivula@REDACTED:~ % dirname "/foo/"
/

vs.

1> filename:dirname("/foo").
"/"
2> filename:dirname("/foo/").
"/foo"


Best regards,
Ville Koivula


From ulf@REDACTED  Fri Jan 16 11:18:49 2009
From: ulf@REDACTED (Ulf Wiger)
Date: Fri, 16 Jan 2009 11:18:49 +0100
Subject: [erlang-bugs] R12B-3/64bit/smp Stuck disk_log_server
In-Reply-To: <496B3B9E.3070407@ericsson.com>
References: <m1y6xlj25b.fsf@lisp.geek.nz> <m1r63dites.fsf@lisp.geek.nz>
	<496B3B9E.3070407@ericsson.com>
Message-ID: <8209f740901160218v5197dc94nd1510d176a545151@mail.gmail.com>

Hi Rickard,

Which versions of OTP seem to have this bug?

BR,
Ulf W

2009/1/12 Rickard Green <rickard.s.green@REDACTED>:
> Hi Geoff,
>
> I've looked at this and found a bug that may have caused this. When a
> process garbage collect another process and the process being garbage
> collected also receives a message during the garbage collect, the
> process being garbage collected can end up in the state that you described.
>
> This kind of garbage collect only happen when someone calls the
> garbage_collect/1 BIF or when code is purged. In the case with the
> disk_log server being stuck I think we can rule out the purge, i.e., if
> it is this bug that caused your problem another process must have
> garbage collected the disk_log server via the garbage_collect/1 BIF. Do
> you have any code that may have garbage collected the disk_log server
> via the garbage_collect/1 BIF? The garbage collect may also have been
> done explicitly in the shell.
>
> Regards,
> Rickard Green, Erlang/OTP, Ericsson AB.
>
>
> Geoff Cant wrote:
>> Hi all, we just took another look at the cluster and discovered another
>> stuck gen_server. This time we sent it a bogus message of '$ignore_me' -
>> the process then woke up, processed the first message in its queue (an
>> ejabberd internal message) and exited (as expected from the ejabberd
>> code).
>>
>> It appears that this bug causes processes to sometimes not get
>> scheduled in when they receive a message. It seems to strike randomly
>> and subsequent messages cause the process to be scheduled properly.
>>
>> Most of the time this doesn't cause major problems as the affected
>> process will receive another message in the course of normal events and
>> will now run normally. However, sometimes this strikes the wrong process
>> in just the wrong way (the disk_log_server case) and we get visible
>> error behaviour.
>>
>> This problem has been discovered on two different machines in the same
>> ejabberd cluster, so I don't think this is a heisenbug due to bad RAM.
>>
>> We're going to try replicating this with a tsung test of the same
>> emulator package (http://packages.debian.org/lenny/erlang-base-hipe
>> 1:12.b.3-dfsg-4) and then see if the same problem exists with a source
>> compile of R12B-5.
>>
>> Thanks,
>> --Geoff Cant
>>
>> The debugging session transcript follows.
>>
>> Running
>> [ Pid
>>  || Pid <- erlang:processes(),
>>     element(2, erlang:process_info(Pid, current_function)) =:= {gen_server,loop,6},
>>     element(2, erlang:process_info(Pid, status)) =:= waiting,
>>     length(element(2, erlang:process_info(Pid, message_queue))) > 0].
>>
>> gave us the process <0.19313.279>:
>>
>> (ejabberd@REDACTED)10> process_info(pid(0,19313,279)).
>> [{current_function,{gen_server,loop,6}},
>>  {initial_call,{proc_lib,init_p,5}},
>>  {status,waiting},
>>  {message_queue_len,1},
>>  {messages,[{timeout,#Ref<0.0.1009.52090>,activate}]},
>>  {links,[#Port<0.15757334>,<0.235.0>,#Port<0.15757327>]},
>>  {dictionary,[{'$ancestors',[ejabberd_receiver_sup,
>>                              ejabberd_sup,<0.37.0>]},
>>               {'$initial_call',{gen,init_it,
>>                                     [gen_server,<0.235.0>,<0.235.0>,ejabberd_receiver,
>>                                      [#Port<0.15757327>,gen_tcp,none,131072,<0.19312.279>],
>>                                      []]}}]},
>>  {trap_exit,false},
>>  {error_handler,error_handler},
>>  {priority,normal},
>>  {group_leader,<0.36.0>},
>>  {total_heap_size,987},
>>  {heap_size,987},
>>  {stack_size,12},
>>  {reductions,922822},
>>  {garbage_collection,[{fullsweep_after,0},{minor_gcs,0}]},
>>  {suspending,[]}]
>>
>> This process stayed at {reductions,922822} for over a minute.
>>
>> It was sitting on a backtrace of:
>> Program counter: 0x00007f3374b86af8 (gen_server:loop/6 + 288)
>> CP: 0x00007f3371644fe8 (proc_lib:init_p/5 + 400)
>> arity = 0
>>
>> 0x00007f32d07fd988 Return addr 0x00007f3371644fe8 (proc_lib:init_p/5 + 400)
>> y(0)     []
>> y(1)     infinity
>> y(2)     ejabberd_receiver
>> y(3)     {state,{tlssock,#Port<0.15757327>,#Port<0.15757329>},tls,{maxrate,32768,3.247837e+04,1231372801805878},<0.19312.279>,131072,{xml_stream_state,<0.19312.279>,#Port<0.15757334>,[{xmlelement,"stream:stream",[{"to","fake.domain"},{"xmlns","jabber:client"},{"xmlns:stream","http://etherx.jabber.org/streams"},{"xml:lang","de"},{"version","1.0"}],[]}],0,131072},infinity}
>> y(4)     <0.19313.279>
>> y(5)     <0.235.0>
>>
>> 0x00007f32d07fd9c0 Return addr 0x000000000084bd18 (<terminate process normally>)
>> y(0)     Catch 0x00007f3371645008 (proc_lib:init_p/5 + 432)
>> y(1)     gen
>> y(2)     init_it
>> y(3)     [gen_server,<0.235.0>,<0.235.0>,ejabberd_receiver,[#Port<0.15757327>,gen_tcp,none,131072,<0.19312.279>],[]]
>>
>> As this process ignores messages it doesn't understand, we sent it a
>> bogus message:
>>
>> pid(0,19313,279) ! '$ignore_me'.
>>
>> The process then logged:
>> =ERROR REPORT==== 2009-01-08 17:33:37 ===
>> E(<0.19313.279>:ejabberd_receiver:264): ejabberd_reciever:activate_socket missed the tcp_closed event
>>
>> before exiting. This is the expected behaviour on receiving a message
>> like {timeout,#Ref<0.0.1009.52090>,activate} - the one in the queue
>> while it was stuck before we sent the '$ignore_me' message.
>>
>> So, it appears that this bug causes processes to sometimes not get
>> scheduled in when they receive a message. It seems to strike randomly
>> and subsequent messages cause the process to be scheduled properly.
>>
>> _______________________________________________
>> erlang-bugs mailing list
>> erlang-bugs@REDACTED
>> http://www.erlang.org/mailman/listinfo/erlang-bugs
>>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-bugs
>


From rickard.s.green@REDACTED  Fri Jan 16 12:11:51 2009
From: rickard.s.green@REDACTED (Rickard Green S)
Date: Fri, 16 Jan 2009 12:11:51 +0100
Subject: [erlang-bugs] R12B-3/64bit/smp Stuck disk_log_server
References: <m1y6xlj25b.fsf@lisp.geek.nz> <m1r63dites.fsf@lisp.geek.nz>
	<496B3B9E.3070407@ericsson.com>
	<8209f740901160218v5197dc94nd1510d176a545151@mail.gmail.com>
Message-ID: <1E5CB28D9F205A4CA167F2173F2C693401260CAA@esealmw115.eemea.ericsson.se>

Unfortunately all versions of the smp emulator. R11 as well as R12.
 
Regards,
Rickard
 
Rickard Green, Erlang/OTP, Ericsson AB.

________________________________

Fr?n: ulf.wiger@REDACTED genom Ulf Wiger
Skickat: fr 2009-01-16 11:18
Till: Rickard Green S
Kopia: Geoff Cant; erlang-bugs@REDACTED
?mne: Re: [erlang-bugs] R12B-3/64bit/smp Stuck disk_log_server


Hi Rickard,

Which versions of OTP seem to have this bug?

BR,
Ulf W

2009/1/12 Rickard Green <rickard.s.green@REDACTED>:
> Hi Geoff,
>
> I've looked at this and found a bug that may have caused this. When a
> process garbage collect another process and the process being garbage
> collected also receives a message during the garbage collect, the
> process being garbage collected can end up in the state that you described.
>
> This kind of garbage collect only happen when someone calls the
> garbage_collect/1 BIF or when code is purged. In the case with the
> disk_log server being stuck I think we can rule out the purge, i.e., if
> it is this bug that caused your problem another process must have
> garbage collected the disk_log server via the garbage_collect/1 BIF. Do
> you have any code that may have garbage collected the disk_log server
> via the garbage_collect/1 BIF? The garbage collect may also have been
> done explicitly in the shell.
>
> Regards,
> Rickard Green, Erlang/OTP, Ericsson AB.
>
>
> Geoff Cant wrote:
>> Hi all, we just took another look at the cluster and discovered another
>> stuck gen_server. This time we sent it a bogus message of '$ignore_me' -
>> the process then woke up, processed the first message in its queue (an
>> ejabberd internal message) and exited (as expected from the ejabberd
>> code).
>>
>> It appears that this bug causes processes to sometimes not get
>> scheduled in when they receive a message. It seems to strike randomly
>> and subsequent messages cause the process to be scheduled properly.
>>
>> Most of the time this doesn't cause major problems as the affected
>> process will receive another message in the course of normal events and
>> will now run normally. However, sometimes this strikes the wrong process
>> in just the wrong way (the disk_log_server case) and we get visible
>> error behaviour.
>>
>> This problem has been discovered on two different machines in the same
>> ejabberd cluster, so I don't think this is a heisenbug due to bad RAM.
>>
>> We're going to try replicating this with a tsung test of the same
>> emulator package (http://packages.debian.org/lenny/erlang-base-hipe
>> 1:12.b.3-dfsg-4) and then see if the same problem exists with a source
>> compile of R12B-5.
>>
>> Thanks,
>> --Geoff Cant
>>
>> The debugging session transcript follows.
>>
>> Running
>> [ Pid
>>  || Pid <- erlang:processes(),
>>     element(2, erlang:process_info(Pid, current_function)) =:= {gen_server,loop,6},
>>     element(2, erlang:process_info(Pid, status)) =:= waiting,
>>     length(element(2, erlang:process_info(Pid, message_queue))) > 0].
>>
>> gave us the process <0.19313.279>:
>>
>> (ejabberd@REDACTED)10> process_info(pid(0,19313,279)).
>> [{current_function,{gen_server,loop,6}},
>>  {initial_call,{proc_lib,init_p,5}},
>>  {status,waiting},
>>  {message_queue_len,1},
>>  {messages,[{timeout,#Ref<0.0.1009.52090>,activate}]},
>>  {links,[#Port<0.15757334>,<0.235.0>,#Port<0.15757327>]},
>>  {dictionary,[{'$ancestors',[ejabberd_receiver_sup,
>>                              ejabberd_sup,<0.37.0>]},
>>               {'$initial_call',{gen,init_it,
>>                                     [gen_server,<0.235.0>,<0.235.0>,ejabberd_receiver,
>>                                      [#Port<0.15757327>,gen_tcp,none,131072,<0.19312.279>],
>>                                      []]}}]},
>>  {trap_exit,false},
>>  {error_handler,error_handler},
>>  {priority,normal},
>>  {group_leader,<0.36.0>},
>>  {total_heap_size,987},
>>  {heap_size,987},
>>  {stack_size,12},
>>  {reductions,922822},
>>  {garbage_collection,[{fullsweep_after,0},{minor_gcs,0}]},
>>  {suspending,[]}]
>>
>> This process stayed at {reductions,922822} for over a minute.
>>
>> It was sitting on a backtrace of:
>> Program counter: 0x00007f3374b86af8 (gen_server:loop/6 + 288)
>> CP: 0x00007f3371644fe8 (proc_lib:init_p/5 + 400)
>> arity = 0
>>
>> 0x00007f32d07fd988 Return addr 0x00007f3371644fe8 (proc_lib:init_p/5 + 400)
>> y(0)     []
>> y(1)     infinity
>> y(2)     ejabberd_receiver
>> y(3)     {state,{tlssock,#Port<0.15757327>,#Port<0.15757329>},tls,{maxrate,32768,3.247837e+04,1231372801805878},<0.19312.279>,131072,{xml_stream_state,<0.19312.279>,#Port<0.15757334>,[{xmlelement,"stream:stream",[{"to","fake.domain"},{"xmlns","jabber:client"},{"xmlns:stream","http://etherx.jabber.org/streams"},{"xml:lang","de"},{"version","1.0"}],[]}],0,131072},infinity}
>> y(4)     <0.19313.279>
>> y(5)     <0.235.0>
>>
>> 0x00007f32d07fd9c0 Return addr 0x000000000084bd18 (<terminate process normally>)
>> y(0)     Catch 0x00007f3371645008 (proc_lib:init_p/5 + 432)
>> y(1)     gen
>> y(2)     init_it
>> y(3)     [gen_server,<0.235.0>,<0.235.0>,ejabberd_receiver,[#Port<0.15757327>,gen_tcp,none,131072,<0.19312.279>],[]]
>>
>> As this process ignores messages it doesn't understand, we sent it a
>> bogus message:
>>
>> pid(0,19313,279) ! '$ignore_me'.
>>
>> The process then logged:
>> =ERROR REPORT==== 2009-01-08 17:33:37 ===
>> E(<0.19313.279>:ejabberd_receiver:264): ejabberd_reciever:activate_socket missed the tcp_closed event
>>
>> before exiting. This is the expected behaviour on receiving a message
>> like {timeout,#Ref<0.0.1009.52090>,activate} - the one in the queue
>> while it was stuck before we sent the '$ignore_me' message.
>>
>> So, it appears that this bug causes processes to sometimes not get
>> scheduled in when they receive a message. It seems to strike randomly
>> and subsequent messages cause the process to be scheduled properly.
>>
>> _______________________________________________
>> erlang-bugs mailing list
>> erlang-bugs@REDACTED
>> http://www.erlang.org/mailman/listinfo/erlang-bugs
>>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-bugs
>


From mikpe@REDACTED  Fri Jan 16 18:43:45 2009
From: mikpe@REDACTED (Mikael Pettersson)
Date: Fri, 16 Jan 2009 18:43:45 +0100
Subject: [erlang-bugs] Segmentation fault when running HIPE-compilled
	code
In-Reply-To: <18799.4441.149848.485106@harpo.it.uu.se>
References: <ab09a9970901141338q4ad5b777sd2120365dd600519@mail.gmail.com>
	<18798.63909.14341.336262@harpo.it.uu.se>
	<ab09a9970901150211qb104e96i8a25d1af205ddfb4@mail.gmail.com>
	<18799.4441.149848.485106@harpo.it.uu.se>
Message-ID: <18800.51025.11729.439954@harpo.it.uu.se>

Mikael Pettersson writes:
 > Sergey S writes:
 >  > Hello.
 >  > 
 >  > I reproduced this bug on two separate computers running the same software.
 >  > 
 >  > > Please give us some information about your system:
 >  > > 1. Which CPU type? Is it 32- or 64-bit?
 >  > 32-bit (i686)
 >  > 
 >  > > 2. Which C compiler and version?
 >  > GCC 4.3.2
 >  > 
 >  > > 3. Which OS / distribution / version?
 >  > Up-to-date Archinux i686.
 > 
 > Ok. I'll take a look at this issue tomorrow.

I've been able to reproduce the bug, and it's memory corruption
caused by an invalid optimisation performed by the compiler.
I've notified the rest of the HiPE team about the issue and
hopefully someone will know how to fix it (unfortunately it's
in a part of the compiler I'm not familiar with).

The combination of having a 'receive after infinity' after a
heap allocation (the fun expression) is what's triggering the
bug, so if you can put them in separate functions or move this
one function to a non-natively compiled module you should be
able to work around the bug for the time being.


From ad.sergey@REDACTED  Sat Jan 17 00:23:39 2009
From: ad.sergey@REDACTED (Sergey S)
Date: Fri, 16 Jan 2009 15:23:39 -0800
Subject: [erlang-bugs] Segmentation fault when running HIPE-compilled
	code
In-Reply-To: <18800.51025.11729.439954@harpo.it.uu.se>
References: <ab09a9970901141338q4ad5b777sd2120365dd600519@mail.gmail.com>
	<18798.63909.14341.336262@harpo.it.uu.se>
	<ab09a9970901150211qb104e96i8a25d1af205ddfb4@mail.gmail.com>
	<18799.4441.149848.485106@harpo.it.uu.se>
	<18800.51025.11729.439954@harpo.it.uu.se>
Message-ID: <ab09a9970901161523n752177e3l74dbb8912cfd829d@mail.gmail.com>

Hello.

> Mikael Pettersson writes:
>
> I've notified the rest of the HiPE team about the issue and
> hopefully someone will know how to fix it (unfortunately it's
> in a part of the compiler I'm not familiar with).

Thanks for that! I believe the people who are writing HiPE will fix that!

--
Sergey


From pguyot@REDACTED  Mon Jan 19 23:11:41 2009
From: pguyot@REDACTED (Paul Guyot)
Date: Mon, 19 Jan 2009 23:11:41 +0100
Subject: [erlang-bugs] Allow C nodes to be visible
Message-ID: <F53566D2-4CF5-4D0C-91B7-AE6D0360A4FB@kallisys.net>

Hello,

C nodes have limited functionalities. Some capabilities can be  
implemented using the current ei_connect interface, but these  
capabilities are not advertised when the node is connected to other  
nodes. In particular, C nodes are always hidden, and this prevents the  
inclusion of a global name service or of process groups, among other  
things.

The attached patch is a minimal change to allow visible C nodes. The  
ei_connect and ei_accept are provided with a new variant that takes a  
flags parameter that determines which capabilities are to be exposed  
upon connection to a particular node.

The new functions are:

int ei_connect_tmo_flags(ei_cnode* ec, char *nodename, unsigned ms,  
unsigned flags);
int ei_xconnect_tmo_flags(ei_cnode* ec, Erl_IpAddr adr, char  
*alivename, unsigned ms, unsigned flags);
int ei_accept_tmo_flags(ei_cnode* ec, int lfd, ErlConnect *conp,  
unsigned ms, unsigned flags);

All other functions have exactly the same behavior. In particular, the  
default flags are passed as they used to be whenever the original  
functions are used. To make a C node visible, the caller just needs to  
pass the default set of flags or-ed with DFLAG_PUBLISHED.

I realize the proposed patch is minimalistic and the interface is very  
rough. There are two reasons. First, providing flags to ei_connect*  
and ei_accept* functions is experimental. It has many consequences,  
and a simple "visible" boolean would probably make the API user  
believe that making a C node visible is as simple as passing true,  
which of course it isn't. Second, a full implementation of the  
required protocols, including the global protocol, in C, would imply a  
large development we do not plan to undertake (our implementation is  
not in C) and a high maintenance cost. As a result, it seemed that a  
minimalistic, backward compatible and coherent change, is the easier  
to include upstream. Yet, we believe that such a patch would be useful  
for parallel efforts that consists in bridging erlang with other  
languages (e.g. .NET).

Regards,

Paul
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch-connect-flags.diff
Type: application/octet-stream
Size: 5671 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20090119/f97b8611/attachment.obj>
-------------- next part --------------


From ad.sergey@REDACTED  Tue Jan 20 00:41:43 2009
From: ad.sergey@REDACTED (Sergey S)
Date: Mon, 19 Jan 2009 15:41:43 -0800
Subject: [erlang-bugs] A small omission in OTP Design Principles example
	6.2.1
Message-ID: <ab09a9970901191541w102c5ba0yb87d0963000f93dc@mail.gmail.com>

Hello.

I don't think it will be the most important comment, but...

I think that example illustrating the approach to create OTP
compatible processes by using proc_lib and sys modules should contain
system_code_change/4.

"erl -man sys" says "The Module must export system_continue/3,
system_terminate/4, and system_code_change/4 (see below)."

--
Sergey


From mikpe@REDACTED  Wed Jan 21 21:33:39 2009
From: mikpe@REDACTED (Mikael Pettersson)
Date: Wed, 21 Jan 2009 21:33:39 +0100
Subject: [erlang-bugs] Segmentation fault when running HIPE-compilled
	code
In-Reply-To: <ab09a9970901141338q4ad5b777sd2120365dd600519@mail.gmail.com>
References: <ab09a9970901141338q4ad5b777sd2120365dd600519@mail.gmail.com>
Message-ID: <18807.34467.894973.40219@harpo.it.uu.se>

Sergey S writes:
 > Hello.
 > 
 > While I was playing with +native option, I run into a bug in HIPE
 > which leads to segmentation fault.
 > 
 > To reproduce the bug just compile the code below using HIPE and run
 > crash:start/0. Your will see the following:
 > 
 > Erlang (BEAM) emulator version 5.6.5 [source] [smp:2]
 > [async-threads:0] [hipe] [kernel-poll:false]
 > 
 > Eshell V5.6.5  (abort with ^G)
 > 1> crash:start().
 > # This message will be printed only once when compiled with +native
 > Segmentation fault
 > 
 > Here is the code (don't look for intention of this example, it has not
 > got that):
 > 
 > %---------------------------------------------------
 > -module(crash).
 > -export([start/0]).
 > 
 > start() ->
 >     spawn(fun() -> init() end).
 > 
 > init() ->
 >     repeat(10, fun() -> void end),
 >     receive after infinity -> ok end.
 > 
 > repeat(0, _) ->
 >     ok;
 > repeat(N, Fun) ->
 >     io:format("# This message will be printed only once when compiled
 > with +native~n"),
 >     Fun(),
 >     repeat(N - 1, Fun). % <------ It never will be called if you use HIPE
 > %---------------------------------------------------
 > 
 > The same code compiled without +native flag works well to me. I'm
 > using Erlang R12B5.

Thanks for reporting this bug. Here's the patch fixing this bug for R12B-5.

There was an omission in the liveness information for the native compiler's
RTL intermediate representation, which in this very specific case caused it
to lose the fact that the heap pointer register was live into the recursive
function calls, which in turn caused the 'fun() -> void end' function closure
object to be corrupted.

/Mikael Pettersson
The HiPE group

--- otp_src_R12B-5/lib/hipe/amd64/hipe_amd64_registers.erl.~1~	2007-11-26 19:59:44.000000000 +0100
+++ otp_src_R12B-5/lib/hipe/amd64/hipe_amd64_registers.erl	2009-01-21 14:54:23.000000000 +0100
@@ -268,8 +268,7 @@ tailcall_clobbered() ->		% tailcall crap
   | fp_call_clobbered()].
 
 live_at_return() ->
-    [{?RAX,tagged}
-     ,{?RSP,untagged}
+    [{?RSP,untagged}
      ,{?PROC_POINTER,untagged}
      ,{?FCALLS,untagged}
      ,{?HEAP_LIMIT,untagged}
--- otp_src_R12B-5/lib/hipe/rtl/hipe_rtl.erl.~1~	2008-11-04 11:51:39.000000000 +0100
+++ otp_src_R12B-5/lib/hipe/rtl/hipe_rtl.erl	2009-01-21 14:54:36.000000000 +0100
@@ -882,15 +882,17 @@ args(I) ->
     #alub{} -> [alub_src1(I), alub_src2(I)];
     #branch{} -> [branch_src1(I), branch_src2(I)];
     #call{} -> 
+      Args = call_arglist(I) ++ hipe_rtl_arch:call_used(),
       case call_is_known(I) of
-	false -> [call_fun(I)|call_arglist(I)];
-	true -> call_arglist(I)
+	false -> [call_fun(I) | Args];
+	true -> Args
       end;
     #comment{} -> [];
     #enter{} ->
+      Args = enter_arglist(I) ++ hipe_rtl_arch:tailcall_used(),
       case enter_is_known(I) of
-	false -> hipe_rtl_arch:add_ra_reg([enter_fun(I)|enter_arglist(I)]);
-	true -> hipe_rtl_arch:add_ra_reg(enter_arglist(I))
+	false -> [enter_fun(I) | Args];
+	true -> Args
       end;
     #fconv{} -> [fconv_src(I)];
     #fixnumop{} -> [fixnumop_src(I)];
@@ -910,7 +912,7 @@ args(I) ->
     #move{} -> [move_src(I)];
     #multimove{} -> multimove_srclist(I);
     #phi{} -> phi_args(I);
-    #return{} -> hipe_rtl_arch:add_ra_reg(return_varlist(I));
+    #return{} -> return_varlist(I) ++ hipe_rtl_arch:return_used();
     #store{} -> [store_base(I), store_offset(I), store_src(I)];
     #switch{} -> [switch_src(I)]
   end.
@@ -924,7 +926,7 @@ defines(Instr) ->
 	   #alu{} -> [alu_dst(Instr)];
 	   #alub{} -> [alub_dst(Instr)];
 	   #branch{} -> [];
-	   #call{} -> call_dstlist(Instr);
+	   #call{} -> call_dstlist(Instr) ++ hipe_rtl_arch:call_defined();
 	   #comment{} -> [];
 	   #enter{} -> [];
 	   #fconv{} -> [fconv_dst(Instr)];
@@ -990,7 +992,7 @@ subst_uses(Subst, I) ->
       end;
     #comment{} ->
       I;
-    #enter{} -> %% XXX: Check why ra_reg is added in uses() but not updated here
+    #enter{} ->
       case enter_is_known(I) of
 	false -> 
 	  I0 = enter_fun_update(I, subst1(Subst, enter_fun(I))),
--- otp_src_R12B-5/lib/hipe/rtl/hipe_rtl_arch.erl.~1~	2008-06-10 14:47:41.000000000 +0200
+++ otp_src_R12B-5/lib/hipe/rtl/hipe_rtl_arch.erl	2009-01-21 14:56:26.000000000 +0100
@@ -22,9 +22,12 @@
 	 heap_pointer/0,
 	 heap_limit/0,
 	 fcalls/0,
-	 add_ra_reg/1,
 	 reg_name/1,
 	 is_precoloured/1,
+	 call_defined/0,
+	 call_used/0,
+	 tailcall_used/0,
+	 return_used/0,
 	 live_at_return/0,
 	 endianess/0,
 	 load_big_2/4,
@@ -164,22 +167,6 @@ fcalls_from_pcb() ->
   Reg = hipe_rtl:mk_new_reg(),
   {pcb_load(Reg, ?P_FCALLS), Reg, pcb_store(?P_FCALLS, Reg)}.
 
--spec(add_ra_reg/1 :: ([X]) -> [X]).
-
-add_ra_reg(Rest) ->
-  case get(hipe_target_arch) of
-    ultrasparc ->
-      [hipe_rtl:mk_reg(hipe_sparc_registers:return_address()) | Rest];
-    powerpc ->
-      Rest;	% do not include LR: it's not a normal register
-    arm ->
-      [hipe_rtl:mk_reg(hipe_arm_registers:lr()) | Rest];
-    x86 ->
-      Rest;
-    amd64 ->
-      Rest
-  end.
-
 reg_name(Reg) ->
   case get(hipe_target_arch) of
     ultrasparc ->
@@ -225,6 +212,18 @@ is_precolored_regnum(RegNum) ->
       hipe_amd64_registers:is_precoloured(RegNum)
   end.
 
+call_defined() ->
+  call_used().
+
+call_used() ->
+  live_at_return().
+
+tailcall_used() ->
+  call_used().
+
+return_used() ->
+  tailcall_used().
+
 live_at_return() ->
   case get(hipe_target_arch) of
     ultrasparc ->
--- otp_src_R12B-5/lib/hipe/x86/hipe_x86_registers.erl.~1~	2007-11-26 19:58:49.000000000 +0100
+++ otp_src_R12B-5/lib/hipe/x86/hipe_x86_registers.erl	2009-01-21 14:53:33.000000000 +0100
@@ -224,14 +224,8 @@ all_x87_pseudos() ->
    {4,double}, {5,double}, {6,double}].
 
 live_at_return() ->
-    [{?EAX,tagged}
-     %% XXX: should the following (fixed) regs be included or not?
-     ,{?ESP,untagged}
+    [{?ESP,untagged}
      ,{?PROC_POINTER,untagged}
-     %% Lets try not!
-     %% If these are included they will interfere with other
-     %% temps during regalloc, but regs FCALLS and HEAP_LIMIT
-     %% don't even exist at regalloc.
      ,{?FCALLS,untagged}
      ,{?HEAP_LIMIT,untagged}
      | ?LIST_HP_LIVE_AT_RETURN


From ad.sergey@REDACTED  Wed Jan 21 23:05:18 2009
From: ad.sergey@REDACTED (Sergey S)
Date: Wed, 21 Jan 2009 14:05:18 -0800
Subject: [erlang-bugs] Segmentation fault when running HIPE-compilled
	code
In-Reply-To: <18807.34467.894973.40219@harpo.it.uu.se>
References: <ab09a9970901141338q4ad5b777sd2120365dd600519@mail.gmail.com>
	<18807.34467.894973.40219@harpo.it.uu.se>
Message-ID: <ab09a9970901211405l18be0abey96921f75a7beb90f@mail.gmail.com>

Hello.

> Here's the patch fixing this bug for R12B-5.

Thanks for the patch!

--
Sergey


From jason@REDACTED  Sat Jan 24 18:59:53 2009
From: jason@REDACTED (Jason Davies)
Date: Sat, 24 Jan 2009 17:59:53 +0000
Subject: [erlang-bugs] Bug in http:request(): No port set in
	automatically-added "Host:" header
Message-ID: <88408FFC-FD95-4189-8922-02320F087E4D@jasondavies.com>

There is a bug in inets http:request(): it automatically adds a  
"Host:" header to comply with HTTP/1.1 but it doesn't add the port  
number.  This causes 301/302 redirects to fail on servers where the  
redirect URL is generated using the "Host:" request header.

See http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.23

Thanks,
--
Jason Davies

www.jasondavies.com


From lfredlund@REDACTED  Mon Jan 26 13:43:17 2009
From: lfredlund@REDACTED (=?ISO-8859-1?Q?Lars-=C5ke_Fredlund?=)
Date: Mon, 26 Jan 2009 13:43:17 +0100
Subject: [erlang-bugs] core_lint:module/1 problem
Message-ID: <497DAFE5.9010003@fi.upm.es>

Version:   otp R12B-5 (not patched)
Problem:
    Applying core_lint:module/1 to a core erlang module generated by 
compile:file(FileSpec,[to_core,binary] (without problems)
results in the error message:
    *** Core Erlang ERROR in module schedule: illegal guard expression 
in reschedule/1

Source code and core erlang code for function attached.
(apparently the checks for correct guards are too strict for try... guards).

/Lars-Ake Fredlund

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: schedule.core
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20090126/00b11a75/attachment.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: schedule.erl
Type: text/x-erlang
Size: 1557 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20090126/00b11a75/attachment.bin>

From rvirding@REDACTED  Mon Jan 26 13:58:04 2009
From: rvirding@REDACTED (Robert Virding)
Date: Mon, 26 Jan 2009 13:58:04 +0100
Subject: [erlang-bugs] core_lint:module/1 problem
In-Reply-To: <497DAFE5.9010003@fi.upm.es>
References: <497DAFE5.9010003@fi.upm.es>
Message-ID: <3dbc6d1c0901260458x16eb57bh85e9702c93812ea4@mail.gmail.com>

2009/1/26 Lars-?ke Fredlund <lfredlund@REDACTED>

> Version:   otp R12B-5 (not patched)
> Problem:
>   Applying core_lint:module/1 to a core erlang module generated by
> compile:file(FileSpec,[to_core,binary] (without problems)
> results in the error message:
>   *** Core Erlang ERROR in module schedule: illegal guard expression in
> reschedule/1
>
> Source code and core erlang code for function attached.
> (apparently the checks for correct guards are too strict for try...
> guards).
>
> /Lars-Ake Fredlund
>

Without having looked at the actual code I can say that some of the core
support modules, core_lint and core_parse for example, don't always follow
the latest core development. This is because they are not actually used by
the compiler, it *knows* it's generated code is correct.

Another problem I had with LFE is that some of the core optimisation passes
assumed that the core module was generated in the same way as the from the
erlang compiler, in some cases they couldn't handle general core. This has
now been fixed.

Basically, both of these are due to core erlang not really being a language
in its own right but a pass in the compiler. Whether it should be like that
is another question.

Robert
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20090126/a7022aa6/attachment.htm>

From lfredlund@REDACTED  Mon Jan 26 14:38:02 2009
From: lfredlund@REDACTED (=?ISO-8859-1?Q?Lars-=C5ke_Fredlund?=)
Date: Mon, 26 Jan 2009 14:38:02 +0100
Subject: [erlang-bugs] core_lint:module/1 problem
In-Reply-To: <3dbc6d1c0901260458x16eb57bh85e9702c93812ea4@mail.gmail.com>
References: <497DAFE5.9010003@fi.upm.es>
	<3dbc6d1c0901260458x16eb57bh85e9702c93812ea4@mail.gmail.com>
Message-ID: <497DBCBA.5080106@fi.upm.es>

Robert Virding wrote:
> 2009/1/26 Lars-?ke Fredlund <lfredlund@REDACTED 
> <mailto:lfredlund@REDACTED>>
>
>     Version:   otp R12B-5 (not patched)
>     Problem:
>       Applying core_lint:module/1 to a core erlang module generated by
>     compile:file(FileSpec,[to_core,binary] (without problems)
>     results in the error message:
>       *** Core Erlang ERROR in module schedule: illegal guard
>     expression in reschedule/1
>
>     Source code and core erlang code for function attached.
>     (apparently the checks for correct guards are too strict for
>     try... guards).
>
>     /Lars-Ake Fredlund
>
>
> Without having looked at the actual code I can say that some of the 
> core support modules, core_lint and core_parse for example, don't 
> always follow the latest core development. This is because they are 
> not actually used by the compiler, it *knows* it's generated code is 
> correct.
>
> Another problem I had with LFE is that some of the core optimisation 
> passes assumed that the core module was generated in the same way as 
> the from the erlang compiler, in some cases they couldn't handle 
> general core. This has now been fixed.
>
Yes, I don't check core_lint normally, but in the interest of improving 
things for the future I submitted the bug report.

Robert, any experience on how the Core Erlang code should be written so 
that the optimisation passes optimise well?
(or does things work ok now, without any need to adapt the code 
structure to the optimisers?)

/Lars-Ake Fredlund


From bgustavsson@REDACTED  Tue Jan 27 14:40:07 2009
From: bgustavsson@REDACTED (Bjorn Gustavsson)
Date: Tue, 27 Jan 2009 14:40:07 +0100
Subject: [erlang-bugs] core_lint:module/1 problem
In-Reply-To: <497DAFE5.9010003@fi.upm.es>
References: <497DAFE5.9010003@fi.upm.es>
Message-ID: <6672d0160901270540r290b569by843ede5f14ccd53@mail.gmail.com>

2009/1/26 Lars-?ke Fredlund <lfredlund@REDACTED>:
> Version:   otp R12B-5 (not patched)
> Problem:
>   Applying core_lint:module/1 to a core erlang module generated by
> compile:file(FileSpec,[to_core,binary] (without problems)
> results in the error message:
>   *** Core Erlang ERROR in module schedule: illegal guard expression in
> reschedule/1
>
> Source code and core erlang code for function attached.
> (apparently the checks for correct guards are too strict for try... guards).

Your source code is not complete, so I can't run it through the compiler.

However, for the R13 release I have extended the test suites to also test the
core_lint pass and I have fixed all bugs exposed in core_lint as a
result of that.

/Bj?rn

-- 
Bj?rn Gustavsson, Erlang/OTP, Ericsson AB


From steven.charles.davis@REDACTED  Tue Jan 27 15:04:32 2009
From: steven.charles.davis@REDACTED (Steve Davis)
Date: Tue, 27 Jan 2009 08:04:32 -0600
Subject: [erlang-bugs] Crash on attempted use of float_to_binary/2
Message-ID: <497F1470.3000805@gmail.com>

Hi,

During my learning process I was trying to convert floats to binary, I 
inadvisedly tried the following:

to_binary(X) when is_float(X) -> erlang:float_to_binary(X, 64).

I do understand that this BIF has been removed, and it did throw an 
expected "erlang:float_to_binary/2 not defined" error BUT within 20 
seconds of running that code my PC "bluescreened" (for the first time in 
over 3 years).

I am not at all certain it's reproducible but I don't want to risk 
trying it again on my machine... but I do strongly suspect that the 
underlying c code for this bif remains, and this caused the observed result.

System details:
Erlang R12B-5/erts 5.6.5
Windows XP SP3
Acer Ferrari 3400 (a laptop)


Regards,
Steve


From mikpe@REDACTED  Tue Jan 27 17:29:54 2009
From: mikpe@REDACTED (Mikael Pettersson)
Date: Tue, 27 Jan 2009 17:29:54 +0100
Subject: [erlang-bugs] Crash on attempted use of float_to_binary/2
In-Reply-To: <497F1470.3000805@gmail.com>
References: <497F1470.3000805@gmail.com>
Message-ID: <18815.13954.191620.129693@harpo.it.uu.se>

Steve Davis writes:
 > Hi,
 > 
 > During my learning process I was trying to convert floats to binary, I 
 > inadvisedly tried the following:
 > 
 > to_binary(X) when is_float(X) -> erlang:float_to_binary(X, 64).
 > 
 > I do understand that this BIF has been removed, and it did throw an 
 > expected "erlang:float_to_binary/2 not defined" error BUT within 20 
 > seconds of running that code my PC "bluescreened" (for the first time in 
 > over 3 years).
 > 
 > I am not at all certain it's reproducible but I don't want to risk 
 > trying it again on my machine... but I do strongly suspect that the 
 > underlying c code for this bif remains, and this caused the observed result.
 > 
 > System details:
 > Erlang R12B-5/erts 5.6.5
 > Windows XP SP3
 > Acer Ferrari 3400 (a laptop)

I am unable to reproduce anything but the benign "not defined" error
with R12B-5 on Solaris 9, MacOSX 10.3, and Windows XP 64 Professional.

There is no float_to_binary of any kind in R12B-5, the only reference
to it is a documentation note that it has been removed.

A Windows bluescreen can happen due to any number of reasons, mostly
hardware or kernel/driver bugs, but a bug in the Erlang VM should not
be able to trigger it (that would in itself be a kernel bug).

IOW, I think this is pure coincidence.


From steven.charles.davis@REDACTED  Tue Jan 27 19:04:26 2009
From: steven.charles.davis@REDACTED (Steve Davis)
Date: Tue, 27 Jan 2009 12:04:26 -0600
Subject: [erlang-bugs] Crash on attempted use of float_to_binary/2
In-Reply-To: <18815.13954.191620.129693@harpo.it.uu.se>
References: <497F1470.3000805@gmail.com>
	<18815.13954.191620.129693@harpo.it.uu.se>
Message-ID: <497F4CAA.7010503@gmail.com>

Hi Mikael,

It does rather sound like a coincidence, then - perhaps something else 
is going on with my machine. I'm sorry to have wasted your time 
unnecessarily.

BR,
/s


Mikael Pettersson wrote:
> Steve Davis writes:
>  > Hi,
>  > 
>  > During my learning process I was trying to convert floats to binary, I 
>  > inadvisedly tried the following:
>  > 
>  > to_binary(X) when is_float(X) -> erlang:float_to_binary(X, 64).
>  > 
>  > I do understand that this BIF has been removed, and it did throw an 
>  > expected "erlang:float_to_binary/2 not defined" error BUT within 20 
>  > seconds of running that code my PC "bluescreened" (for the first time in 
>  > over 3 years).
>  > 
>  > I am not at all certain it's reproducible but I don't want to risk 
>  > trying it again on my machine... but I do strongly suspect that the 
>  > underlying c code for this bif remains, and this caused the observed result.
>  > 
>  > System details:
>  > Erlang R12B-5/erts 5.6.5
>  > Windows XP SP3
>  > Acer Ferrari 3400 (a laptop)
> 
> I am unable to reproduce anything but the benign "not defined" error
> with R12B-5 on Solaris 9, MacOSX 10.3, and Windows XP 64 Professional.
> 
> There is no float_to_binary of any kind in R12B-5, the only reference
> to it is a documentation note that it has been removed.
> 
> A Windows bluescreen can happen due to any number of reasons, mostly
> hardware or kernel/driver bugs, but a bug in the Erlang VM should not
> be able to trigger it (that would in itself be a kernel bug).
> 
> IOW, I think this is pure coincidence.
> 


From mikpe@REDACTED  Tue Jan 27 20:40:52 2009
From: mikpe@REDACTED (Mikael Pettersson)
Date: Tue, 27 Jan 2009 20:40:52 +0100
Subject: [erlang-bugs] Crash on attempted use of float_to_binary/2
In-Reply-To: <497F4CAA.7010503@gmail.com>
References: <497F1470.3000805@gmail.com>
	<18815.13954.191620.129693@harpo.it.uu.se>
	<497F4CAA.7010503@gmail.com>
Message-ID: <18815.25412.565359.953125@harpo.it.uu.se>

Steve Davis writes:
 > It does rather sound like a coincidence, then - perhaps something else 
 > is going on with my machine. I'm sorry to have wasted your time 
 > unnecessarily.

Don't worry about it. It's better to have a false bug report
than to miss a report on an actual bug.

/Mikael


From adam@REDACTED  Thu Jan 29 10:23:39 2009
From: adam@REDACTED (Adam Lindberg)
Date: Thu, 29 Jan 2009 09:23:39 +0000 (GMT)
Subject: [erlang-bugs] [link] Possible bug in io:fread
Message-ID: <31544426.1501233221019877.JavaMail.root@zimbra>

Not my finding, I'm only re-posting it here.

See:
http://stackoverflow.com/questions/473327/unexpected-behavior-of-iofread-in-erlang

Cheers,
Adam


From jwecker@REDACTED  Fri Jan 30 19:21:57 2009
From: jwecker@REDACTED (Joseph Wecker)
Date: Fri, 30 Jan 2009 11:21:57 -0700 (MST)
Subject: [erlang-bugs] inet_gethost (small) memory leak
Message-ID: <sDJQ8JPU.1233339716.8641210.jwecker@entride.com>


I was running valgrind on my erlang program to find some memory leaks in
a port program.  As expected, the erlang vm itself was very tight
(compared to some python that ended up getting profiled, my C program,
and even sed- throwing up memory leak errors all over the place).  For
what it's worth though, there was one very small memory leak that
Erlang itself was generating.  This may be a known problem already, or
it may be too small to care- but it's the only one I saw:

==31273== 21 bytes in 1 blocks are definitely lost in loss record 1 of 2
==31273==    at 0x4025D2E: malloc (vg_replace_malloc.c:207)
==31273==    by 0x4025EAF: realloc (vg_replace_malloc.c:429)
==31273==    by 0x80497EE: (within
/usr/lib/erlang/erts-5.6.3/bin/inet_gethost)
==31273==    by 0x804AF06: (within
/usr/lib/erlang/erts-5.6.3/bin/inet_gethost)
==31273==    by 0x804BB34: (within
/usr/lib/erlang/erts-5.6.3/bin/inet_gethost)
==31273==    by 0x4084684: (below main) (in
/lib/tls/i686/cmov/libc-2.8.90.so)


==31275==
==31275== 21 bytes in 1 blocks are definitely lost in loss record 1 of 2
==31275==    at 0x4025D2E: malloc (vg_replace_malloc.c:207)
==31275==    by 0x4025EAF: realloc (vg_replace_malloc.c:429)
==31275==    by 0x80497EE: (within
/usr/lib/erlang/erts-5.6.3/bin/inet_gethost)
==31275==    by 0x804AF06: (within
/usr/lib/erlang/erts-5.6.3/bin/inet_gethost)
==31275==    by 0x804BB34: (within
/usr/lib/erlang/erts-5.6.3/bin/inet_gethost)
==31275==    by 0x4084684: (below main) (in
/lib/tls/i686/cmov/libc-2.8.90.so)

etc.
Wouldn't be surprised if it was coming from a system call- there may be
nothing you can do about it, but I thought I'd bring it up as I
couldn't find reference to this anywhere.

-Joseph