[erlang-questions] user process suspended

Samuel samuelrivas@REDACTED
Fri Sep 25 16:44:12 CEST 2015


Thanks a lot, that seems very helpful :)

For what is worth, after disabling the console backed we have been
running for a while without experiencing problems, but I want to see
it running flawlessly for a few more days before calling it a victory.

On 24 September 2015 at 07:37, Scott Lystig Fritchie
<fritchie@REDACTED> wrote:
> Samuel <samuelrivas@REDACTED> wrote:
>
> sr> We are starting to suspect that it is the lager console_backed that
> sr> hangs in a send to the io port. We have just disabled it and we'll
> sr> see if it helps.
>
> Hi, Samuel.  Basho has fought a similar problem with console I/O
> hanging.  IIRC we had the most problems with Solaris/Illumos/SmartOS.
> My poor memory is forgetting which BEAM runtimes were succeptible.  I
> also don't know if or how often the problem was reported to us on other
> OS platforms.
>
> There were a couple of diagnostics that we used:
>
> 1. Getting process_info(Pid, backtrace) for all procs.  Then we'd see
> that the ... hrm, which process ... hrrrrm, Riak version 1.2 used R15B01,
> I think ... the 'user' process?  This may or may not resemble the hang,
> see the end of this message.
>
> 2. Connecting to the Riak console via the "to_erl" scheme to talk to The
> VM's Actual Console.  Then the admin would press Return ... and the Riak
> node would spring to life, problem solved.  Until next time.
>
> IIRC disabling console output was the workaround.
>
> I've lost the ops history of when that problem was fixed VM/runtime-wise
> and/or Riak's workarounds.  But if this sounds like your problem, then
> it's worth digging for that history.
>
> Oh, here's a piece of luck.  I've rediscovered that I'd written an
> Expect script to automate the process for triggering the problem.  I'll
> attach that, also, in case it's helpful.
>
> -Scott
>
> --- snip --- snip --- snip --- snip ---
>
> =proc:<0.30.0>
> State: Waiting
> Name: user
> Spawned as: erlang:apply/2
> Current call: user:get_chars_more/7
> Spawned by: <0.29.0>
> Started: Tue Apr  9 16:16:46 2013
> Message queue length: 1
> Message queue: [{io_request,<0.186.0>,<0.179.0>,{put_chars,unicode,<<821 bytes>>}}]
> Number of heap fragments: 0
> Heap fragment data: 0
> Link list: []
> Dictionary: [{shell,<0.31.0>},{read_mode,list},{unicode,false}]
> Reductions: 357542
> Stack+heap: 610
> OldHeap: 0
> Heap unused: 152
> OldHeap unused: 0
> Stack dump:
> Program counter: 0x00007f0c51d92930 (user:get_chars_more/7 + 232)
> CP: 0x0000000000000000 (invalid)
> arity = 0
>
> 0x00007f0c51931710 Return addr 0x00007f0c51d8c498 (user:do_io_request/5 + 88)
> y(0)     unicode
> y(1)     {[],[]}
> y(2)     #Port<0.630>
> y(3)     {erl_scan,tokens,[1]}
> y(4)     get_until
> y(5)     io_lib
> y(6)     {erl_scan_continuation,[],no_col,[],2,{erl_scan,#Fun<erl_scan.3.84904554>,false,false,false},0,#Fun<erl_scan.25.84904554>}
>
> 0x00007f0c51931750 Return addr 0x00007f0c51d8c308 (user:server_loop/2 + 1408)
> y(0)     #Port<0.630>
> y(1)     <0.30.0>
> y(2)     <0.25333.1728>
>
> 0x00007f0c51931770 Return addr 0x00007f0c51d8b968 (user:catch_loop/3 + 112)
> y(0)     #Port<0.630>
>
> 0x00007f0c51931780 Return addr 0x0000000000836c78 (<terminate process normally>)
> y(0)     <0.31.0>
> y(1)     #Port<0.630>
> y(2)     Catch 0x00007f0c51d8b968 (user:catch_loop/3 + 112)
>
> --- snip --- snip --- snip --- snip ---
>
> #!/usr/bin/expect --
>
> ## Assumptions:
> ##
> ## 1. SSH has been used to log into "hostname" once before: we aren't
> ##    smart enough to deal with SSH host key verification stuff.
> ##
> ## 2. This script can successfully use SSH to log into hostname using
> ##    the "riak" user.
> ##
> ## 3. If SSH does not require a password for the "riak" user on
> ##    hostname, then please break that feature so that the SSH
> ##    password authentication method is required.
> ##
> ## 4. We assume that the "riak" account's shell prompt will include
> ##    a "$" in it.  If that is not a correct assumption, please fix
> ##    the login_prompt procedure below.
> ##
> ## 5. This script will stop Riak and kill any other beam.smp processes
> ##    that happen to be running on hostname.
> ##
> ## 6. We assume riak is SMP-capable and look at beam.smp processes only.
>
>
> ## Usage: this-script hostname password riak_path erl_path cookie riak_node_name
> ##
> ## For example, if the remote machine is using a Linux CentOS Riak package:
> ##
> ## expect this-script 172.16.72.130 foofoo /usr/sbin/riak /usr/lib64/riak/erts-5.9.1/bin/erl riak riak@REDACTED
>
> ## Reminder reminder reminder: never use single quotes
> ## Reminder reminder reminder: never use single quotes
> ## Reminder reminder reminder: never use single quotes
>
> set timeout 30
> set erl_prompt {@[^)]+\)\d+> }
>
> set hostname [lindex $argv 0]
> set password [lindex $argv 1]
> set riak_path [lindex $argv 2]
> set erl_path [lindex $argv 3]
> set cookie [lindex $argv 4]
> set riak_node_name [lindex $argv 5]
>
> proc login_prompt {} {
>     expect -re {\$}
>     # If a dollar sign does not appear in the prompt, comment out
>     # the line above, uncomment the line below, then edit the line
>     # below to include the correct prompt string.
>     # expect -re {some prompt regexp here}
> }
>
> proc erlang_prompt {} {
>     global erl_prompt
>
>     expect -re $erl_prompt
> }
>
> proc cmd {cmd} {
>     send "$cmd\r"
>     login_prompt
> }
>
> proc async_cmd {cmd} {
>     send "$cmd\r"
> }
>
> proc erlang_cmd {cmd} {
>     send "$cmd\r"
>     erlang_prompt
> }
>
> proc announce {msg} {
>     puts "\n\n*** [exec date]\n*** $msg"
> }
>
> proc switch_to {id} {
>     global spawn_id
>
>     announce "Switching to session $id"
>     set spawn_id $id
> }
>
> # send_user "Password for host $hostname: "
> # stty -echo
> # expect_user -re "(.*)\n"
> # stty echo
> # set password $expect_out(1,string)
> # puts ""
>
>     spawn ssh -l riak $hostname
>     set session1 $spawn_id
>     expect {password}
>     send "$password\r"
>     login_prompt
>
> announce "Killing Riak and all other beam.smp processes"
> switch_to $session1
> cmd {ps axww | grep beam.smp | grep -v grep | awk '{print $1}' | xargs kill -9 ; sleep 1}
> cmd {ps -ef | grep beam.smp | grep -v grep | awk '{print $2}' | xargs kill -9 ; sleep 1}
>
> announce "Starting Riak"
>     spawn ssh -l riak $hostname "$riak_path start"
>     set session2 $spawn_id
>     expect {password}
>     send "$password\r"
>     expect "ok"
> set after_start 10
> announce "Got ok from '$riak_path start', sleeping $after_start seconds..."
>     sleep $after_start
>
> announce "Attach to riak, do sanity test"
> switch_to $session1
> async_cmd "$riak_path attach"
> send "hello_world.\r"
> erlang_prompt
>
> announce "2nd login session to $hostname, attach to riak via remsh"
>
>     spawn ssh -l riak $hostname
>     set session2 $spawn_id
>     expect {password}
>     send "$password\r"
>     login_prompt
>
> switch_to $session2
> async_cmd "$erl_path -setcookie $cookie -name tmp@REDACTED -remsh $riak_node_name"
> erlang_prompt
> erlang_cmd "hello."
>
> announce "Log a message via lager, watch for it over on Riak console"
> switch_to $session2
> erlang_cmd "lager:log(warning, self(), \"WARNING 1\", \[\])."
>
> switch_to $session1
> expect {
>     "WARNING 1" { announce "Found our first warning, excellent!" }
>     timeout     { puts "BUMMER 1!" ; exit 4 }
> }
>
> announce "Now, to hang the logger to demonstrate the bug.\n    Any incomplete term is fine, we'll use an unfinished tuple...."
> switch_to $session1
> async_cmd "{tuple_item_number_1,"
>
> set bug_timeout 12
> announce "Now that console is hung, call lager:log().\n    Set timeout time to $bug_timeout seconds.\n    This function should execute in only a few milliseconds."
>
> set timeout $bug_timeout
>
> switch_to $session2
> async_cmd "lager:log(warning, self(), \"WARNING 2\", \[\])."
>
> switch_to $session1
> expect {
>     "WARNING 2" { announce "BUMMER, Found 2nd warning at wrong time"; exit 4}
>     timeout     { announce "TIMEOUT! Excellent, the bug has been demonstrated (1)" }
> }
>
> announce "When we finish the erlang term over on $session1 ('riak attach' session),\n    then the logger will become unblocked."
>
> switch_to $session1
> async_cmd "element_number_2}."
> expect {
>     -re $erl_prompt { announce "Excellent, we have 'riak attach' prompt" }
>     timeout     { announce "BUMMER, timeout when not expected (2)"; exit 4}
> }
>
> announce "Now go to session $session2 and expect the 2nd warning\n    _and_ the -remsh's prompt."
> switch_to $session2
> expect {
>     "WARNING 2" { announce "Excellent, I see the 'WARNING 2' log message" }
>     timeout     { announce "BUMMER, timeout when not expected @ 2nd w"; exit 4}
> }
> expect {
>     -re $erl_prompt { announce "Excellent, we have the 'riak -remsh' prompt" }
>     timeout     { announce "BUMMER, timeout when not expected (3)"; exit 4}
> }
>
> announce "Time to clean up both sessions"
>
> switch_to $session1
> async_cmd "\004"
> login_prompt
> async_cmd "exit\r"
> expect eof
>
> switch_to $session2
> async_cmd "\003" ; sleep 1
> async_cmd "\003" ; sleep 1
> login_prompt
> async_cmd "exit\r"
> expect eof
>
> announce "Excellent, this successful test is finished."
> exit 0



-- 
Samuel



More information about the erlang-questions mailing list