OTP R13B01 odbc

Paul Oliver puzza007@REDACTED
Thu Sep 3 17:36:20 CEST 2009


Found it!

For suture reference, it was snoopy.so (a logging package installed on
a dev server I'm using http://sourceforge.net/projects/snoopylogger/).
 It screws up the vforked process because it does stuff (logging
around calls to execve) before the execve is called.  Obviously for
vfork this is a problem - the manpage states "The vfork() function has
the same effect as fork(), except that the behaviour is undefined if
the process created by vfork() either modifies any data other than a
variable of type pid_t used to store the return value from vfork(), or
returns from the function in which vfork() was called, or calls any
other function before successfully calling _exit() or one of the exec
family of functions.".  Please see the strace output below.

BTW, is there any reason that vfork isn't disabled by default on
Linux?  Given that fork is implemented using copy-on-write is there
any benefit in using vfork?

11:51:29 read(13,
"\7\0\0\0\300V\0\0pts/262\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) =
384
11:51:29 read(13,
"\7\0\0\0\334a\0\0pts/264\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) =
384
11:51:29 read(13,
"\7\0\0\0\337f\0\0pts/265\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 384) =
384
11:51:29 fcntl(13, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=0,
len=0}) = 0
11:51:29 alarm(0) = 1
11:51:29 rt_sigaction(SIGALRM, {SIG_DFL}, NULL, 8) = 0
11:51:29 close(13) = 0
11:51:29 getsid(0) = 26338
11:51:29 getuid() = 1084
11:51:29 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3661, ...}) = 0
11:51:29 getpid() = 21217
11:51:29 sendto(11, "<86>Sep 3 11:51:29 snoopy[21217"..., 97,
MSG_NOSIGNAL, NULL, 0) = -1 ENOTSOCK (Socket operation on non-socket)
11:51:29 close(11) = 0
11:51:29 socket(PF_FILE, SOCK_DGRAM, 0) = 11
11:51:29 fcntl(11, F_SETFD, FD_CLOEXEC) = 0
11:51:29 connect(11, {sa_family=AF_FILE, path="/dev/log"}, 110) = 0
11:51:29 sendto(11, "<86>Sep 3 11:51:29 snoopy[21217"..., 97,
MSG_NOSIGNAL, NULL, 0) = 97
11:51:29 execve("/space/jpurring/erlR13B01/lib/erlang/erts-5.7.2/bin/child_setup",
["/space/jpurring/erlR13B01/lib/er"..., "false", ".", "exec
/space/jpurring/erlR13B01/l"..., "3:1023", "10:1", "11:0", "-"], [/*
55 vars */]) = 0
11:51:29 brk(0) = 0x603000
11:51:29 mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3b4e541000
...
11:51:29 dup2(10, 1) = 1
11:51:29 dup2(11, 0) = -1 EBADF (Bad file descriptor)
11:51:29 exit_group(1) = ?


On Thu, Sep 3, 2009 at 11:24 AM, Paul Oliver <puzza007@REDACTED> wrote:
>
> More info in case it rings any bells...
>
> Using strace to monitor erl/odbcserver causes odbc:connect to *work*.  The odbcserver process starts up and connects to both the supervisor prot and the worker port.  Adding logging to odbcserver seems to show that when not being viewed with strace (or having pman run), the main function in odbcserver is never run!  Is there a reason it wouldn't even get this far?
>
> The logging we've added to odbcserver.c is:
>
> int main(void)
> {
>     FILE *fh;
>     byte *msg = NULL;
>     char *temp = NULL, *supervisor_port = NULL, *odbc_port = NULL;
>     size_t length;
> #ifdef WIN32
>     _setmode(_fileno( stdin),  _O_BINARY);
> #endif
>
>       fh = fopen("/tmp/odbc.log", "a");
>     fprintf(fh, "Start main...\n");
>     fflush(fh);
>     fclose(fh);
>
> [snip]
>
> #if defined WIN32 || defined USE_IPV4
> /* Currently only an old windows compiler is supported so we do not have ipv6
>   capabilities */
> static SOCKET connect_to_erlang(const char *port, int is_sup)
> {
>     SOCKET sock;
>     struct sockaddr_in sin;
>     FILE *fh;
>     int conn_ret;
>
>     if (is_sup)
>       fh = fopen("/tmp/sup.log", "a");
>     else
>       fh = fopen("/tmp/odbc.log", "a");
>
>     fprintf(fh, "connect_to_erlang call to \"socket\"\n");
>     fflush(fh);
>
> I'm running on Linux foobar 2.6.24-24-server #1 SMP Tue Aug 18 16:51:43 UTC 2009 x86_64 GNU/Linux
>
> Thanks!
> Paul.
>
> On Tue, Sep 1, 2009 at 3:38 PM, Paul Oliver<puzza007@REDACTED> wrote:
> > Hi all,
> >
> > I'm experiencing an odd issue with the odbc application under OTP R13B.
> >
> > If I run start a connection *after* starting pman, it works.  However,
> > if I don't run pman, I get a connection timeout!  See my erl shell
> > session below.  I've added logging to the odbcserver port and in the
> > latter case it doesn't seem to start at all.
> >
> > Does anyone know what the problem could be?
> >
> > Thanks,
> > Paul.
> >
> > Eshell V5.7.2  (abort with ^G)
> > 1>  odbc:start().
> > ok
> > 2> odbc:connect("DSN=inftest",[]).
> >
> > =ERROR REPORT==== 1-Sep-2009::14:20:24 ===
> > ** Generic server <0.40.0> terminating
> > ** Last message in was {<0.33.0>,
> >                        {connect,[1,1,2,1,1,"DSN=inftest"],on,on},
> >                        infinity}
> > ** When Server state == {state,#Port<0.547>,
> >                               {<0.33.0>,#Ref<0.0.0.37>},
> >                               <0.33.0>,undefined,on,undefined,undefined,on,
> >                               connecting,undefined,0,
> >                               [#Port<0.545>,#Port<0.546>],
> >                               undefined,undefined}
> > ** Reason for termination ==
> > ** timeout
> > {error,connection_closed}
> > 3>
> >
> > But:
> >
> > 1> odbc:start().
> > ok
> > 2> pman:start().
> > <0.42.0>
> > 3>  odbc:connect("DSN=inftest",[]).
> > {ok,<0.54.0>}
> >
>


More information about the erlang-questions mailing list