Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.os.linux.development.system > #225

Re: problem with core dumps and multi-threaded daemon

From Gorlash <dan.miller.x23@gmail.com>
Newsgroups comp.os.linux.development.system
Subject Re: problem with core dumps and multi-threaded daemon
Date 2011-07-18 14:33 -0700
Organization http://groups.google.com
Message-ID <2c50d24a-a4f0-440e-bd25-0349621f41b9@28g2000pry.googlegroups.com> (permalink)
References <64909bbc-5fb3-4ceb-9bba-2b75eec6b158@h7g2000prf.googlegroups.com> <slrnj2985k.gtj.grahn+nntp@frailea.sa.invalid>

Show all headers | View raw


On Jul 18, 2:08 pm, Jorgen Grahn <grahn+n...@snipabacken.se> wrote:
> On Mon, 2011-07-18, Gorlash wrote:
> > I'm using Ubuntu server 10.10 (2.6.35-22-generic-pae)
> > I have a daemon running on my system, and it processes tcp packets and
> > other data sources, using multiple pthreads.  I'm finding that every
> > once in awhile it generates a segfault, and I'm trying to get some
> > clues as to what is going wrong.  All I get in /var/log/messages is:
> > Jul 18 11:15:32 u10hub kernel: [ 8688.898805] grunion[11319]: segfault
> > at b77ea000 ip 00688fb6 sp b77e7b18 error 6 in
> > libc-2.12.1.so[613000+157000]
> > I've added SIGSEGV to the list of signals that I monitor, but it does
> > not appear to to get invoked (though it *does* get invoked if I send
> > 'sudo killall -11 grunion').
>
> What do you mean by "monitor"? Trying to catch the SIGSEGV is unlikely
> to help you.
I use a separate thread to monitor signals (this is a left-over
technique from more than a decade ago, when I first started writing
Linux daemons).  The signal_monitor thread calls sigaddset() for each
entry in my signal list, then calls sigwait() to wait for one of the
signals to occur, after which it handles the requested signal.  I used
this in the past for SIG_USR1 and SIG_USR2; even now, this works for
SIGSEGV if I generate it with kill(), but when the signal is coming
from the kernel, my sigwait() does not appear to be returning at all.

I wonder if that's because the signal is actually occurring in
libc... ??

All I'm trying to accomplish at the moment is to somehow generate a
stack trace, so I can find out where the segfault is occurring.  I've
established, I think, that it's happening in my tcp_monitor thread
(based on the pid in the segfault message), but I don't know any more
than that.

BTW, I tried commenting out my signal-handling thread entirely, in
order to try to get the coredump, but then I don't get any log-file
message at all, and I still don't get a coredump...
>
> > I tried getting it to generate a coredump by setting 'ulimit -c
> > unlimited', and 'ulimit -a' shows that being set as requested, but
> > when it segfaulted most recently, there was no core file that I could
> > find.
>
> When you set a limit using ulimit, that affects the shell and is
> inherited by its children.  It isn't magically applied to all
> processes in the system, or all newly created processes.
>
> Easiest is probably to change the daemon's init script to set the
> limit.
When you say "init script", I'm assuming you mean my startup code?  I
don't use any scripts to start the daemon.  Do I use ulimit() to do
this??  (actually, I'll read 'man 5 core' as you recommended, it
probably answers these questions)

Anyway, I've been running the daemon via sudo, and just now confirmed
that root can write a text file to '/'.
Also, I reloaded the daemon and verified that cwd for each of the
threads *is* '/'.
I'll run getrlimit() and see what it says.

>
> Also note that there are other rules which may cause a core dump /not/
> to be generated. See the core(5) man page.  That man page also
> describes how to override some of these security features temporarily.
>
> /Jorgen
>
> --
>   // Jorgen Grahn <grahn@  Oo  o.   .     .
> \X/     snipabacken.se>   O  o   .

Back to comp.os.linux.development.system | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

problem with core dumps and multi-threaded daemon Gorlash <dan.miller.x23@gmail.com> - 2011-07-18 11:49 -0700
  Re: problem with core dumps and multi-threaded daemon Rainer Weikusat <rweikusat@mssgmbh.com> - 2011-07-18 19:54 +0100
    Re: problem with core dumps and multi-threaded daemon Gorlash <dan.miller.x23@gmail.com> - 2011-07-18 12:03 -0700
      Re: problem with core dumps and multi-threaded daemon Jorgen Grahn <grahn+nntp@snipabacken.se> - 2011-07-18 20:52 +0000
  Re: problem with core dumps and multi-threaded daemon Jorgen Grahn <grahn+nntp@snipabacken.se> - 2011-07-18 21:08 +0000
    Re: problem with core dumps and multi-threaded daemon Gorlash <dan.miller.x23@gmail.com> - 2011-07-18 14:33 -0700
      Re: problem with core dumps and multi-threaded daemon Jorgen Grahn <grahn+nntp@snipabacken.se> - 2011-07-18 22:34 +0000
        Re: problem with core dumps and multi-threaded daemon Gorlash <dan.miller.x23@gmail.com> - 2011-07-18 17:51 -0700

csiph-web