Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #17432 > unrolled thread

Debugging a difficult refcount issue.

Started bybuck <workitharder@gmail.com>
First post2011-12-17 20:17 -0800
Last post2011-12-18 18:01 -0800
Articles 8 — 3 participants

Back to article view | Back to comp.lang.python


Contents

  Debugging a difficult refcount issue. buck <workitharder@gmail.com> - 2011-12-17 20:17 -0800
    Re: Debugging a difficult refcount issue. Paul Rubin <no.email@nospam.invalid> - 2011-12-17 23:55 -0800
      Re: Debugging a difficult refcount issue. buck <workitharder@gmail.com> - 2011-12-18 08:06 -0800
    Re: Debugging a difficult refcount issue. Jack Diederich <jackdied@gmail.com> - 2011-12-18 12:52 -0500
      Re: Debugging a difficult refcount issue. buck <workitharder@gmail.com> - 2011-12-18 18:01 -0800
        Re: Debugging a difficult refcount issue. buck <workitharder@gmail.com> - 2011-12-19 00:09 -0800
        Re: Debugging a difficult refcount issue. buck <workitharder@gmail.com> - 2011-12-19 00:09 -0800
      Re: Debugging a difficult refcount issue. buck <workitharder@gmail.com> - 2011-12-18 18:01 -0800

#17432 — Debugging a difficult refcount issue.

Frombuck <workitharder@gmail.com>
Date2011-12-17 20:17 -0800
SubjectDebugging a difficult refcount issue.
Message-ID<18365650.792.1324181855929.JavaMail.geo-discussion-forums@prht13>
I'm getting a fatal python error "Fatal Python error: GC object already tracked"[1].

Using gdb, I've pinpointed the place where the error is detected. It is an empty dictionary which is marked as in-use. This is somewhat helpful since I can reliably find the memory address of the dict, but it does not help me pinpoint the issue. I was able to find the piece of code that allocates the problematic dict via a malloc/LD_PRELOAD interposer, but that code was pure python. I don't think it was the cause.

I believe that the dict was deallocated, cached, and re-allocated via PyDict_New to a C routine with bad refcount logic, then the above error manifests when the dict is again deallocated, cached, and re-allocated.

I tried to pinpoint this intermediate allocation with a similar PyDict_New/LD_PRELOAD interposer, but that isn't working for me[2].

How should I go about debugging this further? I've been completely stuck on this for two days now :(

[1] http://hg.python.org/cpython/file/99af4b44e7e4/Include/objimpl.h#l267
[2] http://stackoverflow.com/questions/8549671/cant-intercept-pydict-new-with-ld-preload

[toc] | [next] | [standalone]


#17440

FromPaul Rubin <no.email@nospam.invalid>
Date2011-12-17 23:55 -0800
Message-ID<7xpqfm48m6.fsf@ruckus.brouhaha.com>
In reply to#17432
buck <workitharder@gmail.com> writes:
> I tried to pinpoint this intermediate allocation with a similar
> PyDict_New/LD_PRELOAD interposer, but that isn't working for me[2].

Did you try a gdb watchpoint?

[toc] | [prev] | [next] | [standalone]


#17465

Frombuck <workitharder@gmail.com>
Date2011-12-18 08:06 -0800
Message-ID<812938.821.1324224416576.JavaMail.geo-discussion-forums@prfi34>
In reply to#17440
On Saturday, December 17, 2011 11:55:13 PM UTC-8, Paul Rubin wrote:
> buck <workit...@gmail.com> writes:
> > I tried to pinpoint this intermediate allocation with a similar
> > PyDict_New/LD_PRELOAD interposer, but that isn't working for me[2].
> 
> Did you try a gdb watchpoint?

I didn't try that, since that piece of code is run millions of times, and I don't know the dict-id I'm looking for until after the problem has occurred.

[toc] | [prev] | [next] | [standalone]


#17468

FromJack Diederich <jackdied@gmail.com>
Date2011-12-18 12:52 -0500
Message-ID<mailman.3799.1324230769.27778.python-list@python.org>
In reply to#17432
I don't have any great advice, that kind of issue is hard to pin down.
 That said, do try using a python compile with --with-debug enabled,
with that you can turn your unit tests on and off to pinpoint where
the refcounts are getting messed up.  It also causes python to use
plain malloc()s so valgrind becomes useful.  Worst case add assertions
and printf()s in the places you think are most janky.

-Jack

On Sat, Dec 17, 2011 at 11:17 PM, buck <workitharder@gmail.com> wrote:
> I'm getting a fatal python error "Fatal Python error: GC object already tracked"[1].
>
> Using gdb, I've pinpointed the place where the error is detected. It is an empty dictionary which is marked as in-use. This is somewhat helpful since I can reliably find the memory address of the dict, but it does not help me pinpoint the issue. I was able to find the piece of code that allocates the problematic dict via a malloc/LD_PRELOAD interposer, but that code was pure python. I don't think it was the cause.
>
> I believe that the dict was deallocated, cached, and re-allocated via PyDict_New to a C routine with bad refcount logic, then the above error manifests when the dict is again deallocated, cached, and re-allocated.
>
> I tried to pinpoint this intermediate allocation with a similar PyDict_New/LD_PRELOAD interposer, but that isn't working for me[2].
>
> How should I go about debugging this further? I've been completely stuck on this for two days now :(
>
> [1] http://hg.python.org/cpython/file/99af4b44e7e4/Include/objimpl.h#l267
> [2] http://stackoverflow.com/questions/8549671/cant-intercept-pydict-new-with-ld-preload
> --
> http://mail.python.org/mailman/listinfo/python-list

[toc] | [prev] | [next] | [standalone]


#17488

Frombuck <workitharder@gmail.com>
Date2011-12-18 18:01 -0800
Message-ID<mailman.3810.1324260120.27778.python-list@python.org>
In reply to#17468
Thanks Jack. I think printf is what it will come down to. I plan to put a little code into PyDict_New to print the id and the line at which it was allocated. Hopefully this will show me all the possible suspects and I can figure it out from there.

I hope figuring out the file and line-number from within that code isn't too hard.


On Sunday, December 18, 2011 9:52:46 AM UTC-8, Jack Diederich wrote:
> I don't have any great advice, that kind of issue is hard to pin down.
>  That said, do try using a python compile with --with-debug enabled,
> with that you can turn your unit tests on and off to pinpoint where
> the refcounts are getting messed up.  It also causes python to use
> plain malloc()s so valgrind becomes useful.  Worst case add assertions
> and printf()s in the places you think are most janky.
> 
> -Jack
> 
> On Sat, Dec 17, 2011 at 11:17 PM, buck <workit...@gmail.com> wrote:
> > I'm getting a fatal python error "Fatal Python error: GC object already tracked"[1].
> >
> > Using gdb, I've pinpointed the place where the error is detected. It is an empty dictionary which is marked as in-use. This is somewhat helpful since I can reliably find the memory address of the dict, but it does not help me pinpoint the issue. I was able to find the piece of code that allocates the problematic dict via a malloc/LD_PRELOAD interposer, but that code was pure python. I don't think it was the cause.
> >
> > I believe that the dict was deallocated, cached, and re-allocated via PyDict_New to a C routine with bad refcount logic, then the above error manifests when the dict is again deallocated, cached, and re-allocated.
> >
> > I tried to pinpoint this intermediate allocation with a similar PyDict_New/LD_PRELOAD interposer, but that isn't working for me[2].
> >
> > How should I go about debugging this further? I've been completely stuck on this for two days now :(
> >
> > [1] http://hg.python.org/cpython/file/99af4b44e7e4/Include/objimpl.h#l267
> > [2] http://stackoverflow.com/questions/8549671/cant-intercept-pydict-new-with-ld-preload
> > --
> > http://mail.python.org/mailman/listinfo/python-list

[toc] | [prev] | [next] | [standalone]


#17502

Frombuck <workitharder@gmail.com>
Date2011-12-19 00:09 -0800
Message-ID<13061655.1156.1324282159733.JavaMail.geo-discussion-forums@prkz16>
In reply to#17488
This is what I came up with:
https://gist.github.com/1496028

We'll see if it helps, tomorrow.


On Sunday, December 18, 2011 6:01:50 PM UTC-8, buck wrote:
> Thanks Jack. I think printf is what it will come down to. I plan to put a little code into PyDict_New to print the id and the line at which it was allocated. Hopefully this will show me all the possible suspects and I can figure it out from there.
> 
> I hope figuring out the file and line-number from within that code isn't too hard.
> 
> 
> On Sunday, December 18, 2011 9:52:46 AM UTC-8, Jack Diederich wrote:
> > I don't have any great advice, that kind of issue is hard to pin down.
> >  That said, do try using a python compile with --with-debug enabled,
> > with that you can turn your unit tests on and off to pinpoint where
> > the refcounts are getting messed up.  It also causes python to use
> > plain malloc()s so valgrind becomes useful.  Worst case add assertions
> > and printf()s in the places you think are most janky.
> > 
> > -Jack
> > 
> > On Sat, Dec 17, 2011 at 11:17 PM, buck <work...@gmail.com> wrote:
> > > I'm getting a fatal python error "Fatal Python error: GC object already tracked"[1].
> > >
> > > Using gdb, I've pinpointed the place where the error is detected. It is an empty dictionary which is marked as in-use. This is somewhat helpful since I can reliably find the memory address of the dict, but it does not help me pinpoint the issue. I was able to find the piece of code that allocates the problematic dict via a malloc/LD_PRELOAD interposer, but that code was pure python. I don't think it was the cause.
> > >
> > > I believe that the dict was deallocated, cached, and re-allocated via PyDict_New to a C routine with bad refcount logic, then the above error manifests when the dict is again deallocated, cached, and re-allocated.
> > >
> > > I tried to pinpoint this intermediate allocation with a similar PyDict_New/LD_PRELOAD interposer, but that isn't working for me[2].
> > >
> > > How should I go about debugging this further? I've been completely stuck on this for two days now :(
> > >
> > > [1] http://hg.python.org/cpython/file/99af4b44e7e4/Include/objimpl.h#l267
> > > [2] http://stackoverflow.com/questions/8549671/cant-intercept-pydict-new-with-ld-preload
> > > --
> > > http://mail.python.org/mailman/listinfo/python-list

[toc] | [prev] | [next] | [standalone]


#17503

Frombuck <workitharder@gmail.com>
Date2011-12-19 00:09 -0800
Message-ID<mailman.3817.1324282889.27778.python-list@python.org>
In reply to#17488
This is what I came up with:
https://gist.github.com/1496028

We'll see if it helps, tomorrow.


On Sunday, December 18, 2011 6:01:50 PM UTC-8, buck wrote:
> Thanks Jack. I think printf is what it will come down to. I plan to put a little code into PyDict_New to print the id and the line at which it was allocated. Hopefully this will show me all the possible suspects and I can figure it out from there.
> 
> I hope figuring out the file and line-number from within that code isn't too hard.
> 
> 
> On Sunday, December 18, 2011 9:52:46 AM UTC-8, Jack Diederich wrote:
> > I don't have any great advice, that kind of issue is hard to pin down.
> >  That said, do try using a python compile with --with-debug enabled,
> > with that you can turn your unit tests on and off to pinpoint where
> > the refcounts are getting messed up.  It also causes python to use
> > plain malloc()s so valgrind becomes useful.  Worst case add assertions
> > and printf()s in the places you think are most janky.
> > 
> > -Jack
> > 
> > On Sat, Dec 17, 2011 at 11:17 PM, buck <work...@gmail.com> wrote:
> > > I'm getting a fatal python error "Fatal Python error: GC object already tracked"[1].
> > >
> > > Using gdb, I've pinpointed the place where the error is detected. It is an empty dictionary which is marked as in-use. This is somewhat helpful since I can reliably find the memory address of the dict, but it does not help me pinpoint the issue. I was able to find the piece of code that allocates the problematic dict via a malloc/LD_PRELOAD interposer, but that code was pure python. I don't think it was the cause.
> > >
> > > I believe that the dict was deallocated, cached, and re-allocated via PyDict_New to a C routine with bad refcount logic, then the above error manifests when the dict is again deallocated, cached, and re-allocated.
> > >
> > > I tried to pinpoint this intermediate allocation with a similar PyDict_New/LD_PRELOAD interposer, but that isn't working for me[2].
> > >
> > > How should I go about debugging this further? I've been completely stuck on this for two days now :(
> > >
> > > [1] http://hg.python.org/cpython/file/99af4b44e7e4/Include/objimpl.h#l267
> > > [2] http://stackoverflow.com/questions/8549671/cant-intercept-pydict-new-with-ld-preload
> > > --
> > > http://mail.python.org/mailman/listinfo/python-list

[toc] | [prev] | [next] | [standalone]


#17489

Frombuck <workitharder@gmail.com>
Date2011-12-18 18:01 -0800
Message-ID<4953582.389.1324260110614.JavaMail.geo-discussion-forums@pruu23>
In reply to#17468
Thanks Jack. I think printf is what it will come down to. I plan to put a little code into PyDict_New to print the id and the line at which it was allocated. Hopefully this will show me all the possible suspects and I can figure it out from there.

I hope figuring out the file and line-number from within that code isn't too hard.


On Sunday, December 18, 2011 9:52:46 AM UTC-8, Jack Diederich wrote:
> I don't have any great advice, that kind of issue is hard to pin down.
>  That said, do try using a python compile with --with-debug enabled,
> with that you can turn your unit tests on and off to pinpoint where
> the refcounts are getting messed up.  It also causes python to use
> plain malloc()s so valgrind becomes useful.  Worst case add assertions
> and printf()s in the places you think are most janky.
> 
> -Jack
> 
> On Sat, Dec 17, 2011 at 11:17 PM, buck <workit...@gmail.com> wrote:
> > I'm getting a fatal python error "Fatal Python error: GC object already tracked"[1].
> >
> > Using gdb, I've pinpointed the place where the error is detected. It is an empty dictionary which is marked as in-use. This is somewhat helpful since I can reliably find the memory address of the dict, but it does not help me pinpoint the issue. I was able to find the piece of code that allocates the problematic dict via a malloc/LD_PRELOAD interposer, but that code was pure python. I don't think it was the cause.
> >
> > I believe that the dict was deallocated, cached, and re-allocated via PyDict_New to a C routine with bad refcount logic, then the above error manifests when the dict is again deallocated, cached, and re-allocated.
> >
> > I tried to pinpoint this intermediate allocation with a similar PyDict_New/LD_PRELOAD interposer, but that isn't working for me[2].
> >
> > How should I go about debugging this further? I've been completely stuck on this for two days now :(
> >
> > [1] http://hg.python.org/cpython/file/99af4b44e7e4/Include/objimpl.h#l267
> > [2] http://stackoverflow.com/questions/8549671/cant-intercept-pydict-new-with-ld-preload
> > --
> > http://mail.python.org/mailman/listinfo/python-list

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web