Groups | Search | Server Info | Login | Register


Groups > comp.compilers > #3477

Re: binary search debugging of compilers

From Kaz Kylheku <864-117-4973@kylheku.com>
Newsgroups comp.compilers
Subject Re: binary search debugging of compilers
Date 2023-05-19 03:44 +0000
Organization A noiseless patient Spider
Message-ID <23-05-021@comp.compilers> (permalink)
References <CADSkJJVN7RGqhgVDZaz_K5be6uEDaMofMu5OF3RR4Y5-fDu00Q@mail.gmail.com> <23-05-013@comp.compilers> <23-05-018@comp.compilers>

Show all headers | View raw


On 2023-05-18, Spiros Bousbouras <spibou@gmail.com> wrote:
> On Wed, 17 May 2023 10:55:27 -0400
> Russ Cox <rsc@swtch.com> wrote:
>> When BitKeeper came along, we added this:
>>
>>        --longest        Restrict  the  deltas  to  those  on  the longest line
>>                         between the two range endpoints.  Unlike a range, the
>>                         lower bound is included in the output.
>>
>> because BitKeeper history is a lattice, not a straight line.  So your
>> test points became
>>
>>         bk changes -nd:REV: > REVS
>>
>> and binary search over those.  That gets you the longest "straight" line
>> in the graph.
>
> Any partial ordering can be extended to a total ordering. Why can't you
> just do that instead of using  'the longest "straight" line' ?

I think, not without introducing tie breaker rules that don't
necessarily make any sense.

But, maybe that doens't matter as something else. Say we have:

            D-------E
           /         \
  A---B---C           I----J
           \         /
            F---G---H

A is before B and so on. Are F-G-H before D-E or after?

Suppose the bug was introduced by E. We would like to know that.

The longest path between the A-J bisection points goes through F-G-H,
ignoring D-E. So it may wrongly look like I is the culprit, if E is
never tested. I has the bug, H doesn't.

Say we pick a precedence and flatten everything, in these
two ways:

  A---B---C---D---E---F---G---H---I----J

or

  A---B---C---F---G---H---D---E---I----J

problem is, under the first order, the bug is bouncing into
and out of existence:

  A---B---C---D---E---F---G---H---I----J
                  X               X    X


E has the bug, and so do I and J. But F-G-H do not.  So the bisect
algorithm has a problem; it is predicated on the idea that the bug
exists in the upper endpoint, is absent in the lower and has appeared
once in between.

Now, if we actually cherry pick all those commits into that order
(solving whatever conflicts are required), then we can do a meaningful
bisect. Then  the rebased F G H commits do have the bug:

  A---B---C---D---E---F'--G'--H'--I'---J
                  X   X   X   X   X    X

It's unthinkable to be bisecting some lineage whipped up on-the-fly
that doesn't actually exist in the repo.

The best thing is to just keep the actual history that way.

Or else, do that longest-path things when bisecting. Then
if a merge point is implicated as the bad commit, then go
into the branch. When I is implicated, we note that it
has two parents, E and H. Only the H parentage was tested.
C was good. Thus we bisect C to E.

A smarter bisect algorithm could do this.

To make this sort of topical for comp.compilers, we note that the
revision history is a graph which resembles the control flow
graph of a program (one with no backwards branches).

The graph can be divided into our favorite chunks: basic blocks.

Basic blocks are sequence of commits in which no commit has multiple
parents (other than the first one), or is the parent of multiple children
(other than the last one).

The bisect algorithm should divide the graph into basic blocks,
and recurse into then when necessary.

So in:


            D-------E
           /         \
  A---B---C           I----J
           \         /
            F---G---H


we have these basic blocks:

           /
  A---B---C
           \


  D---E
       \


           /
  F---G---H

  I---J


First, the bisect could process the longest-path basic blocks: those
headed by A, F and I.   The goal is to determine which basic block
introduces the bug.

When a buggy basic block is identified, then bisect actually
does commit-granularity bisection in that block.

If we discover that the bad commit is the head of a block,
and that block has multiple parents, we then recurse over
those parents somehow.

We identify paths (through the basic block graph) we have not taken
and similarly analyze them.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Back to comp.compilers | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Re: binary search debugging of compilers Russ Cox <rsc@swtch.com> - 2023-05-17 10:55 -0400
  Re: binary search debugging of compilers Cameron McInally <cameron.mcinally@nyu.edu> - 2023-05-17 13:40 -0400
  Re: binary search debugging of compilers Kaz Kylheku <864-117-4973@kylheku.com> - 2023-05-17 18:47 +0000
  Re: binary search debugging of compilers Spiros Bousbouras <spibou@gmail.com> - 2023-05-18 10:50 +0000
    Re: binary search debugging of compilers Kaz Kylheku <864-117-4973@kylheku.com> - 2023-05-19 03:44 +0000
    binary search debugging of compilers Cliff Click <cclick0@gmail.com> - 2023-05-19 10:47 -0700
  Re: binary search debugging of compilers mrs@kithrup.com (Mike Stump) - 2023-05-20 18:04 +0000

csiph-web