Groups | Search | Server Info | Login | Register

Re: binary search debugging of compilers

From	Russ Cox <rsc@swtch.com>
Newsgroups	comp.compilers
Subject	Re: binary search debugging of compilers
Date	2023-05-18 15:28 -0400
Organization	Compilers Central
Message-ID	<23-05-019@comp.compilers> (permalink)
References	<CADSkJJVN7RGqhgVDZaz_K5be6uEDaMofMu5OF3RR4Y5-fDu00Q@mail.gmail.com>

Show all headers | View raw

> Any partial ordering can be extended to a total ordering. Why can't you
> just do that instead of using  'the longest "straight" line' ?

A precondition for binary search is that you are searching a function
f over some domain [0, n) with the property that f(i) == true implies
f(j) == true for all j >= i. That is, once the condition "flips", it
stays flipped. Otherwise binary search is not a correct algorithm.

The assumption in tools like git bisect is that some commit introduces the
bug, and then the bug is detectable in all future commits as well,
establishing the precondition. The test f(i), testing at commit i, can be
viewed as asking whether the bad commit is in the range [0, i] (the full
history of that commit). With that definition, clearly f(i) implies f(j)
for all j >= i, because the commit history [0, j] is a superset of the
commit history [0, i].

If instead you squash unrelated commit lines together in a total order, the
precondition is no longer true: if i is from one line but j > i is from a
different unmerged line, then testing j does not include some history that
testing i does. The bug can disappear as you cross from one line to
another. Focusing on the longest straight line in the commit history is one
way to establish the necessary precondition.

Another possibility, which is what git bisect does, is to maintain a set of
commits not yet decided and then find some "good" midpoint along the
longest line of commits remaining. You may in general end up with multiple
disconnected lines in your set of commits not yet decided, and you have to
handle each one separately until you find the bug.

What git bisect does, then, is a more complex algorithm that uses
binary search over longest lines as a subroutine. Restricted to a
single line of commits the precondition holds and binary search works.
(I am simplifying a bit. The actual implementation is perhaps not this
principled, but this is the essence of what it is doing and why it
works.)

The binary search I described to start this thread is also not exactly
the same binary search algorithm, because the underlying input problem
is not a single-input function with domain [0, n). Instead the input
problem (in the simplest form) is a function taking a set of possible
changes [lo, hi) that reports whether the bug is in that set. We
recognize it as binary search because it's still the same code and
idea, and every binary search does maintain this progressively
narrower range, but normal binary searches don't tell f the range.

Best,
Russ

Back to comp.compilers | Previous | Next | Find similar

Thread

Re: binary search debugging of compilers Russ Cox <rsc@swtch.com> - 2023-05-18 15:28 -0400

csiph-web