Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Kaz Kylheku <864-117-4973@kylheku.com> Newsgroups: comp.compilers Subject: Re: binary search debugging of compilers Date: Fri, 19 May 2023 03:44:04 -0000 (UTC) Organization: A noiseless patient Spider Sender: johnl@iecc.com Approved: comp.compilers@iecc.com Message-ID: <23-05-021@comp.compilers> References: <23-05-013@comp.compilers> <23-05-018@comp.compilers> Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="18481"; mail-complaints-to="abuse@iecc.com" Keywords: debug, tools Posted-Date: 19 May 2023 15:51:14 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:3477 On 2023-05-18, Spiros Bousbouras wrote: > On Wed, 17 May 2023 10:55:27 -0400 > Russ Cox wrote: >> When BitKeeper came along, we added this: >> >> --longest Restrict the deltas to those on the longest line >> between the two range endpoints. Unlike a range, the >> lower bound is included in the output. >> >> because BitKeeper history is a lattice, not a straight line. So your >> test points became >> >> bk changes -nd:REV: > REVS >> >> and binary search over those. That gets you the longest "straight" line >> in the graph. > > Any partial ordering can be extended to a total ordering. Why can't you > just do that instead of using 'the longest "straight" line' ? I think, not without introducing tie breaker rules that don't necessarily make any sense. But, maybe that doens't matter as something else. Say we have: D-------E / \ A---B---C I----J \ / F---G---H A is before B and so on. Are F-G-H before D-E or after? Suppose the bug was introduced by E. We would like to know that. The longest path between the A-J bisection points goes through F-G-H, ignoring D-E. So it may wrongly look like I is the culprit, if E is never tested. I has the bug, H doesn't. Say we pick a precedence and flatten everything, in these two ways: A---B---C---D---E---F---G---H---I----J or A---B---C---F---G---H---D---E---I----J problem is, under the first order, the bug is bouncing into and out of existence: A---B---C---D---E---F---G---H---I----J X X X E has the bug, and so do I and J. But F-G-H do not. So the bisect algorithm has a problem; it is predicated on the idea that the bug exists in the upper endpoint, is absent in the lower and has appeared once in between. Now, if we actually cherry pick all those commits into that order (solving whatever conflicts are required), then we can do a meaningful bisect. Then the rebased F G H commits do have the bug: A---B---C---D---E---F'--G'--H'--I'---J X X X X X X It's unthinkable to be bisecting some lineage whipped up on-the-fly that doesn't actually exist in the repo. The best thing is to just keep the actual history that way. Or else, do that longest-path things when bisecting. Then if a merge point is implicated as the bad commit, then go into the branch. When I is implicated, we note that it has two parents, E and H. Only the H parentage was tested. C was good. Thus we bisect C to E. A smarter bisect algorithm could do this. To make this sort of topical for comp.compilers, we note that the revision history is a graph which resembles the control flow graph of a program (one with no backwards branches). The graph can be divided into our favorite chunks: basic blocks. Basic blocks are sequence of commits in which no commit has multiple parents (other than the first one), or is the parent of multiple children (other than the last one). The bisect algorithm should divide the graph into basic blocks, and recurse into then when necessary. So in: D-------E / \ A---B---C I----J \ / F---G---H we have these basic blocks: / A---B---C \ D---E \ / F---G---H I---J First, the bisect could process the longest-path basic blocks: those headed by A, F and I. The goal is to determine which basic block introduces the bug. When a buggy basic block is identified, then bisect actually does commit-granularity bisection in that block. If we discover that the bad commit is the head of a block, and that block has multiple parents, we then recurse over those parents somehow. We identify paths (through the basic block graph) we have not taken and similarly analyze them. -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca