Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Russ Cox Newsgroups: comp.compilers Subject: binary search debugging of compilers Date: Fri, 12 May 2023 13:59:22 -0400 Organization: Compilers Central Sender: johnl@iecc.com Approved: comp.compilers@iecc.com Message-ID: <23-05-003@comp.compilers> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="21438"; mail-complaints-to="abuse@iecc.com" Keywords: tools, debug Posted-Date: 12 May 2023 21:34:34 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:3459 Hi all, In the Go compiler, we have been using a technique for the past eight years that I assume has been done in other compilers before, but I'm unaware of any. I describe it below, along with refinements we've made over the years. I'd be very interested to hear about earlier uses of approaches like this, or of any other ideas for use cases. In 2015, Keith Randall was working on a new SSA backend for the Go compiler. The old and new backends coexisted, and we could use either for any given function in the compilation. Keith introduced an environment variable GOSSAHASH that specifies the last few binary digits of the hash of function names that should use the new backend. So GOSSAHASH=0110 means compile only those functions whose names hash to a value with last four bits 0110. When a test is failing with the new backend, you try GOSSAHASH=0 and GOSSAHASH=1, and then use binary search to progressively refine the pattern, narrowing the failure down until only a single function is being compiled with the new backend. This was invaluable for approaching failures in large real-world tests (tests for libraries or production code, not for the compiler) that we had not written and did not understand. In 2016, David Chase was debugging a new optimizer rewrite rule that should have been correct but was causing mysterious test failures. He reused the same technique at finer granularity: the bit pattern now matched against a hash of the current file name and line number, so that the binary search pinpoints the exact line of code that is miscompiled (meaning causes a test failure) when the suspect rewrite rule is used. Since then we have continued to find uses for this approach. For example, when we added automatic FMA insertion on certain architectures, some tests started failing. By making the rewrite rule controlled by a file:line hash pattern, we can pinpoint the exact line or lines where FMA insertion causes the failure. As another example, we are considering a small change to the semantics of variables declared in for loops, along the lines of what C# 5 did and what JavaScript does in for-let loops. By using a hash pattern to toggle whether the new semantics applies at a given file:line, we can identify the specific loops where the semantic change provokes a test failure. The binary search does not have to stop at a single failure. With a richer pattern syntax adding an AND-NOT operator, we can disable the instances we've found, and if the test still fails, run a new binary search to find another culprit. (Technically, AND-NOT is not necessary for doing a complete binary traversal of the search space guided by single failures, but it is easy, and it is necessary for the next step.) The binary search can also be adapted to finding pairs or larger groups of changes, all of which are necessary to provoke a failure. If suffix B provokes the failure but neither 0B nor 1B do, then (assuming the test is not flaky!) you start using the pattern “0B OR 1B”, refine the left half of the expression with binary search (for example the first step checks “00B OR 1B” versus “10B OR 1B”), and then having refined the left half, move on to refining the right half. After refining both, you have an OR of patterns that each match one change, and all the changes are necessary to provoke the failure. The splitting naturally recurses as needed to find a locally minimal set of changes, in the sense that by construction, removing any single change would not provoke the failure. Stepping slightly away from compilers temporarily, we have recently realized that this technique can be adapted to identifying specific bug-inducing contexts for library functions as well. For example, suppose we want to introduce a new qsort implementation, but that causes tests to fail in a large program (maybe even a compiler) that makes use of qsort in many contexts. Binary search selecting the old or new algorithm according to a hash of the call stack will quickly identify the exact call stack affected by the new qsort. As I mentioned at the top, I am interested to hear about earlier uses of approaches like this, or any other ideas for situations where the approach might be applicable. Clearly the general problem has overlaps with group testing [1], and in particular Hwang's binary search-based approach (1972) [2]. I've also found one paper describing this algorithm in the context of identifying which of a large set of changes between GDB versions caused a crash (1999) [3]. (That paper's version of the algorithm for finding pairs or larger groups is buggy unless there is only one such group.) It seems like the idea of binary search to narrow down a buggy software change is so natural that there must have been many earlier uses for debugging before 1999, especially in compilers. Thanks for any references, interesting history, or ideas. If anyone is interested in code, we've published the binary search tool [4] and the code that interprets the patterns [5]. Best, Russ [1] https://en.wikipedia.org/wiki/Group_testing (a remarkably good article) [2] https://www.jstor.org/stable/2284447 [3] https://dl.acm.org/doi/10.1145/318774.318946 [4] https://pkg.go.dev/golang.org/x/tools/cmd/bisect [5] https://pkg.go.dev/golang.org/x/tools/internal/bisect