Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #3459 > unrolled thread
| Started by | Russ Cox <rsc@swtch.com> |
|---|---|
| First post | 2023-05-12 13:59 -0400 |
| Last post | 2023-05-15 21:35 +0000 |
| Articles | 17 — 8 participants |
Back to article view | Back to comp.compilers
binary search debugging of compilers Russ Cox <rsc@swtch.com> - 2023-05-12 13:59 -0400
Re: binary search debugging of compilers Kaz Kylheku <864-117-4973@kylheku.com> - 2023-05-13 03:20 +0000
Re: binary search debugging of compilers Fernando <pronesto@gmail.com> - 2023-05-13 04:47 -0700
Re: binary search debugging of compilers Kaz Kylheku <864-117-4973@kylheku.com> - 2023-05-14 02:49 +0000
Re: binary search debugging of compilers gah4 <gah4@u.washington.edu> - 2023-05-14 13:38 -0700
Re: binary search debugging of compilers Kaz Kylheku <864-117-4973@kylheku.com> - 2023-05-15 21:52 +0000
Re: binary search debugging of compilers gah4 <gah4@u.washington.edu> - 2023-05-16 23:52 -0700
Re: binary search debugging of compilers Kaz Kylheku <864-117-4973@kylheku.com> - 2023-05-17 18:28 +0000
Re: binary search debugging of compilers gah4 <gah4@u.washington.edu> - 2023-05-17 15:23 -0700
Re: binary search debugging of compilers Kaz Kylheku <864-117-4973@kylheku.com> - 2023-05-19 03:21 +0000
Re: binary search debugging of compilers Thomas Koenig <tkoenig@netcologne.de> - 2023-05-19 21:59 +0000
Re: binary search debugging of compilers gah4 <gah4@u.washington.edu> - 2023-05-20 20:20 -0700
Re: Old C compilers, binary search debugging of compilers Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2023-05-22 09:05 +0200
binary search debugging of compilers Max B <tekk.nolagi@gmail.com> - 2023-05-19 16:31 -0500
Re: binary search debugging of compilers Thomas Koenig <tkoenig@netcologne.de> - 2023-05-14 19:59 +0000
Re: binary search debugging of compilers Cameron McInally <cameron.mcinally@nyu.edu> - 2023-05-14 23:28 -0400
Re: binary search debugging of compilers Kaz Kylheku <864-117-4973@kylheku.com> - 2023-05-15 21:35 +0000
| From | Russ Cox <rsc@swtch.com> |
|---|---|
| Date | 2023-05-12 13:59 -0400 |
| Subject | binary search debugging of compilers |
| Message-ID | <23-05-003@comp.compilers> |
Hi all, In the Go compiler, we have been using a technique for the past eight years that I assume has been done in other compilers before, but I'm unaware of any. I describe it below, along with refinements we've made over the years. I'd be very interested to hear about earlier uses of approaches like this, or of any other ideas for use cases. In 2015, Keith Randall was working on a new SSA backend for the Go compiler. The old and new backends coexisted, and we could use either for any given function in the compilation. Keith introduced an environment variable GOSSAHASH that specifies the last few binary digits of the hash of function names that should use the new backend. So GOSSAHASH=0110 means compile only those functions whose names hash to a value with last four bits 0110. When a test is failing with the new backend, you try GOSSAHASH=0 and GOSSAHASH=1, and then use binary search to progressively refine the pattern, narrowing the failure down until only a single function is being compiled with the new backend. This was invaluable for approaching failures in large real-world tests (tests for libraries or production code, not for the compiler) that we had not written and did not understand. In 2016, David Chase was debugging a new optimizer rewrite rule that should have been correct but was causing mysterious test failures. He reused the same technique at finer granularity: the bit pattern now matched against a hash of the current file name and line number, so that the binary search pinpoints the exact line of code that is miscompiled (meaning causes a test failure) when the suspect rewrite rule is used. Since then we have continued to find uses for this approach. For example, when we added automatic FMA insertion on certain architectures, some tests started failing. By making the rewrite rule controlled by a file:line hash pattern, we can pinpoint the exact line or lines where FMA insertion causes the failure. As another example, we are considering a small change to the semantics of variables declared in for loops, along the lines of what C# 5 did and what JavaScript does in for-let loops. By using a hash pattern to toggle whether the new semantics applies at a given file:line, we can identify the specific loops where the semantic change provokes a test failure. The binary search does not have to stop at a single failure. With a richer pattern syntax adding an AND-NOT operator, we can disable the instances we've found, and if the test still fails, run a new binary search to find another culprit. (Technically, AND-NOT is not necessary for doing a complete binary traversal of the search space guided by single failures, but it is easy, and it is necessary for the next step.) The binary search can also be adapted to finding pairs or larger groups of changes, all of which are necessary to provoke a failure. If suffix B provokes the failure but neither 0B nor 1B do, then (assuming the test is not flaky!) you start using the pattern “0B OR 1B”, refine the left half of the expression with binary search (for example the first step checks “00B OR 1B” versus “10B OR 1B”), and then having refined the left half, move on to refining the right half. After refining both, you have an OR of patterns that each match one change, and all the changes are necessary to provoke the failure. The splitting naturally recurses as needed to find a locally minimal set of changes, in the sense that by construction, removing any single change would not provoke the failure. Stepping slightly away from compilers temporarily, we have recently realized that this technique can be adapted to identifying specific bug-inducing contexts for library functions as well. For example, suppose we want to introduce a new qsort implementation, but that causes tests to fail in a large program (maybe even a compiler) that makes use of qsort in many contexts. Binary search selecting the old or new algorithm according to a hash of the call stack will quickly identify the exact call stack affected by the new qsort. As I mentioned at the top, I am interested to hear about earlier uses of approaches like this, or any other ideas for situations where the approach might be applicable. Clearly the general problem has overlaps with group testing [1], and in particular Hwang's binary search-based approach (1972) [2]. I've also found one paper describing this algorithm in the context of identifying which of a large set of changes between GDB versions caused a crash (1999) [3]. (That paper's version of the algorithm for finding pairs or larger groups is buggy unless there is only one such group.) It seems like the idea of binary search to narrow down a buggy software change is so natural that there must have been many earlier uses for debugging before 1999, especially in compilers. Thanks for any references, interesting history, or ideas. If anyone is interested in code, we've published the binary search tool [4] and the code that interprets the patterns [5]. Best, Russ [1] https://en.wikipedia.org/wiki/Group_testing (a remarkably good article) [2] https://www.jstor.org/stable/2284447 [3] https://dl.acm.org/doi/10.1145/318774.318946 [4] https://pkg.go.dev/golang.org/x/tools/cmd/bisect [5] https://pkg.go.dev/golang.org/x/tools/internal/bisect
[toc] | [next] | [standalone]
| From | Kaz Kylheku <864-117-4973@kylheku.com> |
|---|---|
| Date | 2023-05-13 03:20 +0000 |
| Message-ID | <23-05-004@comp.compilers> |
| In reply to | #3459 |
On 2023-05-12, Russ Cox <rsc@swtch.com> wrote: > In 2015, Keith Randall was working on a new SSA backend for the Go > compiler. The old and new backends coexisted, and we could use either > for any given function in the compilation. Keith introduced an > environment variable GOSSAHASH that specifies the last few binary > digits of the hash of function names that should use the new backend. > So GOSSAHASH=0110 means compile only those functions whose names hash > to a value with last four bits 0110. When a test is failing with the > new backend, you try GOSSAHASH=0 and GOSSAHASH=1, and then use binary > search to progressively refine the pattern, narrowing the failure down > until only a single function is being compiled with the new backend. > This was invaluable for approaching failures in large real-world tests > (tests for libraries or production code, not for the compiler) that we > had not written and did not understand. ... > It seems like the idea of binary search to narrow down a buggy > software change is so natural that there must have been many earlier > uses for debugging before 1999, especially in compilers. I have used binary searching to find bugs in the TXR Lisp compiler. Say for the sake of simplicity that I already have a pretty reliable idea in which file the miscompiled function resides. Because the compiler executes the top-level forms that it compiles, I can go to 50% of that file (literally by typing 50% in Vim). Then stick a (setf *opt-level* 0) at that line. The bottom half of the file is then compiled without optimizations. If the bug goes away, then something is being mis-optimized in the second half. And so it goes. Another trick I have used is to suppress the compilation of specific forms according to this pattern: (defun fun (...) ...) --> (eval '(defun fun () ...)) I.e. take the form, and eval its quoted version: form -> (eval 'form) So the compiler now is compiling nothing more than an eval call which takes a canned code literal as its argument. When the resulting compiled file is loaded, fun will be an interpreted function, not a compiled fucntion: the compiled version of the eval call executes, interpreting the defun form, resulting in an interpreted function being defined. If the bug goes away, that function was being miscompiled. (This works in Lisp dialects that have a pure interpreter behind eval; dialects whose eval is actually a compiler may not support this technique well.) The hash technique seems nice. If we have no idea where in the code base the miscompilation is happening, it seems like it could help; it seems easier than some of the following techniques. I often have a good idea where in the code base the problem is because during the rebuild of TXR, files are compiled one by one. As each file in the stdlib/ is compiled, the next compile job now uses that file (if it is implied for loading). So more ofen than not, as soon as we miscompile some file that used during compilation, the compilation of the next file will misbehave. Or possibly the same file. The compiler evaluates what it compiles (unless told not to). If the compiler is compiling some code that it's also itself using, it may misbehave soon after miscompiling some of it, before that file is done compiling. A useful technique used sometimes is to substitute known-good compiled files. If the problem goes away if we substitute a known-good compiled file (touching the timestamps so that it's not clobbered with a recompile), we can suspect that the file is being miscompiled. -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca
[toc] | [prev] | [next] | [standalone]
| From | Fernando <pronesto@gmail.com> |
|---|---|
| Date | 2023-05-13 04:47 -0700 |
| Message-ID | <23-05-005@comp.compilers> |
| In reply to | #3459 |
Hi Russ, that's very interesting. You should consider submitting a report of the technique to CGO! The next deadline is on the 19th, but then there will be another deadline on September 1st. As for a related work, see John Regehr's research on test case reduction for compilers [1]. [1] John Regehr, Yang Chen, Pascal Cuoq, Eric Eide, Chucky Ellison, Xuejun Yang: Test-case reduction for C compiler bugs. PLDI 2012: 335-346 Regards, Fernando
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <864-117-4973@kylheku.com> |
|---|---|
| Date | 2023-05-14 02:49 +0000 |
| Message-ID | <23-05-006@comp.compilers> |
| In reply to | #3461 |
On 2023-05-13, Fernando <pronesto@gmail.com> wrote: > Hi Russ, that's very interesting. You should consider submitting a report of > the technique to CGO! The next deadline is on the 19th, but then there will be > another deadline on September 1st. We can summarize it briefly. Background: Suppose we have some data processing technique which transforms N entities. A new version of the technique changes how all the entities are transformed. One or more of them are transformed erroneously, resulting in a system regression (one which doesn't readily identify the bad element). Because all elements are transformed with the new technique, there are big differences in all of them compared to the old technique: comparing old and new is futile. Setup: We can (somehow) enumerate the N entities with integers, which we express using a pure binary enumeration. Then, we can discover the number of an erroneously processed element, one bit at a time. Procedure: First we treat all elements numbered ..XXXXXX0 using the new technique technique, and all others using the the old technique. If the system fails, we know that it's the ..XXXXX0 set which has one or more badly processed elements. Otherwise it's the other set, whose enumerations end in a 1. Either way, we have discovered the identity of the last bit. Suppose that discovered bit is 1. We then proceed with the next bit: we process ..XXXXXX01 entities with the new technique and all others with the old technique. If the problem reproduces, we have narrowed to the ...XXXXX01 set, otherwise to the .XXXXX11 set. In this manner we discover every bit, at which point we have a useful piece of information: the identity of a misprocessed element. We can then focus the investigation on how it is misprocessed. -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca
[toc] | [prev] | [next] | [standalone]
| From | gah4 <gah4@u.washington.edu> |
|---|---|
| Date | 2023-05-14 13:38 -0700 |
| Message-ID | <23-05-008@comp.compilers> |
| In reply to | #3462 |
On Sunday, May 14, 2023 at 9:20:54 AM UTC-7, Kaz Kylheku wrote: (snip) > Setup: > > We can (somehow) enumerate the N entities with integers, which we > express using a pure binary enumeration. Then, we can discover the > number of an erroneously processed element, one bit at a time. > > Procedure: > > First we treat all elements numbered ..XXXXXX0 using the new technique > technique, and all others using the the old technique. If the > system fails, we know that it's the ..XXXXX0 set which has one or more > badly processed elements. Otherwise it's the other set, whose > enumerations end in a 1. Either way, we have discovered the identity of > the last bit. There is the assumtion that only one of the N is in error. Reminds me of ECC memory, which can correct one bit errors, and detect two bit errors. After log(N) tests, you find the one that it is supposed to be, fix it, and it either works or doesn't. Then you work on a new set of tests. Or start out with a more complicated set, which can learn more in one set.
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <864-117-4973@kylheku.com> |
|---|---|
| Date | 2023-05-15 21:52 +0000 |
| Message-ID | <23-05-011@comp.compilers> |
| In reply to | #3464 |
On 2023-05-14, gah4 <gah4@u.washington.edu> wrote:
> On Sunday, May 14, 2023 at 9:20:54 AM UTC-7, Kaz Kylheku wrote:
>
> (snip)
>
>> Setup:
>>
>> We can (somehow) enumerate the N entities with integers, which we
>> express using a pure binary enumeration. Then, we can discover the
>> number of an erroneously processed element, one bit at a time.
>>
>> Procedure:
>>
>> First we treat all elements numbered ..XXXXXX0 using the new technique
>> technique, and all others using the the old technique. If the
>> system fails, we know that it's the ..XXXXX0 set which has one or more
>> badly processed elements. Otherwise it's the other set, whose
>> enumerations end in a 1. Either way, we have discovered the identity of
>> the last bit.
>
> There is the assumption that only one of the N is in error.
You might think, but it's not actually a problem with the algorithm.
When we bisect a change history, there is the assumption that the
bug showed up, so there are good commits before that, and bad commits
after. If the history is not like that: the bug disappears and
reappears, then the bisect will not identify *the* bad commit reliably,
because there isn't one.
I have a story like this; but it digresses from my present point, so
I will return to it below.
In this situation, we are bisecting a space of N entities. All we need
is to find one bad one. If there are three bad ones, it doesn't matter
which one we find.
Say we have a basket of apples with a rotten smell emanating from it.
We can subdivide it, and smell one half and the other. If both halves
smell, we know we have two or more rotten apples and they ended up
in different halves. This doesn't matter. We just pick one half and
keep subdividing. As long as we stay on the trail of the rotten scent, we will
get down to one rotten apple, and we can use that apple to analyze further:
what kind of mould or bacterium has infected it and so on. Probably
the other rotten apples have the same root cause. If they have different root
causes, we can do another search after fixing the one root cause we have found.
Now about the story. The conclusion of the story was that there was
a GC-related bug in an ash (arithmetic shift) function, dealing with
heap-allocated bignum integers.
A bignum integer was being prematurely reclaimed by the garbage
collector.
The fix looked like this:
commit 9cf992835c8a0f2e6c4ace07b67fea2acb762cc5
Author: Kaz Kylheku <kaz@kylheku.com>
Date: Wed Jun 19 23:21:39 2019 -0700
ash: gc problem.
On 32 bit x86 Solaris 10, with gcc 4.9.0, this issue caused a
miscompilation of the pset macro, due to ash abruptly
returning zero, causing the op-code field of an instruction to
be NOOP rather than GCALL.
I'm committing this fix just for reference; I will immediately
replace it with a refactoring of the function.
* arith.c (ash): Add a gc_hint to prevent the a bignum from
being reclaimed if the b = make_bignum() triggers gc.
This happens in the case when the previous case computes
a = make_bignum() and falls through.
diff --git a/arith.c b/arith.c
index fdb294b7..f374ddf4 100644
--- a/arith.c
+++ b/arith.c
@@ -3235,6 +3235,7 @@ val ash(val a, val bits)
b = make_bignum();
if ((mpe = mp_shift(mp(a), mp(b), bn)) != MP_OKAY)
break;
+ gc_hint(a);
return normalize(b);
default:
goto bad3;
The problem only showed upon Solaris (first red flag). When I git bisected it,
it was randomly appearing and disappearing, so git bisect was of no use.
Not all problems introduce themselves in a way that there is a reproducible
test case.
Here, the garbage collector had to go off exactly at the wrong time during the
execution of the ash function. Small perturbations to completely unrelated code
can make it go away.
In the end, I had to debug it on Solaris (no working gdb or anything), using
the repro that I had.
Once I discovered the bad VM instruction, which was affecting the behavior of a
certain Lisp macro in the standard library, I traced it through the compiler
right down to the assembler, where the opcode field for a good assembly
instruction was being miscalculated. I could not reproduce the bad ash
calculation though in isolation.
In summary:
1. non-gc-aware, sloppy coding in ash function caused ...
2. rare problem in assembler's masking and shifting calculations, causing ...
3. Bad instruction when compiling the "pset" (parallel assignment) macro ...
4. Which failed some code relying pset.
All of this reproduced only one one platform, one particular commit.
Small perturbations of Lisp code just about anywhere made it go away.
Bisect was useless.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
[toc] | [prev] | [next] | [standalone]
| From | gah4 <gah4@u.washington.edu> |
|---|---|
| Date | 2023-05-16 23:52 -0700 |
| Message-ID | <23-05-012@comp.compilers> |
| In reply to | #3467 |
On Monday, May 15, 2023 at 6:54:08 PM UTC-7, Kaz Kylheku wrote: (snip) > Say we have a basket of apples with a rotten smell emanating from it. > We can subdivide it, and smell one half and the other. If both halves > smell, we know we have two or more rotten apples and they ended up > in different halves. This doesn't matter. We just pick one half and > keep subdividing. But the algorithm described, at least as I remember it, doesn't test both halves. Sniffing two baskets doesn't take so long, but two tests might. I was suspecting that there were more efficient ways than doing both halves, though didn't try to figure out what they might be. > As long as we stay on the trail of the rotten scent, we will > get down to one rotten apple, and we can use that apple to analyze further: > what kind of mould or bacterium has infected it and so on. Probably > the other rotten apples have the same root cause. If they have different root > causes, we can do another search after fixing the one root cause we have found. I don't know apple statistics so well. If you suspect more than one from the beginning, As a binary search, it should be 50% probability on each test. If you see higher than 50% as the tests go on, it might look suspicious already.
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <864-117-4973@kylheku.com> |
|---|---|
| Date | 2023-05-17 18:28 +0000 |
| Message-ID | <23-05-015@comp.compilers> |
| In reply to | #3468 |
On 2023-05-17, gah4 <gah4@u.washington.edu> wrote: > On Monday, May 15, 2023 at 6:54:08 PM UTC-7, Kaz Kylheku wrote: > > (snip) > >> Say we have a basket of apples with a rotten smell emanating from it. >> We can subdivide it, and smell one half and the other. If both halves >> smell, we know we have two or more rotten apples and they ended up >> in different halves. This doesn't matter. We just pick one half and >> keep subdividing. > > But the algorithm described, at least as I remember it, doesn't test both > halves. If the problem reproducews with the half you're testing, you assume the misprocessed function is in that half. (There could be a bad function in the other half also, and you could test that, but it's unnecessary.) If the problem deosn't reproduce in the half you're testing, you assume the misprocessed functions lie in the other half, and recurse on that half instead. The algorithm will work even if every single function is miscompiled. (And will take the same logarithmic number of steps, since we cut in half no matter what.) It will arbitrarily narrow it down to one, showing that even if the compiler change is enabled for just one function, there is a problem. That function can be investigated. Then maybe then the fix solve the issue for the five hundred other broken functions, too. The main problem in that situation is that the function you narrow it down to may not be the easiest one to investigate. Say that a three line function is broken, and a 1000 line function is broken; the binary search finds the 1000 line function, whereas it would be easier to investigate how the three line function is miscompiled. There could be the following issue: among several functions that are miscompiled, there are two: foo and bar. *Both* must be miscompiled for the problem to reproduce, due to an interaction. If there is a recursive subdivision such that foo and bar are in different halves, then the problem doesn't reproduce for either half. Therefore, when the problem fails to reproduce for the currently tested half, it may be wise to cross-test the other half. If it also doesn't reproduce, the halves have to be recombined and some other approach taken, like perturbing the enumeration so that the partitioning is done differently, or doing a slow linear elimination. -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca
[toc] | [prev] | [next] | [standalone]
| From | gah4 <gah4@u.washington.edu> |
|---|---|
| Date | 2023-05-17 15:23 -0700 |
| Message-ID | <23-05-017@comp.compilers> |
| In reply to | #3471 |
On Wednesday, May 17, 2023 at 12:08:27 PM UTC-7, Kaz Kylheku wrote: > On 2023-05-17, gah4 <ga...@u.washington.edu> wrote: > > On Monday, May 15, 2023 at 6:54:08 PM UTC-7, Kaz Kylheku wrote: > > > > (snip) > > > >> Say we have a basket of apples with a rotten smell emanating from it. > >> We can subdivide it, and smell one half and the other. If both halves > >> smell, we know we have two or more rotten apples and they ended up > >> in different halves. This doesn't matter. We just pick one half and > >> keep subdividing. > > But the algorithm described, at least as I remember it, doesn't test both > > halves. > If the problem reproducews with the half you're testing, you assume > the misprocessed function is in that half. (There could be a bad > function in the other half also, and you could test that, but it's > unnecessary.) > > If the problem deosn't reproduce in the half you're testing, you > assume the misprocessed functions lie in the other half, and > recurse on that half instead. I was mostly noting it, as the apple test was written to test both halves. > The algorithm will work even if every single function is miscompiled. > (And will take the same logarithmic number of steps, since we cut > in half no matter what.) > It will arbitrarily narrow it down to one, showing that even if the > compiler change is enabled for just one function, there is a problem. > That function can be investigated. Then maybe then the fix solve the > issue for the five hundred other broken functions, too. As to the problem of debugging by miscompiled code, this reminds me of stories of debugging compilers by feeding them cards from the card recycling bin. It seems that cards were especially high grade paper and so worth keeping separate for recycling. I never actually saw anyone do that, though. In my younger days, some said that I was especially good at finding compiler errors. Maybe not as good now. I would try things that were allowed, but others might not have thought about doing. Only one I remember now, is a C compiler that miscompiled the ++ operator on a double variable. Allowed in C, but maybe not often used.
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <864-117-4973@kylheku.com> |
|---|---|
| Date | 2023-05-19 03:21 +0000 |
| Message-ID | <23-05-020@comp.compilers> |
| In reply to | #3473 |
On 2023-05-17, gah4 <gah4@u.washington.edu> wrote: > As to the problem of debugging by miscompiled code, this reminds > me of stories of debugging compilers by feeding them cards from > the card recycling bin. It seems that cards were especially high grade > paper and so worth keeping separate for recycling. I never actually > saw anyone do that, though. Obviously, the card recycling bin is going to be a source of garbage that is invalid inputs to the compiler, with perhaps some valid bits. This was effectively an early form of fuzzing. > In my younger days, some said that I was especially good at finding > compiler errors. Maybe not as good now. Fuzzing has matured now. There are tools now which instrument the executable, and monitor what is going on in terms of the program control flow in response to random inputs. Then they adjust the bits to try to stimulate paths not taken to tickle bugs. That's the jist of it. > I would try things that were > allowed, but others might not have thought about doing. I've not used fuzzing in a while ... since 2016 it turns out! I played with it in the summer of that year. The software was AFL (American Fuzzy Lop) 2.30b. I had found three bugs in a short timespan back then. One of them was exponential memory explosion in nested quasiquoting syntax like ^^^....^^exr. (TXR Lisp's quasiquoting operator is written ^ rather than the usual backtick ` used in most traditional Lisp dialects, otherwise it is the same thing). AFL also discovered that if it puts a huge number into the op syntax like this: (op list @123455234), the op expander will obligingly generate a lambda with that many arguments! E.g. (op list @3) generates something similar to (lambda (arg0 arg1 arg3) (list arg3)). So imagine if we replace @3 with a large integer. AFL also found a crash in the regex parser. It was not doing a range check on numeric character escapes, making it possible for the program to sneak a negatively valued character code into the regex copiler, where it resulted in an out-of-bounds array access problem. I have been meaning brush the dust off this technique again. > Only one I remember now, is a C compiler that miscompiled the ++ > operator on a double variable. Allowed in C, but maybe not often used. Oh, stepping doubles with ++ is, I would say, not *that* uncommon. I doubt that if such a bug were introduced as an easter egg into GCC, it would go very long without being discovered by the FOSS distros and other downstream users. -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca
[toc] | [prev] | [next] | [standalone]
| From | Thomas Koenig <tkoenig@netcologne.de> |
|---|---|
| Date | 2023-05-19 21:59 +0000 |
| Message-ID | <23-05-025@comp.compilers> |
| In reply to | #3476 |
Kaz Kylheku <864-117-4973@kylheku.com> schrieb:
> Oh, stepping doubles with ++ is, I would say, not *that* uncommon.
> I doubt that if such a bug were introduced as an easter egg into GCC,
> it would go very long without being discovered by the FOSS distros and
> other downstream users.
It would very likely be caught by regression testing. GCC has an
extensive test suite, and it is a requirement that this is run
before submitting a patch.
If that slips through (as can happen from time to time, for
example for different architectures or operating systems), then
the automated regression testers will flag it.
If not that, the automated SPEC testers also have a good chance
of catching it.
As an aside, the post-increment operator is converted to its
equivalent assignment statement very early, during gimplification.
In the following example, the result of the first two passes that
GCC performs are shown - the first is a reproduction of the source
code, as parsed, and the second one after lowering to GIMPLE.
File names may differ bit depending on whic version of GCC you use.
$ cat double.c
double c;
void foo()
{
c++;
}
$ gcc -c -fdump-tree-original -fdump-tree-gimple double.c
$ cat double.c.004t.original
;; Function foo (null)
;; enabled by -tree-original
{
c++ ;
}
$ cat double.c.005t.gimple
foo ()
{
c.0_1 = c;
_2 = c.0_1 + 1.0e+0;
c = _2;
}
[toc] | [prev] | [next] | [standalone]
| From | gah4 <gah4@u.washington.edu> |
|---|---|
| Date | 2023-05-20 20:20 -0700 |
| Message-ID | <23-05-027@comp.compilers> |
| In reply to | #3481 |
On Saturday, May 20, 2023 at 3:05:13 PM UTC-7, Thomas Koenig wrote: > Kaz Kylheku <864-11...@kylheku.com> schrieb: > > Oh, stepping doubles with ++ is, I would say, not *that* uncommon. > > I doubt that if such a bug were introduced as an easter egg into GCC, > > it would go very long without being discovered by the FOSS distros and > > other downstream users. > It would very likely be caught by regression testing. GCC has an > extensive test suite, and it is a requirement that this is run > before submitting a patch. I believe it was Borland Turbo C, maybe 2.0. At one point, I had a few different C compilers for MS-DOS, not so long before I went to using OS/2. It seems that Borland also had compilers for OS/2, but I don't remember using those.
[toc] | [prev] | [next] | [standalone]
| From | Hans-Peter Diettrich <DrDiettrich1@netscape.net> |
|---|---|
| Date | 2023-05-22 09:05 +0200 |
| Subject | Re: Old C compilers, binary search debugging of compilers |
| Message-ID | <23-05-030@comp.compilers> |
| In reply to | #3483 |
On 5/21/23 5:20 AM, gah4 wrote: > I believe it was Borland Turbo C, maybe 2.0. > > At one point, I had a few different C compilers for MS-DOS, not so long > before I went to using OS/2. It seems that Borland also had compilers > for OS/2, but I don't remember using those. Borland also developed Borland C++ for the Atari ST. The library was not so perfect, as my decompiler could reveal during beta test. But it was by far the best C/C++ compiler available for the ST. DoDi
[toc] | [prev] | [next] | [standalone]
| From | Max B <tekk.nolagi@gmail.com> |
|---|---|
| Date | 2023-05-19 16:31 -0500 |
| Message-ID | <23-05-024@comp.compilers> |
| In reply to | #3471 |
We did something similar -- delta debugging -- with the Cinder JIT. Except we didn't have hashes, but instead lists of function names. It's an incredibly useful approach. See more at https://bernsteinbear.com/blog/cinder-jit-bisect/
[toc] | [prev] | [next] | [standalone]
| From | Thomas Koenig <tkoenig@netcologne.de> |
|---|---|
| Date | 2023-05-14 19:59 +0000 |
| Message-ID | <23-05-007@comp.compilers> |
| In reply to | #3459 |
Russ Cox <rsc@swtch.com> schrieb: > As I mentioned at the top, I am interested to hear about earlier uses > of approaches like this, or any other ideas for situations where the > approach might be applicable. Clearly the general problem has overlaps > with group testing [1], and in particular Hwang's binary search-based > approach (1972) [2]. For GCC regression-hunting, two techniques are routinely used. One of them is the use of "gcc bisect", described at https://git-scm.com/docs/git-bisect . If there is a failing test case, this can (at the cost of some CPU time) usually pinpoint offending commit. For reducing test cases, at least for C and C++, cvise https://github.com/marxin/cvise is now the mehod of choice; it is brutally efficient at reducing test cases to an absolute minimum. Because it can transform C and C++ programs, it can make transformations of the program which are language-aware. Both are tools external to the compiler.
[toc] | [prev] | [next] | [standalone]
| From | Cameron McInally <cameron.mcinally@nyu.edu> |
|---|---|
| Date | 2023-05-14 23:28 -0400 |
| Message-ID | <23-05-009@comp.compilers> |
| In reply to | #3463 |
On Sun, May 14, 2023 at 22:42 Thomas Koenig <tkoenig@netcologne.de> wrote: > Russ Cox <rsc@swtch.com> schrieb: > > > As I mentioned at the top, I am interested to hear about earlier uses > > of approaches like this, or any other ideas for situations where the > > approach might be applicable. Clearly the general problem has overlaps > > with group testing [1], and in particular Hwang's binary search-based > > approach (1972) [2]. > > For GCC regression-hunting, two techniques are routinely used. > > One of them is the use of "gcc bisect", described at > https://git-scm.com/docs/git-bisect > . If there is a failing test > case, this can (at the cost of some CPU time) usually pinpoint > offending commit. Ah, this is close to my heart. The story I heard was that automated regression bisection originated in the Cray compiler group around the mid 1990s. I didn't join that team until the aughts, but can attest to the quality of Cray's system, which is still in use today AFAIK. https://ieeexplore.ieee.org/document/625082 For the general case of reducing runtime errors, the Wolf Fencing algorithm would be a good thread to pull for uncovering the history. https://dl.acm.org/doi/abs/10.1145/358690.358695 Hope that helps, Cam
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <864-117-4973@kylheku.com> |
|---|---|
| Date | 2023-05-15 21:35 +0000 |
| Message-ID | <23-05-010@comp.compilers> |
| In reply to | #3463 |
On 2023-05-14, Thomas Koenig <tkoenig@netcologne.de> wrote: > Russ Cox <rsc@swtch.com> schrieb: > >> As I mentioned at the top, I am interested to hear about earlier uses >> of approaches like this, or any other ideas for situations where the >> approach might be applicable. Clearly the general problem has overlaps >> with group testing [1], and in particular Hwang's binary search-based >> approach (1972) [2]. > > For GCC regression-hunting, two techniques are routinely used. > > One of them is the use of "gcc bisect", described at > https://git-scm.com/docs/git-bisect . If there is a failing test > case, this can (at the cost of some CPU time) usually pinpoint > offending commit. The technique described by Russ Cox is applicable when you know what the offending commit is. E.g. you made some changes, say, to the optimizer, and something is going wrong when you boostrap the compiler. Obviously, the optimization changes are breaking something; the prior commit works. But you don't understand what is breaking where. > > For reducing test cases, at least for C and C++, cvise > https://github.com/marxin/cvise is now the mehod of choice; it is > brutally efficient at reducing test cases to an absolute minimum. > Because it can transform C and C++ programs, it can make > transformations of the program which are language-aware. This is only applicable if you have an external test case which reproduces some compiler problem (like a crash) and would like to make it smaller. The binary search technique described by Cox is uniquely applicable in the situation that something goes wrong with the bootstrapping of a compiler. 1. A bug has been introduced into the compiler source. You already know exactly which commit did that, and what change is responsible, due to having done the git bisect or whatever. Maybe this is just new, uncommited work that is breaking compared to the working HEAD. 2. Consequently, the compiler miscompiles something in its own source code (anywhere in its source: in the compiler proiper, or standard library support routines of that language which are used by the compiler.) 3. Consequently, the compiled compiler misbehaves somehow. (Or possibly, the boostrapping even fails: compiled code is loaded into the compiler's image, and it cannot continue the boostrapping process.) Characterizing what is wrong in (3) is not too difficult (though it can be). The problem is that the issues exhibited in 3 are in good parts of the compiler. So they are red herring issues. If you look at the source code for those parts, it is fine and has not been touched. Rather, the object code is incorrect due to the introduced compiler problem. You might know the commit which introduces the problem, yet be stumped as to how, due to the complexity of all the pieces and how they interact, and the difficulty of some of the algorithms. (Plus hidden problems, like the new work actually being correct, but exposing a bug in existing code; so staring at just the new code until you are blue does no good.) Cox's trick lets you turn on the new, known-bad code only for a seletected subset of functions (or whatever logical units of the image). You can iteratively shrink the subset until it contains only one mistranslated unit. Then you know---aha!--the bad optimizer change is, say, mistranslating the partition function in stdlib/quicksort.mylang. If just that function is treated with the buggy new change, and nothing else, the show grinds to a halt. So now you know you can focus your investigation on what the buggy compiler change is doing to the partition function in stdlib/quicksort.mylang. You can write test cases exercising that function to understand how its behavior has gone wrong, and study the object code to see how it's organized to bring about the wrong behavior, and how that relates the bad new code in the compiler. You can tweak the code, and build the image with just that partition function being processed with the new code to see whether the regression has gone away, and iterate on that. Then try boostrapping the entire image with the repaired code. Etc. The bug can show up as a mistranslation of anything, which is why I picked the partition helper of quicksort example. That could wreck the compiler's behavior, if some logic in it somemwhere depends on sorting something. You could really be on a wild goose chase trying to debug *that*, rather than it completely unrelated root cause. > Both are tools external to the compiler. Right; so not applicable to this internal technique where basically we subdivide the compiler's own image into "parts compiled with the known-good compiler" and "parts compiled with the new, known-bad change in effect", where the parts are on a finer granularity than files. -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca
[toc] | [prev] | [standalone]
Back to top | Article view | comp.compilers
csiph-web