Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #3669 > unrolled thread
| Started by | Kaz Kylheku <643-408-1753@kylheku.com> |
|---|---|
| First post | 2025-07-04 08:27 +0000 |
| Last post | 2025-07-05 00:52 +0000 |
| Articles | 2 — 1 participant |
Back to article view | Back to comp.compilers
Nice application of that hash bisection debugging approach. Kaz Kylheku <643-408-1753@kylheku.com> - 2025-07-04 08:27 +0000
Re: Nice application of that hash bisection debugging approach. Kaz Kylheku <643-408-1753@kylheku.com> - 2025-07-05 00:52 +0000
| From | Kaz Kylheku <643-408-1753@kylheku.com> |
|---|---|
| Date | 2025-07-04 08:27 +0000 |
| Subject | Nice application of that hash bisection debugging approach. |
| Message-ID | <25-07-002@comp.compilers> |
I'm doing compiler work at the moment and have one of those bugs where an optimization miscompiles something in the compiler which then shows up as a breakage elsewhere. Because this is Lisp, it took literally two or three minutes to whip up the hash bisection technique. In the Lisp compiler, the basic unit of file compilation is the top-level form. I defined three dynamically scoped variables in the compiler, which are shown with the final values: (defvar *cmask* #b1111111) (defvar *cval* #b0101100) (defvar *chash*) *chash* is bound to the equal hash calculated over the top-level form being compiled. E.g. if the form were (+ 2 2), the hash would be calculated as: 2> (hash-equal '(+ 2 2)) 192681593 and that would be the value of *chash*. Then whenever the expression (eql (logand *chash* *cmask*) *cval*) is true, the optimization is enabled. Also, when that same expression is true, a portion of top-level form is printed: enough to identify it. I quickly arrived at the above values. One form is printed, and that's the one mistranslated: $ make TXR stdlib/optimize.tl -> stdlib/optimize.tlo TXR stdlib/compiler.tl -> stdlib/compiler.tlo (defun compile-file-conditionally (in-path out-path test-fn)) TXR stdlib/asm.tl -> stdlib/asm.tlo ./txr: unhandled exception of type error: ./txr: slot: #<struct-type assembler> has no slot named asm make: failing command: ./txr --in-package=sys --compile=stdlib/asm.tl:stdlib/asm.tlo.tmp Makefile:195: recipe for target 'stdlib/asm.tlo' failed make: *** [stdlib/asm.tlo] Error 1 So the function compile-file-condtionally is the mistranslated one. When we flip the top masked bit in *cval* from 0 to 1: (defvar *cmask* #b1111111) (defvar *cval* #b1101100) (defvar *chash*) then a different top-level form is identified and targeted for the optimization, a function called rewrite in the optimizer, and the compilation succeeds: $ make TXR stdlib/optimize.tl -> stdlib/optimize.tlo (defun rewrite (fun list out)) TXR stdlib/compiler.tl -> stdlib/compiler.tlo TXR stdlib/asm.tl -> stdlib/asm.tlo It's so easy to do it's not worth keeping the changes; next time I need the technique, I will whip it up from scratch again. -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca
[toc] | [next] | [standalone]
| From | Kaz Kylheku <643-408-1753@kylheku.com> |
|---|---|
| Date | 2025-07-05 00:52 +0000 |
| Message-ID | <25-07-003@comp.compilers> |
| In reply to | #3669 |
On 2025-07-04, Kaz Kylheku <643-408-1753@kylheku.com> wrote:
> $ make
> TXR stdlib/optimize.tl -> stdlib/optimize.tlo
> TXR stdlib/compiler.tl -> stdlib/compiler.tlo
> (defun compile-file-conditionally
> (in-path out-path
> test-fn))
> TXR stdlib/asm.tl -> stdlib/asm.tlo
> ./txr: unhandled exception of type error:
> ./txr: slot: #<struct-type assembler> has no slot named asm
What was the bug, you might want to ask? It was a little
anticlimactic: an issue I was already anticipating in the development
plan, but which I neglected to suspect I would need to solve at this
stage already.
I'm working on tail calls. Tail calls cannot occur in a scope where
dynamic bindings (bindings of dynamically scoped variables) have been
established, but my code was doing them anyway.
Tail calls are being represented in the VM by a prefix instruction
called "tail". It has no inputs or outputs, and is placed before
one of several kinds of call/apply instructions.
(Self tail calls are handled by a backward jmp. I label it with
a "tjmp" pseudo-instruction in the compiler intermediate code;
it gets mapped to a jmp instruction at some point.)
Curently, the tail prefix instruction is a no-op; I'm just
getting all else ready before implementing it.
In the virtual machine, there are frame instructions that push new
display frames which represent lexical variables. Maching end
instructions dispose of them. (The compiler optimizes away frames
very well, but cannot always do so, like when there are closures
capturing them.)
A tail call cannot jump out from the middle of one or more frames.
What happens it that the compiler at first naively generates the
tail call (tjmp or tail prefix), disregarding its location within
frames.
A post-processing step is applied to the code which moves the
tail calls out of frames. It basically parses the code and inserts
end instructions before each tail call: enough end instructions
to terminate all the frames.
Now the problem is that there is a frame type "dframe" for dynamic
variables; the logic will happily move the tail call out of that one,
too. This means that if you have
;; at the end of a function body:
(let ((*dynamic-var* 42))
...
(fun arg)) ;; tail call
The *dynamic-var* binding is wrongly torn down /before/ the tail call;
fun will not see a value of 42 of the dynamic variable *dynamic-var*, as
if it were from this source code:
;; /not/ at the end of a function body:
(let ((*dynamic-var* 42))
...)
;; at the end of function body:
(fun arg) ;; tail call
So that's the bug. As soon as I viewed the disassembly listing
for the mistranslated function and saw the dframe close to before
a tail instruction, I knew what was going on.
The fix is simply to switch off the recursive tail position indicator in
a context where one or more dynamic variables are bound, so no tail
calls will be compiled there.
(I think, it is possible to have tail calls under dynamic binding, but
not with the deep binding strategy in this Lisp dialect/implementation.
Dynamic binding pushes new environment objects onto an environment stack
(separately from the VM dframe mechanism I talked about above) and so if
tail recursion were perpetrated anyway, that stack would keep growing;
it wouldn't be proper stackless recursion.)
BTW, this bug affects the existing self-tail-calls also, a released
feature. But there are no instances anywhere where a function binds a
special variable and then calls itself, expecting the recursive
self-call to see that binding. (Or no instances that break the
bootstrapping and test suite, in any case!) It was already in my
list of things to fix.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
[toc] | [prev] | [standalone]
Back to top | Article view | comp.compilers
csiph-web