Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #3669 > unrolled thread

Nice application of that hash bisection debugging approach.

Started byKaz Kylheku <643-408-1753@kylheku.com>
First post2025-07-04 08:27 +0000
Last post2025-07-05 00:52 +0000
Articles 2 — 1 participant

Back to article view | Back to comp.compilers


Contents

  Nice application of that hash bisection debugging approach. Kaz Kylheku <643-408-1753@kylheku.com> - 2025-07-04 08:27 +0000
    Re: Nice application of that hash bisection debugging approach. Kaz Kylheku <643-408-1753@kylheku.com> - 2025-07-05 00:52 +0000

#3669 — Nice application of that hash bisection debugging approach.

FromKaz Kylheku <643-408-1753@kylheku.com>
Date2025-07-04 08:27 +0000
SubjectNice application of that hash bisection debugging approach.
Message-ID<25-07-002@comp.compilers>
I'm doing compiler work at the moment and have one of those bugs
where an optimization miscompiles something in the compiler which
then shows up as a breakage elsewhere.

Because this is Lisp, it took literally two or three minutes to whip up the
hash bisection technique.

In the Lisp compiler, the basic unit of file compilation is the
top-level form.

I defined three dynamically scoped variables in the compiler,
which are shown with the final values:

(defvar *cmask* #b1111111)
(defvar *cval*  #b0101100)
(defvar *chash*)

*chash* is bound to the equal hash calculated over the top-level form
being compiled.

E.g. if the form were (+ 2 2), the hash would be calculated
as:

  2> (hash-equal '(+ 2 2))
  192681593

and that would be the value of *chash*.

Then whenever the expression

  (eql (logand *chash* *cmask*) *cval*)

is true, the optimization is enabled. Also, when that same expression
is true, a portion of top-level form is printed: enough to identify it.

I quickly arrived at the above values. One form is printed,
and that's the one mistranslated:

$ make
TXR stdlib/optimize.tl -> stdlib/optimize.tlo
TXR stdlib/compiler.tl -> stdlib/compiler.tlo
(defun compile-file-conditionally
  (in-path out-path
   test-fn))
TXR stdlib/asm.tl -> stdlib/asm.tlo
./txr: unhandled exception of type error:
./txr: slot: #<struct-type assembler> has no slot named asm
make: failing command:
./txr --in-package=sys --compile=stdlib/asm.tl:stdlib/asm.tlo.tmp
Makefile:195: recipe for target 'stdlib/asm.tlo' failed
make: *** [stdlib/asm.tlo] Error 1

So the function compile-file-condtionally is the mistranslated one.

When we flip the top masked bit in *cval* from 0 to 1:

(defvar *cmask* #b1111111)
(defvar *cval*  #b1101100)
(defvar *chash*)

then a different top-level form is identified and targeted for
the optimization, a function called rewrite in the optimizer, and the
compilation succeeds:

$ make
TXR stdlib/optimize.tl -> stdlib/optimize.tlo
(defun rewrite
  (fun list out))
TXR stdlib/compiler.tl -> stdlib/compiler.tlo
TXR stdlib/asm.tl -> stdlib/asm.tlo

It's so easy to do it's not worth keeping the changes; next time I need
the technique, I will whip it up from scratch again.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

[toc] | [next] | [standalone]


#3670

FromKaz Kylheku <643-408-1753@kylheku.com>
Date2025-07-05 00:52 +0000
Message-ID<25-07-003@comp.compilers>
In reply to#3669
On 2025-07-04, Kaz Kylheku <643-408-1753@kylheku.com> wrote:
> $ make
> TXR stdlib/optimize.tl -> stdlib/optimize.tlo
> TXR stdlib/compiler.tl -> stdlib/compiler.tlo
> (defun compile-file-conditionally
>   (in-path out-path
>    test-fn))
> TXR stdlib/asm.tl -> stdlib/asm.tlo
> ./txr: unhandled exception of type error:
> ./txr: slot: #<struct-type assembler> has no slot named asm

What was the bug, you might want to ask? It was a little
anticlimactic: an issue I was already anticipating in the development
plan, but which I neglected to suspect I would need to solve at this
stage already.

I'm working on tail calls.  Tail calls cannot occur in a scope where
dynamic bindings (bindings of dynamically scoped variables) have been
established, but my code was doing them anyway.

Tail calls are being represented in the VM by a prefix instruction
called "tail". It has no inputs or outputs, and is placed before
one of several kinds of call/apply instructions.

(Self tail calls are handled by a backward jmp. I label it with
a "tjmp" pseudo-instruction in the compiler intermediate code;
it gets mapped to a jmp instruction at some point.)

Curently, the tail prefix instruction is a no-op; I'm just
getting all else ready before implementing it.

In the virtual machine, there are frame instructions that push new
display frames which represent lexical variables. Maching end
instructions dispose of them. (The compiler optimizes away frames
very well, but cannot always do so, like when there are closures
capturing them.)

A tail call cannot jump out from the middle of one or more frames.
What happens it that the compiler at first naively generates the
tail call (tjmp or tail prefix), disregarding its location within
frames.

A post-processing step is applied to the code which moves the
tail calls out of frames. It basically parses the code and inserts
end instructions before each tail call: enough end instructions
to terminate all the frames.

Now the problem is that there is a frame type "dframe" for dynamic
variables; the logic will happily move the tail call out of that one,
too.  This means that if you have

  ;; at the end of a function body:
  (let ((*dynamic-var* 42))
    ...
    (fun arg)) ;; tail call

The *dynamic-var* binding is wrongly torn down /before/ the tail call;
fun will not see a value of 42 of the dynamic variable *dynamic-var*, as
if it were from this source code:

  ;; /not/ at the end of a function body:
  (let ((*dynamic-var* 42))
    ...)

  ;; at the end of function body:
  (fun arg) ;; tail call

So that's the bug. As soon as I viewed the disassembly listing
for the mistranslated function and saw the dframe close to before
a tail instruction, I knew what was going on.

The fix is simply to switch off the recursive tail position indicator in
a context where one or more dynamic variables are bound, so no tail
calls will be compiled there.

(I think, it is possible to have tail calls under dynamic binding, but
not with the deep binding strategy in this Lisp dialect/implementation.
Dynamic binding pushes new environment objects onto an environment stack
(separately from the VM dframe mechanism I talked about above) and so if
tail recursion were perpetrated anyway, that stack would keep growing;
it wouldn't be proper stackless recursion.)

BTW, this bug affects the existing self-tail-calls also, a released
feature. But there are no instances anywhere where a function binds a
special variable and then calls itself, expecting the recursive
self-call to see that binding. (Or no instances that break the
bootstrapping and test suite, in any case!) It was already in my
list of things to fix.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

[toc] | [prev] | [standalone]


Back to top | Article view | comp.compilers


csiph-web