Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #2642

Re: TeX syntax?

From Rock Brentwood <rockbrentwood@gmail.com>
Newsgroups comp.compilers
Subject Re: TeX syntax?
Date 2021-04-04 14:08 -0500
Organization Compilers Central
Message-ID <21-04-002@comp.compilers> (permalink)
References <07-02-024@comp.compilers>

Show all headers | View raw


[ This is a followup to a thread from 2007. ]


>I've looked high and low without success. Where can i find
>something resembling the BNF of Knuth's TeX typesetting syntax?

It's in the file weave.web, section 14.

The syntax for TeX was written as a context-sensitive translation
grammar, suitable for a streaming translator, rather than a
context-free grammar. It may be possible to convert it to one (either
directly or as a context-free enveloping grammar with semantic
constraints). That's a matter that may be worth looking into. But in
its present form, there is no tree-building required or involved: it
can stream. The distinction is analogous to that between SSDT's versus
SDT's ([S]SDT = [simple-]syntax-directed translations). SSDT's can be
streamed, SDT's require stacking or treeing values and, in effect, SDT
= SSDT + value-stacking/treeing.

TeX is written in Web which is essentially Pascal + hyper-linked
comments. It is also in C-Web on the main TeX distribution site, which
is C + hyper-linked comments. They *can* be converted directly to more
normal programs with the comments embedded. I did so in the local
versions of my older MiKTeX distribution, but haven't
regression-tested it yet - since I haven't established a working
baseline yet to work off of.

The syntax is in - and an essential part - of the weave.web file. In detail:
Section 14.1 describes the framework used for the syntax
Section 14.2 lists the "category codes" used
Section 14.3 lists additional post-processed lexical units used
Section 14.4 lists describes a processing layer from lexical units to "scraps"
Section 14.5 contains the productions for the context sensitive grammar

Section 15 implements the parser; the most important routine being
translate_cases() (its name in the C-Web file) - as a master "switch"
statement (or "case" statement in Pascal) in section 15.7.

By the way the "open" case (its subcases are in 15.19), "math" subcase
(its sub-subcases iare in 15.20), "var_head" sub-subcase has a bug in
it. The "intro" sub-sub-subcase listed a transition to rule 31,
instead of to rule 33. (I want my money Knuth! :))

I believe it's possible to convert it all to a context-free grammar,
albeit with the possible inclusion of a few semantic constraints. Why
Knuth chose to write everything this way - as borderline obfuscated
code that cannot be scaled upwards or sideways or integrated in other
existing code - is beyond me. But it is not maintainable, and
heavily-laden with Technical Debt; notably, its *critical* reliance on
the dated assumption that the Future would be Pascal, along with all
the other assumptions and - more importantly - the now-unnecessary
restrictions that came out of that.

Much of the very design of the entire Web framework's very conception
and design was premised on the assumed necessity of those
restrictions; and the whole thing can be done on a much simpler
foundation, when remade in more up-to-date terms (relatively speaking)
*natively* in C or C++. Among other things, there isn't a need for any
Web-like framework. You can just simply use ordinary comments. I know,
because I did so: I rewrote the entire set of Web files in my local
copy doing just that. When a baseline is established and it is
regression-validated I'll put a copy up on GitHub.

A follow-up to the additional comments at the end of the article:
>Knuths TeX book is an abomination, describing lexing and parsing
>as mouth, gullet and stomach nonsense.

I know. It's literally a woven and tangled mess - both the book and the code.

>[Well, he invented most of what we know about parsing, he gets to
>explain it any way he wants. Chapters 7 and 8 describe the syntax
>operationally. -John]

Discovery. Not invention. Mathematics is not invented, it is
discovered (and in this case: only a partial and incomplete discovery).

And that, too, is a complete tangle that we had to remake from bottom
up. Now, finally with recent publications [2-5] establishing the
foundations for the new algebraic framework ... along with another,
currently in submission, that may come out in 2021, for the remaking
alluded to in [3] of the 1963 algebraic formulation by Chomsky and
Schuetzenberger [1] that lies at the foundation of this all, we're now
finally in a position to refactor both the theory itself and
everything that's based on it or is an application of it; literally
remaking the entire stack from bottom up.

References:
[1] Chomsky, N., Schuetzenberger, M.: "The algebraic theory of context
free languages". In: Braffort, P., Hirschberg, D. (eds.) Computer
Programming and Formal Systems, pp. 118=E2=80=93161. North-Holland, Amsterdam
(1963)
[2] H. Lei=C3=9F et al: "C-dioids and =CE=BC-continuous Chomsky algebras". In:
Desharnais, J., et al. (eds.) RAMiCS 2018. LNCS, vol. 11194, pp.
21=E2=80=9336. Springer, Cham (2018)
[3] M. Hopkins et al: "Coequalizers and Tensor Products for Continuous
Idempotent Semirings". In: Desharnais, J., et al. (eds.) RAMiCS 2018.
LNCS, vol. 11194, pp. 37-52. Springer, Cham (2018)
[4] M.Hopkins: "The algebraic approach I: the algebraization of the
Chomsky hierarchy". In: Berghammer, R., M=C3=B6ller, B., Struth, G. (eds.)
RelMiCS 2008. LNCS, vol. 4988, pp. 155=E2=80=93172. Springer, Heidelberg
(2008).
[5] N.B.B. Grathwohl et al: "Infinitary axiomatization of the
equational theory of context-free languages". In: Baelde, D., Carayol,
A. (eds.) Fixed Points in Computer Science (FICS 2013). EPTCS, vol.
126, pp. 44=E2=80=9355 (2013)

Back to comp.compilers | Previous | NextNext in thread | Find similar


Thread

Re: TeX syntax? Rock Brentwood <rockbrentwood@gmail.com> - 2021-04-04 14:08 -0500
  Re: TeX syntax? gah4 <gah4@u.washington.edu> - 2021-04-05 04:21 -0700
    Re: TeX syntax? gah4 <gah4@u.washington.edu> - 2021-04-05 13:29 -0700
    Re: TeX syntax? ... In-stream/parallel parsing & intelligent handling of macros Rock Brentwood <rockbrentwood@gmail.com> - 2021-04-09 15:18 -0700
      Re: macros of yore, was TeX syntax? gah4 <gah4@u.washington.edu> - 2021-04-09 18:39 -0700

csiph-web