Path: csiph.com!xmission!usenet.csail.mit.edu!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Rock Brentwood Newsgroups: comp.compilers Subject: Re: TeX syntax? Date: Sun, 4 Apr 2021 14:08:30 -0500 Organization: Compilers Central Lines: 105 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <21-04-002@comp.compilers> References: <07-02-024@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="57677"; mail-complaints-to="abuse@iecc.com" Keywords: syntax Posted-Date: 04 Apr 2021 21:12:04 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:2642 [ This is a followup to a thread from 2007. ] >I've looked high and low without success. Where can i find >something resembling the BNF of Knuth's TeX typesetting syntax? It's in the file weave.web, section 14. The syntax for TeX was written as a context-sensitive translation grammar, suitable for a streaming translator, rather than a context-free grammar. It may be possible to convert it to one (either directly or as a context-free enveloping grammar with semantic constraints). That's a matter that may be worth looking into. But in its present form, there is no tree-building required or involved: it can stream. The distinction is analogous to that between SSDT's versus SDT's ([S]SDT = [simple-]syntax-directed translations). SSDT's can be streamed, SDT's require stacking or treeing values and, in effect, SDT = SSDT + value-stacking/treeing. TeX is written in Web which is essentially Pascal + hyper-linked comments. It is also in C-Web on the main TeX distribution site, which is C + hyper-linked comments. They *can* be converted directly to more normal programs with the comments embedded. I did so in the local versions of my older MiKTeX distribution, but haven't regression-tested it yet - since I haven't established a working baseline yet to work off of. The syntax is in - and an essential part - of the weave.web file. In detail: Section 14.1 describes the framework used for the syntax Section 14.2 lists the "category codes" used Section 14.3 lists additional post-processed lexical units used Section 14.4 lists describes a processing layer from lexical units to "scraps" Section 14.5 contains the productions for the context sensitive grammar Section 15 implements the parser; the most important routine being translate_cases() (its name in the C-Web file) - as a master "switch" statement (or "case" statement in Pascal) in section 15.7. By the way the "open" case (its subcases are in 15.19), "math" subcase (its sub-subcases iare in 15.20), "var_head" sub-subcase has a bug in it. The "intro" sub-sub-subcase listed a transition to rule 31, instead of to rule 33. (I want my money Knuth! :)) I believe it's possible to convert it all to a context-free grammar, albeit with the possible inclusion of a few semantic constraints. Why Knuth chose to write everything this way - as borderline obfuscated code that cannot be scaled upwards or sideways or integrated in other existing code - is beyond me. But it is not maintainable, and heavily-laden with Technical Debt; notably, its *critical* reliance on the dated assumption that the Future would be Pascal, along with all the other assumptions and - more importantly - the now-unnecessary restrictions that came out of that. Much of the very design of the entire Web framework's very conception and design was premised on the assumed necessity of those restrictions; and the whole thing can be done on a much simpler foundation, when remade in more up-to-date terms (relatively speaking) *natively* in C or C++. Among other things, there isn't a need for any Web-like framework. You can just simply use ordinary comments. I know, because I did so: I rewrote the entire set of Web files in my local copy doing just that. When a baseline is established and it is regression-validated I'll put a copy up on GitHub. A follow-up to the additional comments at the end of the article: >Knuths TeX book is an abomination, describing lexing and parsing >as mouth, gullet and stomach nonsense. I know. It's literally a woven and tangled mess - both the book and the code. >[Well, he invented most of what we know about parsing, he gets to >explain it any way he wants. Chapters 7 and 8 describe the syntax >operationally. -John] Discovery. Not invention. Mathematics is not invented, it is discovered (and in this case: only a partial and incomplete discovery). And that, too, is a complete tangle that we had to remake from bottom up. Now, finally with recent publications [2-5] establishing the foundations for the new algebraic framework ... along with another, currently in submission, that may come out in 2021, for the remaking alluded to in [3] of the 1963 algebraic formulation by Chomsky and Schuetzenberger [1] that lies at the foundation of this all, we're now finally in a position to refactor both the theory itself and everything that's based on it or is an application of it; literally remaking the entire stack from bottom up. References: [1] Chomsky, N., Schuetzenberger, M.: "The algebraic theory of context free languages". In: Braffort, P., Hirschberg, D. (eds.) Computer Programming and Formal Systems, pp. 118=E2=80=93161. North-Holland, Amsterdam (1963) [2] H. Lei=C3=9F et al: "C-dioids and =CE=BC-continuous Chomsky algebras". In: Desharnais, J., et al. (eds.) RAMiCS 2018. LNCS, vol. 11194, pp. 21=E2=80=9336. Springer, Cham (2018) [3] M. Hopkins et al: "Coequalizers and Tensor Products for Continuous Idempotent Semirings". In: Desharnais, J., et al. (eds.) RAMiCS 2018. LNCS, vol. 11194, pp. 37-52. Springer, Cham (2018) [4] M.Hopkins: "The algebraic approach I: the algebraization of the Chomsky hierarchy". In: Berghammer, R., M=C3=B6ller, B., Struth, G. (eds.) RelMiCS 2008. LNCS, vol. 4988, pp. 155=E2=80=93172. Springer, Heidelberg (2008). [5] N.B.B. Grathwohl et al: "Infinitary axiomatization of the equational theory of context-free languages". In: Baelde, D., Carayol, A. (eds.) Fixed Points in Computer Science (FICS 2013). EPTCS, vol. 126, pp. 44=E2=80=9355 (2013)