Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: "matt.ti...@gmail.com" Newsgroups: comp.compilers Subject: Re: The remarkable similarities between Flex/Lex and XSLT Date: Sat, 25 Jun 2022 09:20:54 -0700 (PDT) Organization: Compilers Central Lines: 44 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-06-078@comp.compilers> References: <22-06-073@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="79332"; mail-complaints-to="abuse@iecc.com" Keywords: lex, history Posted-Date: 25 Jun 2022 12:44:44 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com In-Reply-To: <22-06-073@comp.compilers> Xref: csiph.com comp.compilers:3098 On Friday, 24 June 2022 at 09:00:44 UTC-4, Roger L Costello wrote: > Hi Folks, > > XSLT is a language for processing XML documents. > > There are remarkable similarities between Flex/Lex and XSLT. Lex was created > 47 years ago, long before XSLT. One wonders if some members of the XSLT 1.0 > Working Group were Lex users and were influenced by its concepts? It's not really about a single tool like Lex. Before XML there was SGML, which XML was supposed to "simplify". SGML included a schema language (DTD), which defines the hierarchical structure of a document using regular expressions over elements. There was also a strange unnecessary constraint on these expressions called "ambiguity", which *everybody* who wrote SGML software needed to understand, and so the idea of applying formal language techniques to SGML was inevitable. Long before XSLT, there were a variety of attempts to define languages that would allow users to specify an automatic translation from SGML into printed form. Many of these languages were context-free grammars at their core, with translation rules as actions. This is called "syntax-directed translation" and was a well-known concept long before that. With SGML, though, the problem of syntax-directed translation is different than it is in other contexts, and more difficult in many ways, because the basic structures in the input are very easy to parse -- elements are delimited after all -- but the input was a semantically marked up text and the output was a published document that had to follow all the ambiguously-defined stylistic rules that people use when they actually to typography. This meant that complicated grammars, over *element trees* instead of linear text, and lots of other ideas, needed to be applied. Lots of companies put a lot of work into it. So by the time XSLT came around, everyone on the committee as already familiar with a lot of this history from SGML processing, which was based on a lot of work rooted in the same formal language theory that goes into lexers and parsers, and that is why some of XSLT looks a lot like Lex. Unfortunately, XSLT kind of sucks. When the standard was written, the problem itself had not really been solved by industry in a really acceptable way (and it still hasn't been!), and the W3C committee fell into the trap of trying to innovate instead of codifying best practice.