Path: csiph.com!xmission!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: mjbishop@fastmail.com (matthew bishop) Newsgroups: comp.compilers Subject: alpha release "pp" pattern parsing language and machine Date: Sat, 17 Aug 2019 21:45:17 +0000 (UTC) Organization: Aioe.org NNTP Server Lines: 72 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <19-08-001@comp.compilers> Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="65818"; mail-complaints-to="abuse@iecc.com" Keywords: parse, available Posted-Date: 18 Aug 2019 14:33:46 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:2350 Hello, I have written a small scripting language for parsing and compiling mainly context-free languages. I had an idea for parsing languages using a stack/tape (string array) combination that stays in synchronisation through "push" and "pop" commands. The idea seems to work. For example at http://bumble.sourceforge.net/books/gh/eg/exp.tolisp.pss is a script that translates simple arithmetic expressions into a lisp-like syntax. Also, at http://bumble.sourceforge.net/books/gh/compilable.c.pss is a script which translates parse-scripts into compilable c code (so the scripts can be compiled to standalone executables). The script language and virtual machine are implemented at http://bumble.sourceforge.net/books/gh/object/ The system reads the input stream one character at a time and constructs and compiles parse tokens. The machine and language were mainly inspired by "sed", both its strengths and weaknesses. Here is a small snippet showing the relationship between an ebnf rule and some code in a parse-language script. # ebnf rule: commandset := commandset, command ; "commandset*command*" { clear; add "commandset*" ; push; } The script snippet above implements the given ebnf rule. But the script language can also compile the "attributes" of the grammar. Complete scripts can be written on the command line like sed. Eg: pp -e 'read; [aeiou] { add "(vowel)"; } print; clear;' -i "abcde" (output is) "a(vowel)bcde(vowel)" I use the -i switch here to provide input because the "pp" tool is also a debugger at the moment (you can step through and view the compiled program and machine state) and so cant accept input from piped stdin. In the future I will separate "pp" into 2 tools. One which is a debugger and script interpreter, and another which just runs (interprets) the script. The executable "pp" includes a script interpreter and viewer/debugger. The script language can implement itself (!). For example: http://bumble.sourceforge.net/books/gh/compile.pss contains a working implementation (compiler) of the script language written as a parse-script (it is boot-strapped by the "assembler" program /books/gh/asm.pp). The idea uses a stack to maintain the parse tokens, the "tape" to maintain and compile the "attributes", and a "workspace" as a text accumulator to manipulate tokens and attributes. This is a small idea, but I think it has potential. I would like to know what other people think about it. The idea is so simple, and seems so effective, that I am dubious that no-one else has implemented it before. This is an open source software project. The code is in an "alpha" stage. Useful scripts can be written, run, viewed, and debugged (with the "pp" executable) or compiled with the /books/gh/compilable.c.pss script. But the code needs to be reorganised (the struct Program object, for example, should not be a member variable of the struct Machine object). Also, there is a malloc segmentation fault bug that I need to track down. Also, I need to think of a good name for the system and debugger (or just leave it as "pp" for pattern parser). I would appreciate any feedback or contributions to this project. I would also be interested if any one knows of a similar system that has already been implemented regards Matthew