Path: csiph.com!1.us.feeder.erje.net!3.us.feeder.erje.net!feeder.erje.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: luser droog Newsgroups: comp.compilers Subject: Re: Using a C struct to represent a node in a parse tree ... how many fields in the struct? Date: Wed, 27 Apr 2022 21:22:51 -0700 (PDT) Organization: Compilers Central Lines: 104 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-04-023@comp.compilers> References: <22-04-002@comp.compilers> <22-04-006@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="67869"; mail-complaints-to="abuse@iecc.com" Keywords: parse, analysis Posted-Date: 28 Apr 2022 00:30:34 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com In-Reply-To: <22-04-006@comp.compilers> Xref: csiph.com comp.compilers:2989 On Friday, April 8, 2022 at 4:09:26 PM UTC-5, Kaz Kylheku wrote: > On 2022-04-08, Roger L Costello wrote: > > Hi Folks, > > > > The compiler book [1] that I am reading says this: > > > > The most obvious way to represent the information gained from lexical and > > syntax analysis is as a tree along the lines of the parse tree. In C this is > > suitably handled using a simple struct for each node: > > > > struct node > > { > > int nodetype ; > > struct node *field1 ; > > struct node *field2 ; > > struct node *field3 ; > > struct node *field4 ; > > struct node *field5 ; > > } ; > > > > But, but, but, ..what if a node requires more than 5 fields; how is that > > handled? > The fields are all nodes so you can easily have a linked list: > > struct node > { > int nodetype; > struct node *car; > struct node *cdr; > } > > The Lisp family of languages handle all syntax using nothing but binary > cells like this, including nodes with arbitrary numbers of arguments. > This is mostly how I do it. I've just started a new draft in C, so it's the appropriate size (and scope) to share as an example. One trick I've found useful in previous versions is the extra 'data' member in the symbol struct. If the output of the tokenizer is a stream of symbols, then the data for each symbol can be something useful like the original string representation and/or file position. #define PC10OBJECT_H typedef union object object; typedef object list; typedef object suspension; typedef object parser; typedef object boolean; typedef object operator; typedef operator predicate; typedef object fSuspension( object ); typedef object fParser( object, list ); typedef object fOperator( object, object ); typedef boolean fPredicate( object, object ); typedef object fBinaryOperator( object, object ); typedef enum { INVALID, INT, LIST, SUSPENSION, PARSER, OPERATOR, SYMBOL, STRING, VOID } tag; union object { tag t; struct { tag t; int i; } Int; struct { tag t; struct list *ref; } List; struct { tag t; struct suspension *ref; } Suspension; struct { tag t; struct parser *ref; } Parser; struct { tag t; struct operator *ref; } Operator; struct { tag t; struct symbol *ref; } Symbol; struct { tag t; struct string *ref; } String; struct { tag t; void *ptr; } Void; }; struct list { object a, b; // or car and cdr if you like }; struct suspension { object env; // an association list of (, ) pairs fSuspension *f; }; struct parser { object env; fParser *f; }; struct operator { object env; fOperator *f; }; struct symbol { int code; char *printname; object data; }; struct string { char *ptr; int disposable; // if yes, the GC should call free(ptr) }; extern boolean T_, NIL_;