Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Kaz Kylheku <480-992-1380@kylheku.com> Newsgroups: comp.compilers Subject: Re: Using a C struct to represent a node in a parse tree ... how many fields in the struct? Date: Fri, 8 Apr 2022 19:49:10 -0000 (UTC) Organization: A noiseless patient Spider Lines: 69 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-04-006@comp.compilers> References: <22-04-002@comp.compilers> Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="51733"; mail-complaints-to="abuse@iecc.com" Keywords: parse, design Posted-Date: 08 Apr 2022 17:09:23 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:2974 On 2022-04-08, Roger L Costello wrote: > Hi Folks, > > The compiler book [1] that I am reading says this: > > The most obvious way to represent the information gained from lexical and > syntax analysis is as a tree along the lines of the parse tree. In C this is > suitably handled using a simple struct for each node: > > struct node > { > int nodetype ; > struct node *field1 ; > struct node *field2 ; > struct node *field3 ; > struct node *field4 ; > struct node *field5 ; > } ; > > But, but, but, ..what if a node requires more than 5 fields; how is that > handled? The fields are all nodes so you can easily have a linked list: struct node { int nodetype; struct node *car; struct node *cdr; } The Lisp family of languages handle all syntax using nothing but binary cells like this, including nodes with arbitrary numbers of arguments. You can also also have an N-ary tree, compactly allocated, using the C struct hack (which is now official and called "flexible array member"): struct node { int nodetype; int nfields; struct node* fields[]; }; When the node is allocated, extra room is requested for required number of fields, which are then accessed as nodeptr->field[0], and so on. If it's ever necessary to dynamically grow that after a node has come into existence, and is referenced in multiple places in the program, the above representation has disadvantages. But you can easily switch to: struct node { int nodetype; int nfields; struct node** fields; }; where fields is a separately allocated array of pointers. The access expressions look the same. nodeptr->field[0] -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal