Path: csiph.com!1.us.feeder.erje.net!3.us.feeder.erje.net!feeder.erje.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: luser droog <luser.droog@gmail.com>
Newsgroups: comp.compilers
Subject: Re: Using a C struct to represent a node in a parse tree ... how many fields in the struct?
Date: Wed, 27 Apr 2022 21:22:51 -0700 (PDT)
Organization: Compilers Central
Lines: 104
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-04-023@comp.compilers>
References: <22-04-002@comp.compilers> <22-04-006@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="67869"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, analysis
Posted-Date: 28 Apr 2022 00:30:34 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <22-04-006@comp.compilers>
Xref: csiph.com comp.compilers:2989

On Friday, April 8, 2022 at 4:09:26 PM UTC-5, Kaz Kylheku wrote:
> On 2022-04-08, Roger L Costello <cost...@mitre.org> wrote:
> > Hi Folks,
> >
> > The compiler book [1] that I am reading says this:
> >
> > The most obvious way to represent the information gained from lexical and
> > syntax analysis is as a tree along the lines of the parse tree. In C this is
> > suitably handled using a simple struct for each node:
> >
> > struct node
> > {
> > int nodetype ;
> > struct node *field1 ;
> > struct node *field2 ;
> > struct node *field3 ;
> > struct node *field4 ;
> > struct node *field5 ;
> > } ;
> >
> > But, but, but, ..what if a node requires more than 5 fields; how is that
> > handled?
> The fields are all nodes so you can easily have a linked list:
>
> struct node
> {
> int nodetype;
> struct node *car;
> struct node *cdr;
> }
>
> The Lisp family of languages handle all syntax using nothing but binary
> cells like this, including nodes with arbitrary numbers of arguments.
>

This is mostly how I do it. I've just started a new draft in C, so it's the
appropriate size (and scope) to share as an example. One trick I've found
useful in previous versions is the extra 'data' member in the symbol struct.
If the output of the tokenizer is a stream of symbols, then the data for each
symbol can be something useful like the original string representation and/or
file position.


#define PC10OBJECT_H

typedef union object object;
typedef object list;
typedef object suspension;
typedef object parser;
typedef object boolean;
typedef object operator;
typedef operator predicate;
typedef object fSuspension( object );
typedef object fParser( object, list );
typedef object fOperator( object, object );
typedef boolean fPredicate( object, object );
typedef object fBinaryOperator( object, object );

typedef enum {
  INVALID, INT, LIST, SUSPENSION, PARSER, OPERATOR, SYMBOL, STRING, VOID
} tag;

union object { tag t;
      struct { tag t; int i; } Int;
      struct { tag t; struct list *ref; } List;
      struct { tag t; struct suspension *ref; } Suspension;
      struct { tag t; struct parser *ref; } Parser;
      struct { tag t; struct operator *ref; } Operator;
      struct { tag t; struct symbol *ref; } Symbol;
      struct { tag t; struct string *ref; } String;
      struct { tag t; void *ptr; } Void;
};

struct list {
  object a, b; // or car and cdr if you like
};

struct suspension {
  object env; // an association list of (<symbol>, <value>) pairs
  fSuspension *f;
};

struct parser {
  object env;
  fParser *f;
};

struct operator {
  object env;
  fOperator *f;
};

struct symbol {
  int code;
  char *printname;
  object data;
};

struct string {
  char *ptr;
  int disposable; // if yes, the GC should call free(ptr)
};

extern boolean T_, NIL_;