Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Kaz Kylheku <864-117-4973@kylheku.com> Newsgroups: comp.compilers Subject: Re: bison parser : retrieving values from recursive pattern Date: Fri, 07 Jul 2023 01:14:04 -0000 Organization: Compilers Central Sender: johnl@iecc.com Approved: comp.compilers@iecc.com Message-ID: <23-07-002@comp.compilers> References: <23-07-001@comp.compilers> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="50127"; mail-complaints-to="abuse@iecc.com" Keywords: yacc, design Posted-Date: 06 Jul 2023 21:17:02 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:3499 On 2023-07-06, Archana Deshmukh wrote: > Hello, > > I have a following rule > > num : >| integer comma num >| integer closeroundbkt >| integer closesquarebkt > Recognizing close brackets in a different rule from the open ones is not absolutely off the table, but it's a code smell. Consider a nice grammar like list : '(' items ')' | '(' ')' | '[' items ']' | '[' ']' items : items ',' item | item ; item : list | num | type | decl decl : keyword ':' oper list keyword : KW_main | KW_data | KW_param_1 type : TYPE_float32 | ... oper : OPER_r | OPER_or I'd make all the symbols just one token type SYMBOL, and deal with it all semantically later in the pipeline. I.e. the over-generated grammar would allow nonsense like @data(%float32: foo[(1, 2, 3, 4), param_1], main: ...) This would be checked for validity semantically; that the right kinds of symbols are in the right positions in the shape. list : '(' items ')' | '(' ')' | '[' items ']' | '[' ']' items : items ',' item | item ; item : list | num | SYMBOL | decl decl : SYMBOL ':' SYMBOL list Lisp teaches us that reserved keywords are largely inflexible and counterproductive. Make your SYMBOl objects interned, and give them a type like "struct symbol *". Interned means that when the same symbol occurs more than once, the parser returns the same pointer: SYMBOL { $$ = intern($1); } /* $1 is the yytext lexeme */ The first time intern("foo") is called it creates and return s a symbol sym such that sym->name is foo (a strdup-ed copy) The second time intern("foo") is called, it returns exactly the same object! In your program you can have initialization like this: struct symbol *float32_s; void global_init(void) { float32_s = intern("float32"); ... } Then when the parser sees float32, it will produce the same pointer. The upshot is that you never have to compare strings. If you want to check, is x the float32 symbol, you just use the == operator; void foo(struct symbol *x) { if (X == float32_s) { // we are looking at the float32 symbol } } Because symbols are just pointers, they are also fast to hash. A hash table which maps symbols to other things just has to hash the 4 or 8 byte pointer, not the string. This can be done in a few bit operations. Important global properties about symbols can be stored in the struct symbol itself. For instance float32 is a type, so there can be a sym->is_type property, which is true for float32. Then you can easily check whether some list has a type symbol in a certain position. First check there is a symbol and if so, that it is one with the is_type property true.