Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #3499

Re: bison parser : retrieving values from recursive pattern

From Kaz Kylheku <864-117-4973@kylheku.com>
Newsgroups comp.compilers
Subject Re: bison parser : retrieving values from recursive pattern
Date 2023-07-07 01:14 +0000
Organization Compilers Central
Message-ID <23-07-002@comp.compilers> (permalink)
References <23-07-001@comp.compilers>

Show all headers | View raw


On 2023-07-06, Archana Deshmukh <desharchana19@gmail.com> wrote:
> Hello,
>
> I have a following rule
>
> num :
>| integer comma num
>| integer closeroundbkt
>| integer closesquarebkt
>

Recognizing close brackets in a different rule from the open ones is
not absolutely off the table, but it's a code smell.

Consider a nice grammar like

  list : '(' items ')'
       | '(' ')'
       | '[' items ']'
       | '[' ']'

  items : items ',' item
        | item
        ;

  item : list | num | type | decl

  decl : keyword ':' oper list

  keyword : KW_main | KW_data | KW_param_1

  type : TYPE_float32 | ...

  oper : OPER_r | OPER_or

I'd make all the symbols just one token type SYMBOL, and deal with it
all semantically later in the pipeline.

I.e. the over-generated grammar would allow nonsense like

  @data(%float32: foo[(1, 2, 3, 4), param_1], main: ...)

This would be checked for validity semantically; that the right
kinds of symbols are in the right positions in the shape.


  list : '(' items ')'
       | '(' ')'
       | '[' items ']'
       | '[' ']'

  items : items ',' item
        | item
        ;

  item : list | num | SYMBOL | decl

  decl : SYMBOL ':' SYMBOL list

Lisp teaches us that reserved keywords are largely inflexible
and counterproductive.

Make your SYMBOl objects interned, and give them a type like
"struct symbol *". Interned means that when the same symbol
occurs more than once, the parser returns the same pointer:

   SYMBOL { $$ = intern($1); } /* $1 is the yytext lexeme */

The first time intern("foo") is called it creates and return
s a symbol sym such that sym->name is foo (a strdup-ed copy)
The second time intern("foo") is called, it returns exactly
the same object!

In your program you can have initialization like this:

  struct symbol *float32_s;

  void global_init(void)
  {
     float32_s = intern("float32");

     ...
  }

Then when the parser sees float32, it will produce
the same pointer.

The upshot is that you never have to compare strings.
If you want to check, is x the float32 symbol, you just use
the == operator;

  void foo(struct symbol *x)
  {
    if (X == float32_s) {
      // we are looking at the float32 symbol

    }

  }

Because symbols are just pointers, they are also fast to hash.
A hash table which maps symbols to other things just has
to hash the 4 or 8 byte pointer, not the string. This can
be done in a few bit operations.

Important global properties about symbols can be stored
in the struct symbol itself. For instance float32 is
a type, so there can be a sym->is_type property,
which is true for float32. Then you can easily check
whether some list has a type symbol in a certain position.
First check there is a symbol and if so, that it is
one with the is_type property true.

Back to comp.compilers | Previous | NextPrevious in thread | Find similar


Thread

bison parser : retrieving values from recursive pattern Archana Deshmukh <desharchana19@gmail.com> - 2023-07-06 02:12 -0700
  Re: bison parser : retrieving values from recursive pattern Kaz Kylheku <864-117-4973@kylheku.com> - 2023-07-07 01:14 +0000

csiph-web