Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #3498 > unrolled thread

bison parser : retrieving values from recursive pattern

Started byArchana Deshmukh <desharchana19@gmail.com>
First post2023-07-06 02:12 -0700
Last post2023-07-07 01:14 +0000
Articles 2 — 2 participants

Back to article view | Back to comp.compilers


Contents

  bison parser : retrieving values from recursive pattern Archana Deshmukh <desharchana19@gmail.com> - 2023-07-06 02:12 -0700
    Re: bison parser : retrieving values from recursive pattern Kaz Kylheku <864-117-4973@kylheku.com> - 2023-07-07 01:14 +0000

#3498 — bison parser : retrieving values from recursive pattern

FromArchana Deshmukh <desharchana19@gmail.com>
Date2023-07-06 02:12 -0700
Subjectbison parser : retrieving values from recursive pattern
Message-ID<23-07-001@comp.compilers>
Hello,

I have a following rule

num :
| integer comma num
| integer closeroundbkt
| integer closesquarebkt


I need to parse  data like
efg @main(%data: r[(1, 2, 4, 4), float32], %param_1: or[(2, 1, 5, 5), float32], %param_2: or[(20), float32], %param_3: or[(5, 2, 5, 5), float32], %param_4: or[(50), float32], %param_5: or[(50, 80), float32], %param_6: Tensor[(50), float32], %param_7: or[(10, 50), float32], %param_8: or[(20), float32]

I also need to retrieve these values and store to a lsit.

Retreiving and storing values for patterns like
| integer closeroundbkt
| integer closesquarebkt

is simple.

However, I am not able to find a way to retrieve and store recursive numbers from pattern

| integer comma num

Sometimes there can be 2 numbers (50, 80), sometimes there can be 4 numbers ((1, 2, 4, 4)). How to handle this?

Any suggestions are welcome.

Best Regards,
Archana Deshmukh
[For a list of numbers in parens I would do something like this:

parennumlist: '(' numlist ')' ;

numlist: integer
 | numlist ',' integer ;

For the bracketed lists:

bracketlist: '[' parennumlist ',' datatype ']':

datatype: FLOAT32 | ... whatever other types there are ... ;

The usual way you do a variable length list is to make a recursive rule with one item
for a single item and another rule to add an item.  Any book about compiler design should
give advice on writing grammar rules or my "flex & bison" has example grammars that
include lists. -John]

[toc] | [next] | [standalone]


#3499

FromKaz Kylheku <864-117-4973@kylheku.com>
Date2023-07-07 01:14 +0000
Message-ID<23-07-002@comp.compilers>
In reply to#3498
On 2023-07-06, Archana Deshmukh <desharchana19@gmail.com> wrote:
> Hello,
>
> I have a following rule
>
> num :
>| integer comma num
>| integer closeroundbkt
>| integer closesquarebkt
>

Recognizing close brackets in a different rule from the open ones is
not absolutely off the table, but it's a code smell.

Consider a nice grammar like

  list : '(' items ')'
       | '(' ')'
       | '[' items ']'
       | '[' ']'

  items : items ',' item
        | item
        ;

  item : list | num | type | decl

  decl : keyword ':' oper list

  keyword : KW_main | KW_data | KW_param_1

  type : TYPE_float32 | ...

  oper : OPER_r | OPER_or

I'd make all the symbols just one token type SYMBOL, and deal with it
all semantically later in the pipeline.

I.e. the over-generated grammar would allow nonsense like

  @data(%float32: foo[(1, 2, 3, 4), param_1], main: ...)

This would be checked for validity semantically; that the right
kinds of symbols are in the right positions in the shape.


  list : '(' items ')'
       | '(' ')'
       | '[' items ']'
       | '[' ']'

  items : items ',' item
        | item
        ;

  item : list | num | SYMBOL | decl

  decl : SYMBOL ':' SYMBOL list

Lisp teaches us that reserved keywords are largely inflexible
and counterproductive.

Make your SYMBOl objects interned, and give them a type like
"struct symbol *". Interned means that when the same symbol
occurs more than once, the parser returns the same pointer:

   SYMBOL { $$ = intern($1); } /* $1 is the yytext lexeme */

The first time intern("foo") is called it creates and return
s a symbol sym such that sym->name is foo (a strdup-ed copy)
The second time intern("foo") is called, it returns exactly
the same object!

In your program you can have initialization like this:

  struct symbol *float32_s;

  void global_init(void)
  {
     float32_s = intern("float32");

     ...
  }

Then when the parser sees float32, it will produce
the same pointer.

The upshot is that you never have to compare strings.
If you want to check, is x the float32 symbol, you just use
the == operator;

  void foo(struct symbol *x)
  {
    if (X == float32_s) {
      // we are looking at the float32 symbol

    }

  }

Because symbols are just pointers, they are also fast to hash.
A hash table which maps symbols to other things just has
to hash the 4 or 8 byte pointer, not the string. This can
be done in a few bit operations.

Important global properties about symbols can be stored
in the struct symbol itself. For instance float32 is
a type, so there can be a sym->is_type property,
which is true for float32. Then you can easily check
whether some list has a type symbol in a certain position.
First check there is a symbol and if so, that it is
one with the is_type property true.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.compilers


csiph-web