Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #3498 > unrolled thread
| Started by | Archana Deshmukh <desharchana19@gmail.com> |
|---|---|
| First post | 2023-07-06 02:12 -0700 |
| Last post | 2023-07-07 01:14 +0000 |
| Articles | 2 — 2 participants |
Back to article view | Back to comp.compilers
bison parser : retrieving values from recursive pattern Archana Deshmukh <desharchana19@gmail.com> - 2023-07-06 02:12 -0700
Re: bison parser : retrieving values from recursive pattern Kaz Kylheku <864-117-4973@kylheku.com> - 2023-07-07 01:14 +0000
| From | Archana Deshmukh <desharchana19@gmail.com> |
|---|---|
| Date | 2023-07-06 02:12 -0700 |
| Subject | bison parser : retrieving values from recursive pattern |
| Message-ID | <23-07-001@comp.compilers> |
Hello,
I have a following rule
num :
| integer comma num
| integer closeroundbkt
| integer closesquarebkt
I need to parse data like
efg @main(%data: r[(1, 2, 4, 4), float32], %param_1: or[(2, 1, 5, 5), float32], %param_2: or[(20), float32], %param_3: or[(5, 2, 5, 5), float32], %param_4: or[(50), float32], %param_5: or[(50, 80), float32], %param_6: Tensor[(50), float32], %param_7: or[(10, 50), float32], %param_8: or[(20), float32]
I also need to retrieve these values and store to a lsit.
Retreiving and storing values for patterns like
| integer closeroundbkt
| integer closesquarebkt
is simple.
However, I am not able to find a way to retrieve and store recursive numbers from pattern
| integer comma num
Sometimes there can be 2 numbers (50, 80), sometimes there can be 4 numbers ((1, 2, 4, 4)). How to handle this?
Any suggestions are welcome.
Best Regards,
Archana Deshmukh
[For a list of numbers in parens I would do something like this:
parennumlist: '(' numlist ')' ;
numlist: integer
| numlist ',' integer ;
For the bracketed lists:
bracketlist: '[' parennumlist ',' datatype ']':
datatype: FLOAT32 | ... whatever other types there are ... ;
The usual way you do a variable length list is to make a recursive rule with one item
for a single item and another rule to add an item. Any book about compiler design should
give advice on writing grammar rules or my "flex & bison" has example grammars that
include lists. -John]
[toc] | [next] | [standalone]
| From | Kaz Kylheku <864-117-4973@kylheku.com> |
|---|---|
| Date | 2023-07-07 01:14 +0000 |
| Message-ID | <23-07-002@comp.compilers> |
| In reply to | #3498 |
On 2023-07-06, Archana Deshmukh <desharchana19@gmail.com> wrote:
> Hello,
>
> I have a following rule
>
> num :
>| integer comma num
>| integer closeroundbkt
>| integer closesquarebkt
>
Recognizing close brackets in a different rule from the open ones is
not absolutely off the table, but it's a code smell.
Consider a nice grammar like
list : '(' items ')'
| '(' ')'
| '[' items ']'
| '[' ']'
items : items ',' item
| item
;
item : list | num | type | decl
decl : keyword ':' oper list
keyword : KW_main | KW_data | KW_param_1
type : TYPE_float32 | ...
oper : OPER_r | OPER_or
I'd make all the symbols just one token type SYMBOL, and deal with it
all semantically later in the pipeline.
I.e. the over-generated grammar would allow nonsense like
@data(%float32: foo[(1, 2, 3, 4), param_1], main: ...)
This would be checked for validity semantically; that the right
kinds of symbols are in the right positions in the shape.
list : '(' items ')'
| '(' ')'
| '[' items ']'
| '[' ']'
items : items ',' item
| item
;
item : list | num | SYMBOL | decl
decl : SYMBOL ':' SYMBOL list
Lisp teaches us that reserved keywords are largely inflexible
and counterproductive.
Make your SYMBOl objects interned, and give them a type like
"struct symbol *". Interned means that when the same symbol
occurs more than once, the parser returns the same pointer:
SYMBOL { $$ = intern($1); } /* $1 is the yytext lexeme */
The first time intern("foo") is called it creates and return
s a symbol sym such that sym->name is foo (a strdup-ed copy)
The second time intern("foo") is called, it returns exactly
the same object!
In your program you can have initialization like this:
struct symbol *float32_s;
void global_init(void)
{
float32_s = intern("float32");
...
}
Then when the parser sees float32, it will produce
the same pointer.
The upshot is that you never have to compare strings.
If you want to check, is x the float32 symbol, you just use
the == operator;
void foo(struct symbol *x)
{
if (X == float32_s) {
// we are looking at the float32 symbol
}
}
Because symbols are just pointers, they are also fast to hash.
A hash table which maps symbols to other things just has
to hash the 4 or 8 byte pointer, not the string. This can
be done in a few bit operations.
Important global properties about symbols can be stored
in the struct symbol itself. For instance float32 is
a type, so there can be a sym->is_type property,
which is true for float32. Then you can easily check
whether some list has a type symbol in a certain position.
First check there is a symbol and if so, that it is
one with the is_type property true.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.compilers
csiph-web