Groups > comp.compilers > #1988 > unrolled thread

Parser Reversed

Started by	Hans-Peter Diettrich <DrDiettrich1@netscape.net>
First post	2018-03-11 08:32 +0100
Last post	2018-03-13 12:21 +0100
Articles	5 — 3 participants

Back to article view | Back to comp.compilers

  Parser Reversed Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2018-03-11 08:32 +0100
    Re: Parser Reversed "Matt P. Dziubinski" <matdzb@gmail.com> - 2018-03-11 15:08 +0100
      Re: Parser Reversed Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2018-03-13 11:23 +0100
    Re: Parser Reversed Kaz Kylheku <157-073-9834@kylheku.com> - 2018-03-12 21:00 +0000
      Re: Parser Reversed Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2018-03-13 12:21 +0100

#1988 — Parser Reversed

From	Hans-Peter Diettrich <DrDiettrich1@netscape.net>
Date	2018-03-11 08:32 +0100
Subject	Parser Reversed
Message-ID	<18-03-038@comp.compilers>

A grammar can be used to *check* for valid sentences of a language, but
it also can be used to *create* valid sentences. For a pretty printer or
decompiler test I need a sentence generator for logical expressions. For
now the language can be restricted to AND, OR, variables and (kind of)
parentheses. Later on NOT and XOR can be added. RPN is one alternative
for the "kind of parentheses", eliminating the need for a specific
operator precedence.

Now I'm looking for possible implementations of such a generator, in
addition to my own ideas. So far the output can be anything, e.g. source
code or machine code, or some tree (AST...).

Any ideas or references to such projects?

TIA
   DoDi

[toc] | [next] | [standalone]

#1989

From	"Matt P. Dziubinski" <matdzb@gmail.com>
Date	2018-03-11 15:08 +0100
Message-ID	<18-03-040@comp.compilers>
In reply to	#1988

On 3/11/2018 08:32, Hans-Peter Diettrich wrote:
> A grammar can be used to *check* for valid sentences of a language, but
> it also can be used to *create* valid sentences. For a pretty printer or
> decompiler test I need a sentence generator for logical expressions. For
> now the language can be restricted to AND, OR, variables and (kind of)
> parentheses. Later on NOT and XOR can be added. RPN is one alternative
> for the "kind of parentheses", eliminating the need for a specific
> operator precedence.
>
> Now I'm looking for possible implementations of such a generator, in
> addition to my own ideas. So far the output can be anything, e.g. source
> code or machine code, or some tree (AST...).
>
> Any ideas or references to such projects?

Hi!

Csmith comes to mind: https://embed.cs.utah.edu/csmith/

Reference: Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. PLDI
2011. "Finding and Understanding Bugs in C Compilers"
Paper: http://www.cs.utah.edu/~regehr/papers/pldi11-preprint.pdf
LtU post: http://lambda-the-ultimate.org/node/4241

Summary (from the paper): "The shape of a program generated by Csmith is
governed by a grammar for a subset of C. A program is a collection of
type, variable, and function definitions; a function body is a block; a
block contains a list of declarations and a list of statements; and a
statement is an expression, control-flow construct (e.g., `if`,
`return`, `goto`, or `for`), assignment, or block. Assignments are
modeled as statementsbnot expressionsbwhich reflects the most common
idiom for assignments in C code. We leverage our grammar to produce
other idiomatic code as well: in particular, we include a statement kind
that represents a loop iterating over an array. The grammar is
implemented by a collection of hand-coded C++ classes."

You may also want to take a look at the following:

* "Effect-Driven QuickChecking of Compilers" (notably, the following
goes substantially further than relying solely on the grammar grammar by
making use of the type system -- more in the paper):

Code (Effect-Driven Compiler Tester): https://github.com/jmid/efftester
Paper: http://janmidtgaard.dk/papers/Midtgaard-al%3AICFP17-full.pdf
Talk: https://podcasts.ox.ac.uk/effect-driven-quickchecking-compilers

* "Structure-aware fuzzing for Clang and LLVM with libprotobuf-mutator"
- Kostya Serebryany, Vitaly Buka and Matt Morehouse - 2017 LLVM
Developersb Meeting
https://www.youtube.com/watch?v=U60hC16HEDY
https://llvm.org/devmtg/2017-10/#talk8

See: https://llvm.org/docs/FuzzingLLVM.html
In particular:
https://github.com/llvm-mirror/clang/tree/master/tools/clang-fuzzer

"This directory contains two utilities for fuzzing Clang: clang-fuzzer
and clang-proto-fuzzer.  Both use libFuzzer to generate inputs to clang
via coverage-guided mutation.

The two utilities differ, however, in how they structure inputs to
Clang. clang-fuzzer makes no attempt to generate valid C++ programs and
is therefore primarily useful for stressing the surface layers of Clang
(i.e. lexer, parser). clang-proto-fuzzer uses a protobuf class to
describe a subset of the C++ language and then uses libprotobuf-mutator
to mutate instantiations of that class, producing valid C++ programs in
the process.  As a result, clang-proto-fuzzer is better at stressing
deeper layers of Clang and LLVM."

For further reference, perhaps the following compiler correctness
resources (literature & software) can also be of help:
https://github.com/MattPD/cpplinks/blob/master/compilers.correctness.md

Best,

Matt P. Dziubinski

[toc] | [prev] | [next] | [standalone]

#1999

From	Hans-Peter Diettrich <DrDiettrich1@netscape.net>
Date	2018-03-13 11:23 +0100
Message-ID	<18-03-053@comp.compilers>
In reply to	#1989

Am 11.03.2018 um 15:08 schrieb Matt P. Dziubinski:
> On 3/11/2018 08:32, Hans-Peter Diettrich wrote:
>> A grammar can be used to *check* for valid sentences of a language, but
>> it also can be used to *create* valid sentences. For a pretty printer or
>> decompiler test I need a sentence generator for logical expressions. For
>> now the language can be restricted to AND, OR, variables and (kind of)
>> parentheses. Later on NOT and XOR can be added. RPN is one alternative
>> for the "kind of parentheses", eliminating the need for a specific
>> operator precedence.
>>
>> Now I'm looking for possible implementations of such a generator, in
>> addition to my own ideas. So far the output can be anything, e.g. source
>> code or machine code, or some tree (AST...).
>>
>> Any ideas or references to such projects?
>
> Hi!
>
> Csmith comes to mind: https://embed.cs.utah.edu/csmith/

Thanks for all the links :-)


> * "Effect-Driven QuickChecking of Compilers" (notably, the following
> goes substantially further than relying solely on the grammar grammar by
> making use of the type system -- more in the paper):
>
> Code (Effect-Driven Compiler Tester): https://github.com/jmid/efftester
> Paper: http://janmidtgaard.dk/papers/Midtgaard-al%3AICFP17-full.pdf
> Talk: https://podcasts.ox.ac.uk/effect-driven-quickchecking-compilers

Here I'm absolutely lost with the notation :-(

DoDi

[toc] | [prev] | [next] | [standalone]

#1992

From	Kaz Kylheku <157-073-9834@kylheku.com>
Date	2018-03-12 21:00 +0000
Message-ID	<18-03-044@comp.compilers>
In reply to	#1988

On 2018-03-11, Hans-Peter Diettrich <DrDiettrich1@netscape.net> wrote:
> A grammar can be used to *check* for valid sentences of a language, but
> it also can be used to *create* valid sentences.

Grammars are *defined* in terms of generation.  That's why the rules are
called "production rules" not "recognition rules".

Using grammars to generate sentences randomly is algorithmically far
simpler than the cunning required to parse sentences according to a
grammar.

It's a common homework exercise in the first week of compiler courses.
(Sometimes called compiler courses, though rarely do compilers
get written.)

If you can use a high level, dynamic language, you can probably
write the code on a napkin.

  let L := { S }   # let L be a list containing the start symbol

  while (at least one non-terminal symbol remains in L) {
    choose a non-terminal symbol N from L
    choose from the grammar a production rule R which has N on its left
    replace S by the right side of the rule
  }

  L now holds a random sentence

Note that if the grammar can generate infinite sentences, then this
algorithm has no guarantee of termination.

Some cunning could be built into such that, at a given configured
sentence length, it starts preferring the rules which lead to the
generation of terminal symbols.

If a given chosen N has two production rules like

  N -> N t   # t is a terminal symbol
    -> t

then the second rule will be preferred (by some heuristic like choose
the rule with maximal terminal symbols and minimal non-terminals).

Speaking of termination, again, the generative POV dictates
the terminology: terminal symbols terminate the generation!

If the grammar isn't context free (two or more symbols appear on
the left hand side of a rule) then you have a more complicated
job. Basically you have to gather all the left hand patterns
from the grammar, and then look for occurences of these patterns
in the sentential form L. From these occurrences, pick one
and expand it. Something like that.

[toc] | [prev] | [next] | [standalone]

#1998

From	Hans-Peter Diettrich <DrDiettrich1@netscape.net>
Date	2018-03-13 12:21 +0100
Message-ID	<18-03-051@comp.compilers>
In reply to	#1992

Am 12.03.2018 um 22:00 schrieb Kaz Kylheku:
> On 2018-03-11, Hans-Peter Diettrich <DrDiettrich1@netscape.net> wrote:
>> A grammar can be used to *check* for valid sentences of a language, but
>> it also can be used to *create* valid sentences.
>
> Grammars are *defined* in terms of generation.  That's why the rules are
> called "production rules" not "recognition rules".

Isn't that depending on the grammar notation?

> Using grammars to generate sentences randomly is algorithmically far
> simpler than the cunning required to parse sentences according to a
> grammar.

Right, at least as long as semantical correctness is not of interest.

> It's a common homework exercise in the first week of compiler courses.
> (Sometimes called compiler courses, though rarely do compilers
> get written.)

I'm near 50 years past that stage ;-)

> If you can use a high level, dynamic language, you can probably
> write the code on a napkin.
>
>    let L := { S }   # let L be a list containing the start symbol
>
>    while (at least one non-terminal symbol remains in L) {
>      choose a non-terminal symbol N from L
>      choose from the grammar a production rule R which has N on its left
>      replace S by the right side of the rule
>    }
>
>    L now holds a random sentence

Ah, double randomization is a nice trick :-)

> Note that if the grammar can generate infinite sentences, then this
> algorithm has no guarantee of termination.

Right. I thought of giving weights to the alternatives of a NT, and
using multiple sets of weights for initial, continued and final stage of
the generation.

> Some cunning could be built into such that, at a given configured
> sentence length, it starts preferring the rules which lead to the
> generation of terminal symbols.

Fine, with eps (empty) counting as a terminal symbol.

> If a given chosen N has two production rules like
>
>    N -> N t   # t is a terminal symbol
>      -> t
>
> then the second rule will be preferred (by some heuristic like choose
> the rule with maximal terminal symbols and minimal non-terminals).

In my simple grammar case this will be sufficient. Other grammars may
deserve some initial analysis.

> Speaking of termination, again, the generative POV dictates
> the terminology: terminal symbols terminate the generation!

Point taken :-)

> If the grammar isn't context free (two or more symbols appear on
> the left hand side of a rule) then you have a more complicated
> job. Basically you have to gather all the left hand patterns
> from the grammar, and then look for occurences of these patterns
> in the sentential form L. From these occurrences, pick one
> and expand it. Something like that.

This is where replacing the (sequentially) first matching pattern may
not cover all cases of interest.

Thanks for the many hints :-)

DoDi

[toc] | [prev] | [standalone]

csiph-web

Parser Reversed

Contents

#1988 — Parser Reversed

#1989

#1999

#1992

#1998