Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #1988 > unrolled thread
| Started by | Hans-Peter Diettrich <DrDiettrich1@netscape.net> |
|---|---|
| First post | 2018-03-11 08:32 +0100 |
| Last post | 2018-03-13 12:21 +0100 |
| Articles | 5 — 3 participants |
Back to article view | Back to comp.compilers
Parser Reversed Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2018-03-11 08:32 +0100
Re: Parser Reversed "Matt P. Dziubinski" <matdzb@gmail.com> - 2018-03-11 15:08 +0100
Re: Parser Reversed Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2018-03-13 11:23 +0100
Re: Parser Reversed Kaz Kylheku <157-073-9834@kylheku.com> - 2018-03-12 21:00 +0000
Re: Parser Reversed Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2018-03-13 12:21 +0100
| From | Hans-Peter Diettrich <DrDiettrich1@netscape.net> |
|---|---|
| Date | 2018-03-11 08:32 +0100 |
| Subject | Parser Reversed |
| Message-ID | <18-03-038@comp.compilers> |
A grammar can be used to *check* for valid sentences of a language, but it also can be used to *create* valid sentences. For a pretty printer or decompiler test I need a sentence generator for logical expressions. For now the language can be restricted to AND, OR, variables and (kind of) parentheses. Later on NOT and XOR can be added. RPN is one alternative for the "kind of parentheses", eliminating the need for a specific operator precedence. Now I'm looking for possible implementations of such a generator, in addition to my own ideas. So far the output can be anything, e.g. source code or machine code, or some tree (AST...). Any ideas or references to such projects? TIA DoDi
[toc] | [next] | [standalone]
| From | "Matt P. Dziubinski" <matdzb@gmail.com> |
|---|---|
| Date | 2018-03-11 15:08 +0100 |
| Message-ID | <18-03-040@comp.compilers> |
| In reply to | #1988 |
On 3/11/2018 08:32, Hans-Peter Diettrich wrote: > A grammar can be used to *check* for valid sentences of a language, but > it also can be used to *create* valid sentences. For a pretty printer or > decompiler test I need a sentence generator for logical expressions. For > now the language can be restricted to AND, OR, variables and (kind of) > parentheses. Later on NOT and XOR can be added. RPN is one alternative > for the "kind of parentheses", eliminating the need for a specific > operator precedence. > > Now I'm looking for possible implementations of such a generator, in > addition to my own ideas. So far the output can be anything, e.g. source > code or machine code, or some tree (AST...). > > Any ideas or references to such projects? Hi! Csmith comes to mind: https://embed.cs.utah.edu/csmith/ Reference: Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. PLDI 2011. "Finding and Understanding Bugs in C Compilers" Paper: http://www.cs.utah.edu/~regehr/papers/pldi11-preprint.pdf LtU post: http://lambda-the-ultimate.org/node/4241 Summary (from the paper): "The shape of a program generated by Csmith is governed by a grammar for a subset of C. A program is a collection of type, variable, and function definitions; a function body is a block; a block contains a list of declarations and a list of statements; and a statement is an expression, control-flow construct (e.g., `if`, `return`, `goto`, or `for`), assignment, or block. Assignments are modeled as statementsbnot expressionsbwhich reflects the most common idiom for assignments in C code. We leverage our grammar to produce other idiomatic code as well: in particular, we include a statement kind that represents a loop iterating over an array. The grammar is implemented by a collection of hand-coded C++ classes." You may also want to take a look at the following: * "Effect-Driven QuickChecking of Compilers" (notably, the following goes substantially further than relying solely on the grammar grammar by making use of the type system -- more in the paper): Code (Effect-Driven Compiler Tester): https://github.com/jmid/efftester Paper: http://janmidtgaard.dk/papers/Midtgaard-al%3AICFP17-full.pdf Talk: https://podcasts.ox.ac.uk/effect-driven-quickchecking-compilers * "Structure-aware fuzzing for Clang and LLVM with libprotobuf-mutator" - Kostya Serebryany, Vitaly Buka and Matt Morehouse - 2017 LLVM Developersb Meeting https://www.youtube.com/watch?v=U60hC16HEDY https://llvm.org/devmtg/2017-10/#talk8 See: https://llvm.org/docs/FuzzingLLVM.html In particular: https://github.com/llvm-mirror/clang/tree/master/tools/clang-fuzzer "This directory contains two utilities for fuzzing Clang: clang-fuzzer and clang-proto-fuzzer. Both use libFuzzer to generate inputs to clang via coverage-guided mutation. The two utilities differ, however, in how they structure inputs to Clang. clang-fuzzer makes no attempt to generate valid C++ programs and is therefore primarily useful for stressing the surface layers of Clang (i.e. lexer, parser). clang-proto-fuzzer uses a protobuf class to describe a subset of the C++ language and then uses libprotobuf-mutator to mutate instantiations of that class, producing valid C++ programs in the process. As a result, clang-proto-fuzzer is better at stressing deeper layers of Clang and LLVM." For further reference, perhaps the following compiler correctness resources (literature & software) can also be of help: https://github.com/MattPD/cpplinks/blob/master/compilers.correctness.md Best, Matt P. Dziubinski
[toc] | [prev] | [next] | [standalone]
| From | Hans-Peter Diettrich <DrDiettrich1@netscape.net> |
|---|---|
| Date | 2018-03-13 11:23 +0100 |
| Message-ID | <18-03-053@comp.compilers> |
| In reply to | #1989 |
Am 11.03.2018 um 15:08 schrieb Matt P. Dziubinski: > On 3/11/2018 08:32, Hans-Peter Diettrich wrote: >> A grammar can be used to *check* for valid sentences of a language, but >> it also can be used to *create* valid sentences. For a pretty printer or >> decompiler test I need a sentence generator for logical expressions. For >> now the language can be restricted to AND, OR, variables and (kind of) >> parentheses. Later on NOT and XOR can be added. RPN is one alternative >> for the "kind of parentheses", eliminating the need for a specific >> operator precedence. >> >> Now I'm looking for possible implementations of such a generator, in >> addition to my own ideas. So far the output can be anything, e.g. source >> code or machine code, or some tree (AST...). >> >> Any ideas or references to such projects? > > Hi! > > Csmith comes to mind: https://embed.cs.utah.edu/csmith/ Thanks for all the links :-) > * "Effect-Driven QuickChecking of Compilers" (notably, the following > goes substantially further than relying solely on the grammar grammar by > making use of the type system -- more in the paper): > > Code (Effect-Driven Compiler Tester): https://github.com/jmid/efftester > Paper: http://janmidtgaard.dk/papers/Midtgaard-al%3AICFP17-full.pdf > Talk: https://podcasts.ox.ac.uk/effect-driven-quickchecking-compilers Here I'm absolutely lost with the notation :-( DoDi
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <157-073-9834@kylheku.com> |
|---|---|
| Date | 2018-03-12 21:00 +0000 |
| Message-ID | <18-03-044@comp.compilers> |
| In reply to | #1988 |
On 2018-03-11, Hans-Peter Diettrich <DrDiettrich1@netscape.net> wrote:
> A grammar can be used to *check* for valid sentences of a language, but
> it also can be used to *create* valid sentences.
Grammars are *defined* in terms of generation. That's why the rules are
called "production rules" not "recognition rules".
Using grammars to generate sentences randomly is algorithmically far
simpler than the cunning required to parse sentences according to a
grammar.
It's a common homework exercise in the first week of compiler courses.
(Sometimes called compiler courses, though rarely do compilers
get written.)
If you can use a high level, dynamic language, you can probably
write the code on a napkin.
let L := { S } # let L be a list containing the start symbol
while (at least one non-terminal symbol remains in L) {
choose a non-terminal symbol N from L
choose from the grammar a production rule R which has N on its left
replace S by the right side of the rule
}
L now holds a random sentence
Note that if the grammar can generate infinite sentences, then this
algorithm has no guarantee of termination.
Some cunning could be built into such that, at a given configured
sentence length, it starts preferring the rules which lead to the
generation of terminal symbols.
If a given chosen N has two production rules like
N -> N t # t is a terminal symbol
-> t
then the second rule will be preferred (by some heuristic like choose
the rule with maximal terminal symbols and minimal non-terminals).
Speaking of termination, again, the generative POV dictates
the terminology: terminal symbols terminate the generation!
If the grammar isn't context free (two or more symbols appear on
the left hand side of a rule) then you have a more complicated
job. Basically you have to gather all the left hand patterns
from the grammar, and then look for occurences of these patterns
in the sentential form L. From these occurrences, pick one
and expand it. Something like that.
[toc] | [prev] | [next] | [standalone]
| From | Hans-Peter Diettrich <DrDiettrich1@netscape.net> |
|---|---|
| Date | 2018-03-13 12:21 +0100 |
| Message-ID | <18-03-051@comp.compilers> |
| In reply to | #1992 |
Am 12.03.2018 um 22:00 schrieb Kaz Kylheku:
> On 2018-03-11, Hans-Peter Diettrich <DrDiettrich1@netscape.net> wrote:
>> A grammar can be used to *check* for valid sentences of a language, but
>> it also can be used to *create* valid sentences.
>
> Grammars are *defined* in terms of generation. That's why the rules are
> called "production rules" not "recognition rules".
Isn't that depending on the grammar notation?
> Using grammars to generate sentences randomly is algorithmically far
> simpler than the cunning required to parse sentences according to a
> grammar.
Right, at least as long as semantical correctness is not of interest.
> It's a common homework exercise in the first week of compiler courses.
> (Sometimes called compiler courses, though rarely do compilers
> get written.)
I'm near 50 years past that stage ;-)
> If you can use a high level, dynamic language, you can probably
> write the code on a napkin.
>
> let L := { S } # let L be a list containing the start symbol
>
> while (at least one non-terminal symbol remains in L) {
> choose a non-terminal symbol N from L
> choose from the grammar a production rule R which has N on its left
> replace S by the right side of the rule
> }
>
> L now holds a random sentence
Ah, double randomization is a nice trick :-)
> Note that if the grammar can generate infinite sentences, then this
> algorithm has no guarantee of termination.
Right. I thought of giving weights to the alternatives of a NT, and
using multiple sets of weights for initial, continued and final stage of
the generation.
> Some cunning could be built into such that, at a given configured
> sentence length, it starts preferring the rules which lead to the
> generation of terminal symbols.
Fine, with eps (empty) counting as a terminal symbol.
> If a given chosen N has two production rules like
>
> N -> N t # t is a terminal symbol
> -> t
>
> then the second rule will be preferred (by some heuristic like choose
> the rule with maximal terminal symbols and minimal non-terminals).
In my simple grammar case this will be sufficient. Other grammars may
deserve some initial analysis.
> Speaking of termination, again, the generative POV dictates
> the terminology: terminal symbols terminate the generation!
Point taken :-)
> If the grammar isn't context free (two or more symbols appear on
> the left hand side of a rule) then you have a more complicated
> job. Basically you have to gather all the left hand patterns
> from the grammar, and then look for occurences of these patterns
> in the sentential form L. From these occurrences, pick one
> and expand it. Something like that.
This is where replacing the (sequentially) first matching pattern may
not cover all cases of interest.
Thanks for the many hints :-)
DoDi
[toc] | [prev] | [standalone]
Back to top | Article view | comp.compilers
csiph-web