Path: csiph.com!xmission!news.snarked.org!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: "Matt P. Dziubinski" Newsgroups: comp.compilers Subject: Re: Parser Reversed Date: Sun, 11 Mar 2018 15:08:54 +0100 Organization: http://www.wit.edu.pl Lines: 76 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <18-03-040@comp.compilers> References: <18-03-038@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="44798"; mail-complaints-to="abuse@iecc.com" Keywords: parse, tools, question Posted-Date: 12 Mar 2018 16:18:26 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:1989 On 3/11/2018 08:32, Hans-Peter Diettrich wrote: > A grammar can be used to *check* for valid sentences of a language, but > it also can be used to *create* valid sentences. For a pretty printer or > decompiler test I need a sentence generator for logical expressions. For > now the language can be restricted to AND, OR, variables and (kind of) > parentheses. Later on NOT and XOR can be added. RPN is one alternative > for the "kind of parentheses", eliminating the need for a specific > operator precedence. > > Now I'm looking for possible implementations of such a generator, in > addition to my own ideas. So far the output can be anything, e.g. source > code or machine code, or some tree (AST...). > > Any ideas or references to such projects? Hi! Csmith comes to mind: https://embed.cs.utah.edu/csmith/ Reference: Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. PLDI 2011. "Finding and Understanding Bugs in C Compilers" Paper: http://www.cs.utah.edu/~regehr/papers/pldi11-preprint.pdf LtU post: http://lambda-the-ultimate.org/node/4241 Summary (from the paper): "The shape of a program generated by Csmith is governed by a grammar for a subset of C. A program is a collection of type, variable, and function definitions; a function body is a block; a block contains a list of declarations and a list of statements; and a statement is an expression, control-flow construct (e.g., `if`, `return`, `goto`, or `for`), assignment, or block. Assignments are modeled as statementsbnot expressionsbwhich reflects the most common idiom for assignments in C code. We leverage our grammar to produce other idiomatic code as well: in particular, we include a statement kind that represents a loop iterating over an array. The grammar is implemented by a collection of hand-coded C++ classes." You may also want to take a look at the following: * "Effect-Driven QuickChecking of Compilers" (notably, the following goes substantially further than relying solely on the grammar grammar by making use of the type system -- more in the paper): Code (Effect-Driven Compiler Tester): https://github.com/jmid/efftester Paper: http://janmidtgaard.dk/papers/Midtgaard-al%3AICFP17-full.pdf Talk: https://podcasts.ox.ac.uk/effect-driven-quickchecking-compilers * "Structure-aware fuzzing for Clang and LLVM with libprotobuf-mutator" - Kostya Serebryany, Vitaly Buka and Matt Morehouse - 2017 LLVM Developersb Meeting https://www.youtube.com/watch?v=U60hC16HEDY https://llvm.org/devmtg/2017-10/#talk8 See: https://llvm.org/docs/FuzzingLLVM.html In particular: https://github.com/llvm-mirror/clang/tree/master/tools/clang-fuzzer "This directory contains two utilities for fuzzing Clang: clang-fuzzer and clang-proto-fuzzer. Both use libFuzzer to generate inputs to clang via coverage-guided mutation. The two utilities differ, however, in how they structure inputs to Clang. clang-fuzzer makes no attempt to generate valid C++ programs and is therefore primarily useful for stressing the surface layers of Clang (i.e. lexer, parser). clang-proto-fuzzer uses a protobuf class to describe a subset of the C++ language and then uses libprotobuf-mutator to mutate instantiations of that class, producing valid C++ programs in the process. As a result, clang-proto-fuzzer is better at stressing deeper layers of Clang and LLVM." For further reference, perhaps the following compiler correctness resources (literature & software) can also be of help: https://github.com/MattPD/cpplinks/blob/master/compilers.correctness.md Best, Matt P. Dziubinski