Groups > comp.compilers > #3560 > unrolled thread

Figuring out grammars from examples

Started by	John R Levine <johnl@taugh.com>
First post	2024-04-12 18:39 -0400
Last post	2024-04-15 02:17 +0100
Articles	3 — 2 participants

Back to article view | Back to comp.compilers

  Figuring out grammars from examples John R Levine <johnl@taugh.com> - 2024-04-12 18:39 -0400
    Re: Figuring out grammars from examples Derek <derek@shape-of-code.com> - 2024-04-13 12:06 +0100
      Re: Figuring out grammars from examples Derek <derek@shape-of-code.com> - 2024-04-15 02:17 +0100

#3560 — Figuring out grammars from examples

From	John R Levine <johnl@taugh.com>
Date	2024-04-12 18:39 -0400
Subject	Figuring out grammars from examples
Message-ID	<24-04-001@comp.compilers>

There's been a surprising amount of work on taking language strings and
feeding them to a box that infers a grammar for them.  One of the harder
bits is figuring out nesting constructs, which in the past has often
required hints.

In this paper, a Visibly Pushdown Grammar is large but more tractable
subset of context free grammars.

V-Star: Learning Visibly Pushdown Grammars from Program Inputs
Xiaodong Jia, Gang Tan

Accurate description of program inputs remains a critical challenge in the
field of programming languages. Active learning, as a well-established
field, achieves exact learning for regular languages. We offer an
innovative grammar inference tool, V-Star, based on the active learning of
visibly pushdown automata. V-Star deduces nesting structures of program
input languages from sample inputs, employing a novel inference mechanism
based on nested patterns. This mechanism identifies token boundaries and
converts languages such as XML documents into VPLs. We then adapted
Angluin's L-Star, an exact learning algorithm, for VPA learning, which
improves the precision of our tool. Our evaluation demonstrates that
V-Star effectively and efficiently learns a variety of practical grammars,
including S-Expressions, JSON, and XML, and outperforms other
state-of-the-art tools.

https://arxiv.org/abs/2404.04201v1

Regards,
John Levine, johnl@taugh.com, Taughannock Networks, Trumansburg NY
Please consider the environment before reading this e-mail. https://jl.ly

[toc] | [next] | [standalone]

#3561

From	Derek <derek@shape-of-code.com>
Date	2024-04-13 12:06 +0100
Message-ID	<24-04-002@comp.compilers>
In reply to	#3560

John,

> including S-Expressions, JSON, and XML, and outperforms other
> state-of-the-art tools.

They did not compare their tool against LLMs, which these days
outperform non-humans on language related tasks.
[I would like to see some actual data. In my experience, LLMs are
impressive, confident, and frequently wrong. -John]

[toc] | [prev] | [next] | [standalone]

#3563

From	Derek <derek@shape-of-code.com>
Date	2024-04-15 02:17 +0100
Message-ID	<24-04-004@comp.compilers>
In reply to	#3561

John,

> [I would like to see some actual data. In my experience, LLMs are
> impressive, confident, and frequently wrong. -John]

LLM's performance on fact recall is poor.

It seems to be much better than other tools when dealing with
grammars. I could not find the example I was looking for, but here are
two others: https://arxiv.org/abs/2305.19234
https://szopa.medium.com/teaching-chatgpt-to-speak-my-sons-invented-language-9d109c0a0f05

My own experience using local (i.e., very small)
models
https://shape-of-code.com/2024/02/25/extracting-named-entities-from-a-change-log-using-an-llm/

[toc] | [prev] | [standalone]

csiph-web

Figuring out grammars from examples

Contents

#3560 — Figuring out grammars from examples

#3561

#3563