Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #3560 > unrolled thread
| Started by | John R Levine <johnl@taugh.com> |
|---|---|
| First post | 2024-04-12 18:39 -0400 |
| Last post | 2024-04-15 02:17 +0100 |
| Articles | 3 — 2 participants |
Back to article view | Back to comp.compilers
Figuring out grammars from examples John R Levine <johnl@taugh.com> - 2024-04-12 18:39 -0400
Re: Figuring out grammars from examples Derek <derek@shape-of-code.com> - 2024-04-13 12:06 +0100
Re: Figuring out grammars from examples Derek <derek@shape-of-code.com> - 2024-04-15 02:17 +0100
| From | John R Levine <johnl@taugh.com> |
|---|---|
| Date | 2024-04-12 18:39 -0400 |
| Subject | Figuring out grammars from examples |
| Message-ID | <24-04-001@comp.compilers> |
There's been a surprising amount of work on taking language strings and feeding them to a box that infers a grammar for them. One of the harder bits is figuring out nesting constructs, which in the past has often required hints. In this paper, a Visibly Pushdown Grammar is large but more tractable subset of context free grammars. V-Star: Learning Visibly Pushdown Grammars from Program Inputs Xiaodong Jia, Gang Tan Accurate description of program inputs remains a critical challenge in the field of programming languages. Active learning, as a well-established field, achieves exact learning for regular languages. We offer an innovative grammar inference tool, V-Star, based on the active learning of visibly pushdown automata. V-Star deduces nesting structures of program input languages from sample inputs, employing a novel inference mechanism based on nested patterns. This mechanism identifies token boundaries and converts languages such as XML documents into VPLs. We then adapted Angluin's L-Star, an exact learning algorithm, for VPA learning, which improves the precision of our tool. Our evaluation demonstrates that V-Star effectively and efficiently learns a variety of practical grammars, including S-Expressions, JSON, and XML, and outperforms other state-of-the-art tools. https://arxiv.org/abs/2404.04201v1 Regards, John Levine, johnl@taugh.com, Taughannock Networks, Trumansburg NY Please consider the environment before reading this e-mail. https://jl.ly
[toc] | [next] | [standalone]
| From | Derek <derek@shape-of-code.com> |
|---|---|
| Date | 2024-04-13 12:06 +0100 |
| Message-ID | <24-04-002@comp.compilers> |
| In reply to | #3560 |
John, > including S-Expressions, JSON, and XML, and outperforms other > state-of-the-art tools. They did not compare their tool against LLMs, which these days outperform non-humans on language related tasks. [I would like to see some actual data. In my experience, LLMs are impressive, confident, and frequently wrong. -John]
[toc] | [prev] | [next] | [standalone]
| From | Derek <derek@shape-of-code.com> |
|---|---|
| Date | 2024-04-15 02:17 +0100 |
| Message-ID | <24-04-004@comp.compilers> |
| In reply to | #3561 |
John, > [I would like to see some actual data. In my experience, LLMs are > impressive, confident, and frequently wrong. -John] LLM's performance on fact recall is poor. It seems to be much better than other tools when dealing with grammars. I could not find the example I was looking for, but here are two others: https://arxiv.org/abs/2305.19234 https://szopa.medium.com/teaching-chatgpt-to-speak-my-sons-invented-language-9d109c0a0f05 My own experience using local (i.e., very small) models https://shape-of-code.com/2024/02/25/extracting-named-entities-from-a-change-log-using-an-llm/
[toc] | [prev] | [standalone]
Back to top | Article view | comp.compilers
csiph-web