Groups | Search | Server Info | Login | Register
Groups > comp.compilers > #3732
| Path | csiph.com!weretis.net!feeder9.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end |
|---|---|
| From | John R Levine <johnl@taugh.com> |
| Newsgroups | comp.compilers |
| Subject | Paper: LLM Translation of Compiler Intermediate Representation |
| Date | Tue, 12 May 2026 11:34:57 -0400 |
| Organization | Compilers Central |
| Sender | johnl%iecc.com |
| Approved | comp.compilers@iecc.com |
| Message-ID | <26-05-002@comp.compilers> (permalink) |
| MIME-Version | 1.0 |
| Content-Type | text/plain; charset="UTF-8" |
| Injection-Info | gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="41016"; mail-complaints-to="abuse@iecc.com" |
| Keywords | GCC, LLVM |
| Posted-Date | 12 May 2026 11:35:54 EDT |
| X-submission-address | compilers@iecc.com |
| X-moderator-address | compilers-request@iecc.com |
| X-FAQ-and-archives | http://compilers.iecc.com |
| Xref | csiph.com comp.compilers:3732 |
Show key headers only | View raw
They use an LLM to translate between GCC and LLVM intermediate representation, a famously hard task, and claim success even though one table says it's at best 84% correct. Abstract GCC and LLVM underpin much of modern software infrastructure, relying on distinct Intermediate Representations (IRs) to drive optimizations and code generation. However, the semantic and structural differences between these IRs create significant barriers for cross-toolchain interaction, limiting the reuse of compiler frontends, backends, and optimization pipelines across programming languages and compilation ecosystems. Traditional rule-based translators have attempted to bridge this gap, but their complexity and maintenance cost have hindered practical adoption. In this context, Large Language Models (LLMs) appear to be an emerging technology that offers a data-driven alternative, capable of learning complex mappings between heterogeneous compiler IRs directly from sufficiently representative examples. To explore this approach, this paper presents IRIS-14B, a 14-billion-parameter transformer model fine-tuned to translate GIMPLE (as emitted by GCC) to LLVM IR (as emitted by LLVM). The model is trained on paired IRs extracted from C sources and evaluated on the GIMPLE-to-LLVM IR transformation applied to IRs derived from real-world C code and competitive programming problems. To the best of our knowledge, IRIS-14B is the first model trained explicitly for IR-to-IR translation. It outperforms the accuracy of widely used models, including the largest state-of-the-art open models available today, ranging from 13 to 1,000 billion parameters, by up to 44 percentage points. The proposed transformation supports the integration of LLMs as complementary components within hybrid neuro-symbolic compiler architectures, where models such as IRIS-14B act as interoperability layers enabling cross-toolchain workflows without modifying existing compiler passes, while traditional compiler infrastructure continues to perform deterministic compilation and optimization. https://arxiv.org/abs/2605.08247 Regards, John Levine, johnl@taugh.com, Taughannock Networks, Trumansburg NY Please consider the environment before reading this e-mail. https://jl.ly
Back to comp.compilers | Previous | Find similar
Paper: LLM Translation of Compiler Intermediate Representation John R Levine <johnl@taugh.com> - 2026-05-12 11:34 -0400
csiph-web