Path: csiph.com!weretis.net!feeder9.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: John R Levine Newsgroups: comp.compilers Subject: Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust Date: Fri, 09 May 2025 12:27:11 -0400 Organization: Compilers Central Sender: johnl%iecc.com Approved: comp.compilers@iecc.com Message-ID: <25-05-004@comp.compilers> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="9767"; mail-complaints-to="abuse@iecc.com" Keywords: C, Rust, optimize Posted-Date: 09 May 2025 12:30:28 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:3647 Automated tools translate C to Rust but produce lousy Rust code because of C's loose pointer semantics. They use an LLM to improve it somewhat. Abstract There has been a growing interest in translating C code to Rust due to Rust's robust memory and thread safety guarantees. Tools such as C2RUST enable syntax-guided transpilation from C to semantically equivalent Rust code. However, the resulting Rust programs often rely heavily on unsafe constructs--particularly raw pointers--which undermines Rust's safety guarantees. This paper aims to improve the memory safety of Rust programs generated by C2RUST by eliminating raw pointers. Specifically, we propose a peephole raw pointer rewriting technique that lifts raw pointers in individual functions to appropriate Rust data structures. Technically, PR2 employs decision-tree-based prompting to guide the pointer lifting process. Additionally, it leverages code change analysis to guide the repair of errors introduced during rewriting, effectively addressing errors encountered during compilation and test case execution. We implement PR2 as a prototype and evaluate it using gpt-4o-mini on 28 real-world C projects. The results show that PR2 successfully eliminates 13.22% of local raw pointers across these projects, significantly enhancing the safety of the translated Rust code. On average, PR2 completes the transformation of a project in 5.44 hours, at an average cost of $1.46. https://arxiv.org/abs/2505.04852 Regards, John Levine, johnl@taugh.com, Taughannock Networks, Trumansburg NY Please consider the environment before reading this e-mail. https://jl.ly