Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #3647 > unrolled thread
| Started by | John R Levine <johnl@taugh.com> |
|---|---|
| First post | 2025-05-09 12:27 -0400 |
| Last post | 2025-05-16 17:57 +0000 |
| Articles | 8 — 7 participants |
Back to article view | Back to comp.compilers
Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust John R Levine <johnl@taugh.com> - 2025-05-09 12:27 -0400
Re: Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust Derek <derek-nospam@shape-of-code.com> - 2025-05-13 21:30 +0100
Re: Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust arnold@freefriends.org - 2025-05-14 08:21 +0000
Re: Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust Kaz Kylheku <643-408-1753@kylheku.com> - 2025-05-14 20:01 +0000
Re: Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust anton@mips.complang.tuwien.ac.at - 2025-05-15 07:48 +0000
Re: Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust George Neuner <gneuner2@comcast.net> - 2025-05-15 11:52 -0400
Re: Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust cross@spitfire.i.gajendra.net - 2025-05-16 15:42 +0000
Re: Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust Kaz Kylheku <643-408-1753@kylheku.com> - 2025-05-16 17:57 +0000
| From | John R Levine <johnl@taugh.com> |
|---|---|
| Date | 2025-05-09 12:27 -0400 |
| Subject | Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust |
| Message-ID | <25-05-004@comp.compilers> |
Automated tools translate C to Rust but produce lousy Rust code because of C's loose pointer semantics. They use an LLM to improve it somewhat. Abstract There has been a growing interest in translating C code to Rust due to Rust's robust memory and thread safety guarantees. Tools such as C2RUST enable syntax-guided transpilation from C to semantically equivalent Rust code. However, the resulting Rust programs often rely heavily on unsafe constructs--particularly raw pointers--which undermines Rust's safety guarantees. This paper aims to improve the memory safety of Rust programs generated by C2RUST by eliminating raw pointers. Specifically, we propose a peephole raw pointer rewriting technique that lifts raw pointers in individual functions to appropriate Rust data structures. Technically, PR2 employs decision-tree-based prompting to guide the pointer lifting process. Additionally, it leverages code change analysis to guide the repair of errors introduced during rewriting, effectively addressing errors encountered during compilation and test case execution. We implement PR2 as a prototype and evaluate it using gpt-4o-mini on 28 real-world C projects. The results show that PR2 successfully eliminates 13.22% of local raw pointers across these projects, significantly enhancing the safety of the translated Rust code. On average, PR2 completes the transformation of a project in 5.44 hours, at an average cost of $1.46. https://arxiv.org/abs/2505.04852 Regards, John Levine, johnl@taugh.com, Taughannock Networks, Trumansburg NY Please consider the environment before reading this e-mail. https://jl.ly
[toc] | [next] | [standalone]
| From | Derek <derek-nospam@shape-of-code.com> |
|---|---|
| Date | 2025-05-13 21:30 +0100 |
| Message-ID | <25-05-005@comp.compilers> |
| In reply to | #3647 |
All, > Automated tools translate C to Rust but produce lousy Rust code because of > C's loose pointer semantics. They use an LLM to improve it somewhat. Developers could always stay with C and switch on all the pointer+array bounds checking that GCC/LLVM have been supporting for some years (30 in the case of gcc). I have been trying to find out how many products written in Rust actually ship with the checking still switched on. Way back when, most products written in Pascal used to ship with the checking switched off, so that customers did not see the strange errors+program termination. I suspect that the same is happening with Rust. If so, how does using Rust make the code safer than using C without any checking switched on?
[toc] | [prev] | [next] | [standalone]
| From | arnold@freefriends.org |
|---|---|
| Date | 2025-05-14 08:21 +0000 |
| Message-ID | <25-05-006@comp.compilers> |
| In reply to | #3648 |
In article <25-05-005@comp.compilers>, Derek <derek-nospam@shape-of-code.com> wrote: >I suspect that the same is happening with Rust. If so, how does using >Rust make the code safer than using C without any checking switched >on? Rust catches many problems at compile time. I am not at all a Rust expert, or even a novice, but I don't think Rust does runtime bounds checking, since it relies on compiler analysis instead.
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <643-408-1753@kylheku.com> |
|---|---|
| Date | 2025-05-14 20:01 +0000 |
| Message-ID | <25-05-007@comp.compilers> |
| In reply to | #3649 |
On 2025-05-14, arnold@freefriends.org <arnold@freefriends.org> wrote: > In article <25-05-005@comp.compilers>, > Derek <derek-nospam@shape-of-code.com> wrote: >>I suspect that the same is happening with Rust. If so, how does using >>Rust make the code safer than using C without any checking switched >>on? > > Rust catches many problems at compile time. I am not at all a Rust > expert, or even a novice, but I don't think Rust does runtime > bounds checking, since it relies on compiler analysis instead. How would it be safe if you could write a Rust program that asks the user to input a random decimal number, and then uses it an index to access an array, without any check? The compiler will eliminate bounds checks at compile time if it can infer they are unnecessary; e.g. a loop sets up a dummy variable to step over the correct range, and does not mess with it otherwise. -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca
[toc] | [prev] | [next] | [standalone]
| From | anton@mips.complang.tuwien.ac.at |
|---|---|
| Date | 2025-05-15 07:48 +0000 |
| Message-ID | <25-05-008@comp.compilers> |
| In reply to | #3650 |
Kaz Kylheku <643-408-1753@kylheku.com> writes: >On 2025-05-14, arnold@freefriends.org <arnold@freefriends.org> wrote: >> [Rust] relies on compiler analysis instead. > >How would it be safe if you could write a Rust program that asks the >user to input a random decimal number, and then uses it an index to >access an array, without any check? I don't know if Rust does it this way, but it could reject a program that does a[i] if it cannot prove that i is an allowed index for a. For your example, a program like this would be rejected: input i print a[i] (using what little I remember from BASIC syntax because I don't know the Rust syntax:-). If you want the compiler to accept it, you could write input i if i < length[a] then print a[i] else print "index out of range" endif - anton -- M. Anton Ertl anton@mips.complang.tuwien.ac.at http://www.complang.tuwien.ac.at/anton/ [I believe that Rust does runtime checks unless it can prove at compile time that they're not needed. It has a fancy exception system to catch access violations. -John]
[toc] | [prev] | [next] | [standalone]
| From | George Neuner <gneuner2@comcast.net> |
|---|---|
| Date | 2025-05-15 11:52 -0400 |
| Message-ID | <25-05-009@comp.compilers> |
| In reply to | #3649 |
On Wed, 14 May 2025 08:21:51 +0000, arnold@freefriends.org wrote: >In article <25-05-005@comp.compilers>, >Derek <derek-nospam@shape-of-code.com> wrote: >>I suspect that the same is happening with Rust. If so, how does using >>Rust make the code safer than using C without any checking switched >>on? > >Rust catches many problems at compile time. I am not at all a Rust >expert, or even a novice, but I don't think Rust does runtime >bounds checking, since it relies on compiler analysis instead. Debug builds in Rust may do considerable runtime checking depending on what the code is trying to do. There is a small amount of checking done even in release builds. There are always some things that can't be checked at compile time.
[toc] | [prev] | [next] | [standalone]
| From | cross@spitfire.i.gajendra.net |
|---|---|
| Date | 2025-05-16 15:42 +0000 |
| Message-ID | <25-05-011@comp.compilers> |
| In reply to | #3649 |
In article <25-05-006@comp.compilers>, <arnold@freefriends.org> wrote: >In article <25-05-005@comp.compilers>, >Derek <derek-nospam@shape-of-code.com> wrote: >>I suspect that the same is happening with Rust. If so, how does using >>Rust make the code safer than using C without any checking switched >>on? > >Rust catches many problems at compile time. I am not at all a Rust >expert, or even a novice, but I don't think Rust does runtime >bounds checking, since it relies on compiler analysis instead. Other way 'round, mostly. Array bounds checking is performed at runtime, but if the compiler can prove that the bounds check is superfluous (trivial example: the index is the constant 0 for a non-empty array) then it can elide the code that does the check. Someone has put together a nice document demonstrating some of the more useful techniques: https://github.com/Shnatsel/bounds-check-cookbook/ - Dan C.
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <643-408-1753@kylheku.com> |
|---|---|
| Date | 2025-05-16 17:57 +0000 |
| Message-ID | <25-05-012@comp.compilers> |
| In reply to | #3654 |
On 2025-05-16, cross@spitfire.i.gajendra.net <cross@spitfire.i.gajendra.net> wrote: > In article <25-05-006@comp.compilers>, <arnold@freefriends.org> wrote: >>In article <25-05-005@comp.compilers>, >>Derek <derek-nospam@shape-of-code.com> wrote: >>>I suspect that the same is happening with Rust. If so, how does using >>>Rust make the code safer than using C without any checking switched >>>on? >> >>Rust catches many problems at compile time. I am not at all a Rust >>expert, or even a novice, but I don't think Rust does runtime >>bounds checking, since it relies on compiler analysis instead. > > Other way 'round, mostly. Array bounds checking is performed at > runtime, but if the compiler can prove that the bounds check is > superfluous (trivial example: the index is the constant 0 for a > non-empty array) then it can elide the code that does the check. The logic doesn't even have to be specific to array bounds checking. If we know that "i" is in the range 0 to 9, then "if (i < 10) S;" is dead code, whether appearing literally that way in the source code, or whether such a test is generated for an array access. -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca
[toc] | [prev] | [standalone]
Back to top | Article view | comp.compilers
csiph-web