Groups > comp.compilers > #3647 > unrolled thread

Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust

Started by	John R Levine <johnl@taugh.com>
First post	2025-05-09 12:27 -0400
Last post	2025-05-16 17:57 +0000
Articles	8 — 7 participants

Back to article view | Back to comp.compilers

  Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust John R Levine <johnl@taugh.com> - 2025-05-09 12:27 -0400
    Re: Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust Derek <derek-nospam@shape-of-code.com> - 2025-05-13 21:30 +0100
      Re: Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust arnold@freefriends.org - 2025-05-14 08:21 +0000
        Re: Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust Kaz Kylheku <643-408-1753@kylheku.com> - 2025-05-14 20:01 +0000
          Re: Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust anton@mips.complang.tuwien.ac.at - 2025-05-15 07:48 +0000
        Re: Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust George Neuner <gneuner2@comcast.net> - 2025-05-15 11:52 -0400
        Re: Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust cross@spitfire.i.gajendra.net - 2025-05-16 15:42 +0000
          Re: Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust Kaz Kylheku <643-408-1753@kylheku.com> - 2025-05-16 17:57 +0000

#3647 — Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust

From	John R Levine <johnl@taugh.com>
Date	2025-05-09 12:27 -0400
Subject	Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust
Message-ID	<25-05-004@comp.compilers>

Automated tools translate C to Rust but produce lousy Rust code because of
C's loose pointer semantics.  They use an LLM to improve it somewhat.

Abstract
There has been a growing interest in translating C code to Rust due to
Rust's robust memory and thread safety guarantees. Tools such as C2RUST
enable syntax-guided transpilation from C to semantically equivalent Rust
code. However, the resulting Rust programs often rely heavily on unsafe
constructs--particularly raw pointers--which undermines Rust's safety
guarantees. This paper aims to improve the memory safety of Rust programs
generated by C2RUST by eliminating raw pointers. Specifically, we propose
a peephole raw pointer rewriting technique that lifts raw pointers in
individual functions to appropriate Rust data structures. Technically, PR2
employs decision-tree-based prompting to guide the pointer lifting
process. Additionally, it leverages code change analysis to guide the
repair of errors introduced during rewriting, effectively addressing
errors encountered during compilation and test case execution. We
implement PR2 as a prototype and evaluate it using gpt-4o-mini on 28
real-world C projects. The results show that PR2 successfully eliminates
13.22% of local raw pointers across these projects, significantly
enhancing the safety of the translated Rust code. On average, PR2
completes the transformation of a project in 5.44 hours, at an average
cost of $1.46.

https://arxiv.org/abs/2505.04852

Regards,
John Levine, johnl@taugh.com, Taughannock Networks, Trumansburg NY
Please consider the environment before reading this e-mail. https://jl.ly

[toc] | [next] | [standalone]

#3648

From	Derek <derek-nospam@shape-of-code.com>
Date	2025-05-13 21:30 +0100
Message-ID	<25-05-005@comp.compilers>
In reply to	#3647

All,

> Automated tools translate C to Rust but produce lousy Rust code because of
> C's loose pointer semantics.  They use an LLM to improve it somewhat.

Developers could always stay with C and switch on all the
pointer+array bounds checking that GCC/LLVM have been supporting for
some years (30 in the case of gcc).

I have been trying to find out how many products written in Rust
actually ship with the checking still switched on.

Way back when, most products written in Pascal used to ship with the
checking switched off, so that customers did not see the strange
errors+program termination.

I suspect that the same is happening with Rust. If so, how does using
Rust make the code safer than using C without any checking switched
on?

[toc] | [prev] | [next] | [standalone]

#3649

From	arnold@freefriends.org
Date	2025-05-14 08:21 +0000
Message-ID	<25-05-006@comp.compilers>
In reply to	#3648

In article <25-05-005@comp.compilers>,
Derek  <derek-nospam@shape-of-code.com> wrote:
>I suspect that the same is happening with Rust. If so, how does using
>Rust make the code safer than using C without any checking switched
>on?

Rust catches many problems at compile time.  I am not at all a Rust
expert, or even a novice, but I don't think Rust does runtime
bounds checking, since it relies on compiler analysis instead.

[toc] | [prev] | [next] | [standalone]

#3650

From	Kaz Kylheku <643-408-1753@kylheku.com>
Date	2025-05-14 20:01 +0000
Message-ID	<25-05-007@comp.compilers>
In reply to	#3649

On 2025-05-14, arnold@freefriends.org <arnold@freefriends.org> wrote:
> In article <25-05-005@comp.compilers>,
> Derek  <derek-nospam@shape-of-code.com> wrote:
>>I suspect that the same is happening with Rust. If so, how does using
>>Rust make the code safer than using C without any checking switched
>>on?
>
> Rust catches many problems at compile time.  I am not at all a Rust
> expert, or even a novice, but I don't think Rust does runtime
> bounds checking, since it relies on compiler analysis instead.

How would it be safe if you could write a Rust program that asks the
user to input a random decimal number, and then uses it an index to
access an array, without any check?

The compiler will eliminate bounds checks at compile time if it can
infer they are unnecessary; e.g. a loop sets up a dummy variable to step
over the correct range, and does not mess with it otherwise.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

[toc] | [prev] | [next] | [standalone]

#3651

From	anton@mips.complang.tuwien.ac.at
Date	2025-05-15 07:48 +0000
Message-ID	<25-05-008@comp.compilers>
In reply to	#3650

Kaz Kylheku <643-408-1753@kylheku.com> writes:
>On 2025-05-14, arnold@freefriends.org <arnold@freefriends.org> wrote:
>> [Rust] relies on compiler analysis instead.
>
>How would it be safe if you could write a Rust program that asks the
>user to input a random decimal number, and then uses it an index to
>access an array, without any check?

I don't know if Rust does it this way, but it could reject a program
that does a[i] if it cannot prove that i is an allowed index for a.
For your example, a program like this would be rejected:

input i
print a[i]

(using what little I remember from BASIC syntax because I don't know the Rust
syntax:-).  If you want the compiler to accept it, you could write

input i
if i < length[a] then
  print a[i]
else
  print "index out of range"
endif

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
[I believe that Rust does runtime checks unless it can prove at compile time that they're not needed.
It has a fancy exception system to catch access violations. -John]

[toc] | [prev] | [next] | [standalone]

#3652

From	George Neuner <gneuner2@comcast.net>
Date	2025-05-15 11:52 -0400
Message-ID	<25-05-009@comp.compilers>
In reply to	#3649

On Wed, 14 May 2025 08:21:51 +0000, arnold@freefriends.org wrote:

>In article <25-05-005@comp.compilers>,
>Derek  <derek-nospam@shape-of-code.com> wrote:
>>I suspect that the same is happening with Rust. If so, how does using
>>Rust make the code safer than using C without any checking switched
>>on?
>
>Rust catches many problems at compile time.  I am not at all a Rust
>expert, or even a novice, but I don't think Rust does runtime
>bounds checking, since it relies on compiler analysis instead.

Debug builds in Rust may do considerable runtime checking depending on
what the code is trying to do.

There is a small amount of checking done even in release builds. There
are always some things that can't be checked at compile time.

[toc] | [prev] | [next] | [standalone]

#3654

From	cross@spitfire.i.gajendra.net
Date	2025-05-16 15:42 +0000
Message-ID	<25-05-011@comp.compilers>
In reply to	#3649

In article <25-05-006@comp.compilers>,  <arnold@freefriends.org> wrote:
>In article <25-05-005@comp.compilers>,
>Derek  <derek-nospam@shape-of-code.com> wrote:
>>I suspect that the same is happening with Rust. If so, how does using
>>Rust make the code safer than using C without any checking switched
>>on?
>
>Rust catches many problems at compile time.  I am not at all a Rust
>expert, or even a novice, but I don't think Rust does runtime
>bounds checking, since it relies on compiler analysis instead.

Other way 'round, mostly.  Array bounds checking is performed at
runtime, but if the compiler can prove that the bounds check is
superfluous (trivial example: the index is the constant 0 for a
non-empty array) then it can elide the code that does the check.
Someone has put together a nice document demonstrating some of
the more useful techniques:

https://github.com/Shnatsel/bounds-check-cookbook/

	- Dan C.

[toc] | [prev] | [next] | [standalone]

#3655

From	Kaz Kylheku <643-408-1753@kylheku.com>
Date	2025-05-16 17:57 +0000
Message-ID	<25-05-012@comp.compilers>
In reply to	#3654

On 2025-05-16, cross@spitfire.i.gajendra.net <cross@spitfire.i.gajendra.net> wrote:
> In article <25-05-006@comp.compilers>,  <arnold@freefriends.org> wrote:
>>In article <25-05-005@comp.compilers>,
>>Derek  <derek-nospam@shape-of-code.com> wrote:
>>>I suspect that the same is happening with Rust. If so, how does using
>>>Rust make the code safer than using C without any checking switched
>>>on?
>>
>>Rust catches many problems at compile time.  I am not at all a Rust
>>expert, or even a novice, but I don't think Rust does runtime
>>bounds checking, since it relies on compiler analysis instead.
>
> Other way 'round, mostly.  Array bounds checking is performed at
> runtime, but if the compiler can prove that the bounds check is
> superfluous (trivial example: the index is the constant 0 for a
> non-empty array) then it can elide the code that does the check.

The logic doesn't even have to be specific to array bounds checking.

If we know that "i" is in the range 0 to 9, then "if (i < 10) S;"
is dead code, whether appearing literally that way in the source
code, or whether such a test is generated for an array access.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

[toc] | [prev] | [standalone]

csiph-web

Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust

Contents

#3647 — Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust

#3648

#3649

#3650

#3651

#3652

#3654

#3655