(Initial, incomplete commit for draft pull request) This completely changes the way Verus erases ghost code for compilation and the way Verus checks tracked variable lifetimes in ghost code. In the long run, this should allow us to completely replace the old ghost code erasure with the proposed ghost block extension to rustc. In the short run, this should allow us to replace the old ghost code erasure with a macro-based erasure that would remove the last major dependency on our fork of rustc, hopefully allowing us to use a standard rustc build. It also simplifies the way programmers use tracked
variables (at least compared to the current rules for tracked
).
I'll first review the old erasure mechanism, then discuss the long-term proposal, then the short-term proposal.
Note: lifetime_generate.rs is still incomplete in this draft commit.
Old Erasure Mechanism
The old mechanism was based on the observation that only exec code gets compiled, and only exec and proof code get borrow checked:
exec/proof/spec code --> type-check, mode-check
exec/proof code --> type-check, mode-check --> lifetime (borrow) check
exec code --> type-check, mode-check --> lifetime (borrow) check --> compile
Based on this, the old mechanism erased spec code after type checking and then erased proof code after lifetime checking, leaving only exec code at the end to compile. However, while this erasure was simple in principle, it was difficult to erase code in Rust's HIR/THIR/MIR formats without making major changes to rustc. Therefore, the actual implementation ran rustc three times, erasing different amounts of code near the rustc front end (in Rust's AST format) in each of the three runs. This had many drawbacks:
- Running rustc multiple times was slow. In particular, users only get verification feedback after the 2nd run, doubling the startup latency for verification.
- Mode checking operated on Rust HIR, while ghost code erasure operated on Rust AST. To connect AST to HIR, we used spans, which was a hack.
- We needed our own fork of rustc.
- Running rustc three times with slightly different AST meant that type inference was not always consistent between the three runs. Users often had to add extra type annotations to make sure that type checking succeeded in all three runs.
Ghost blocks
There has been a recent proposal to add ghost code as a Rust feature. This proposal is syntactic: ghost code is marked explicitly in the source code by writing the ghost code in "ghost blocks". In addition, ghost variables may be represented using new types (e.g. x: Ghost<t>
instead of ghost x: t
). This proposal would have the advantage of requiring only one run of rustc, rather than three.
To prepare for this proposal, Verus has been switching over the last year to a more coarse-grained, syntactic style of expressing ghost code. For example, calls from exec code to proof lemmas are now written inside of explicit proof
blocks. In fact, the syntactic distinction between exec and non-exec code may help make the code more readable. However, we've also moved to a style that syntactically distinguishes between proof code and spec code, and this seems to be just annoying. In particular, proof (tracked) arguments to functions have to be marked as tracked
, as in let z = f(tracked x, tracked y);
rather than just let z = f(x, y);
. The mode checker can easily infer these tracked
annotations, but unfortunately, the mode checking runs after the syntactic splitting into exec/proof/spec code, when it's too late for the mode inference to help:
all code --> split into exec/proof/spec --> type-check, mode-check --> lifetime (borrow) check --> compile
Long-term proposal
To make programming in Verus more pleasant, this pull request reduces the reliance on syntax to distinguish between the different code modes. Syntax is used to distinguish between exec and non-exec code (e.g. by using proof
blocks), but not to distinguish between proof and spec code inside a proof
block. This allows programmers to write let z = f(x, y);
rather than let z = f(tracked x, tracked y);
.
This pull request assumes the simplest possible treatment of lifetime checking for non-exec code in the new ghost code Rust proposal: it assumes that the proposal skips lifetime checking of non-exec code entirely. This means that Verus needs to have a separate way to check lifetimes of proof code. This separate checking is the main purpose of this pull request. Roughly, this separate checking looks like the following:
all code --> split into exec/non-exec --> type-check, mode-check --+--> lifetime (borrow) check exec code --> compile
|
+--> lifetime (borrow) check proof code
In this design, rustc performs type checking of all code, Verus performs mode checking of all code, rustc performs lifetime checking and compilation of exec code, and Verus performs lifetime checking of proof code. (Verus will only need to check proof code that manipulates tracked data; proofs that work only on spec data trivially satisfy lifetime constraints.)
This pull request implements Verus's lifetime checking for proof code. This checking no longer uses erase.rs and erase_rewrite.rs; these will eventually be deleted entirely. Instead, the checking works by constructing synthetic Rust code to abstractly represent the various actions that the proof code performs, such as variable declarations, function calls, and pattern matching. It then sends this synthetic code to a separate invocation of rustc. (We could also consider using some other checker, such as Polonius, although generating synthetic Rust code is fairly easy.) Since this synthetic code generation occurs after mode inference, it can use the infered modes to guide the synthetic code generation.
This approach should solve the drawbacks listed above:
- It only requires running rustc once on the original source code. (It does internally invoke rustc a second time, but only on a relatively small amount of synthetic code with no macros. Furthermore, when verifying only one function at a time, we could configure Verus to only run the lifetime checking only on the function being verified.)
- Both the mode checking and proof lifetime checking operate on Rust HIR/THIR, not on Rust AST. There's no need to connect AST to HIR with spans.
- Assuming the ghost code proposal becomes part of Rust, there will be no need to use our own fork of rustc. (Until this happens, we might need our own fork of rustc to implement the ghost code proposal. However, see the short-term proposal below for a way to use the standard rustc even without the ghost code proposal.)
- Type checking happens only once, so there's no chance for multiple runs of rustc to encounter different type inference results. Full type information is available to when generating the synthetic code, so it doesn't need to rely on type inference.
Short-term proposal
The long-term proposal relies on the ghost code proposal being implemented and adopted. Before that happens, we can still use this pull request's proof-mode lifetime checking in a slightly hackier way to get some of the benefits of the long-term proposal. In particular, the short-term proposal should be able to run with standard rustc, without relying on a separate fork of rustc, and the short-term proposal will allow programmers to write let z = f(x, y);
rather than let z = f(tracked x, tracked y);
. The short-term proposal still requires running rustc multiple times on the original source code, although only twice instead of three times:
exec code --> type-check, mode-check --> lifetime (borrow) check exec code --> compile
all code --> type-check, mode-check --+
|
+--> lifetime (borrow) check proof code
To avoid needing a fork of rustc, the short-term proposal uses the syntax macro to erase non-exec code, rather than relying on erase.rs and erase_rewrite.rs.
This pull request implements the short-term proposal in its entirety (when using the new command-line option --erasure macro
), except that there are still a few leftover unrelated dependencies of Verus on our rustc fork, so it doesn't completely get us off the fork. But the remaining dependencies should be relatively easy to remove.
Possible alternative (even longer-term): MIR rewriting
There may be other ways to implement erasure and lifetime checking for Verus. For example, if rustc exposes an official interface to read and transform "stable" MIR code, we might be able to use MIR transformation to erase spec code and proof code. However, this support is likely to be further in the future. It would also mean we'd have to perform MIR transformations based on HIR mode checking, although we might be able to rerun the mode checking at the MIR level. I think we should continue to think about alternatives like this, but my hope is that this pull request can improve our lives in the short term regardless of future alternatives.
Ghost types
Right now, our ghost types (Ghost<T>
and Tracked<T>
) are somewhat annoying because programmers have to explicitly dereference them to retrieve the underlying T
ghost value. This is an orthogonal issue to this pull request, and this pull request does not address it. However, as part of the long-term solution above, but separately from this pull request, we could try to integrate some automatic dereferencing of ghost types into the ghost code proposal, so that the Rust type checker could automatically add the necessary dereferencing. As far as I recall, something like this was part of the original discussion for ghost code, although I don't think the details were ever settled.
Implementation details
Order of operations: long-term proposal
As discussed above, in the long-term proposal, rustc would only run once on the original source code. Here's what would happen during that one run:
- Parsing, AST construction, macro expansion (including the Verus syntax macro)
- Rust AST -> HIR
- Rust's type checking
- Rust HIR/THIR -> MIR conversion
- Rust's lifetime/borrow checking for non-ghost (exec) code
- Verus HIR/THIR -> VIR conversion
- Verus mode checking
- Verus lifetime/borrow checking for proof code (by generating synthetic code and running rustc)
- Verus SMT solving
- Rust compilation to machine code (if --compile is set)
Order of operations: short-term proposal
The order of operations in the short-term proposal is a little messier because we have to run rustc twice on the original source code, once erasing ghost code and once keeping ghost code. This causes some tension in the order of operations: it would be cleanest to run one rustc invocation entirely, and then the other entirely, but this would be slow. For example, if we ran the ghost-erase rustc and then the ghost-keep rustc, then the user would receive no feedback about verification until parsing and macro expansion for both rustc invocations had completed. On the other hand, if we ran the ghost-keep rustc and then the ghost-erase rustc, the user would receive no feedback about lifetime/borrow checking errors in exec functions until verification had completed. So for better latency, the implementation interleaves the two rustc invocations (marked GHOST for keeping ghost and EXEC for erasing ghost):
- GHOST: Parsing, AST construction, macro expansion (including the Verus syntax macro, keeping ghost code)
- GHOST: Rust AST -> HIR
- GHOST: Rust's type checking
- GHOST: Verus HIR/THIR -> VIR conversion
- GHOST: Verus mode checking
- GHOST: Verus lifetime/borrow checking for both proof and exec code (by generating synthetic code and running rustc), but delaying any lifetime/borrow checking error messages until EXEC has a chance to run
- GHOST: If there were no lifetime/borrow checking errors, run Verus SMT solving
- EXEC: Parsing, AST construction, macro expansion (including the Verus syntax macro, erasing ghost code)
- EXEC: Rust AST -> HIR
- EXEC: Rust's type checking
- EXEC: Rust HIR/THIR -> MIR conversion
- EXEC: Rust's lifetime/borrow checking for non-ghost (exec) code
- GHOST: If there were no EXEC errors, but there were GHOST lifetime/borrow checking errors, print the GHOST lifetime/borrow checking errors
- EXEC: Rust compilation to machine code (if --compile is set)
To avoid having to run the EXEC lifetime/borrow checking before verification, which would add latency before verification, the short-term implementation takes advantage of the synthetic-code lifetime/borrow checking to perform a quick early test for lifetime/borrow checking on all code (even pure exec code). If this detects any lifetime/borrow errors, the implementation skips verification and gives the EXEC rustc a chance to run, on the theory that the EXEC rustc will generate better error messages for any exec lifetime/borrow errors. The GHOST lifetime/borrow errors are printed only if the EXEC rustc finds no errors. Note that the long-term proposal avoids this complex interleaving; rustc only runs once on the original source code, and it naturally prints the regular exec lifetime/borrow checking errors first without so much effort. The long-term proposal also doesn't require the synthetic code generation for all functions; it only requires it for proof code.
Synthetic code generation
The --log-all
or --log-lifetime
options will print the generated synthetic code for lifetime/borrow checking to a file in the .verus-log
directory. For the following example source code:
struct S {
}
spec fn fspec(i: u32, s1: S, s2: S) -> u32 {
i
}
proof fn fproof(i: u32, tracked s1: S, tracked s2: S) -> u32 {
i
}
fn fexec(i: u32, s1: S, s2: S) -> u32 {
i
}
proof fn test_proof(i: u32, tracked s: S) {
let j = fspec(i, s, s);
let k = fproof(i, s, s);
}
fn test_exec(i: u32, s: S)
requires i < 100
{
proof {
let j = fspec(i, s, s);
let k = fproof(i, s, s);
}
let k = fexec(i, s, s);
let n = fexec(i, s, s);
}
the following crate-lifetime.rs
file is generated when logging is enabled:
#![allow(non_camel_case_types)]
fn f<A, B>(a: A) -> B { unimplemented!() }
fn notdet() -> bool { unimplemented!() }
struct T_0_S {
}
fn x_3_f_fexec_3(
x_1_i: u32,
x_4_s1: T_0_S,
x_5_s2: T_0_S,
) -> u32
{
x_1_i
}
fn x_8_f_test_proof_8(
x_6_s: T_0_S,
)
{
f::<_, u32>((x_6_s, x_6_s, ));
}
fn x_12_f_test_exec_12(
x_1_i: u32,
x_6_s: T_0_S,
)
{
{
f::<_, u32>((x_6_s, x_6_s, ));
};
let x_9_k = f::<_, u32>((x_1_i, x_6_s, x_6_s, ));
let x_10_n = f::<_, u32>((x_1_i, x_6_s, x_6_s, ));
}
Note that the synthetic code erases all spec code, keeping only exec and proof code. Also note that the synthetic code is simpler than the original source code in various ways, keeping only what's necessary for lifetime/borrow checking. For example, all function calls are redirected to the same function f
, which takes the function arguments as a tuple.
Error reporting
As discussed in the order of operations above, the implementation lets rustc print any exec-mode lifetime/borrow errors first. In the example above, this results in the following output:
error[E0382]: use of moved value: `s`
--> .\rust_verify\example\features.rs:210:25
|
203 | fn test_exec(i: u32, s: S)
| - move occurs because `s` has type `S`, which does not implement the `Copy` trait
...
210 | let k = fexec(i, s, s);
| - ^ value used here after move
| |
| value moved here
error[E0382]: use of moved value: `s`
--> .\rust_verify\example\features.rs:211:22
|
203 | fn test_exec(i: u32, s: S)
| - move occurs because `s` has type `S`, which does not implement the `Copy` trait
...
210 | let k = fexec(i, s, s);
| - value moved here
211 | let n = fexec(i, s, s);
| ^ value used here after move
error[E0382]: use of moved value: `s`
--> .\rust_verify\example\features.rs:211:25
|
203 | fn test_exec(i: u32, s: S)
| - move occurs because `s` has type `S`, which does not implement the `Copy` trait
...
211 | let n = fexec(i, s, s);
| - ^ value used here after move
| |
| value moved here
error: aborting due to 3 previous errors
If we change the example by commenting out the exec-mode error lines, then we will see the ghost-mode lifetime/borrow errors from the synthetic code:
error: use of moved value: `s`
--> .\rust_verify\example\features.rs:198:37
|
198 | proof fn test_proof(i: u32, tracked s: S) {
| ^
199 | let j = fspec(i, s, s);
200 | let k = fproof(i, s, s);
| ^ ^
error: use of moved value: `s`
--> .\rust_verify\example\features.rs:203:22
|
203 | fn test_exec(i: u32, s: S)
| ^
...
208 | let k = fproof(i, s, s);
| ^ ^
error: aborting due to 2 previous errors
The error messages here aren't quite as detailed, but hopefully they should be adequate. The ghost-mode messages are generated by running rustc on the synthetic code, with rustc configured to generate errors in JSON format, capturing the JSON errors and parsing them to retrieve the line/column information and error messages, converting the synthetic line/column information back into spans for the original source code, and then sending the error messages and spans to the rustc diagnostics for the original source code. It's possible that if we used, say, Polonius, instead of rustc, to do the lifetime/borrow checking, we wouldn't have to jump through so many hoops to manage the line/column/span information in the errors, but I haven't looked into this.