Tools to feature more lenient Polonius-based borrow-checker patterns in stable Rust

Overview

Though this be madness, yet there is method in 't.

stable-rust-stands-atop-dead-zpolonius

More context
  1. Hamlet:

    For yourself, sir, shall grow old as I am – if, like a crab, you could go backward.

  2. Polonius:

    Though this be madness, yet there is method in 't.

  3. Polonius, eventually:

    polonius-lying-dead

::polonius-the-crab

Tools to feature more lenient Polonius-based borrow-checker patterns in stable Rust.

Repository Latest version Documentation MSRV unsafe internal no_std compatible License CI

Rationale: limitations of the NLL borrow checker

See the following issues:

All these examples boil down to the following canonical instance:

#![forbid(unsafe_code)]
use ::std::{
    collections::HashMap,
};

/// Typical example of lack-of-Polonius limitation: get_or_insert pattern.
/// See https://nikomatsakis.github.io/rust-belt-rust-2019/#72
fn get_or_insert(
    map: &mut HashMap<u32, String>,
) -> &String {
    if let Some(v) = map.get(&22) {
        return v;
    }
    map.insert(22, String::from("hi"));
    &map[&22]
}
error message
 error[E0502]: cannot borrow `*map` as mutable because it is also borrowed as immutable
  --> src/lib.rs:53:5
   |
14 |     map: &mut HashMap<u32, String>,
   |          - let's call the lifetime of this reference `'1`
15 | ) -> &String {
16 |     if let Some(v) = map.get(&22) {
   |                      --- immutable borrow occurs here
17 |         return v;
   |                - returning this value requires that `*map` be borrowed for `'1`
18 |     }
19 |     map.insert(22, String::from("hi"));
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ mutable borrow occurs here

Explanation

Click to see

Now, this pattern is known to be sound / a false positive from the current borrow checker, NLL.

  • The technical reason behind it is the named / in-function-signature lifetime involved in the borrow: contrary to a fully-in-body anonymous borrow, borrows that last for a "named" / outer-generic lifetime are deemed to last until the end of the function, across all possible codepaths (even those unreachable whence the borrow starts).

    • a way to notice this difference is to, when possible, rewrite the function as a macro. By virtue of being syntactically inlined, it will involve anonymous lifetimes and won't cause any trouble.

Workarounds

So "jUsT uSe UnSaFe" you may suggest. But this is tricky:

  • does your use-case really fit this canonical example?

    • or a variant: will it still fit it as the code evolves / in face of code refactorings?
  • even when we know "we can use unsafe", actually using it is subtle and error-prone. Since &mut borrows are often involved in this situation, one may accidentally end up transmuting a & reference to a &mut reference, which is always UB.

  • both of these issues lead to a certain completely legitimate allergy to unsafe_code, and the very reassuring #![forbid(unsafe_code)]-at-the-root-of-the-crate pattern.

Non-unsafe albeit cumbersome workarounds for lack-of-Polonius issues

Click to see
  • if possible, reach for a dedicated API. For instance, the get_or_insert() example can be featured using the .entry() API:

    #![forbid(unsafe_code)]
    use ::std::{
        collections::HashMap,
    };
    
    fn get_or_insert(
        map: &mut HashMap<u32, String>,
    ) -> &String {
        map.entry(22).or_insert_with(|| String::from("hi"))
    }

    Sadly, the reality is that you won't always have such convenient APIs at your disposal.

  • otherwise, you can perform successive non-idiomatic lookups to avoid holding the borrow for too long:

    #![forbid(unsafe_code)]
    use ::std::{
        collections::HashMap,
    };
    
    fn get_or_insert(
        map: &mut HashMap<u32, String>,
    ) -> &String {
        // written like this to show the "transition path" from previous code
        let should_insert =
            if let Some(_discarded) = map.get(&22) {
                false
            } else {
                true
            }
        ;
        // but `should_insert` can obviously be shortened down to `map.get(&22).is_none()`
        // or, in this very instance, to `map.contains_key(&22).not()`.
        if should_insert {
            map.insert(22, String::from("hi"));
        }
        map.get(&22).unwrap() // or `&map[&22]`
    }
  • finally, related to the "this only happens with concrete named lifetimes" issue, a clever non-unsafe albeit cumbersome way to circumvent the limitation is to use CPS / callbacks / a scoped API:

    #![forbid(unsafe_code)]
    use ::std::{
        collections::HashMap,
    };
    
    fn with_get_or_insert<R>(
        map: &mut HashMap<u32, String>,
        yield_:     impl FnOnce(
    /* -> */ &String
                    ) -> R ) -> R
    {
        if let Some(v) = map.get(&22) {
            yield_(v)
        } else {
            map.insert(22, String::from("hi"));
            yield_(&map[&22])
        }
    }

While you should try these workarounds first and see how they apply to your codebase, sometimes they're not applicable or way too cumbersome compared to "a tiny bit of unsafe".

In that case, as with all the cases of known-to-be-sound unsafe patterns, the ideal solution is to factor it out down to its own small and easy to review crate or module, and then use the non-unsafe fn API thereby exposed 👌 .

Enters ::polonius-the-crab

polonius-the-crab

Explanation of its implementation

Click to see

So, back to that "safety encapsulation" idea:

  1. let's find a canonical instance of this borrow checker issue that is known to be sound and accepted under Polonius;

  2. and tweak it so that it can then be re-used as a general-purpose tool for most of these issues.

And if we stare at the borrow checker issues above, we can see there are two defining ingredients:

  • An explicit generic lifetime parameter (potentially elided);
  • A branch, where one of the branches returns based on that borrow, whilst the other is no longer interested in it.

The issue is then that that second branch ought to get back access to the stuff borrowed in the first branch, but the current borrow checker denies it.

That's where we'll sprinkle some correctly-placed unsafe to make the "borrow checker look the other way" just for a moment, the right moment.

This thus gives us (in pseudo-code first):

fn polonius<'r, T> (
    borrow: &'r mut T,
    branch:
        impl // generic type to apply to all possible scopes.
        for<'any> // <- higher-order lifetime ensures the `&mut T` infected with it…
        FnOnce(&'any mut T)      // …can only escape the closure…
                    // vvvv        … through its return type and its return type only.
          -> Option< _<'any> > // <- The `Some` / `None` discriminant represents the branch info.
                  // ^^^^^^^
                  // some return type allowed to depend on `'any`.
                  // For instance, in the case of `get_or_insert`, this could
                  // have been `&'any String` (or `Option<&'any String>`).
                  // Bear with me for the moment and tolerate this pseudo-code.
    ,
) -> Result< // <- we "forward the branch", but with data attached to the fallback one (`Err(…)`).
        _<'r>, // <- "plot twist": `'any` above was `'r` !
        &'r mut T, // <- through Arcane Magic™ we get to transmute the `None` into an `Err(borrow)`
    >
{
    let tentative_borrow = &mut *borrow; // reborrow
    if let Some(dependent) = branch(tentative_borrow) {
        /* within this branch, the reborrow needs to last for `'r` */
        return Ok(dependent);
    }
    /* but within this branch, the reborrow needs to have ended: only Polonius supports that kind of logic */

    // give the borrow back
    Err(borrow) // <- without Polonius this is denied
}

This function, ignoring that generic unspecified _<'…> return type in pseudo-code, does indeed represent a canonical example of the borrow checker issue (without -Zpolonius, it will reject the Err(borrow) line saying that borrow needs to be borrowed for 'r so that dependent is, and that 'r spans until any end of function (the borrow checker bug).

Whereas with -Zpolonius it is accepted.

The ArcaneMagic™

The correct use of unsafe, here, to palliate the lack of -Zpolonius, is to change:

let tentative_borrow = &mut *borrow; // reborrow

into:

let tentative_borrow = unsafe { &mut *(borrow as *mut _) }; // reborrow

where unsafe { &mut *(thing as *mut _) } is the canonical way to perform lifetime(-of-the-borrow) extension: the lifetime of that &mut borrow is then no longer tied, in any way, to 'r nor to *borrow.

  • Some of you might have been tempted to use mem::transmute. While that does indeed work, it is a strictly more flexible API, which in the case of unsafe, means it's a strictly more dangerous API. With transmute, for instance, when the borrowee has lifetime parameters of its own, those may be erased as well, whereas a downgrade-to-pointer-and-upgrade-back-to-ref operation is guaranteed to "erase" only the outer lifetime of the borrow, leaving the inner type untouched: definitely safer.

The borrow checker no longer holds our hand, as far as overlapped usage of borrow and tentative_borrow is concerned (which would be UB). It is now up to us to ensure no runtime path can ever lead to such borrows overlapping.

And indeed they don't, as the simple branch showcases:

  • in the Some branch, the dependent is still borrowing tentative_borrow, and thus, *borrow. But we do not use borrow anymore in that branch, nor in the caller's body, as long as dependent is used. Indeed, signature-wise, we do tell that that dependent return value, of type _<'r>, is borrowing from *borrow, due to that repetition of the 'r name.

  • in the None branch, there is no dependent, and tentative_borrow isn't used anymore, so it is sound to refer to borrow again.

In other words:

Though this be unsafe, yet there is soundness in 't.

As an extra precaution, this crate does even guard that usage of unsafe through a cfg-opt-out, so that when using -Zpolonius, the unsafe is removed, and yet the body of the function, as well as its signature, compiles fine (this is further enforced in CI through a special test).

Generalizing it

None becomes <Err>

It turns out that we don't have to restrict the branch to returning no data on None, and that we can use it as a "channel" through which to smuggle non-borrowing data.

This leads to replacing Option< _<'any> > with Result< _<'any>, Err >

  • Notice how the Err cannot depend on 'any since it can't name it (generic parameter introduced before the 'any quantification ever gets introduced).
The FnOnceReturningAnOption trick is replaced with a HKT pattern

Indeed, a FnOnceReturningAnOption-based signature would be problematic on the caller's side, since:

  • it infers the higher-order-'any-infected return type of the closure through the actual closure instance being fed;

  • but a closure only gets to be higher-order when the API it is fed to explicitly requires it to

So this leads to a situation where both the caller and callee expect each other to disambiguate what the higher-order return value of the closure should be, leading to no higher-orderness to begin with and/or to type inference errors.

  • Note that the hrtb! macro from https://docs.rs/higher-order-closure, or the actual for<…>-closures RFC such crate polyfills, would help in that regard. But the usage then becomes, imho, way more convoluted than any of the aforementioned workarounds, defeating the very purpose of this crate.

So that _<'any> is achieved in another manner. Through HKTs, that is, through "generic generics" / "generics that are, themselves, generic":

//! In pseudo-code:
fn polonius<'r, T, Ret<'_>> (
    borrow: &'r mut T,
    branch: impl FnOnce(&'_ mut T) -> Option<Ret<'_>>,
) -> Result<
        Ret<'r>,
        &'r mut T,
    >

This cannot directly be written in Rust, but you can define a trait representing the <'_>-ness of a type (HKT in this crate), and with it, use as WithLifetime<'a>::T as the "feed <'a>" operator:

//! Real code!
fn polonius<'r, T, Ret : HKT> (
    borrow: &'r mut T,
    branch: impl FnOnce(&'_ mut T) -> Option< <Ret as WithLifetime<'_>>::T >,
) -> Result<
        <Ret as WithLifetime<'r>>::T,
        &'r mut T,
    >

We have reached the definition of the actual fn polonius exposed by this very crate!

Now, a HKT type is still cumbersome to use. If we go back to that get_or_insert example that was returning a &'_ String, we'd need to express that "generic type" representing <'lt> => &'lt String, such as:

/// Pseudo-code (`StringRef` is not a type, `StringRef<'…>` is).
type StringRef<'any> = &'any String;

/// Real HKT code: make `StringRef` a fully-fledged stand-alone type
struct StringRef;
/// And now express the `<'lt> => &'lt String` relationship:
impl<'lt> WithLifetime <'lt>
   for StringRef // is:  ⇓
{                     // ⇓
                      // ⇓
    type T =         &'lt String    ;
}

Putting it altogether: get_or_insert with no .entry() nor double-lookup

So this crate exposes a "raw" polonius() function that has the unsafe in its body, and which is quite powerful at tackling these lack-of-polonius related issues.

And yet, it is cumbersome to use:

use ::polonius_the_crab::polonius;

fn get_or_insert (
    map: &'_ mut ::std::collections::HashMap<i32, String>,
) -> &'_ String
{
    #![forbid(unsafe_code)] // No unsafe code in this function: VICTORY!!

    enum StringRef {}
    impl<'lt> ::polonius_the_crab::WithLifetime<'lt> for StringRef {
        type T = &'lt String;
    }

    match polonius::<StringRef, _, _, _>(map, |map| map.get(&22).ok_or(())) {
        | Ok(ret) => {
            // no second-lookup!
            ret
        },
        // we get the borrow back (we had to give the original one to `polonius()`)
        | Err((map, ())) => {
            map.insert(22, String::from("…"));
            &map[&22]
        },
    }
}

Hence why this crate also offers

Convenient macros for ergonomic usage 😗 👌

Mainly, a polonius! entry point, within which you can use polonius_return! to early return the dependent value, or a polonius_break! to instead "break" / leave the polonius! block with a non-dependent value (notice how the branch nature of this borrow checker limitation is kept in the very bones of the API).

  • The polonius! macro requires that a 'polonius-infected return type be used (the HKT marker, for those having followed the implementation).

This leads to the following get_or_insert usage:

Using Polonius The Crab for Fun And Profit™

polonius-the-crab

#![forbid(unsafe_code)]
use {
    ::polonius_the_crab::{
        prelude::*,
    },
    ::std::{
        collections::HashMap,
    },
};

/// Typical example of lack-of-Polonius limitation: get_or_insert pattern.
/// See https://nikomatsakis.github.io/rust-belt-rust-2019/#72
fn get_or_insert(
    mut map: &mut HashMap<u32, String>,
) -> &String {
    // Who needs the entry API?
    polonius!(|map| -> &'polonius String {
        if let Some(v) = map.get(&22) {
            polonius_return!(v);
        }
    });
    map.insert(22, String::from("hi"));
    &map[&22]
}
You might also like...
Leetcode Solutions in Rust, Advent of Code Solutions in Rust and more

RUST GYM Rust Solutions Leetcode Solutions in Rust AdventOfCode Solutions in Rust This project demostrates how to create Data Structures and to implem

Rust 核心库和标准库的源码级中文翻译,可作为 IDE 工具的智能提示 (Rust core library and standard library translation. can be used as IntelliSense for IDE tools)

Rust 标准库中文版 这是翻译 Rust 库 的地方, 相关源代码来自于 https://github.com/rust-lang/rust。 如果您不会说英语,那么拥有使用中文的文档至关重要,即使您会说英语,使用母语也仍然能让您感到愉快。Rust 标准库是高质量的,不管是新手还是老手,都可以从中

Unify your game sources in one place and aquire more of them, using modules made by the community.

Project Black Pearl Unify your game sources in one place by using modules made by the community. What is Project Black Pearl? Project Black Pearl (or

sodiumoxide clone, now with less chlorides and more oxides

sodiumoxide2 Less Chlorides, more Oxides! This package is a fork of sodiumoxide, using rust-native crypto. It's not intended to be easy to use or impl

Harvest Moon: (More) Friends of Mineral Town event script compiler

mary This is a script compiler for Harvest Moon: Friends of Mineral Town and Harvest Moon: More Friends of Mineral Town for the GBA. The end goal is f

Conversion Tools API Rust client

ConversionTools Rust This Conversion Tools API Rust client allows you to use the site API and convert files faster and more conveniently. Site Convers

Tools for managing GitHub block lists

GitHub block list management Octocrabby is a small set of command-line tools and Octocrab extensions that are focused on managing block lists on GitHu

Fast fail2ban-like tools for parsing nginx logs

Fast2ban This is simple fail2ban-like replacement written in Rust. Usage: ./fast2ban # reads default config.toml from current directory ./fast2ban co

A Rust no-std (de)compressor based on PAQ

MASHI まし A 100% no-std compatible Rust implementation of a PAQ style arithmetic coding, context mixing compressor. Its intended use case is compressin

Comments
  • Could you provide an example of loop break with dependent value?

    Could you provide an example of loop break with dependent value?

    Unlike examples for the polonius!(), and the polonius_loop!() where the dependent value is returned from function immediately, the suggestion on the polonius_break!() page seem to be rather vague. A naive attempt to implement it results in getting either a mismatched lifetimes error, or the same double borrow error.

    Could you please provide a full example for this use case, similar to the examples for the other ones?

    enhancement help wanted question 
    opened by NekoiNemo 7
  • Unexpected

    Unexpected "unreachable call" warning in macro `polonius_loop` usage

    Crate version:

    [dependencies]
    polonius-the-crab = "0.3.0"
    

    rustc version:

    rustc 1.65.0-nightly (015a824f2 2022-08-22)
    

    Show case:

    use polonius_the_crab::prelude::*;
    
    trait ContexIterator<'context, Cx> {
        type Item;
    
        fn next(&mut self) -> Option<(&mut Cx, Self::Item)>;
    
        fn find<F>(&mut self, f: &mut F) -> Option<(&Cx, Self::Item)>
        where
            F: FnMut(&mut Cx, &Self::Item) -> bool,
        {
            let mut s = self;
            polonius_loop!(|s| -> Option<(&'polonius Cx, Self::Item)> {
                if let Some((cx, item)) = s.next() {
                    if f(cx, &item) {
                        polonius_return!(Some((cx, item)));
                    }
                } else {
                    polonius_break!();
                }
            });
            None
        }
    }
    
    #[cfg(test)]
    mod tests {
        use crate::ContexIterator;
    
        #[derive(Debug)]
        pub struct Iter<'context, 'source, Cx, I> {
            inner: &'source [I],
            context: &'context mut Cx,
            index: usize,
        }
    
        impl<'context, 'source, Cx, I> ContexIterator<'context, Cx> for Iter<'context, 'source, Cx, I> {
            type Item = &'source I;
    
            fn next(&mut self) -> Option<(&mut Cx, Self::Item)> {
                let item = self
                    .inner
                    .get(self.index)
                    .map(|item| (&mut *self.context, item));
                self.index += 1;
                item
            }
        }
    
        #[test]
        fn test() {
            let v = vec![1, 2, 3];
            let mut i = Iter {
                inner: &v,
                context: &mut (),
                index: 0,
            };
            println!("{:?}", i.find(&mut |_, i| **i % 2 == 0));
        }
    }
    

    cargo check output:

     ↳ cargo check
    warning: unreachable call
      --> src/lib.rs:13:9
       |
    13 | /         polonius_loop!(|s| -> Option<(&'polonius Cx, Self::Item)> {
    14 | |             if let Some((cx, item)) = s.next() {
    15 | |                 if f(cx, &item) {
    16 | |                     polonius_return!(Some((cx, item)));
    ...  |
    20 | |             }
    21 | |         });
       | |          ^
       | |          |
       | |__________unreachable call
       |            any code following this expression is unreachable
       |
       = note: `#[warn(unreachable_code)]` on by default
       = note: this warning originates in the macro `$crate::polonius_break_dependent` which comes from the expansion of the macro `polonius_loop` (in Nightly builds, run with -Z macro-backtrace for more info)
    
    warning: `iter` (lib) generated 1 warning
    
    bug 
    opened by ethe 0
Releases(v0.3.1)
  • v0.3.1(Aug 24, 2022)

    What's Changed

    • Remove unnecessary type inference nudge in polonius_loop! by @danielhenrymantilla in https://github.com/danielhenrymantilla/polonius-the-crab.rs/pull/10

    Full Changelog: https://github.com/danielhenrymantilla/polonius-the-crab.rs/compare/v0.3.0...v0.3.1

    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Jul 28, 2022)

    What's Changed

    • Changed polonius…! macro signatures to enable the creation of a polonius_break_dependent! macro by @danielhenrymantilla in https://github.com/danielhenrymantilla/polonius-the-crab.rs/pull/8

    • Doc improvements

    Full Changelog: https://github.com/danielhenrymantilla/polonius-the-crab.rs/compare/v0.2.0...v0.3.0

    Source code(tar.gz)
    Source code(zip)
  • v0.2.1(May 9, 2022)

  • v0.2.0(May 8, 2022)

    What's Changed

    • Rename polonius_break! to exit_polonius!
    • Remove the need to reintroduce generic parameters in polonius! macro
    • loop convenience support (polonius_{loop,break,continue}!) by @danielhenrymantilla in https://github.com/danielhenrymantilla/polonius-the-crab.rs/pull/4

    Full Changelog: https://github.com/danielhenrymantilla/polonius-the-crab.rs/compare/d64ca52d26a9ecbba3c16c78a694e4e76b3c49b0...999af4db3f26fd4e5803febfe21f140cfc240a57

    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(Apr 29, 2022)

  • v0.1.1(Apr 29, 2022)

    What's Changed

    • Fix clippy::self_assignment lint by @danielhenrymantilla in https://github.com/danielhenrymantilla/polonius-the-crab.rs/pull/2

    Full Changelog: https://github.com/danielhenrymantilla/polonius-the-crab.rs/compare/v0.1.0...v0.1.1

    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Apr 29, 2022)

Owner
Daniel Henry-Mantilla
🦀 Feeling crabulous 🦀 https://danielhenrymantilla.github.io
Daniel Henry-Mantilla
syntax-level async join enabling branching control flow and shared mutable borrow

enjoin enjoin's async join macros operate at the syntax level. It allows you to... break, continue, and return out of async code running in a join for

Wisha W. 15 Apr 16, 2023
Generic abstractions for combining and nesting reduction patterns for iterables.

reductor Generic abstractions for combining and nesting reduction patterns for iterables. Docs: https//docs.rs/reductor Before & After: Before fn proc

Yotam Ofek 2 Jul 9, 2022
Rust crate implementing short & stable ids based on timestamps

Lexicoid Short & stable IDs based on timestamps. Heavily inspired by Short, friendly base32 slugs from timestamps by @brandur. Install Install with ca

Luciano Mammino 6 Jan 29, 2023
Build database expression type checker and vectorized runtime executor in type-safe Rust

Typed Type Exercise in Rust Build database expression type checker and vectorized runtime executor in type-safe Rust. This project is highly inspired

Andy Lok 89 Dec 27, 2022
A simple path traversal checker made with Rust. Useful for APIs that serve dynamic files.

Path trav A simple path traversal checker made with Rust. Useful for APIs that serve dynamic files. Note: this is a security tool. If you see somethin

Gátomo 3 Nov 21, 2022
Simple and fast proxy checker that include protocol validation;

Open Proxies ⭐️ Leave me a start please ⭐️ it will motivate me to continue maintaining and adding futures About | Technologies | Requirements | Starti

kmoz000 3 Nov 29, 2022
(lifetime) GATs in stable Rust

::nougat Use (lifetime-)GATs on stable rust. Example #![forbid(unsafe_code)] #[macro_use] extern crate nougat; #[gat] trait LendingIterator { ty

Daniel Henry-Mantilla 45 Dec 26, 2022
A traditional web forum built in Rust with modern technology to be fast, secure, scalable, and stable.

Volksforo A traditional web forum built in Rust with modern technology to be fast, secure, scalable, and stable. Stack Rust actix-web askama ScyllaDB

Josh 5 Mar 21, 2023
Rust libraries for Bluesky's AT Protocol services. NOT STABLE (yet)

ATrium ATrium is a collection of Rust libraries designed to work with the AT Protocol, providing a versatile and coherent ecosystem for developers. Th

Yoshihiro Sugi 43 Jun 25, 2023
Toggle parallelism with feature flags!

maybe_parallel_iterator Write your code once. Then toggle between sequential and parallel iterators with a feature flag! let a: Vec<i32> = (0..100).co

Finn Bear 2 May 30, 2022