A lightweight Datalog engine in Rust

Overview

datafrog

Datafrog is a lightweight Datalog engine intended to be embedded in other Rust programs.

Datafrog has no runtime, and relies on you to build and repeatedly apply the update rules. It tries to help you do this correctly. As an example, here is how you might write a reachability query using Datafrog (minus the part where we populate the nodes and edges initial relations).

extern crate datafrog;
use datafrog::Iteration;

fn main() {
    // Prepare initial values, ..
    let nodes: Vec<(u32,u32)> = vec![
        // ..
    ];
    let edges: Vec<(u32,u32)> = vec![
        // ..
    ];

    // Create a new iteration context, ..
    let mut iteration = Iteration::new();

    // .. some variables, ..
    let nodes_var = iteration.variable::<(u32,u32)>("nodes");
    let edges_var = iteration.variable::<(u32,u32)>("edges");

    // .. load them with some initial values, ..
    nodes_var.insert(nodes.into());
    edges_var.insert(edges.into());

    // .. and then start iterating rules!
    while iteration.changed() {
        // nodes(a,c)  <-  nodes(a,b), edges(b,c)
        nodes_var.from_join(&nodes_var, &edges_var, |_b, &a, &c| (c,a));
    }

    // extract the final results.
    let reachable: Vec<(u32,u32)> = nodes_var.complete();
}

If you'd like to read more about how it works, check out this blog post.

Authorship

Datafrog was initially developed by Frank McSherry and was later transferred to the rust-lang-nursery organization. Thanks Frank!

Comments
  • Proposal: extend API to support more operations on relations [breaking change]

    Proposal: extend API to support more operations on relations [breaking change]

    While doing some work using datafrog, I got annoyed about the need to create intermediate variables and things instead of being able to work directly with relations. I also noticed that Relation::from, which currently takes an iterator and collects to an intermediate vector, introduces a certain amount of inefficiency. (As an example, from_leapjoin seems to be making an extra vector on each iteration.)

    This branch aims to correct both things:

    • It makes Relation::from equivalent to Relation::from_vec instead of Relation::from_iter
      • it also implements FromIterator for Relation, so you can do let foo: Relation = iter.collect()
      • (this is the breaking change)
    • It extends Relation with a number of methods, like Relation::from_join, Relation::from_antijoin, and Relation::from_leapjoin
    • It extends Variable with an extend method, equivalent to .insert(Relation::from_iter(...))
    • It extends Variable::from_join with the ability to take one argument that is a relation
      • the tuples in the relation are treated as "stable" tuples
      • if you have two relations to join, you can use Relation::from_join instead

    One side effect of this change is that you can often pull computation out from the "while loop" of the iteration and instead just construct Relation tuples directly. It seems to me that this must be more efficient, but @frankmcsherry I'd be curious if you think I am wrong.

    cc @lqd, @frankmcsherry

    opened by nikomatsakis 19
  • Allow `Relation<Key>` to join with `Variable<(Key, Value)>` directly

    Allow `Relation` to join with `Variable<(Key, Value)>` directly

    Polonius has to create new intermediate variables with types like Variable<(Key, ())> from Relation<Key> so that they can be joined with variables like Variable<(Key, Value)>. This happens once in the naive variant (see below) and twice in the optimized one. This isn't actually necessary, since we can legally transmute a Relation<Key> to a Relation<(Key, ())> according to the unsafe-code guidelines. This saves a bit of memory and some time copying tuples around, although these relations aren't very large on any of the inputs bundled with Polonius. Mostly, it makes the code clearer by removing the number of variables used solely for re-indexing.

    ~~To do this safely, I had to change how Relation implements JoinInput. This actually fixes a bug in Relation::from_antijoin (which we don't use), but it's not obviously correct. See the second commit for details.~~

    I changed the signature of JoinInput so its semantics could remain the same. Relation::from_antijoin is still broken, however.

    opened by ecstatic-morse 6
  • Improve example code provided in 'README' file

    Improve example code provided in 'README' file

    The example code provided in the readme calls variable.complete() where it should be calling nodes_var.complete().

    It could also make it more obvious what kind of types nodes and edges are supposed to be and where they are supposed to come from.

    opened by regexident 6
  • Datafrog does not support early termination

    Datafrog does not support early termination

    It seems that the following assertions ensure that datafrog will not give partial results.

    https://github.com/rust-lang/datafrog/blob/5455139ef26ee36acf229eb8abf45b1dfa453774/src/variable.rs#L307

    This is often useful for completeness, but unproductive in real world situations where the are often unbounded numbers solutions to problems and 'good enough' is fine (e.g. estimation, path finding etc.).

    Are patches accepted here? I'd like to add an incomplete method which would just be the 'complete' method without the assertions, and then replace the body of complete with just the assertions and the result of incomplete.

    opened by Cypher1 5
  • Add a helper function for

    Add a helper function for "passthrough" leapers

    See https://github.com/rust-lang/polonius/pull/178#issuecomment-901315380 for background. This is a simple way to combine rules 7 and 8 of the naive analysis. I haven't audited the other rulesets to see where it could be used.

    Frankly, this seems a little too cute to me. The fact that you need this esoteric thing to avoid a runtime panic is not good API design. At the very least, we should try to make the "no proposers" case panic at compile-time, but that's easier said than done.

    r? @lqd

    opened by ecstatic-morse 5
  • Unify owned and ref values [WIP]

    Unify owned and ref values [WIP]

    This PR attempts to unify treefrog implementations to work both with owned and reference value types.

    Previously, the signatures of the Leaper trait involved Leaper<'a, Tuple, Val: 'a> and spoke of vectors of &'a Val elements. These references are potentially unnecessarily complicated if Val is simple and supports clone, so we change this to be

    /// Methods to support treefrog leapjoin.
    pub trait Leaper<Tuple, Val> {
        /// Estimates the number of proposed values.
        fn count(&mut self, prefix: &Tuple) -> usize;
        /// Populates `values` with proposed values.
        fn propose(&mut self, prefix: &Tuple, values: &mut Vec<Val>);
        /// Restricts `values` to proposed values.
        fn intersect(&mut self, prefix: &Tuple, values: &mut Vec<Val>);
    }
    

    Where the Val type can be arbitrary. Now, we have multiple implementations of Leaper where appropriate, both

        impl<'a, Key, Val, Tuple, Func> Leaper<Tuple, &'a Val> for ...
        impl<'a, Key, Val, Tuple, Func> Leaper<Tuple, Val> for ... 
    

    The FilterWith and FilterAnti variants had no constraint on their Val type, and so we just removed the requirement that it be a reference and got a more general implementation (yay). Neither looks at the value field, and just makes their decision based on the key parts of the record.

    The potential fall-out is that there are now multiple implementations of Leaper, and it may be less obvious which is intended when you casually try to leapjoin_into.

    The from_leapjoin method on Variable now has the signature

        pub fn from_leapjoin<SourceTuple: Ord, Val: Ord>(
            &self,
            source: &Variable<SourceTuple>,
            leapers: &mut [&mut dyn Leaper<SourceTuple, Val>],
            logic: impl FnMut(&SourceTuple, Val) -> Tuple,
        )
    

    which allows Val to be &'a Val as appropriate, or Val, and it is an open question whether this will allow anyone to actually use this elegantly.

    opened by frankmcsherry 5
  • Specialize creating Relations from Vecs

    Specialize creating Relations from Vecs

    While this makes 🐸 rely on an unstable feature, it can help avoiding consuming Vecs and collecting them immediately afterwards.

    For example, it should prevent 81k of such cases in Polonius' clap benchmark, for the datafrogopt variant. (It's not a huge time saving in this case, but still)

    Thoughts ? (especially if we'd want to add a rust-toolchain file, or still support stable by having different functions for IntoInter and Vec)

    opened by lqd 5
  • Micro-optimize `binary_search`

    Micro-optimize `binary_search`

    We spend about 30% of cycles doing binary search in ExtendWith::count while running clap-rs/app-parser-{{impl}}-add_defaults. This indicates to me that we need to explore different storage for the cfg_edge relation, but a maximally optimal binary_search is beneficial regardless.

    This PR switches to get_unchecked to eliminate a bounds check that the optimizer cannot. This is also done in the standard library implementation.

    It also takes advantage of a lesser known invariant of Vec, that the capacity cannot exceed isize::MAX bytes, to compute the midpoint using less instructions (there's a fun article from the early internet about this).

    As a result, the aforementioned test case on a Ryzen 4700U laptop on a goes from this:

     Performance counter stats for 'target/release/polonius -a DatafrogOpt inputs/clap-rs/app-parser-{{impl}}-add_defaults':
    
       691,011,608,622      instructions:u                                              
    
          99.967872602 seconds time elapsed
    
          96.724862000 seconds user
           3.172634000 seconds sys
    

    to this:

     Performance counter stats for 'target/release/polonius -a DatafrogOpt inputs/clap-rs/app-parser-{{impl}}-add_defaults':
    
       623,280,710,902      instructions:u                                              
    
          92.983953355 seconds time elapsed
    
          89.483132000 seconds user
           3.404928000 seconds sys
    

    Wall time is highly variable and may differ on your platform.

    cc @shepmaster (since they seem to be interested in this sort of thing)

    opened by ecstatic-morse 4
  • Add Variable statistics

    Add Variable statistics

    This adds simple profiling statistics in the spirit of #5, but inspired by SoufflΓ©'s profiler: adding the statistics per identifiable round.

    For each round, there's the stable and recent tuple counts for each variable, as CSV, output to a io::Write of the user's choice.

    We can talk about what precise data (or format) we'd like to see here. In the future, we can also add a self-profiling option to tally up the time each join operation took to create the tuples.

    Here's how it looks right now
    Variable,Round,Stable count,Recent count
    "subset",1,0,10
    "requires",1,0,3
    "potential_errors",1,0,0
    "subset",2,10,0
    "requires",2,3,3
    "potential_errors",2,0,0
    "subset",3,10,0
    "requires",3,6,3
    "potential_errors",3,0,0
    
    opened by lqd 4
  • Execution Result is different from racket datalog engine

    Execution Result is different from racket datalog engine

    datalog code

    #lang datalog
    edge(0, 1). edge(1, 2). edge(2, 3).
    path(X, Y) :- edge(X, Y).
    path(X, Y) :- edge(X, Z), path(Z, Y).
    
    

    exec path(A, B)? I get

    path(2, 3).
    path(1, 2).
    path(0, 1).
    path(0, 2).
    path(0, 3).
    path(1, 3).
    

    rust code

    use datafrog::Iteration;
    
    fn main() {
        // Prepare initial values, ..
        let path: Vec<(u32, u32)> = vec![
            (0, 1),
            (1, 2),
            (2, 3), // ..
        ];
        let edges: Vec<(u32, u32)> = vec![
            (0, 1),
            (1, 2),
            (2, 3), // ..
        ];
    
        // Create a new iteration context, ..
        let mut iteration = Iteration::new();
    
        // .. some variables, ..
        let path_var = iteration.variable::<(u32, u32)>("path");
        let edges_var = iteration.variable::<(u32, u32)>("edges");
    
        // .. load them with some initial values, ..
        path_var.insert(path.into());
        edges_var.insert(edges.into());
    
        // .. and then start iterating rules!
        while iteration.changed() {
            // b: k, a: v1, c: v2 
            // path(a,c)  <-  edges(a,b), path(b,c)
            path_var.from_join(&edges_var, &path_var, |_b, &a, &c| (a, c));
        }
    
        // extract the final results.
        let reachable = path_var.complete();
        for (a, b) in reachable.iter() {
            println!("({}, {})", a, b);
        }
    }
    
    

    I get

    (0, 1)
    (1, 1)
    (1, 2)
    (2, 1)
    (2, 2)
    (2, 3)
    (3, 1)
    (3, 2)
    (3, 3)
    
    opened by IWANABETHATGUY 3
  • Split-up lib.rs

    Split-up lib.rs

    This PR splits up the lib.rs file, introducing separate files for:

    • Relation<T> -> relation.rs
    • Iteration -> iteration.rs
    • Variable<T> -> variable.rs

    Splitting up the single 555loc file required me to make a select few previously private bindings pub(crate), instead. However I think overall this greatly improves the accessibility of the project and is worth it.


    This is part #1 of a stacked pull-request:

    └── Split-up\ lib.rs πŸ‘ˆπŸ»
        └── Cleanup\ generics (#33)
            └── Support arbitrary tuples (#34)
    
    opened by regexident 3
  • Support joins on prefixes of arbitrary length

    Support joins on prefixes of arbitrary length

    Currently, datafrog requires that all joins occur on tuples of the form (K, V). This becomes unpleasant for relations with more than two elements, as well as for joins on several elements. I found this limitation annoying when first trying to write datafrog programs.

    Because datafrog is implemented on top of sorted data structures, we should be able to join on arbitrarily many tuple elements, so long as they are in order at the front of the tuple. This PR implements joins on arbitrary prefixes. As a result, you can specify all your variables and relations as a flat tuple, and wait to select what prefix you want for each individual join. For example, (A, B, C) could be joined with both (A, B, X) using (A, B) as the shared prefix and with (A, Y) using A as the shared prefix without cloning it.

    Unfortunately, this is a breaking change, since we must copy prefixes and suffixes now instead of passing references to K and V.

    opened by ecstatic-morse 0
  • Disjunctive (OR) *filters*

    Disjunctive (OR) *filters*

    @lqd opened rust-lang/polonius#157 a while ago, ~~which solves the Location::All problem in what I think is the "correct" way~~. Essentially, it transforms all occurrences of origin_live_at(O, P) in rule bodies into (origin_live_at(O, P) OR placeholder(O)). In other words, placeholder regions are live at all points in the CFG.

    Unfortunately it's actually kind of difficult to express that simple idea as a datafrog program with the current API. The PR manually expanded each rule body into two (one for each side of the OR), but this leads to code that is really difficult to maintain. Another option would be to create an intermediate Relation, origin_live_at_or_placeholder(O, P) defined below, to hold the disjunction. That would waste memory, however, and we would also need a new is_point relation that holds all possible points in the CFG.

    origin_live_at_or_placeholder(O, P) :- origin_live_at(O, P).
    origin_live_at_or_placeholder(O, P) :- placeholder(O), is_point(P).
    

    Ideally, we would express this disjunction directly as a leaper. This is possible, but is more work than you might expect. An impl that's generic over Leapers won't work, since (among other things) there's no way to implement a disjunctive intersect efficiently with the current Leaper interface. I think you could do something like the following, but you'd need to handle heterogeneous Value types as well:

    struct<A, B> LeaperOr(A, B);
    
    /* This is the combination of concrete leapers we need, though ideally all combinations would be implemented.*/
    impl<...> Leaper<...> for LeaperOr<ExtendWith<...>, FilterWith<...>> {
        /* ... */
    }
    

    Obviously, this doesn't scale, but maybe it's okay to implement just what we need for rust-lang/polonius#157? The alternative is to adjust the Leaper interface so that it composes better, but I don't see a straightforward way to do this without resorting to boxed trait objects, which is not an option since they would be dispatched in a tight loop (GATs + RPIT in trait fns would solve this, however).

    opened by ecstatic-morse 7
  • Clean up generics for a more consistent naming scheme

    Clean up generics for a more consistent naming scheme

    Currently the project uses an inconsistent naming scheme for generic arguments, which this PR aims to unify:

    • K, Key -> Key
    • V1, Val1 -> Value1
    • V2, Val2 -> Value2
    • T1 -> Tuple1
    • T2 -> Tuple2
    opened by regexident 3
  •  bring this crate into conformance with compiler-team crate policy

    bring this crate into conformance with compiler-team crate policy

    The compiler team is working on a standard policy around its external crates (see https://github.com/rust-lang/compiler-team/pull/19) -- we need to bring this crate into conformance, though the policy is not yet finalized, so it's not 100% clear what this means.

    opened by nikomatsakis 1
  • Unit tests

    Unit tests

    We don't presently have any unit tests...for anything. I think we should create a "base layer" of unit tests where we generate some simple operations and some random inputs and check that they get the correct result, using some kind of naive computation to act as an oracle.

    Roughly as described here.

    opened by nikomatsakis 0
  • Add simple profiling statistics

    Add simple profiling statistics

    It could be interesting to have simple statistics, for example behind a feature flag, of the number of tuples and the time it took to merge and create them, for Relations and/of Variables.

    There are commented out Drop impls in the code, e.g here as an example of the way to add the final tuple counts statistic.

    Similarly, a Duration could be added to the merged relations, updating it in the operator functions, to display the time it took to create those tuples, as described a bit more here by Frank.

    opened by lqd 0
Owner
The Rust Programming Language
The Rust Programming Language
Heavy - an opinionated, efficient, relatively lightweight, and tightly Lua-integrated game framework for Rust

Heavy - an opinionated, efficient, relatively lightweight, and tightly Lua-integrated game framework for Rust Slow down, upon the teeth of Orange Heav

Shea Leffler 12 Mar 18, 2022
A lightweight, cross-platform epub reader.

Pend Pend is a program for reading EPUB files. Check out the web demo! Preview Image(s) Installation Building Pend is simple & easy. You should be abl

bx100 11 Oct 17, 2022
A lightweight job framework for Bevy.

bevy_jobs A lightweight job framework for Bevy. Getting started Defining a job: pub struct FetchRequestJob { pub url: String, } impl bevy_jobs::J

Corey Farwell 3 Aug 31, 2022
Ktx is a lightweight terminal-ui utility for editing Kubernetes config

ktx Ktx is a lightweight terminal-ui utility for editing the kubectl config If you work with a large infrastructure where you have to jump between clu

Maxim 5 Aug 23, 2023
A lightweight standard for non-fungible assets.

Nifty Asset A lightweight standard for non-fungible assets. Contents Features Overview Packages Building Testing Programs Clients CLI License Warning

null 14 Mar 6, 2024
A refreshingly simple data-driven game engine built in Rust

What is Bevy? Bevy is a refreshingly simple data-driven game engine built in Rust. It is free and open-source forever! WARNING Bevy is still in the ve

Bevy Engine 21.1k Jan 4, 2023
RTS game/engine in Rust and WebGPU

What is this? A real time strategy game/engine written with Rust and WebGPU. Eventually it will be able to run in a web browser thanks to WebGPU. This

Thomas SIMON 258 Dec 25, 2022
unrust - A pure rust based (webgl 2.0 / native) game engine

unrust A pure rust based (webgl 2.0 / native) game engine Current Version : 0.1.1 This project is under heavily development, all api are very unstable

null 368 Jan 3, 2023
A no-frills Tetris implementation written in Rust with the Piston game engine, and Rodio for music.

rustris A no-frills Tetris implementation written in Rust with the Piston game engine, and Rodio for music. (C) 2020 Ben Cantrick. This code is distri

Ben Cantrick 17 Aug 18, 2022
Rustcraft is a simple Minecraft engine written in rust using wgpu.

Rustcraft is a simple Minecraft engine written in rust using wgpu.

Raphael Van Hoffelen 110 Dec 22, 2022
Rust implementation of Another World (aka Out of this world) game engine

RustyWorld Rust implementation of Another World (aka Out of this world) game engine. I wanted a fun project to challenge myself while learning Rust, a

Francesco Degrassi 3 Jul 1, 2021
Simple RUST game with the Bevy Engine

Simple RUST Game using the Bevy Engine YouTube videos for this code base: Episode 1 - Rust Game Development tutorial from Scratch with Bevy Engine Epi

null 150 Jan 7, 2023
Rust Game engine 3D (and 2D)

A feature-rich, production-ready, general purpose 2D/3D game engine written in Rust with a scene editor.

rg3d engine 5.4k Jan 9, 2023
Walleye is a chess engine written completely in rust.

Walleye is a UCI-compatible engine written using the classical alpha-beta style AI. It supports loading board positions from arbitrary FEN strings, Unicode pretty printing to the console, and UCI communication logs to help with debugging.

Mitchel Paulin 95 Dec 24, 2022
A feature-rich, production-ready, general purpose 2D/3D game engine written in Rust with a scene editor.

A feature-rich, production-ready, general purpose 2D/3D game engine written in Rust with a scene editor.

rg3d engine 5.4k Jan 4, 2023
A highly customizable snake clone made in Rust with the Bevy engine, named after the Japanese word for snake, 蛇.

Hebi ?? A highly customizable snake clone made in Rust with the Bevy engine, named after the Japanese word for snake, 蛇(へび). Configuration One of the

Elnu 79 Dec 7, 2022
2-player game made with Rust and "ggez" engine, based on "Conway's Game of Life"

fight-for-your-life A 2-player game based on the "Conway's Game of Life", made with Rust and the game engine "ggez". Create shapes on the grid that wi

Petros 3 Oct 25, 2021
A Simple Rust Game Engine For 2D

VeeBee VeeBee Is A Nice And Simple Game Engine For 2D. Features Sprites & Images! Btw There Is A Built In 2D Pack (But The Textures Are REALY BAD, Sor

null 2 Feb 2, 2022
Blackmarlin is a chess engine fully written in Rust.

Blackmarlin WIP UCI Chess Engine Blackmarlin is a chess engine fully written in Rust. Make sure to compile the chess engine with cargo build --release

null 50 Oct 31, 2022