A generic framework for on-demand, incrementalized computation. Inspired by adapton, glimmer, and rustc's query system.

Related tags

Computation salsa
Overview

salsa

Test Book Released API docs Crates.io

A generic framework for on-demand, incrementalized computation.

Obligatory warning

Very much a WORK IN PROGRESS at this point. Ready for experimental use but expect frequent breaking changes.

Credits

This system is heavily inspired by adapton, glimmer, and rustc's query system. So credit goes to Eduard-Mihai Burtescu, Matthew Hammer, Yehuda Katz, and Michael Woerister.

Key idea

The key idea of salsa is that you define your program as a set of queries. Every query is used like function K -> V that maps from some key of type K to a value of type V. Queries come in two basic varieties:

  • Inputs: the base inputs to your system. You can change these whenever you like.
  • Functions: pure functions (no side effects) that transform your inputs into other values. The results of queries is memoized to avoid recomputing them a lot. When you make changes to the inputs, we'll figure out (fairly intelligently) when we can re-use these memoized values and when we have to recompute them.

Want to learn more?

To learn more about Salsa, try one of the following:

Getting in touch

The bulk of the discussion happens in the issues and pull requests, but we have a zulip chat as well.

Issues
  • [POC] Prototype async query functions

    [POC] Prototype async query functions

    Wanted to check how much code would need to be added/duplicated to support async queries. This isn't a a full implementation (I haven't even tested to call async functions) but the normal sync implementation still work with this with what shouldn't be too much overhead.

    Not really interested in this until async/await gets to stable but figured a PR with this could still be useful.

    opened by Marwes 20
  • Include user-readable query keys in cycle errors

    Include user-readable query keys in cycle errors

    Previous output:

    ---- nameres::tests::macros::unexpanded_macro_should_expand_by_fixedpoint_loop stdout ----
    thread 'nameres::tests::macros::unexpanded_macro_should_expand_by_fixedpoint_loop' panicked at 'Internal error, cycle detected:
    
    DatabaseKeyIndex { group_index: 2, query_index: 11, key_index: 1 }
    ', /home/jonas/.cargo/registry/src/github.com-1ecc6299db9ec823/salsa-0.16.1/src/lib.rs:490:48
    

    Output after this PR:

    ---- nameres::tests::macros::unexpanded_macro_should_expand_by_fixedpoint_loop stdout ----
    thread 'nameres::tests::macros::unexpanded_macro_should_expand_by_fixedpoint_loop' panicked at 'Internal error, cycle detected:
    
    hygiene_frame(HirFileId(MacroFile(MacroFile { macro_call_id: LazyMacro(LazyMacroId(0)) })))
    ', /home/jonas/dev/salsa/src/lib.rs:492:35
    
    opened by jonas-schievink 17
  • [RFC] Dynamic databases

    [RFC] Dynamic databases

    This PR introduces an RFC describing a shift to dyn-capable databases. The primary goal is to make it so that query-group code can be compiled in the query-group crate, rather than waiting to be monomorphized in the final database crate.

    The main user-visible effect of this proposal is that it requires all salsa query group traits to be dyn-safe.

    Rendered form of the RFC

    Current status

    Largely implemented.

    Pending work:

    • [x] Refactor slots and dependencies and try to minimize code generic over Q in the derived impl -- the current branch doesn't match the RFC in that it still creates slots that store the values.
    • [x] Update the salsa book
    • [x] Test effectiveness perhaps?

    Pending updates the RFC text itself:

    • [x] Describe the 'static limitation on databases and how it might be overcome
    • [x] I removed the db.query(Q) methods entirely in favor of Q.in_db(&db). This does not require an extension trait and also works better with dyn Db coercions.
    • [x] Describe how we are adding salsa::Database and HasQueryGroup<G> as an automatic supertrait to each query-group trait.
    opened by nikomatsakis 15
  • Allow lifetime annotations in `database_storage!` and `query_group!`

    Allow lifetime annotations in `database_storage!` and `query_group!`

    Extend the database_storage! macro such that it allows for lifetimes to be added to the generated struct. This is necessary if the database itself has lifetime parameters that need to be threaded through to the queries being invoked.

    The main goal is to allow arena allocation to be used in your database struct, and to pass those references around as query inputs and outputs.

    Not sure this is the best solution -- I'd love to hear your thoughts.

    opened by fabianschuiki 11
  • adopt raw-id for interned keys

    adopt raw-id for interned keys

    Per @matklad's suggestion, create a RawId type and use it in place of u32 for interned keys.

    opened by nikomatsakis 11
  • make `fork` give you a `Frozen<DB>`

    make `fork` give you a `Frozen`

    This is the first step towards the "fewer footguns" API.

    Opening for discussion.

    Some to do items:

    • [x] Move set to a query_mut method that requires &mut self (https://github.com/salsa-rs/salsa/issues/79)
    • [x] Add tests for the "recursive read lock" case; I think i understand well enough now to be able to trigger the deadlock deterministically (and show that we are avoiding it)
    opened by nikomatsakis 11
  • feat: Allow queries to avoid panics on cycles

    feat: Allow queries to avoid panics on cycles

    The default behavior when salsa detects a cycle is to panic. This is fine behavior when the cycle occurs as a programmer error but in some instances cycles can be expected to occur. To allow this case while still keeping "programmer error" as the default assumption on cycles a salsa::cycle is added which can be attached to any function which could naturally appear in a cycle.

    The salsa::cycle takes a function which takes the keys of the query as well as description of all the queries that were part of the cycle (currently a slice of Debug formatted strings for simplicity's sake). When a cycle is detected salsa invokes the cycle function to produce the result for the query instead of panicking. Furthermore, every other query in the cycle will discard whatever value they produced normally and invoke their own cycle function (if a function in the cycle lacks the attribute salsa will panic as usual). All values produced from a cycle function is memoized in the same way as if the query executed normally.

    This has been tested to work well in gluon's import system which needs to error if modules import each other in a cycle (a programmer error for the one writing gluon programs, but not in the salsa query usage in gluon) https://github.com/gluon-lang/gluon/pull/683 . The only nasty part is that the cycle information needs to be parsed out of the Debug formatted string but currently I don't have a good abstraction for solving that.

    cc #6

    opened by Marwes 11
  • Track cycles via `Runtime` instead of storage?

    Track cycles via `Runtime` instead of storage?

    Currently, we detect cycles by storing QueryState::InProgress inside of MemoizedStorage.

    I think this could be problematic for two reasons:

    • we won't detect cycles for transparent queries
    • if MemoizedStorage is shared by concurrently executing queries, we can get CycleDetected if two threads execute the same query simultaneously.

    I think that perhaps we need to move cycle tracking from storage to Runtime? The fix I have in mind is

    • remove QueryState and use map: RwLock<FxHashMap<Q::Key, Q::Value>> in the storage,
    • add in_progress: FxHashSet<QC::QueryDescriptor>, to Runtime, which stores the same info as execution_stack, but with an O(1) "contains" check.
    opened by matklad 10
  • Cancellation and parallel queries

    Cancellation and parallel queries

    This is a WIP branch that implements most of cancellation and parallel queries. The history is still a bit messy and I've not implemented cycle detection yet, but hopefully you get the basic idea.

    The branch introduces two key public APIs:

    • You can fork a salsa runtime -- and your database can implement a ParallelDatabase trait that offers a fork method. The intention is that the other forked database should be sent to another thread (not doing so can result in deadlock).
    • We now support concurrent queries and operations on inputs. If you execute set while some other thread is actively executing a query, you will block waiting for them to finish, but you will also signal to them that they should terminate by setting an internal flag.
      • Queries can then poll the is_current_revision_cancelled method. If they observe a true value, then they know that their results no longer matter, and they can go ahead and return whatever value they deem appropriate. Those results will be recomputed in the next revision.

    My intention is that these are the basic primitives and that atop them we would add convenient ways to do parallel execution and cancellation:

    • Cancellable queries, that automatically return a Cancallable::cancel value of some kind.
    • Ways to apply a query to many keys in parallel.
    • Futures to let a query execute concurrently and later demand its value.

    The last two will require figuring out some sort of thread-pool story -- I'd prefer if we don't bake one into salsa, which is why the current building blocks don't actually manage the threads themselves.

    opened by nikomatsakis 10
  • Implement

    Implement "opinionated cancellation"

    This implements the design described in RFC https://github.com/salsa-rs/salsa/pull/262.

    Currently, the in_par_get_set_cancellation test is failing. I'm not completely sure what it is trying to test (the comment on the test does not match its behavior), so I couldn't fix it.

    opened by jonas-schievink 10
  • Cleanup and fix cycle handling

    Cleanup and fix cycle handling

    This PR is...kind of long. It reshapes the core logic of salsa to fix various bugs in cycle handling and generally simplify how we handle cross-thread coordination.

    Best read commit by commit: every commit passes all tests, afaik.

    The core bug I was taking aim at was the fact that, when you invoke maybe_changed_since, you can sometimes wind up detecting a cycle without having pushed some of the relevant queries onto the stack. This is now fixed.

    From a user's POV, ~~nothing changes from this PR~~, there are only minimal changes to the public interface. The biggest one is that recover functions now get a &salsa::Cycle which has methods for accessing the participants; the other is that queries that are participating in cycle fallback will use unwinding to avoid executing past the point where the cycle is discovered. Otherwise, things work the same as before:

    • If you encounter a cycle and all participant queries are marked with #[salsa::recover], then they will take on the recovery value. (At the moment, they continue executing after the cycle is observed, but their final result is ignored; I plan to change this in a follow-up PR, or maybe some future commit to this PR.)
    • If you encounter a cycle and some or all participants are NOT marked with #[salsa::recover], then the code panics. This is treated like any other panic, cancelling all other work.

    Along the way, I made... a few... other changes:

    • Cross-thread handling is simplified. When we block on another thread, it no longer sends us a final result. Instead, it just gets re-awoken and then it retries the original request. This is helpful when you encounter cycles in maybe_changed_since vs read, but it's also more compatible with some of the directions I have in mind.
    • Cycle detection is simplified and more centrally coordinated. Previously, when a cycle was detected, we would mark all the participants on the current thread, but then we would mark other threads 'lazilly'. Now, threads move ownership of their stack into the shared dep graph when they block, so that we can mark all the stack frames at once. This also means less cloning on blocking, so it should be mildly more efficient.
    • The code is DRY-er, since maybe_changed_since has been re-implemented in terms of the same core building blocks as read (probe and friends). I originally tried to unify them, but I realized that they behave somewhat differently from one another and both of them make sense. (In particular, we want to be able to free values with the LRU cache while still checking if they are up to date.)

    Ah, I realize now that I had planned to write a bunch of docs in the salsa book before I landed this. Well, I'm going to open the PR anyway, as I've let this branch go far too long.

    r? @matklad

    opened by nikomatsakis 21
  • RFC: Parallel/Chalk friendly caching

    RFC: Parallel/Chalk friendly caching

    Motivation

    Two-fold:

    • To integrate chalk caching with salsa more deeply, we need to be able to handle cycles more gracefully.
      • In chalk, cycles are handled by iterating until a fixed-point is reached. This could be useful in other salsa applications as well.
      • For this to work, we need caching of provisional results that are produced during those iterations. These results should not be exposed outside the cycle.
      • In general, it would be simpler if we could move cycle handling to per-thread, although there are limits to how much we can do this.
    • We are moving towards a 'parallel by default' future for Salsa.
      • Eventual goal: Cache hit requires only reads (no locks).
      • Creates the option of queries where a cache hit requires no atomic writes.
      • Validation and execution require no locks and few atomic writes.
        • Although synchronization remains an option.
      • This RFC brings that within sight, although it is not achieved.

    Rendered form.

    Status

    Still a WIP! Feedback requested as I evolve the RFC and implementation.

    Implementation status

    • [x] Split global and local state
    • [ ] Rework global state to separate out the 'slot map', 'cached values', and 'synchronization state', as described in the notes
    opened by nikomatsakis 2
  • Expose the ability to update the value of an input query in place

    Expose the ability to update the value of an input query in place

    Related to, but separate from, #46 and #269.

    opened by 1tgr 4
  • Add `#[salsa::update]`

    Add `#[salsa::update]`

    Specifies a second function on a query that can see the previously cached value for the same query.

    Closes #46.

    Rendered RFC

    opened by 1tgr 9
  • Add failing test for cycle revalidation

    Add failing test for cycle revalidation

    The following test hits the unwrap in runtime.rs:415. I encountered this panic while working on rust-analyzer.

    As far as I understand, report_unexpected_cycle assumes the query we hit the cycle on has to be executing right now, but that's not the case here: it's just being revalidated (through validate_memoized_value). I'm not sure what the correct way to handle this is?

    opened by flodiebold 2
  • Access to query result /revision id/

    Access to query result /revision id/

    My use-case is an LSP server. I'd like to send diagnostics for particular documents, but only when diagnostics change. Say I have something like this:

    #[salsa::query_group(...)]
    trait Facts: salsa::Database {
      fn doc_diag(&self, doc_id: DocId) -> Arc<[DocDiag]>;
    }
    

    Ideally, I'd be able to access the revision let doc_diag_rev: RevisionId = db.doc_diag_revision(doc_id), compare it with the revision I reported previously and depending on this either send or not send the diagnostics to the client.

    Since salsa already track this kind of information, I thought I could just use it but I can't find the way to get access to these revisions.

    Am I missing something?

    opened by artempyanykh 3
  • API for defining query implementations

    API for defining query implementations

    I'm curious about the API for defining query implementations, which I find a little surprising and unergonomic.

    I'm thinking in particular about how one must define them as free functions that pollute namespace but that magically get wired into the (generated) storage implementations. Why not instead have users provide these in place of "default" implementations in their trait definitions?

    That is, instead of:

    #[salsa::query_group(HelloWorldStorage)]
    trait HelloWorld: salsa::Database {
        #[salsa::input]
        fn input_string(&self, key: ()) -> Arc<String>;
    
        fn length(&self, key: ()) -> usize;
    }
    
    fn length(db: &dyn HelloWorld, (): ()) -> usize {
        // Read the input string:
        let input_string = db.input_string(());
    
        // Return its length:
        input_string.len()
    }
    

    one would do:

    #[salsa::query_group(HelloWorldStorage)]
    trait HelloWorld: salsa::Database {
        #[salsa::input]
        fn input_string(&self, key: ()) -> Arc<String>;
    
        fn length(&self, key: ()) -> usize {
            // Read the input string:
            let input_string = self.input_string(());
    
            // Return its length:
            input_string.len()
        }
    }
    

    Furthermore, methods without any such default implementation could be assumed to be inputs (thus negating the need to explicitly mark them as such).

    opened by eggyal 0
  • panic on something

    panic on something

    I apologize for not knowing what caused this issue. Could be related to me opening or creating a file. I'm working on a repro

    Version: 1.51.1 (system setup) Commit: e5a624b788d92b8d34d1392e4c4d9789406efe8f Date: 2020-11-10T23:34:32.027Z Electron: 9.3.3 Chrome: 83.0.4103.122 Node.js: 12.14.1 V8: 8.3.110.13-electron.0 OS: Windows_NT x64 10.0.18363

    
    thread '<unnamed>' panicked at 'no value set for FileTextQuery(FileId(6478))', /home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/salsa-0.16.0/src/input.rs:104:32
    stack backtrace:
       0: rust_begin_unwind
                 at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panicking.rs:483
       1: std::panicking::begin_panic_fmt
                 at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panicking.rs:437
       2: <salsa::input::InputStorage<Q> as salsa::plumbing::QueryStorageOps<Q>>::try_fetch::{{closure}}
       3: <salsa::input::InputStorage<Q> as salsa::plumbing::QueryStorageOps<Q>>::try_fetch
       4: <DB as base_db::SourceDatabaseExt>::file_text::__shim
       5: salsa::runtime::Runtime::execute_query_implementation
       6: salsa::derived::slot::Slot<Q,MP>::read_upgrade
       7: salsa::derived::slot::Slot<Q,MP>::read
       8: <salsa::derived::DerivedStorage<Q,MP> as salsa::plumbing::QueryStorageOps<Q>>::try_fetch
       9: <DB as ide_db::LineIndexDatabase>::line_index::__shim
      10: ide::Analysis::file_line_index
      11: rust_analyzer::handlers::publish_diagnostics
      12: core::ops::function::impls::<impl core::ops::function::FnMut<A> for &mut F>::call_mut
      13: <alloc::vec::Vec<T> as alloc::vec::SpecFromIter<T,I>>::from_iter
      14: <F as threadpool::FnBox>::call_box
    note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
    
    
    opened by doivosevic 2
Extendable HPC-Framework for CUDA, OpenCL and common CPU

Collenchyma • Collenchyma is an extensible, pluggable, backend-agnostic framework for parallel, high-performance computations on CUDA, OpenCL and comm

Autumn 454 Nov 8, 2021
Statistical computation library for Rust

statrs Current Version: v0.13.0 Should work for both nightly and stable Rust. NOTE: While I will try to maintain backwards compatibility as much as po

Michael Ma 290 Nov 26, 2021
Extreme fast factor expression & computation library for quantitative trading in Python.

Extreme fast factor expression & computation library for quantitative trading in Python.

Weiyuan Wu 15 Sep 30, 2021
a prototype crate for creating modular and performant 3D CPU particle systems, inspired by Unity's Shuriken Particle System.

bevy_prototype_particles This is a prototype crate for creating modular and performant 3D CPU particle systems, inspired by Unity's Shuriken Particle

James Liu 12 Aug 10, 2021
A system clipboard command line tools which inspired by pbcopy & pbpaste but better to use.

rclip A command line tool which supports copy a file contents to the system clipboard or copy the contents of the system clipboard to a file. Install

yahaa 2 Nov 6, 2021
Generic abstractions for combining and nesting reduction patterns for iterables.

reductor Generic abstractions for combining and nesting reduction patterns for iterables. Docs: https//docs.rs/reductor Before & After: Before fn proc

Yotam Ofek 2 Sep 21, 2021
archive-rs provides a generic way of dealing with multiple archive and compression formats in Rust

archive-rs A Rust crate that aims to provide a generic way of dealing with multiple archive and compression formats by providing a generic abstraction

S.J.R. van Schaik 2 Nov 21, 2021
A generic connection pool for Rust

r2d2 A generic connection pool for Rust. Documentation Opening a new database connection every time one is needed is both inefficient and can lead to

Steven Fackler 977 Nov 26, 2021
Generic array types in Rust

generic-array This crate implements generic array types for Rust. Requires minumum Rust version of 1.36.0, or 1.41.0 for From<[T; N]> implementations

Bartłomiej Kamiński 282 Nov 17, 2021
Parameterized routing for generic resources in Rust

Usher Usher provides an easy way to construct parameterized routing trees in Rust. The nodes of these trees is naturally generic, allowing Usher to le

Isaac Whitfield 30 Sep 24, 2021
Websocket generic library for Bitwyre WS-API

Websocket Core (Rust) Websocket generic server library for: Periodic message broadcast Eventual (Pubsub) message broadcast Async request reply Authors

Bitwyre 11 Nov 12, 2021
Generic Automatic Differentiation library for Rust (aka "autograd")

GAD: Generic Automatic Differentiation for Rust This project aims to provide a general and extensible framework for tape-based automatic differentiati

Facebook Research 6 Nov 5, 2021
Generic tiling window manager library in Rust

Pop Tiler Generic tiling window manager library for Rust, using an architecture based on GhostCell. License Licensed under the GNU Lesser General Publ

Pop!_OS 56 Nov 21, 2021
Generic cellular automaton plugin for bevy.

Bevy Cellular Automaton bevy_life is a generic plugin for cellular automaton. From the classic 2D Conway's game of life to WireWorld and 3D rules, the

Félix Lescaudey de Maneville 6 Nov 20, 2021
Generic parser for competitive programming

This is a generic parser for competitive programming, it can be used to read structured data line-by-line or though derive macro in a higher level fashion.

Rémi Dupré 1 Nov 15, 2021
A Google-like web search engine that provides the user with the most relevant websites in accordance to his/her query, using crawled and indexed textual data and PageRank.

Mini Google Course project for the Architecture of Computer Systems course. Overview: Architecture: We are working on multiple components of the web c

Max 10 May 30, 2021
An expressjs inspired web framework for Rust

nickel.rs nickel.rs is a simple and lightweight foundation for web applications written in Rust. Its API is inspired by the popular express framework

null 2.9k Nov 26, 2021
Sauron is an html web framework for building web-apps. It is heavily inspired by elm.

sauron Guide Sauron is an web framework for creating fast and interactive client side web application, as well as server-side rendering for back-end w

Jovansonlee Cesar 1.4k Nov 23, 2021
A safe, extensible ORM and Query Builder for Rust

A safe, extensible ORM and Query Builder for Rust API Documentation: latest release – master branch Homepage Diesel gets rid of the boilerplate for da

Diesel 7.6k Nov 25, 2021
Apache Arrow DataFusion and Ballista query engines

DataFusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.

The Apache Software Foundation 1.4k Nov 24, 2021
Query LDAP and AD with SQL

SQLDAP Ever wanted to query AD or LDAP with SQL like queries ? I'm going to answer this question myself: yes ! Why ? Because I never could remember al

null 7 Oct 21, 2021
A program written in pure Rust to query music info from mpd and display it in a notification.

musinfo A program written in pure Rust to query music info from mpd and display it in a notification. Note: Cover art is expected to be placed at /tmp

Cpt.Howdy 11 Jul 19, 2021
A query builder that builds and typechecks queries at compile time

typed-qb: a compile-time typed "query builder" typed-qb is a compile-time, typed, query builder. The goal of this crate is to explore the gap between

ferrouille 3 Nov 28, 2021
Language Integrated Query in Rust.

Linq in Rust Language Integrated Query in Rust (created by declarative macros). Inspired by LINQ in .NET. What's LINQ This project is under developmen

StardustDL 67 Nov 19, 2021
Rust query string parser with nesting support

What is Queryst? This is a fork of the original, with serde and serde_json updated to 0.9 A query string parsing library for Rust inspired by https://

Stanislav Panferov 55 Oct 15, 2021
kubesql, Experimental tool to query K8s API using plain SQL

kubesql, an experimental tool for querying your Kubernetes API Server using simple and smallest SQL syntax.

Furkan Türkal 138 Nov 24, 2021
qsv - Performant CLI tool to query CSVs through SQL

qsv Performant CLI tool to query CSVs through SQL Installation After cloning the repository, you can install a binary locally using cargo install --pa

Dermot Haughey 3 Oct 28, 2021
Query textual streams with PromQL-like language

pq - query textual streams with PromQL Glossary Time Series - a stream of timestamped values, aka samples sharing the same metric name and, optionally

Ivan Velichko 223 Nov 30, 2021
Rust implementation of the legacy Master Server Query Protocol

msq-rs Rust library implementation of the legacy Master Server Query Protocol. Documentation crates.io Repository Release Notes Usage Add this to your

mtcw 5 Nov 26, 2021