An incremental programming language

Overview

License: MIT CI workflow pipeline status rustc Gitter chat

Differential Datalog (DDlog)

DDlog is a programming language for incremental computation. It is well suited for writing programs that continuously update their output in response to input changes. With DDlog, the programmer does not need to worry about writing incremental algorithms. Instead they specify the desired input-output mapping in a declarative manner, using a dialect of Datalog. The DDlog compiler then synthesizes an efficient incremental implementation. DDlog is based on Frank McSherry's excellent differential dataflow library.

DDlog has the following key properties:

  1. Relational: A DDlog program transforms a set of input relations (or tables) into a set of output relations. It is thus well suited for applications that operate on relational data, ranging from real-time analytics to cloud management systems and static program analysis tools.

  2. Dataflow-oriented: At runtime, a DDlog program accepts a stream of updates to input relations. Each update inserts, deletes, or modifies a subset of input records. DDlog responds to an input update by outputting an update to its output relations.

  3. Incremental: DDlog processes input updates by performing the minimum amount of work necessary to compute changes to output relations. This has significant performance benefits for many queries.

  4. Bottom-up: DDlog starts from a set of input facts and computes all possible derived facts by following user-defined rules, in a bottom-up fashion. In contrast, top-down engines are optimized to answer individual user queries without computing all possible facts ahead of time. For example, given a Datalog program that computes pairs of connected vertices in a graph, a bottom-up engine maintains the set of all such pairs. A top-down engine, on the other hand, is triggered by a user query to determine whether a pair of vertices is connected and handles the query by searching for a derivation chain back to ground facts. The bottom-up approach is preferable in applications where all derived facts must be computed ahead of time and in applications where the cost of initial computation is amortized across a large number of queries.

  5. In-memory: DDlog stores and processes data in memory. In a typical use case, a DDlog program is used in conjunction with a persistent database, with database records being fed to DDlog as ground facts and the derived facts computed by DDlog being written back to the database.

    At the moment, DDlog can only operate on databases that completely fit the memory of a single machine. We are working on a distributed version of DDlog that will be able to partition its state and computation across multiple machines.

  6. Typed: In its classical textbook form Datalog is more of a mathematical formalism than a practical tool for programmers. In particular, pure Datalog does not have concepts like types, arithmetics, strings or functions. To facilitate writing of safe, clear, and concise code, DDlog extends pure Datalog with:

    1. A powerful type system, including Booleans, unlimited precision integers, bitvectors, floating point numbers, strings, tuples, tagged unions, vectors, sets, and maps. All of these types can be stored in DDlog relations and manipulated by DDlog rules. Thus, with DDlog one can perform relational operations, such as joins, directly over structured data, without having to flatten it first (as is often done in SQL databases).

    2. Standard integer, bitvector, and floating point arithmetic.

    3. A simple procedural language that allows expressing many computations natively in DDlog without resorting to external functions.

    4. String operations, including string concatenation and interpolation.

    5. Syntactic sugar for writing imperative-style code using for/let/assignments.

  7. Integrated: while DDlog programs can be run interactively via a command line interface, its primary use case is to integrate with other applications that require deductive database functionality. A DDlog program is compiled into a Rust library that can be linked against a Rust, C/C++, Java, or Go program (bindings for other languages can be easily added). This enables good performance, but somewhat limits the flexibility, as changes to the relational schema or rules require re-compilation.

Documentation

Installation

Installing DDlog from a binary release

To install a precompiled version of DDlog, download the latest binary release, extract it from archive, add ddlog/bin to your $PATH, and set $DDLOG_HOME to point to the ddlog directory. You will also need to install the Rust toolchain (see instructions below).

If you're using OS X, you will need to override the binary's security settings through these instructions. Else, when first running the DDlog compiler (through calling ddlog), you will get the following warning dialog:

"ddlog" cannot be opened because the developer cannot be verified.
macOS cannot verify that this app is free from malware.

You are now ready to start coding in DDlog.

Compiling DDlog from sources

Installing dependencies manually

  • Haskell stack:
    wget -qO- https://get.haskellstack.org/ | sh
    
  • Rust toolchain v1.52.1 or later:
    curl https://sh.rustup.rs -sSf | sh
    . $HOME/.cargo/env
    rustup component add rustfmt
    rustup component add clippy
    
    Note: The rustup script adds path to Rust toolchain binaries (typically, $HOME/.cargo/bin) to ~/.profile, so that it becomes effective at the next login attempt. To configure your current shell run source $HOME/.cargo/env.
  • JDK, e.g.:
    apt install default-jdk
    
  • Google FlatBuffers library. Download and build FlatBuffers release 1.11.0 from github. Make sure that the flatc tool is in your $PATH. Additionally, make sure that FlatBuffers Java classes are in your $CLASSPATH:
    ./tools/install-flatbuf.sh
    cd flatbuffers
    export CLASSPATH=`pwd`"/java":$CLASSPATH
    export PATH=`pwd`:$PATH
    cd ..
    
  • Static versions of the following libraries: libpthread.a, libc.a, libm.a, librt.a, libutil.a, libdl.a, libgmp.a, and libstdc++.a can be installed from distro-specific packages. On Ubuntu:
    apt install libc6-dev libgmp-dev
    
    On Fedora:
    dnf install glibc-static gmp-static libstdc++-static
    

Building

To build the software once you've installed the dependencies using one of the above methods, clone this repository and set $DDLOG_HOME variable to point to the root of the repository. Run

stack build

anywhere inside the repository to build the DDlog compiler. To install DDlog binaries in Haskell stack's default binary directory:

stack install

To install to a different location:

stack install --local-bin-path 
   

   

To test basic DDlog functionality:

stack test --ta '-p path'

Note: this takes a few minutes

You are now ready to start coding in DDlog.

vim syntax highlighting

The easiest way to enable differential datalog syntax highlighting for .dl files in Vim is by creating a symlink from /tools/vim/syntax/dl.vim into ~/.vim/syntax/.

If you are using a plugin manager you may be able to directly consume the file from the upstream repository as well. In the case of Vundle, for example, configuration could look as follows:

call vundle#begin('~/.config/nvim/bundle')
...
Plugin 'vmware/differential-datalog', {'rtp': 'tools/vim'} <---- relevant line
...
call vundle#end()

Debugging with GHCi

To run the test suite with the GHCi debugger:

stack ghci --ghci-options -isrc --ghci-options -itest differential-datalog:differential-datalog-test

and type do main in the command prompt.

Building with profiling info enabled

stack clean

followed by

stack build --profile

or

stack test --profile
Comments
  • RFC: Streaming aggregates.

    RFC: Streaming aggregates.

    Motivation: we would like to aggregate streaming data without storing the entire monotonically growing collection. Examples:

    • Find the sum/min/average of all values in the stream for each key
    • Find the number of unique values in a given time window

    This functionality can be built on top of DD (see https://github.com/TimelyDataflow/differential-dataflow/issues/296). The idea is, given the previous value of the aggregate and the set of new elements at each timestamp, compute the new value of the aggregate and retract the new elements from the collection at the next iteration. This way the collection passed to the aggregation operator stays small and does not grow over time.

    The parameters to the new streaming aggregation operator are:

    • 0 or more variables to group the input relation by (similar to the group_by operator)
    • a closure that takes the previous value of the aggregate and a group of new values and computes the new aggregate.
    • initial value of the aggregate

    The syntax could be something like (we need a better operator name)

    <expr>.stream_aggregate((<group_by_vars>), |acc, group| <expr>, <init_expr>)
    

    e.g.,

    // Compute the sum of all x's for each y in the stream.
    R2(y, sum) :-
        R1(x: usize, y: string),
        var sum = x.stream_aggregate((y), |acc: usize, g: Group<usize>| {
            var new_acc = acc;
            for (v in g) {
                new_acc = new_acc+v
            };
            new_acc
        },
        0).
    

    There is a further complication for which I don't have a complete solution yet. Some aggregates are too expensive to update frequently, e.g., let's say we wanted to compute the set of all unique values of x for each y. The closure would then be:

    |acc: Set<usize>, g: Group<usize>| {
            // Set copy!
            var new_acc = acc;
            for (v in g) {
                new_acc.insert(v)
            };
            new_acc
        }).
    

    As the accumulated state grows, it gets expensive to copy the set each time. Note that an in-place update is not an option since the old accumulator value is an immutable reference to the contents of a collection. One solution is to use some kind of layered set representation consisting of a set of new values and a reference counted pointer to the base set. When the ref count drops to 1, the two are merged.

    An alternative (complementary?) approach is to not aggregate on each timestamp, e.g., we can tell DD to apply the closure every 100 timestamps, thus amortizing the cost of the expensive operation across many transactions. However this has usability issues, as aggregation frequency must be chosen in advance and the user will have a stale value of the aggregate for 100 iterations.

    EDIT: This RFC doesn't address sliding windows. We will likely need a separate primitive for them, but it should support fixed windows, including windows specified at runtime (e.g., the user can maintain the set of windows they care about as an input relation).

    EDIT the im crate supports a bunch of immutable data structures, which may help in the all-unique-values example.

    opened by ryzhyk 24
  • Concurrency improvements for workers

    Concurrency improvements for workers

    Depends on #900

    • Removed barriers from within the worker runtime, got a 20% speedup
    • Distribute update records even across workers, got a 15% speedup (compared to the barriers change, not master), scaling across threads now works much better

    This PR is best reviewed per-commit

    opened by Kixiron 21
  • DDlogRecord api: how to do updates?

    DDlogRecord api: how to do updates?

    I'm trying to implement SQL UPDATE queries in ddlog-jooq. Example query I want to support:

    update hosts set capacity = 11 where id = 'n1'

    ... where id is a primary key in the hosts table.

    What's the right way to pull this off using the DDlogRecord Java API? The only DDlogCommand.Kind types I see are Insert, DeleteKey and DeleteVal.

    If there is no way to issue an update, I suppose I could do a delete followed by an insert, but that would require me to also maintain an index by primary key on the java side. Right now, I only need to materialize a set of records that are part of each output relation.

    enhancement 
    opened by lalithsuresh 20
  •     rudimentary distributed datalog functionality -

    rudimentary distributed datalog functionality -

       spawn clones of the current program using thread instances
    
       allow ddlog instances to be chained together using @
    
       provide hook for rust functions to register to receive deltas for specific relations
    
       implement a broadcast bus to exhange cluster metadata facts
    
    dco-required 
    opened by convolvatron 19
  • differential_datalog refactors

    differential_datalog refactors

    The start of an effort to refactor the rust code used by ddlog, this PR fixes some lints, makes some documentation show up to rustdoc, splits some components of the former program.rs into separate files. Lots of work still to do (especially with Program::run and transitioning from strings-as-errors to an error enum), but I'm going to try and keep each PR small-ish and contained so they're easier to review

    This PR is best reviewed commit-by-commit

    opened by Kixiron 19
  • Make the command-line libraries optional

    Make the command-line libraries optional

    Currently the cmd_parser generated crate is included in every build of a generated project when they aren't needed for users who use the generated code solely as a library. The cmd_parser crate and the cli interface add a number of unnecessary dependencies for this kind of use case:

    • rustop
    • nom
    • num
    • rustyline

    Similarly the ovsdb-related libraries and generated crates, the generated main.rs and the generated C headers can all be toggled for users who just don't need them. ovsdb itself adds serde_json and uuid. All in all, this can significantly reduce compile times and the dependency load of users who just want a library.

    There's also more dependency culling that can be done, serde and abomination can be an optional features (like flatbuf) which removes the dependencies and the generated code which is a double-win, twox-hash doesn't seem to be used anywhere so it can probably be removed, num-traits could possibly be removed

    opened by Kixiron 17
  • Status of aggregations in Souffle converter

    Status of aggregations in Souffle converter

    We are investigating the performance of differential-datalog vs Souffle for Doop analyses. Aggregations cause an error in the Souffle converter. For example, this Doop Datalog line:

    _MethodLookup_ClosestInterface(?simplename, ?descriptor, ?type, ?method) :-
      _MethodLookup_MoreThanOne(?simplename, ?descriptor, ?type),
      ?minLen = min ?len : { _MethodLookup_WithLen(?simplename, ?descriptor, ?type, _, ?len) },
      _MethodLookup_WithLen(?simplename, ?descriptor, ?type, ?method, ?minLen),
      !_MethodLookup_ClassResolution(?simplename, ?descriptor, ?type, _).
    

    produces the following error when parsed via convert.py:

    parglare.exceptions.ParseError: Error at 1:60917:"Len = min *?len : { _" => Expected: ( but found <?(?)>
    

    In the tutorial, "Aggregation" is marked as "TODO", while in the language reference, it reads: "aggregate functions can only be defined in Rust and imported to a DDlog program as extern function."

    I understand at least syntax support is needed (convert.pg) but we are not sure how the Rust function plugs in the generated code for the example above.

    enhancement question 
    opened by gfour 17
  • Added the FromRecord and IntoRecord derive macros

    Added the FromRecord and IntoRecord derive macros

    • Created a derive macro for the FromRecord and IntoRecord traits to allow end users to easily implement FromRecord and IntoRecord automatically without any boilerplate. Declarative macros exist within differential_datalog, but they're difficult to learn, use and must be manually updated and so are not ideal as a user-facing API (Whether the derives should be used in generated code I'll leave to your judgment)

    These are the first two of (hopefully) 3 macros if all goes well: FromRecord, IntoRecord and Mutator

    opened by Kixiron 15
  • "Error: cannot convert -4 to u32" (via Souffle translator)

    The problem: running a Datalog analysis in Doop with DDlog fails with this error:

    Error: cannot convert -4 to u32
    

    I don't know if the error comes from subtraction or overflowing addition but there seems to be a misrepresentation of signed integers as unsigned ones. For example, in this test Souffle program:

    .decl X(n: number)
    .output X
    
    X(10).
    X(-1).
    

    the souffle.dump output shows -1 as the maximum 32-bit unsigned integer:

    RX{10}
    RX{4294967295}
    

    You can test it via Doop:

    export DDLOG_DIR=path/to/differential-datalog
    ./doop -i http://centauri.di.uoa.gr:8081/artifactory/Demo-benchmarks/test-resources/006-hello-world-1.2.jar -a micro --id hello-world --Xstats-none --via-ddlog -Ldebug
    
    opened by gfour 14
  • Major refactor: Move HDDlog to the differential_datalog crate

    Major refactor: Move HDDlog to the differential_datalog crate

    Moves the HDDlog struct from the generated code into the main differential_datalog crate

    Note: This is a very rough PR right now, I've only fixed up the immediate compile errors and none of the generated code or tests yet

    dco-required 
    opened by Kixiron 13
  • Loosen function type

    Loosen function type

    This is an incomplete PR for #983 to loosen the restriction on function types. For now I only changed four function types FilterMapFunc, FilterFunc, JoinFunc, ArrangeFunc for a review and will add another commit to change the rest of the function types if you guys think everything is good without breaking anything.

    opened by qishen 13
  • `ddlog -o` option has weird behavior

    `ddlog -o` option has weird behavior

    ddlog --help says the following about -o:

      -o DIR   --output-dir=DIR                 Output directory (default based on program name).
    

    To me, this means that the output directory is based on the program name by default. Sure enough, if I run ddlog -i foo.dl I end up with output in foo_ddlog. Cool.

    So, if I run ddlog -i foo.dl -o bar, I'll get an output directory not based on the program name but named bar instead, right? No. Actually it's still based on the program name, just nested inside the bar directory:

    [blp@sigxcpu]$ ddlog -i foo.dl -o bar
    Finished compiling "foo.dl" in 0.76s
    [blp@sigxcpu]$ ls bar/
    foo_ddlog
    [blp@sigxcpu]$ 
    

    This doesn't do what I expect. I don't know whether the implementation is wrong (so that the code should be changed to do what I expect) or the usage message is wrong (so that the usage message should be changed to something like "output base directory (default is .)".

    opened by blp 2
  • Type constraints for right-hand operand of shift operators

    Type constraints for right-hand operand of shift operators

    The following program:

    input relation X(x: bit<16>, y: bit<16>)
    output relation Y(x: bit<16>, y: bit<16>, z: bit<16>)
    Y(x, y, x << y) :- X(x, y).
    

    yields the following compiler error that I don't expect (with release v1.2.3 and earlier releases):

    error: /home/blp/nerpa/nerpa/ofp4/test.dl:3.6-3.7: Unsatisfiable bit-width constraint '16 = 0' in
    expected type: bit<16>
    actual type: bit<32>
    in
    expression 'y'
    Y(x, y, x << y) :- X(x, y).
         ^
    

    I can fix it by casting y to u32 in the shift expression, like this:

    Y(x, y, x << (y as u32)) :- X(x, y).
    

    I would prefer that << (and presumably >>) accept any integer type as its right-hand operand.

    opened by blp 3
  • Dev dependencies in prod release

    Dev dependencies in prod release

    The generated Cargo.toml files contain dependencies that don't look production ready.

    #differential-dataflow = "0.11.0"
    differential-dataflow = { git = "https://github.com/ddlog-dev/differential-dataflow", branch = "ddlog-4" }
    #timely = "0.11"
    timely = { git = "https://github.com/ddlog-dev/timely-dataflow", branch = "ddlog-4", default-features = false }
    

    I understand this might be done during development to fix a bug or add a feature, but now the forks have fallen behind. What would it take to re-point these dependencies to official releases on crates.io?

    opened by yhack 2
  • Stopped instance should return a meaningful error message [was: Error when trying to delete a relation].

    Stopped instance should return a meaningful error message [was: Error when trying to delete a relation].

    I'm using the Rust API to interact with a DDlog program and I keep getting the following error when I try to delete any relation at runtime (using just hddlog.apply_updates(&mut delete_updates.into_iter()).unwrap(); where delete_updates has updates of type Update::DeleteValue):

    thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: "failed to communicate with timely dataflow thread 0"'
    

    I am certain that the relation I want to delete is inserted beforehand (and I also don't have any problems with the initial insertions). Any thoughts as to what issues might be causing the failed to communicate with timely dataflow thread 0 error (which seems very unspecific), or how I could go about debugging to find the issue given I don't know much about timely dataflow?

    opened by tatiana-s 5
  • SQL to DDlog compiler cannot compile JOINs with aliases with the same name as the original identifier

    SQL to DDlog compiler cannot compile JOINs with aliases with the same name as the original identifier

    Suppose you have the following tables:

    create table t1(column1 integer, column2 boolean)
    create table t2(column1 integer)
    

    The following view fails to compile:

    create view v0 as SELECT DISTINCT t1.column1 as new_column 
                    from t1 as t1 
                    JOIN t2 as alias2 ON t1.column1 = alias2.column1
    

    with the following exception:

    com.vmware.ddlog.translator.TranslationException: Column 't1.column1' not in one of the joined relation
    t1.column1 line: 1 column: 94
    
    bug 
    opened by amytai 0
  • Reduce dependency bloat

    Reduce dependency bloat

    • Started trying to remove num from the dependency tree
    • Removed some of the "default dependencies" from generated crates, this should allow better compilation scheduling since crates don't have to wait on unnecessary dependencies
    • Discovered that some files are in CLRF form, this will be fixed in a later PR
    • Removed dependencies on fnv and twox-hash and replaced with a single dependence on xxhash-rust, it has zero dependencies and is very light
    • Removed old dependence on time for all generated crates, this reduced the dependency tree a good bit
    • Minor refactors to some massive match statements
    • Added some documentation to the DDVal/DDValue innards
    dco-required 
    opened by Kixiron 4
Releases(v1.2.3)
Owner
VMware
VMware
A simple programming language for everyone.

Slang A simple programming language for everyone, made with Rust. State In very early stages. Plan is to create a byte-code compiler and make that exe

Slang, Inc. 11 Jul 1, 2022
A programming language. Better mantra pending.

Dusk Dusk is a programming language I've been working on on and off for the past while now. It's still very much in its infancy (... a quick look thro

Kaylynn Morgan 14 Oct 24, 2022
Ruxnasm is an assembler for Uxntal — a programming language for the Uxn stack-machine by Hundred Rabbits

Ruxnasm is an assembler for Uxntal — a programming language for the Uxn stack-machine by Hundred Rabbits. Ruxnasm strives to be an alternative to Uxnasm, featuring more user-friendly error reporting, warnings, and helpful hints, reminiscent of those seen in modern compilers for languages such as Rust or Elm.

Karol Belina 44 Oct 4, 2022
Frame is a markdown language for creating state machines (automata) in 7 programming languages as well as generating UML documentation.

Frame Language Transpiler v0.5.1 Hi! So very glad you are interested in Frame. Frame system design markdown language for software architects and engin

Mark Truluck 35 Dec 31, 2022
beat saber is a strongly typed, self-documenting and highly performant programming language

beatsaber beat saber is a strongly typed, self-documenting and highly performant programming language. With beat saber we aimed to create a language t

Untitled 4 Jan 17, 2022
Analogous, indented syntax for the Rust programming language.

Note: After experimenting with this in the wild, I have found representing keywords as symbols to be far less readable in large codebases. Additionall

null 42 Oct 2, 2021
A cell-based esoteric programming language

Tape A cell-based esoteric programming language Tape is a cell-based, brainfuck-like programming language that has a readable syntax and a non-wasted

Gabriel Pacheco 2 Feb 23, 2022
Programming language from down under, inspired by this Reddit post.

aussie++ Programming language from down under, inspired by this Reddit post. View live demo here. Special thanks to MarkWhyBird, louis100, and others

Zack Radisic 562 Dec 27, 2022
Programming Language Inspired by Brainfuck

Brainsuck Brainfuck but not really... like... a better version of it. Installation Requirements: Rust version 1.50 or higher Linux curl https://raw.gi

Derin Önder Eren 27 Nov 18, 2022
A Star Wars inspired by programming language

The Force The Force is a gateway to abilities many believe are unnatural... Getting Started Install Rust. We also provide a Dev Container if you would

Matthew Booe 14 Dec 21, 2022
🤯 A brainf*ck-style programming language, but readable

?? Brainease Brainease is a brainf*ck-style programming language, but readable. $ cargo install brainease # brainease -f examples/hello.bz save 'H

Arthur Fiorette 11 Sep 30, 2022
Simple programming language that speaks the ones you already know!

Simple programming language that speaks the ones you already know!

LyonSyonII 2 Feb 12, 2022
simple, C-like programming language

CUP: C(ompiler) U(nder) P(rogress) A badly named, in-progress programming language just to learn how these things work. Wait, doesn't everyone write a

Mustafa Quraish 287 Dec 28, 2022
A fast & light weight Discord Client made with love using the Rust programming language.

LemonCord A fast & light-weight Discord Client written in Rust using the wry crate. Features Fast, light-weight, easy to use. 100% Open sourced. No su

Lemon Rose 5 Jan 30, 2023
A simple interpreter written in Rust programming language.

Interpreter in Rust A simple interpreter written in Rust programming language. Getting Started These instructions will get you a copy of the project u

Ismail Mosleh 5 Feb 17, 2023
Compiler & Interpreter for the (rather new and very experimental) Y programming language.

Y Lang Why would anyone build a new (and rather experimental) language with no real world use case. Design Y (pronounced as why) is based on the idea

Louis Meyer 8 Mar 5, 2023
The official programming language of the Hasso Plattner Institute.

Die HPI Programmiersprache Die offizielle Programmiersprache des HPI. Anmerkung: Dieses Projekt soll niemanden beleidigen oder bloßstellen, alles bitt

null 7 Oct 22, 2023
A language server implementation for the WGSL shading language

wgsl-analyzer wgsl-analyzer is a language server plugin for the WGSL Shading language. It comes with a VS Code plugin located in ./editors/code, but d

null 155 Jan 2, 2023
Starting with Rust programming, simple UAL

AVR-like Instruction set simulator Starting with Rust programming, simple UAL. Concepts learned: ownership structure patern matching data collection e

null 3 Dec 10, 2022