A general-purpose, transactional, relational database that uses Datalog and focuses on graph data and algorithms

Overview

GitHub Workflow Status Crates.io GitHub

cozo

A general-purpose, transactional, relational database that uses Datalog for query and focuses on graph data and algorithms.

Features

  • Relational database with Datalog as the query language
  • Recursive queries, especially recursion through (safe) aggregation, capable of expressing complex graph operations and algorithms
  • Fixed rules providing efficient whole-graph algorithms which integrate seamlessly with Datalog
  • Rich set of built-in functions and aggregations
  • Only a single executable, trivial to deploy and run
  • Embeddable, can run in the same process as the application
  • Easy to use from any programming language
  • Special support for Jupyter notebooks for integration with the Python DataScience ecosystem
  • Modern, clean, flexible syntax, informative error messages

Teasers

Here *route is a relation with two columns src and dst, representing a route between those airports.

Find airports reachable by one stop from Frankfurt Airport (code FRA):

?[dst] := *route{src: 'FRA', dst: stop}, 
          *route{src: stop, dst}

Find airports reachable from Frankfurt with any number of stops with code starting with the letter A:

reachable[dst] := *route{src: 'FRA', dst}
reachable[dst] := reachable[src], *route{src, dst}
?[airport] := reachable[airport], starts_with(airport, 'A')

Compute the shortest path between Frankfurt and all airports in the world:

shortest_paths[dst, shortest(path)] := *route{src: 'FRA', dst},
                                       path = ['FRA', dst]
shortest_paths[dst, shortest(path)] := shortest_paths[stop, prev_path], 
                                       *route{src: stop, dst},
                                       path = append(prev_path, dst)
?[dst, path] := shortest_paths[dst, path]

Compute the shortest path again, but with built-in algorithm:

starting[airport] := airport = 'FRA'
?[src, dst, cost, path] <~ ShortestPathDijkstra(*route[], starting[])

Learning Cozo

  • Start with the Tutorial to learn the basics;
  • Continue with the Manual to understand the fine points.

Bug reports, discussions

If you encounter a bug, first search for past issues to see if it has already been reported. If not, open a new issue. Please provide sufficient information so that we can diagnose the problem faster.

Other discussions about Cozo should be in GitHub discussions.

Use cases

As Cozo is a general-purpose database, it can be used in situations where traditional databases such as PostgreSQL and SQLite are used. However, Cozo is designed to overcome several shortcomings of traditional databases, and hence fares especially well in specific situations:

  • You have a lot of interconnected relations and the usual queries need to relate many relations together. In other words, you need to query a complex graph.
    • An example is a system granting permissions to users for specific tasks. In this case, users may have roles, belong to an organization hierarchy, and tasks similarly have organizations and special provisions associated with them. The granting process itself may also be a complicated rule encoded as data within the database.
    • With a traditional database, the corresponding SQL tend to become an entangled web of nested queries, with many tables joined together, and maybe even with some recursive CTE thrown in. This is hard to maintain, and worse, the performance is unpredictable since query optimizers in general fail when you have over twenty tables joined together.
    • With Cozo, on the other hand, Horn clauses make it easy to break the logic into smaller pieces and write clear, easily testable queries. Furthermore, the deterministic evaluation order makes identifying and solving performance problems easier.
  • Your data may be simple, even a single table, but it is inherently a graph.
    • We have seen an example in the Tutorial: the air route dataset, where the key relation contains the routes connecting airports.
    • In traditional databases, when you are given a new relation, you try to understand it by running aggregations on it to collect statistics: what is the distribution of values, how are the columns correlated, etc.
    • In Cozo you can do the same exploratory analysis, except now you also have graph algorithms that you can easily apply to understand things such as: what is the most connected entity, how are the nodes connected, and what are the communities structure within the nodes.
  • Your data contains hidden structures that only become apparent when you identify the scales of the relevant structures.
    • Examples are most real networks, such as social networks, which have a very rich hierarchy of structures
    • In a traditional database, you are limited to doing nested aggregations and filtering, i.e. a form of multifaceted data analysis. For example, you can analyze by gender, geography, job or combinations of them. For structures hidden in other ways, or if such categorizing tags are not already present in your data, you are out of luck.
    • With Cozo, you can now deal with emergent and fuzzy structures by using e.g. community detection algorithms, and collapse the original graph into a coarse-grained graph consisting of super-nodes and super-edges. The process can be iterated to gain insights into even higher-order emergent structures. This is possible in a social network with only edges and no categorizing tags associated with nodes at all, and the discovered structures almost always have meanings correlated to real-world events and organizations, for example, forms of collusion and crime rings. Also, from a performance perspective, coarse-graining is a required step in analyzing the so-called big data, since many graph algorithms have high complexity and are only applicable to the coarse-grained small or medium networks.
  • You want to understand your live business data better by augmenting it into a knowledge graph.
    • For example, your sales database contains product, buyer, inventory, and invoice tables. The augmentation is external data about the entities in your data in the form of taxonomies and ontologies in layers.
    • This is inherently a graph-theoretic undertaking and traditional databases are not suitable. Usually, a dedicated graph processing engine is used, separate from the main database.
    • With Cozo, it is possible to keep your live data and knowledge graph analysis together, and importing new external data and doing analysis is just a few lines of code away. This ease of use means that you will do the analysis much more often, with a perhaps much wider scope.

Status of the project

Cozo is very young and not production-ready yet, but we encourage you to try it out for your use case. Any feedback is welcome.

Versions before 1.0 do not promise syntax/API stability or storage compatibility. We promise that when you try to open database files created with an incompatible version, Cozo will at least refuse to start instead of silently corrupting your data.

Plans for development

In the near term, before we reach version 1.0:

  • Backup/restore functionality
  • Many, many more tests to ensure correctness
  • Benchmarks

Further down the road:

  • More tuning options
  • Streaming/reactive data
  • Extension system
    • The core of Cozo should be kept small at all times. Additional functionalities should be in extensions for the user to choose from.
    • What can be extended: datatypes, functions, aggregations, and fixed algorithms.
    • Extensions should be written in a compiled language such as Rust or C++ and compiled into a dynamic library, to be loaded by Cozo at runtime.
    • There will probably be a few "official" extension bundles, such as
      • arbitrary precision arithmetic
      • full-text "indexing" and searching
      • relations that can emulate spatial and other types of non-lexicographic indices
      • reading from external databases directly
      • more exotic graph algorithms

Ideas and discussions are welcome.

Storage engine

Cozo is written in Rust, with RocksDB as the storage engine (this may change in the future). We manually wrote the C++/Rust bindings for RocksDB with cxx.

Licensing

The contents of this project are licensed under AGPL-3.0 or later, except:

  • Files under cozorocks/ are licensed under MIT, or Apache-2.0, or BSD-3-Clause;
  • Files under docs/ are licensed under CC BY-SA 4.0.
Comments
  • Bulk ingestion

    Bulk ingestion

    Hi, I'm very excited seeing this project as it seem to fit perfectly what I need very soon.

    My initial question would be concerning efficient ingestion of base facts. My current use case is provenance tracking combined with analytics results in distributed environments, so there'll potentially be lots of largish chunks of records. Right now the only API I can see is building a "query' string with a list of parameters. Is there a different option?

    It's not a showstopper for me right now, but would be nice knowing on what your current thinking regarding a roadmap is.

    Thanks

    opened by maxott 5
  • Conda packages?

    Conda packages?

    Hi there, brilliant project, thanks a lot! Are there any plans to build and provide conda packages for cozo? I have never packaged a Python package that wraps Rust code, so no idea if it's easily doable and I can just do it myself, or whether it requires an involved build setup...

    opened by makkus 4
  • Terminal REPL

    Terminal REPL

    I just discovered CoZo, and with disappointment, I noticed that the simplest way to use it is through a browser, which is not that simple at all (not to mention security implications of leaving open network sockets even on localhost).

    So I made a terminal-based REPL: https://paste.debian.net/1266001/

    It's basic, but it works. Type a space to enter multi-line editing.

    Are you interested in having that? Should I submit it as a binary in cozo? Or a separate crate?

    opened by rhn 3
  • Suggestions for a more elaborate Hello World example

    Suggestions for a more elaborate Hello World example

    I suggest adding a tad bit more advanced hello world example, to quickly get people to understand how to register facts, and to query them with an advanced query.

    I don't know (yet) how to represent this in cozo, but in Prolog, I often do something like this (might make some syntax mistake, but I hope you get the idea):

    So, saving this into facts.pl

    parent(joseph, jakob).
    parent(jakob, isaac).
    parent(isaac, abraham).
    
    grandparent(Grandchild, Grandparent) :-
        parent(Grandchild, Middleperson),
        parent(Middleperson, Grandparent).
    

    ... and then running this in a swipl shell:

    ?- grandparent(jakob, Who).
    Who = abraham.
    
    ?- grandparent(joseph, Who).
    Who = isaac.
    
    ?- 
    

    To me, this shows:

    1. How to assert facts that are relations
    2. How to define queries based on relation facts
    3. How to query this query with different inputs

    Do you think something similar would make sense as a hello world example for cozo?

    opened by samuell 3
  • suggestion: don't unify variables named _ (underscore) in rule body

    suggestion: don't unify variables named _ (underscore) in rule body

    Hello, I'd like to suggest not unifying instances of _ variables. That is, treat each instance of _ as separate.

    r1[] <- [[1, 'a'], [2, 'b']]
    r2[] <- [[2, 'B'], [3, 'C']]
    
    ?[l1, l2] := r1[_ , l1], r2[_ , l2]
    # actual behavior
    # ?[l1, l2] := r1[a , l1], r2[a , l2]
    # expected behavior
    # ?[l1, l2] := r1[a, l1], r2[b, l2]
    

    Some prior art, rego of Open Policy Agent behaves this way: https://play.openpolicyagent.org/p/5qfwdxgG5k

    opened by ear7h 2
  • Performance issue (or infinite loop)

    Performance issue (or infinite loop)

    I've tried a variation of this example present in tutorial:

    shortest[b, min(dist)] := *route{fr: 'LHR', to: b, dist} 
                              # Start with the airport 'LHR', retrieve a direct route from 'LHR' to b
    
    shortest[b, min(dist)] := shortest[c, d1], # Start with an existing shortest route from 'LHR' to c
                              *route{fr: c, to: b, dist: d2},  # Retrieve a direct route from c to b
                              dist = d1 + d2 # Add the distances
    
    ?[dist] := shortest['YPO', dist] # Extract the answer for 'YPO'. 
                                     # We chose it since it is the hardest airport to get to from 'LHR'.
    

    Changing it in this way:

    shortest[a, b, min(dist)] := *route{fr: a, to: b, dist} 
    shortest[a, b, min(dist)] := shortest[a, c, d1],
                              *route{fr: c, to: b, dist: d2},
                              dist = d1 + d2
    
    ?[dist] := shortest['LHR', 'YPO', dist]
    

    Despite it should be an equivalent query I don't get any result in reasonable time in https://cozodb.github.io/wasm-demo/

    I don't think it is expected, am I missing something?

    opened by Abramo-Bagnara 2
  • `cozo-node` install fails with `TAR_BAD_ARCHIVE: Unrecognized archive format`

    `cozo-node` install fails with `TAR_BAD_ARCHIVE: Unrecognized archive format`

    When trying to do pnpm add cozo-node or npm install cozo-node, I'm getting a TAR_BAD_ARCHIVE: Unrecognized archive format error both under NodeJS v14 and v19. Full stacktrace:

    .../node_modules/cozo-node install$ node-pre-gyp install
    │ node-pre-gyp info it worked if it ends with ok
    │ node-pre-gyp info using [email protected]
    │ node-pre-gyp info using [email protected] | linux | x64
    │ node-pre-gyp info check checked for "/tmp/cozo-demo/node_modules/.pnpm/[email protected]/node_modules/cozo-node/native/6/index.node" (not found)
    │ node-pre-gyp http GET https://github.com/cozodb/cozo-lib-nodejs/releases/download/0.3.0/6-linux-x64.tar.gz
    │ node-pre-gyp ERR! install TAR_BAD_ARCHIVE: Unrecognized archive format 
    │ node-pre-gyp ERR! install error 
    │ node-pre-gyp ERR! stack Error: TAR_BAD_ARCHIVE: Unrecognized archive format
    │ node-pre-gyp ERR! stack     at Unpack.warn (/tmp/cozo-demo/node_modules/.pnpm/[email protected]/node_modules/tar/lib/warn-mixin.js:21:40)
    │ node-pre-gyp ERR! stack     at Unpack.warn (/tmp/cozo-demo/node_modules/.pnpm/[email protected]/node_modules/tar/lib/unpack.js:229:18)
    │ node-pre-gyp ERR! stack     at Unpack.<anonymous> (/tmp/cozo-demo/node_modules/.pnpm/[email protected]/node_modules/tar/lib/parse.js:83:14)
    │ node-pre-gyp ERR! stack     at Unpack.emit (events.js:412:35)
    │ node-pre-gyp ERR! stack     at Unpack.[emit] (/tmp/cozo-demo/node_modules/.pnpm/[email protected]/node_modules/tar/lib/parse.js:303:12)
    │ node-pre-gyp ERR! stack     at Unpack.[maybeEnd] (/tmp/cozo-demo/node_modules/.pnpm/[email protected]/node_modules/tar/lib/parse.js:426:17)
    │ node-pre-gyp ERR! stack     at Unpack.[consumeChunk] (/tmp/cozo-demo/node_modules/.pnpm/[email protected]/node_modules/tar/lib/parse.js:458:21)
    │ node-pre-gyp ERR! stack     at Unzip.<anonymous> (/tmp/cozo-demo/node_modules/.pnpm/[email protected]/node_modules/tar/lib/parse.js:372:29)
    │ node-pre-gyp ERR! stack     at Unzip.emit (events.js:412:35)
    │ node-pre-gyp ERR! stack     at Unzip.[emitEnd2] (/tmp/cozo-demo/node_modules/.pnpm/[email protected]/node_modules/minipass/index.js:524:23)
    │ node-pre-gyp ERR! System Linux 5.15.0-53-generic
    │ node-pre-gyp ERR! command "/usr/local/bin/node" "/tmp/cozo-demo/node_modules/.pnpm/@[email protected]/node_modules/@mapbox/node-pre-gyp/bin/node-pre-gyp" "install"
    │ node-pre-gyp ERR! cwd /tmp/cozo-demo/node_modules/.pnpm/[email protected]/node_modules/cozo-node
    │ node-pre-gyp ERR! node -v v14.21.2
    │ node-pre-gyp ERR! node-pre-gyp -v v1.0.10
    │ node-pre-gyp ERR! not ok 
    │ TAR_BAD_ARCHIVE: Unrecognized archive format
    
    opened by loveencounterflow 2
  • Document storage strategies/architecture

    Document storage strategies/architecture

    I hunted around for this information briefly but couldn't find it. How are data and relations stored in cozo?

    The primary thing I'm wondering is if Cozo is row-oriented or column-oriented. General-purpose databases have traditionally all stored rows contiguously, but storing similar data types continuously opens up big opportunities for compression and analytics workflows.

    opened by vlmutolo 2
  • Comparison with Surrealdb

    Comparison with Surrealdb

    Cozo and SurrealDB https://surrealdb.com/ are two new and innovative databases and both DBs can learn from each other. I don't want to use decades old relational databases for a new project so I am looking for a high level or If possible detailed comparison between Cozo and SurrealDB.

    opened by ansarizafar 1
  • Failed running rust example from readme

    Failed running rust example from readme

    Hi, great idea for a db! Looking forward towards trying it out a bit more.

    I tried the Rust example from the readme and it's broken, this worked for me:

    use miette::Result;
    
    fn main() -> Result<()> {
        let db = Db::new("_test_db")?;
        println!("{}", db.run_script_str(r#"?[] <- [['hello', 'world!']]"#, ""));
        println!("{}", db.run_script_str(r#"?[] <- [['hello', 'world', $name]]"#, r#"{"name":"Rust"}"#));
        println!("{}", db.run_script_str(r#"?[a] <- [[1, 2]]"#, ""));
    
        Ok(())
    }
    
    opened by vladan 1
  • Usage of functions is not clear

    Usage of functions is not clear

    The list functions are documented, but it is not clear from the manual or the tutorial how to apply them.

    For example, with the list function... other than calling list(1, 2, 3) in the right-hand side of a rule, how would it be used?

    It doesn't seem to work as an aggregator.

    opened by mtnygard 1
Releases(v0.4.1)
Owner
null
A simplified general-purpose queueing system for Rust apps.

A simplified general-purpose queueing system for Rust apps. Example // Create a new Redeez object, and define your queues let mut queues = Redeez::new

Miguel Piedrafita 11 Jan 16, 2023
Umpteen is a general-purpose programming language currently in active development being bootstrapped from Rust

The Umpteen Programming Language This repository provides the reference implementation for the Umpteen Programming Language, bootstrapped from Rust. U

Katie Janzen 4 Nov 20, 2023
General purpose cross-platform GIS-rendering library written in Rust

Galileo is a general purpose cross-platform geo-rendering library. Web examples Raster tile layer (OSM) Vector tile layer (Maplibre) Use buttons at th

Maxim 16 Dec 15, 2023
A general purpose Lisp🛸 intended for use as Sage's preprocessor language

sage-lisp This crate implements a standalone Lisp implementation, intended for use in the Sage preprocessor. (do (defun fact (n) (if (<=

adam mcdaniel 3 Apr 10, 2024
Log-structured, transactional virtual block device backed by S3

mvps Log-structured, transactional virtual block device compatible with the NBD protocol. mvps stands for "multi-versioned page store". MVPS can store

Heyang Zhou 3 Dec 3, 2023
Generate Soufflé Datalog types, relations, and facts that represent ASTs from a variety of programming languages.

treeedb treeedb makes it easier to start writing a source-level program analysis in Soufflé Datalog. First, treeedb generates Soufflé types and relati

Langston Barrett 16 Nov 30, 2022
General Rust Actix Applications and AWS Programming Utilities

RUST Actix-Web Microservice Our Rust Beginners Kit for Application Development A collection of sample code using the actix rust framework to A) Develo

IntelliConnect Technologies 58 Nov 21, 2022
Fuzzy a general fuzzy finder that saves you time in rust!

Life is short, skim! Half of our life is spent on navigation: files, lines, commands… You need skim! It is a general fuzzy finder that saves you time.

Jinzhou Zhang 3.7k Jan 8, 2023
The module graph logic for Deno CLI

deno_graph The module graph/dependency logic for the Deno CLI. This repository is a Rust crate which provides the foundational code to be able to buil

Deno Land 67 Dec 14, 2022
Gping - Ping, but with a graph

gping ?? Ping, but with a graph. Table of Contents Install ?? Usage ?? Install ?? macOS Homebrew: brew install gping MacPorts: sudo port install gping

Tom Forbes 7k Dec 30, 2022
Red-blue graph problem solver - Rust implementation

Red-blue graph problem solver - Rust implementation The problem is the following: In a directed graph, each node is colored either red or blue. Furthe

Thomas Prévost 2 Jan 17, 2022
Uses the cardano mini-protocols to receive every block and transaction, and save them to a configurable destination

cardano-slurp Connects to one or more cardano-node's, streams all available transactions, and saves them to disk (or to S3) in raw cbor format. Usage

Pi Lanningham 16 Jan 31, 2023
Conference Monitoring Project based on Image Recognition that uses Rust Language and AWS Rekognition service to get the level of image similarity.

Conference Monitoring System based on Image Recognition in Rust This is a Conference Monitoring Project based on Image Recognition that uses Rust Lang

Pankaj Chaudhary 6 Dec 18, 2022
An alternative to `qcell` and `ghost-cell` that instead uses const generics

Purpose This crate is another attempt at the ghost-cell / qcell saga of cell crates. This provides an alternative to std::cell::RefCell that can allow

SpencerBeige 5 Feb 9, 2023
Like grep, but uses tree-sitter grammars to search

tree-grepper Works like grep, but uses tree-sitter to search for structure instead of strings. Installing This isn't available packaged anywhere. That

Brian Hicks 219 Dec 25, 2022
📺(tv) Tidy Viewer is a cross-platform CLI csv pretty printer that uses column styling to maximize viewer enjoyment.

??(tv) Tidy Viewer is a cross-platform CLI csv pretty printer that uses column styling to maximize viewer enjoyment.

Alex Hallam 1.8k Jan 2, 2023
hexyl is a simple hex viewer for the terminal. It uses a colored output to distinguish different categories of bytes

hexyl is a simple hex viewer for the terminal. It uses a colored output to distinguish different categories of bytes (NULL bytes, printable ASCII characters, ASCII whitespace characters, other ASCII characters and non-ASCII).

David Peter 7.3k Dec 29, 2022
A terminal clock that uses 7-segment display characters

Seven-segment clock (7clock) 7clock.3.mp4 This is a clock for terminals that uses the Unicode seven-segment display characters added in Unicode 13.0.

Wesley Moore 4 Nov 11, 2022
ChatGPT-Code-Review is a Rust application that uses the OpenAI GPT-3.5 language model to review code

ChatGPT-Code-Review is a Rust application that uses the OpenAI GPT-3.5 language model to review code. It accepts a local path to a folder containing code, and generates a review for each file in the folder and its subdirectories.

Greg P. 15 Apr 22, 2023