New generation decentralized data warehouse and streaming data pipeline

kamu

Last update: Dec 22, 2022

Related tags

Data processing data-science sql spark jupyter blockchain open-data data-management flink data-as-code kamu open-data-fabric

Overview

World's first decentralized real-time data warehouse, on your laptop

Get Started

Watch this introductory video to see kamu in action.
Learn how to use kamu with this self-serve demo without needing to install anything.
Then follow the "Getting Started" section of our documentation to install the tool and try a bunch of examples.

About

kamu (pronounced kaˈmju) is an easy-to-use command-line tool for managing, transforming, and collaborating on structured data.

In short, it can be described as:

Decentralized data warehouse
A peer-to-peer stream processing data pipeline
Git for data
Blockchain-like ledger for data
Or even Kubernetes for data :)

Using kamu, any person or smallest organization can easily share structured data with the world. Data can be static or flow continuously. In all cases kamu will ensure that it stays:

Reproducible - i.e. you can ask the publisher "Give me the same exact data you gave me a year ago"
Verifiable - i.e. you can ask the publisher "Is this the exact data you had a year ago?"

Teams and data communities can then collaborate on cleaning, enriching, and aggregating data by building arbitrarily complex decentralized data pipelines. Following the "data as code" philosophy kamu doesn't let you touch data manually - instead, you transform it using Streaming SQL (we support multiple frameworks). This ensures that data supply chains are:

Autonomous - write query once and run it forever, no more babysitting fragile batch workflows
Low latency - get accurate results immediately, as new data arrives
Transparent - see where every single piece of data came from, who transformed it, and how
Collaborative - collaborate on data just like on Open Source Software

Data scientists, analysts, ML/AI researchers, and engineers can then:

Access fresh, clean, and trustworthy data in seconds
Easily keep datasets up-to-date
Safely reuse data created by the hard work of the community

The reuse is achieved by maintaining unbreakable lineage and provenance trail in tamper-proof metadata, which lets you assess the trustworthyness of data, no matter how many hands and transformation steps it went through.

In a larger context, kamu is a reference implementation of Open Data Fabric - a Web 3.0 protocol for providing timely, high-quality, and verifiable data for data science, smart contracts, web and applications.

Use Cases

In general, kamu is a great fit for cases where data is exchanged between several independent parties, and for (low to moderate frequency & volume) mission-critical data where high degree of trustworthiness and protection from malicious actors is required.

Examples:

Open Data

To share data outside of your organization today you have limited options:

You can publish it on some open data portal, but lose ownership and control of your data
You can deploy and operate some open-source data portal (like CKAN or Dataverse), but you probably have neither time nor money to do so
You can self-host it as a CSV file on some simple HTTP/FTP server, but then you are making it extremely hard for others to discover and use your data

Let's acknowledge that for organizations that produce the most valuable data (governments, hospitals, NGOs), publishing data is not part of their business. They typically don't have the incentives, expertise, and resources to be good publishers.

This is why the goal of kamu is to make data publishing cheap and effortless:

It invisibly guides publishers towards best data management practices (preserving history, making data reproducible and verifiable)
Adds as little friction as exporting data to CSV
Lets you host your data on any storage (FTP, S3, GCS, etc.)
Maintain full control and ownership of your data

As opposed to just the download counter you get on most data portals, kamu brings publishers closer with the communities allowing them to see who and how uses their data. You no longer send data into "the ether", but create a closed feedback loop with your consumers.

Science & Research

One of the driving forces behind kamu's design was the ongoing reproducibility crisis in science, which we believe to a large extent is caused by our poor data management practices.

After incidents like The Surgisphere scandal the sentiment in research is changing from assuming that all research is done in good faith, to considering any research unreliable until proven otherwise.

Data portals like Dataverse, Dryad, Figshare, and Zenodo are helping reproducibility by archiving data, but this approach:

Results in hundreds of millions of poorly systematized datasets
Tends to produce the research based on stale and long-outdated data
Creates lineage and provenance trail that is very manual and hard to trace (through published papers)

In kamu we believe that the majority of valuable data (weather, census, health records, financial core data) flows continuously, and most of the interesting insights lie around the latest data, so we designed it to bring reproducibility and verifiability to near real-time data.

When using kamu:

Your data projects are 100% reproducible using a built-in stable references mechanism
Your results can be reproduced and verified by others in minutes
All the data prep work (that often accounts for 80% of time of a data scientist) can be shared and reused by others
Your data projects will continue to function long after you've moved on, so the work done years ago can continue to produce valuable insights with minimal maintenance on your part
Continuously flowing datasets are much easier to systematize than the exponentially growing number of snapshots

Data-driven Journalism

Data-driven journalism is on the rise and has proven to be extremely effective. In the world of misinformation and extremely polarized opinions data provides us an anchoring point to discuss complex problems and analyze cause and effect. Data itself is non-partisan and has no secret agenda, and arguments around different interpretations of data are infinitely more productive than ones based on gut feelings.

Unfortunately, too often data has issues that undermine its trustworthiness. And even if the data is correct, it's very easy to pose a question about its sources that will take too long to answer - the data will be dismissed, and the gut feelings will step in.

This is why kamu's goal is to make data verifiably trustworthy and make answering provenance questions a matter of seconds. Only when data cannot be easily dismissed we will start to pay proper attention to it.

And once we agree that source data can be trusted, we can build analyses and real-time dashboards that keep track of complex issues like corruption, inequality, climate, epidemics, refugee crises, etc.

kamu prevents good research from going stale the moment it's published!

Business core data

kamu aims to be the most reliable data management solution that provides recent data while maintaining the highest degree of accountability and tamper-proof provenance, without you having to put all data in some central database.

We're developing it with financial and pharmaceutical use cases in mind, where audit and compliance could be fully automated through our system.

Note that we currently focus on mission-critical data and kamu is not well suited for IoT or other high-frequency and high-volume cases, but can be a good fit for insights produced from such data that influence your company's decisions and strategy.

Personal analytics

Being data geeks, we use kamu for data-driven decision-making even in our personal lives.

Actually, our largest data pipelines so far were created for personal finance:

to collect and harmonize data from multiple bank accounts
convert currencies
analyze stocks trading data.

We also scrape a lot of websites to make smarter purchasing decisions. kamu lets us keep all this data up-to-date with an absolute minimal effort.

Features

kamu connects publishers and consumers of data through a decentralized network and lets people collaborate on extracting insight from data. It offers many perks for everyone who participates in this first-of-a-kind data supply chain:

For Data Publishers

Easily share your data with the world without moving it anywhere
Retain full ownership and control of your data
Close the feedback loop and see who and how uses your data
Provide real-time, verifiable and reproducible data that follows the best data management practices

For Data Scientists

Ingest any existing dataset from the web
Always stay up-to-date by pulling latest updates from the data sources with just one command
Use stable data references to make your data projects fully reproducible
Collaborate on cleaning and improving data of existing datasets
Create derivative datasets by transforming, enriching, and summarizing data others have published
Write query once and run it forever - our pipelines require nearly zero maintenance
Built-in support for GIS data
Share your results with others in a fully reproducible and reusable form

For Data Consumers

Download a dataset from a shared repository
Verify that all data comes from trusted sources using 100% accurate lineage
Audit the chain of transformations this data went through
Validate that downloaded was not tampered with a single command
Trust your data by knowing where every single bit of information came from with our fine grain provenance

For Data Exploration

Explore data and run ad-hoc SQL queries (backed by the power of Apache Spark)
Launch a Jupyter notebook with one command
Join, filter, and shape your data using SQL
Visualize the result using your favorite library

Community

If you like what we're doing - support us by starring the repo, this helps us a lot!

Subscribe to our YouTube channel to get fresh tech talks and deep dives.

Stop by and say "hi" in our Discord Server - we're always happy to chat about data.

If you'd like to contribute start here.

Comments

How to use ODS

Lots of open data sets are using or are based on ODS - OpenDocument Spreadsheet, organizations like the Dutch government is moving into that direction.

Example: Download: https://rijksfinancien.nl/bestanden/agentschappen_rijksfinancien_nl.ods

Is it possible to use this type off files!
enhancement usability

opened by JvD007 5

`Thrift Server did not start: TimeoutError` error when launching SQL shell

And the error changed when I re-executed the same command:

(base) ➜  my-repo kamu sql
⠂ Starting Spark SQL shell
thread 'main' panicked at 'Thrift Server did not start: TimeoutError { duration: 60s, backtrace: <disabled> }', kamu-core/src/infra/explore/sql_shell_impl.rs:143:14
(base) ➜  my-repo kamu sql
⠚ Starting Spark SQL shell
thread 'main' panicked at 'Thrift server start script returned non-zero code: ExitStatusError(ExitStatusError(256))', kamu-core/src/infra/explore/sql_shell_impl.rs:130:18

Is there a way to restart the Thrift server and check the detailed log message?

bug need more info

opened by ihainan 3

Documentation and Installation question(s)
Some questions around installation and user documentation:

What do we need to install to get Kamu up and running, by not using Docker?

Is it possible to give examples on all the functions of the kamu-cli, the help is not giving the best answer what you can do with it. The Add en Pull is clear and sql and notebook also.

Some examples with Python, SparkR and maybe others

Create a dataset on S3 etc

TX, Jaco
documentation
opened by JvD007 3
[FeatureReq] : New engine - Apache Pulsar

Hey Folks,

The project looks awesome! I'd like to propose an app integration / new engine with Apache Pulsar. It's a streaming pub/sub platform with native support for local code for mutations/transforms of data. Each topic also supports AVRO resigstered with understanding of datamodel revisions.

-J
enhancement need more info

opened by verbunk 3
Kamu EventTime column can not have nulls exception

I am facting the issue with EventTime column, even though I have followed the date format as mentioned in the yaml file. please help yaml file start kind: DatasetSnapshot version: 1 content: name: Hiding Name kind: root metadata: - kind: setPollingSource fetch: kind: url url: Hiding URL read: kind: csv separator: "," header: true nullValue: "" preprocess: kind: sql engine: spark query: > SELECT CAST(id as BIGINT) as id, CAST(UNIX_TIMESTAMP(date, "yyyy-MM-dd") as TIMESTAMP) as date, username as username,name as name, tweet as tweet, language as language, mentions as mentions, urls as urls, photos as photos, replies_count as replies_count, retweets_count as retweets_count, likes_count as likes,hashtags as hashtags,link as link, retweet as retweet, quote_url as quote_url, video as video, thumbnail as thumbnail, reply_to as reply_to FROM input merge: kind: ledger primaryKey: - id - kind: setVocab eventTimeColumn: date yaml file end

Sample_data_to_upload.csv
need more info

opened by suresh852456 2
[4/7] Failed to update root dataset

By testing kuma on a data set I got the following error, the kuma examples are working fine on my system.

$kamu-cli pull hydro.input.3 [1/7] Checking for updates (hydro.input.3) Downloading hydro.input.3: [00:00:00] [##################################################################################################################################] 25.74KB/25.74KB (119.21MB/s, 0s) [4/7] Failed to update root dataset (hydro.input.3) 1 dataset(s) had errors

Summary of errors:

hydro.input.3: Ingest error: Engine error: Contract error: Engine did not write a response file, see log files for details: /home/jaco/.kamu/run/spark-jeCbeTBOlV.out.txt /home/jaco/.kamu/run/spark-jeCbeTBOlV.err.txt

Data set is locate at http://localhost/cameraregister-utrecht-csv.csv I have add the csv file and the yaml file in the zip file

I can't find what is going wrong with the csv file or yaml

Any help is welcome

[ hydro-test.zip

](url)

opened by JvD007 2
Bump flatbuffers from 2.0.0 to 22.9.29
Bumps flatbuffers from 2.0.0 to 22.9.29.

Release notes

Sourced from flatbuffers's releases.

v22.9.29

Changelog

What's Changed

Moves swift package to root of repository so it can be used directly … by @mustiikhalil in google/flatbuffers#7548

Rust soundness fixes by @tustvold in google/flatbuffers#7518

[TS] Make strict compliant and improve typings by @bjornharrtell in google/flatbuffers#7549

FlatBuffers Version 22.9.29 by @dbaileychess in google/flatbuffers#7557

New Contributors

@tustvold made their first contribution in google/flatbuffers#7518

Full Changelog: https://github.com/google/flatbuffers/compare/v22.9.24...v22.9.29

v22.9.24

Change Log

What's Changed

Disable Android Build by @dbaileychess in google/flatbuffers#7494

update android multidex setting by @dbaileychess in google/flatbuffers#7495

Updates cocoapods version by @mustiikhalil in google/flatbuffers#7497

[ISSUE-6268] returns NaN insteadof nan by @3axap4eHko in google/flatbuffers#7498

[C#] Prepares for official Nuget release by @dbaileychess in google/flatbuffers#7496

[CMake]: fix breaking find_package change (#7499) by @clanghans in google/flatbuffers#7502

Fixes issue with cocoapods failing to be published because of docc by @mustiikhalil in google/flatbuffers#7505

[Android] Remove maven dependency of flatbuffers and use source folder by @paulovap in google/flatbuffers#7503

[Java][FlexBuffers] throwing exception for untyped fixed vectors by @paulovap in google/flatbuffers#7507

Moves all of the swift test code into tests/swift by @mustiikhalil in google/flatbuffers#7509

Install BuildFlatBuffers.cmake by @dbaileychess in google/flatbuffers#7519

[Java][Flexbuffers] Add API to add nullables into the buffer. by @paulovap in google/flatbuffers#7521

remove travis config by @dbaileychess in google/flatbuffers#7522

prevent force_align attribute on enums by @dbaileychess in google/flatbuffers#7523

enabled cpp17 tests in CI by @dbaileychess in google/flatbuffers#7524

Replace bash JavaTest.sh with mvn test by @nick-someone in google/flatbuffers#7500

Bump junit from 4.13 to 4.13.1 in /java by @dependabot in google/flatbuffers#7526

[TS/JS] Move TS tests to dedicated folder and deps upgrade by @bjornharrtell in google/flatbuffers#7508

UnPackTo disable merge by default by @dbaileychess in google/flatbuffers#7527

Fix conform by @hs3366677 in google/flatbuffers#7532

[C++] Rare bad buffer content alignment if sizeof(T) != alignof(T) by @Naios in google/flatbuffers#7520

Upgrade grpc to 1.49.0 and make sure it builds by @meteorcloudy in google/flatbuffers#7538

[Python] Python fixed size array by @joshua-smith8 in google/flatbuffers#7529

Emit internal enums when swift_implementation_only by @pauley-unsaturated in google/flatbuffers#7545

FlatBuffers Version 22.9.24 by @dbaileychess in google/flatbuffers#7547

New Contributors

@3axap4eHko made their first contribution in google/flatbuffers#7498

@nick-someone made their first contribution in google/flatbuffers#7500

@hs3366677 made their first contribution in google/flatbuffers#7532

@Naios made their first contribution in google/flatbuffers#7520

@meteorcloudy made their first contribution in google/flatbuffers#7538

... (truncated)

Changelog

Sourced from flatbuffers's changelog.

22.9.29 (Sept 29 2022)

Rust soundness fixes to avoid the crate from bing labelled unsafe (#7518).

22.9.24 (Sept 24 2022)

20 Major releases in a row? Nope, we switched to a new versioning scheme that is based on date.

Python supports fixed size arrays now (#7529).

Behavior change in how C++ object API uses UnPackTo. The original intent of this was to reduce allocations by reusing an existing object to pack data into. At some point, this logic started to merge the states of the two objects instead of clearing the state of the packee. This change goes back to the original intention, the packed object is cleared when getting data packed into it (#7527).

Fixed a bug in C++ alignment that was using sizeof() instead of the intended AlignOf() for structs (#7520).

C# has an offical Nuget package now (#7496).

2.0.8 (Aug 29 2022)

Fix for --keep-prefix the was generating the wrong include statements for C++ (#7469). The bug was introduced in 2.0.7.

Added the Verifier::Options option struct to allow specifying runtime configuration settings for the verifier (#7489). This allows to skip verifying nested flatbuffers, a on-by-default change that was introduced in 2.0.7. This deprecates the existing Verifier constructor, which may be removed in a future version.

Refactor of tests/test.cpp that lead to ~10% speedup in compilation of the entire project (#7487).

2.0.7 (Aug 22 2022)

This is the first version with an explicit change log, so all the previous features will not be listed.

Verifier now checks that buffers are at least the minimum size required to be a flatbuffers (12 bytes). This includes nested flatbuffers, which previously could be declared valid at size 0.

Annotated binaries. Given a flatbuffer binary and a schema (or binary schema)

... (truncated)

Commits

c92e78a FlatBuffers Version 22.9.29 (#7557)

d243b90 [TS] Make strict compliant and improve typings (#7549)

374f8fb Rust soundness fixes (#7518)

dadbff5 Moves swift package to root of repository so it can be used directly … (#7548)

76ddae0 FlatBuffers Version 22.9.24 (#7547)

cfe157e Emit internal enums when swift_implementation_only (#7545)

4131158 [Python] Python fixed size array (#7529)

8804619 Upgrade grpc to 1.49.0 and make sure it builds (#7538)

72aa85a [C++] Rare bad buffer content alignment if sizeof(T) != alignof(T) (#7520)

bfceebb Fix conform (#7532)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
Bump axum-core from 0.2.7 to 0.2.8
Bumps axum-core from 0.2.7 to 0.2.8.

Release notes

Sourced from axum-core's releases.

axum-core - v0.2.8

Security

breaking: Added default limit to how much data Bytes::from_request will consume. Previously it would attempt to consume the entire request body without checking its length. This meant if a malicious peer sent an large (or infinite) request body your server might run out of memory and crash.

The default limit is at 2 MB and can be disabled by adding the new DefaultBodyLimit::disable() middleware. See its documentation for more details.

This also applies to String which used Bytes::from_request internally.

(#1346)

#1346: tokio-rs/axum#1346

Commits

fcc1c9d axum-core: Version 0.2.8

95e21c1 Limit size of request bodies in Bytes extractor (#1362)

3990c3a axum-extra: Version 0.3.7 (#1236)

849abb1 axum: Version 0.5.15 (#1235)

3215709 Fix accidental breaking change to FailedToDeserializeQueryString (#1233)

e6a75a2 axum: Version 0.5.14 (#1193)

2d5ac3e Update changelog

5e2c782 Improve build times by generating less IR (#1192)

592dc5f Add ws example showing how to pass data to callback (#1185)

ab1ccf8 Serialize Json<T> to Bytes instead of Vec<u8> in IntoResponse (#1178)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
Bump lz4-sys from 1.9.3 to 1.9.4
Bumps lz4-sys from 1.9.3 to 1.9.4.

Changelog

Sourced from lz4-sys's changelog.

1.24.0:

Update to lz4 1.9.4 (lz4-sys 1.9.4) - this fixes CVE-2021-3520, which was a security vulnerability in the core lz4 library

export the include directory of lz4 from build.rs

1.23.3 (March 5, 2022):

Update lz4 to 1.9.3

Add [de]compress_to_buffer to block API to allow reusing buffers (#16)

Windows static lib support

Support favor_dec_speed

Misc small fixes

1.23.2:

Update lz4 to 1.9.2

Remove dependency on skeptic (replace with build-dependency docmatic for README testing)

Move to Rust 2018 edition

1.23.0:

Update lz4 to v1.8.2

Add lz4 block mode api

1.22.0:

Update lz4 to v1.8.0

Remove lz4 redundant dependency to gcc #22 (thanks to Xidorn Quan)

1.21.1:

Fix always rebuild issue #21

1.21.0:

Fix smallest 11-byte stream decoding (thanks to Niklas Hambüchen)

Update lz4 to v1.7.5

1.20.0:

Split out separate sys package #16 (thanks to Thijs Cadier)

1.19.173:

Update lz4 to v1.7.3

1.19.131:

Update dependencies for correct work with change build environmet via rustup override

1.18.131:

Implemented Send for Encoder/Decoder #15 (thanks to Maxime Lenoir)

... (truncated)

Commits

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1

Fix panic on pulling non-existing dataset

$ kamu list
┌─────────────────────────────────┬──────┬────────┬─────────┬──────┐
│              Name               │ Kind │ Pulled │ Records │ Size │
├─────────────────────────────────┼──────┼────────┼─────────┼──────┤
│ com.cryptocompare.ohlcv.eth-usd │ Root │   -    │       - │    - │
└─────────────────────────────────┴──────┴────────┴─────────┴──────┘

$ kamu pull zzz
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', kamu-core/src/infra/pull_service_impl.rs:165:45
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

opened by sergiimk 1

Bump async-graphql from 4.0.5 to 4.0.6
Bumps async-graphql from 4.0.5 to 4.0.6.

Changelog

Sourced from async-graphql's changelog.

[4.0.6] 2022-07-21

Limit recursive depth to 256 by default

Commits

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
Bump tokio from 1.23.0 to 1.23.1
Bumps tokio from 1.23.0 to 1.23.1.

Release notes

Sourced from tokio's releases.

Tokio v1.23.1

This release forward ports changes from 1.18.4.

Fixed

net: fix Windows named pipe server builder to maintain option when toggling pipe mode (#5336).

#5336: tokio-rs/tokio#5336

Commits

1a997ff chore: prepare Tokio v1.23.1 release

a8fe333 Merge branch 'tokio-1.20.x' into tokio-1.23.x

ba81945 chore: prepare Tokio 1.20.3 release

763bdc9 ci: run WASI tasks using latest Rust

9f98535 Merge remote-tracking branch 'origin/tokio-1.18.x' into fix-named-pipes-1.20

9241c3e chore: prepare Tokio v1.18.4 release

699573d net: fix named pipes server configuration builder

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies rust
opened by dependabot[bot] 0
Latest `kubo` breaks key-based IPNS sync

In kubo v0.15.0 and older IPNS lookups via HTTP gateway return invalid status code (400) breaking the sync flow.

Upstream issue: https://github.com/ipfs/kubo/issues/9514
3rd party issue

opened by sergiimk 0
Bump certifi from 2022.9.24 to 2022.12.7 in /images/jupyter
Bumps certifi from 2022.9.24 to 2022.12.7.

Commits

9e9e840 2022.12.07

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies python
opened by dependabot[bot] 0
Bump certifi from 2022.9.24 to 2022.12.7 in /images/demo/kamu
Bumps certifi from 2022.9.24 to 2022.12.7.

Commits

9e9e840 2022.12.07

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies python
opened by dependabot[bot] 0

SELinux support

User reported that kamu fails to pull a root dataset when installed on fresh Fedora host:

[4/7] Failed to update root dataset (ca.bankofcanada.exchange-rates.daily)

Summary of errors:
ca.bankofcanada.exchange-rates.daily: Ingest error: Engine error: Process error: Process exited with code 1, see log files for details:
- .kamu/run/spark-DNSwZEEJZl.err.txt

Error: Partial failure

Spark logs:

Exception in thread "main" java.nio.file.AccessDeniedException: /opt/engine/in-out/request.yaml
	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)

help wanted 3rd party issue

opened by sergiimk 2

CKAN examples

Are there some examples around the CKAN.API

For example I like to use this opendata set:

https://ckan.dataplatform.nl/dataset/speeltoestellen/resource/a8e4bb02-f072-424a-8663-5a5f0fe7c7c0

API call: https://ckan.dataplatform.nl/api/3/action/datastore_search?resource_id=a8e4bb02-f072-424a-8663-5a5f0fe7c7c0&limit=5

on this platform we have 1900 opendata sources, will be nice to have one example in github

Let me know
need more info

opened by JvD007 1

New generation decentralized data warehouse and streaming data pipeline

Related tags

Overview

Get Started

About

Use Cases

Features

Community

Comments

v22.9.29

What's Changed

New Contributors

v22.9.24

What's Changed

New Contributors

22.9.29 (Sept 29 2022)

22.9.24 (Sept 24 2022)

2.0.8 (Aug 29 2022)

2.0.7 (Aug 22 2022)

axum-core - v0.2.8

Security

[4.0.6] 2022-07-21

Tokio v1.23.1

Fixed

Releases(v0.104.0)

v0.104.0(Dec 28, 2022)

v0.103.0(Dec 25, 2022)

v0.102.2(Dec 8, 2022)

v0.102.0(Nov 18, 2022)

v0.101.0(Nov 18, 2022)

v0.100.2(Nov 17, 2022)

v0.100.1(Nov 17, 2022)

v0.100.0(Nov 14, 2022)

v0.99.0(Oct 2, 2022)

v0.98.0(Sep 7, 2022)

v0.97.1(Aug 20, 2022)

v0.97.0(Aug 5, 2022)

v0.96.0(Jul 23, 2022)

v0.95.0(Jul 15, 2022)

v0.94.0(Jun 22, 2022)

v0.93.1(Jun 17, 2022)

v0.93.0(Jun 16, 2022)

v0.92.0(Jun 8, 2022)

v0.91.0(Jun 6, 2022)

v0.90.0(Jun 4, 2022)

v0.89.0(May 23, 2022)

v0.88.0(May 20, 2022)

v0.87.0(May 18, 2022)

v0.86.0(May 17, 2022)

v0.85.1(Apr 10, 2022)

v0.85.0(Apr 10, 2022)

v0.84.1(Apr 8, 2022)

v0.84.0(Apr 8, 2022)

v0.83.0(Mar 31, 2022)

v0.82.0(Mar 28, 2022)

Owner

kamu

An example repository on how to start building graph applications on streaming data. Just clone and start building 💻 💪

A highly efficient daemon for streaming data from Kafka into Delta Lake

TensorBase is a new big data warehousing with modern efforts.

A new arguably faster implementation of Apache Spark from scratch in Rust

This library provides a data view for reading and writing data in a byte array.

A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy

Perhaps the fastest and most memory efficient way to pull data from PostgreSQL into pandas and numpy. 🚀

A fast, powerful, flexible and easy to use open source data analysis and manipulation tool written in Rust

A fast, powerful, flexible and easy to use open source data analysis and manipulation tool written in Rust

ConnectorX - Fastest library to load data from DB to DataFrames in Rust and Python

Provides a way to use enums to describe and execute ordered data pipelines. 🦀🐾

AppFlowy is an open-source alternative to Notion. You are in charge of your data and customizations

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing

Read specialized NGS formats as data frames in R, Python, and more.

High-performance runtime for data analytics applications

Rayon: A data parallelism library for Rust

Quickwit is a big data search engine.

DataFrame / Series data processing in Rust

A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, written in Rust