An example repository on how to start building graph applications on streaming data. Just clone and start building 💻 💪

Overview

Example Streaming App 🚀 🚀

This repository serves as a point of reference when developing a streaming application with Memgraph and a message broker such as Kafka.

Example Streaming App

KafkaProducer represents the source of your data. That can be transactions, queries, metadata or something different entirely. In this minimal example we propose using a special string format that is easy to parse. The data is sent from the KafkaProducer to Kafka under a topic aptly named topic. The Backend implements a KafkaConsumer. It takes data from Kafka, consumes it, but also queries Memgraph for graph analysis, feature extraction or storage.

Installation

Install Kafka and Memgraph using the instructions in the homonymous directories. Then choose a programming language from the list of supported languages and follow the instructions given there.

List of supported programming languages

How does it work exactly

KafkaProducer

The KafkaProducer in ./kafka/producer creates nodes with a label Person that are connected with edges of type CONNECTED_WITH. In this repository we provide a static producer that reads entries from a file and a stream producer that produces entries every X seconds.

Backend

The backend takes a message at a time from kafka, parses it with a csv parser as a line, converts it into a openCypher query and sends it to Memgraph. After storing a node in Memgraph the backend asks Memgraph how many adjacent nodes does it have and prints it to the terminal.

Memgraph

You can think of Memgraph as two separate components: a storage engine and an algorithm execution engine. First we create a trigger: an algorithm that will be run every time a node is inserted. This algorithm calculates and updates the number of neighbors of each affected node after every query is executed.

Comments
  • Updating neighbors doesn't work

    Updating neighbors doesn't work

    Bug Description When running kafka/producer/stream_producer.py, the neighbors property on each node always stays 0. Sometimes the node doesn't even contain the property.

    To Reproduce Just run property initialized Memgraph + any backend + stream_producer.py.

    Expected behavior As new nodes/edges come in, the neighbors should be incrementally updated.

    Desktop: OS: Ubuntu 20.04

    opened by gitbuda 2
  • Using the appropriate Memgraph Docker image

    Using the appropriate Memgraph Docker image

    If memgraph/memgraph:latest already exists on the machine, memgraph/run.sh memgraph won't actually use the latest Memgraph image.

    Probably the best would be to hardcode Memgraph version where everything is working as expected, e.g. v1.6.0.

    opened by gitbuda 2
  • Bump tmpl from 1.0.4 to 1.0.5 in /backend/node

    Bump tmpl from 1.0.4 to 1.0.5 in /backend/node

    Bumps tmpl from 1.0.4 to 1.0.5.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Port to Memgraph 2.0 features

    Port to Memgraph 2.0 features

    • [x] Create kafka.transform module
    • [x] Fix or document problem with neighbors
    • [x] Polish Python backend
    • [x] Remove redundant code from all backends
    • [x] Update README page
    opened by gitbuda 0
  • fix memgraph version

    fix memgraph version

    closes #21

    I added the --pull always flag to the docker run command. It seems weird that docker verifies the hash multiple times before running:

     memgraph fix-memgraph-version ✗ bash run.sh memgraph
    19a3f4efbe827fb1bdb5d67241f424d5dec1820627b265b25e5eeda13e65c213
    Starting memgraph...
    latest: Pulling from memgraph/memgraph
    Digest: sha256:9a45e6ad18f28edcc1e8317eb225041a3c017b7841b43d994d90155c97efd905
    Status: Image is up to date for memgraph/memgraph:latest
    latest: Pulling from memgraph/memgraph
    Digest: sha256:9a45e6ad18f28edcc1e8317eb225041a3c017b7841b43d994d90155c97efd905
    Status: Image is up to date for memgraph/memgraph:latest
    latest: Pulling from memgraph/memgraph
    Digest: sha256:9a45e6ad18f28edcc1e8317eb225041a3c017b7841b43d994d90155c97efd905
    Status: Image is up to date for memgraph/memgraph:latest
    latest: Pulling from memgraph/memgraph
    Digest: sha256:9a45e6ad18f28edcc1e8317eb225041a3c017b7841b43d994d90155c97efd905
    Status: Image is up to date for memgraph/memgraph:latest
    
    opened by MasterMedo 0
  • fix errors

    fix errors

    • [x] fix stream producer
    • [x] print number of neighbours in all backends
    • [x] add a script that runs memgraph queries (why are there so many queries?)
    • ~~[ ] add documentation that explains all memgraph queries~~
    • [x] add create trigger on nodes that sets neighbors to 0
    • [x] fix update trigger not to do bfs (only immediate neighbours are needed)
    • [x] update drawing in readme (bigger arrows)
    opened by MasterMedo 0
  • init java backend

    init java backend

    • ~~[ ] add code docs~~
    • [x] decouple parts of code into functions/classes
    • [x] simplify pom.xml
    • ~~[ ] remove test class or add tests~~
    • ~~[ ] store kafka message split parts to a Map instead of using String.format()~~
    • [x] improve print messages
    • [x] add kafka consumer to try arguments along with the session if possible
    • ~~[ ] extract magic strings and numbers to global variables~~
    • ~~[ ] keep accepting messages even if error occurs after parsing a faulty message~~
    • ~~[ ] use csv reader instead of splitting~~
    • ~~[ ] print number of neighbours~~
    • ~~[ ] use gradle~~
    opened by MasterMedo 0
  • Add Node.js example

    Add Node.js example

    • [x] Init Express and Jest
    • [x] Init linter
    • [x] Fetch data from Kafka
    • [x] Store data to Memgraph
    • [x] Calculate something by using BFS
    • [x] Log to console and/or return data to the user
    • [x] Report bugs in Memgraph
      • Trigger recovery + containing query module call.
      • Trigger --> comment syntax.
      • Property map feature, e.g., (n:Node $map) (Memgraph does not support that yet).
    • [x] Improve count correctness (transitive is not working yet)
    • [x] Improve README installation / setup details
    opened by gitbuda 0
  • Add deployment example with Ansible

    Add deployment example with Ansible

    Combine Memgraph, Kafka, Docker, Docker Compose, and Ansible to showcase simple deployment for each backend. Similar, just more polished, to https://github.com/memgraph/orbicon.

    opened by gitbuda 0
Owner
Memgraph
Build modern, graph-based applications on top of your streaming data in minutes.
Memgraph
A highly efficient daemon for streaming data from Kafka into Delta Lake

A highly efficient daemon for streaming data from Kafka into Delta Lake

Delta Lake 172 Dec 23, 2022
High-performance runtime for data analytics applications

Weld Documentation Weld is a language and runtime for improving the performance of data-intensive applications. It optimizes across libraries and func

Weld 2.9k Dec 28, 2022
📊 Cube.js — Open-Source Analytics API for Building Data Apps

?? Cube.js — Open-Source Analytics API for Building Data Apps

Cube.js 14.4k Jan 8, 2023
This library provides a data view for reading and writing data in a byte array.

Docs This library provides a data view for reading and writing data in a byte array. This library requires feature(generic_const_exprs) to be enabled.

null 2 Nov 2, 2022
This is the repository for the group project assignment in the course "Project in Introduction to Computer Science" (DD1396), by the Inda21plusplus group.

Project-Delta This is the repository for the group project assignment in the course "Project in Introduction to Computer Science" (DD1396), by the Ind

null 9 May 24, 2022
A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy

A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy

Datafuse Labs 5k Jan 9, 2023
Xaynet represents an agnostic Federated Machine Learning framework to build privacy-preserving AI applications.

xaynet Xaynet: Train on the Edge with Federated Learning Want a framework that supports federated learning on the edge, in desktop browsers, integrate

XayNet 196 Dec 22, 2022
Bytewax is an open source Python framework for building highly scalable dataflows.

Bytewax Bytewax is an open source Python framework for building highly scalable dataflows. Bytewax uses PyO3 to provide Python bindings to the Timely

Bytewax 289 Jan 6, 2023
A rust library built to support building time-series based projection models

TimeSeries TimeSeries is a framework for building analytical models in Rust that have a time dimension. Inspiration The inspiration for writing this i

James MacAdie 12 Dec 7, 2022
Perhaps the fastest and most memory efficient way to pull data from PostgreSQL into pandas and numpy. 🚀

flaco Perhaps the fastest and most memory efficient way to pull data from PostgreSQL into pandas and numpy. ?? Have a gander at the initial benchmarks

Miles Granger 14 Oct 31, 2022
A fast, powerful, flexible and easy to use open source data analysis and manipulation tool written in Rust

fisher-rs fisher-rs is a Rust library that brings powerful data manipulation and analysis capabilities to Rust developers, inspired by the popular pan

Syed Vilayat Ali Rizvi 5 Aug 31, 2023
A fast, powerful, flexible and easy to use open source data analysis and manipulation tool written in Rust

fisher-rs fisher-rs is a Rust library that brings powerful data manipulation and analysis capabilities to Rust developers, inspired by the popular pan

null 5 Sep 6, 2023
ConnectorX - Fastest library to load data from DB to DataFrames in Rust and Python

ConnectorX enables you to load data from databases into Python in the fastest and most memory efficient way.

SFU Database Group 939 Jan 5, 2023
Provides a way to use enums to describe and execute ordered data pipelines. 🦀🐾

enum_pipline Provides a way to use enums to describe and execute ordered data pipelines. ?? ?? I needed a succinct way to describe 2d pixel map operat

Ben Greenier 0 Oct 29, 2021
AppFlowy is an open-source alternative to Notion. You are in charge of your data and customizations

AppFlowy is an open-source alternative to Notion. You are in charge of your data and customizations. Built with Flutter and Rust.

null 30.7k Jan 7, 2023
Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing

Apache Arrow Powering In-Memory Analytics Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enabl

The Apache Software Foundation 10.9k Jan 6, 2023
Read specialized NGS formats as data frames in R, Python, and more.

oxbow Read specialized bioinformatic file formats as data frames in R, Python, and more. File formats create a lot of friction for computational biolo

null 12 Jun 7, 2023
A high-performance, high-reliability observability data pipeline.

Quickstart • Docs • Guides • Integrations • Chat • Download What is Vector? Vector is a high-performance, end-to-end (agent & aggregator) observabilit

Timber 12.1k Jan 2, 2023
Rayon: A data parallelism library for Rust

Rayon Rayon is a data-parallelism library for Rust. It is extremely lightweight and makes it easy to convert a sequential computation into a parallel

null 7.8k Jan 8, 2023