Query textual streams with PromQL-like language

Ivan Velichko

Last update: Dec 23, 2022

Related tags

Text processing pq

Overview

pq - query textual streams with PromQL

Glossary

Time Series - a stream of timestamped values, aka samples sharing the same metric name and, optionally, the same set of labels (i.e. a unique combination of key-value pairs).
Metric name - a human-readable name of a measurement. E.g. http_requests_total, content_length, etc).
Metric type - counter, gauge, histogram, and summary.
Label - a dimension of the measurement. E.g. method, url, etc.
Sample - aka data point - a (value, timestamp) tuple. Value is always float64 and timestamp is always with millisecond precision.
Instant vector - a type of expression evaluation - a set of time series (vector) containing a single sample for each time series, all sharing the same timestamp.
Range vector - a type of expression evaluation - a set of time series containing a range of data points over time for each time series.
Scalar and string - two other expression evaluation results.
Vector selector - expression of a form <metric_name>[{label1=value1[, label2=value2, ...]}][[time_duration]].

Run

$ cargo test

$ cat | cargo run -- -d '([^\s]+)\s(\w+)\s(\d+)' -t '0:%Y-%m-%dT%H:%M:%S' -l 1:name -m 2:age -- '-age{name=~"(bob|sarah)", name!~"b.*"}' <<EOF
2021-01-01T05:40:41 bob 42
2021-01-01T23:59:58 sarah 25
2021-01-02T00:00:02 bob 42
2021-01-02T00:00:03 sarah 26
EOF

# Expected output:
InstantVector(InstantVector { instant: 1609545598000, samples: [({"name": "sarah", "__name__": "age"}, -25.0)] })
InstantVector(InstantVector { instant: 1609545599000, samples: [] })
InstantVector(InstantVector { instant: 1609545600000, samples: [] })
InstantVector(InstantVector { instant: 1609545601000, samples: [] })
InstantVector(InstantVector { instant: 1609545602000, samples: [] })
InstantVector(InstantVector { instant: 1609545603000, samples: [({"name": "sarah", "__name__": "age"}, -26.0)] })

Comments

Fix clippy warningis & errors

A follow up to the issue I opened: https://github.com/iximiuz/pq/issues/3 This PR is a step towards having a working CI/CD - only fixing clippy warnings / errors.

Edit: I fixed the warning below, @iximiuz please take a careful look there to make sure I maintained the logic there.

Note:

One error still remains which I'm currently not sure how to solve, and would like any help!

error: this loop never actually loops
   --> src/query/binary.rs:273:24
    |
273 |           let (lv, rv) = loop {
    |  ________________________^
274 | |             let (lv, rv) = match (self.left.peek(), self.right.peek()) {
275 | |                 (Some(InstantVector(lv)), Some(InstantVector(rv))) => (lv, rv),
276 | |                 (None, _) | (_, None) => return None,
...   |
301 | |             )));
302 | |         };
    | |_________^
    |
    = note: `#[deny(clippy::never_loop)]` on by default
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#never_loop

opened by yonipeleg33 7

pq programs with arbitrary number of filters

At the moment, a valid pq program must always start from a decoder step, then have 0 or 1 mapping (transform step), and then 0 or 1 query step, potentially followed by a formatter. While I don't see how multiple query steps can be added, a more generic format like decode | map | map | ... | map | query | format would make the pq language much more expressive. Actually, the very first decode step would just become a syntactic sugar for map { .0: decode }.

The change would require some moderate refactoring, of course.

opened by iximiuz 0
Add Grok format support

ELK's Grok is a powerful and quite concise format to parse arbitrary text and structure it. pq partially reinvents it by combining a regex decoder with a map filter. But having a native Grok support would help the adoption - there are likely people already familiar with Grok.

opened by iximiuz 1
Improve string literal parser

At the moment, the string literal parser is naive. For instance, it doesn't handle escape sequences properly. nom has an example of a more robust string literal parser, so it can be adapted to suit pq use cases. However, the licensing aspect needs to be researched first. Can nom's example be just copied or it has to be copied with the license notice, or it cannot be copied at all?
good first issue

opened by iximiuz 0
Implement missing functions

PromQL has a few tens of functions whilepq currently implements just a small subset of them.

A good first issue could be adding missing functions. The implementation probably should start from extending the parser. Then writing the computation logic here. And of course adding some tests.
good first issue

opened by iximiuz 0
Add Basic CI/CD
Hello there!

Currently, running cargo clippy yields:

... error: aborting due to 4 previous errors; 56 warnings emitted

No formatting is enforced on new PRs.

Currently, there's no way of using pq without building it locally afaik, which can be a huge drawback.

Suggestion:

Fix all clippy warnings / errors

Add a GitHub Action that runs cargo clippy, cargo fmt, make test-all and make test-e2e on every PR and push to master

Add a GitHub Action that publishes to crates.io so that one can install it using cargo install pq (and consider packaging it to apt and such, I offered crates.io as it's the easiest option I think)
opened by yonipeleg33 11

Releases(v0.1-pre-alpha.2)

v0.1-pre-alpha.2(Jul 27, 2021)

This is the very first release of the pq binary. The tool is under active development, so it should be considered as a preview version.
Source code(tar.gz)
Source code(zip)
pq-linux-v0.1-pre-alpha.2.zip(1.76 MB)
pq-macos-v0.1-pre-alpha.2.zip(1.06 MB)
pq-windows-v0.1-pre-alpha.2.zip(897.96 KB)

Owner

Ivan Velichko

Code for your life!

GitHub

Makdown-like text parser.

1 Dec 7, 2021

Natural language detection library for Rust. Try demo online: https://www.greyblake.com/whatlang/

Whatlang Natural language detection for Rust with focus on simplicity and performance. Content Features Get started Documentation Supported languages

805 Dec 28, 2022

👄 The most accurate natural language detection library in the Rust ecosystem, suitable for long and short text alike

Table of Contents What does this library do? Why does this library exist? Which languages are supported? How good is it? Why is it better than other l

569 Jan 3, 2023

A fast, low-resource Natural Language Processing and Text Correction library written in Rust.

nlprule A fast, low-resource Natural Language Processing and Error Correction library written in Rust. nlprule implements a rule- and lookup-based app

496 Jan 8, 2023

Natural Language Processing for Rust

rs-natural Natural language processing library written in Rust. Still very much a work in progress. Basically an experiment, but hey maybe something c

211 Dec 28, 2022

Rust-tokenizer offers high-performance tokenizers for modern language models, including WordPiece, Byte-Pair Encoding (BPE) and Unigram (SentencePiece) models

rust-tokenizers Rust-tokenizer offers high-performance tokenizers for modern language models, including WordPiece, Byte-Pair Encoding (BPE) and Unigra

165 Jan 1, 2023

Simple, extendable and embeddable scripting language.

duckscript duckscript SDK CLI Simple, extendable and embeddable scripting language. Overview Language Goals Installation Homebrew Binary Release Ducks

356 Dec 24, 2022

A HDPSG-inspired symbolic natural language parser written in Rust

Treebender A symbolic natural language parsing library for Rust, inspired by HDPSG. What is this? This is a library for parsing natural or constructed

32 Dec 26, 2022

Rust-nlp is a library to use Natural Language Processing algorithm with RUST

nlp Rust-nlp Implemented algorithm Distance Levenshtein (Explanation) Jaro / Jaro-Winkler (Explanation) Phonetics Soundex (Explanation) Metaphone (Exp

34 Dec 20, 2022

lingua-rs Python binding. An accurate natural language detection library, suitable for long and short text alike.

lingua-py lingua-rs Python binding. An accurate natural language detection library, suitable for long and short text alike. Installation pip install l

7 Dec 30, 2022

The Reactive Extensions for the Rust Programming Language

This is an implementation of reactive streams, which, at the high level, is patterned off of the interfaces and protocols defined in http://reactive-s

468 Dec 20, 2022

Ultra-fast, spookily accurate text summarizer that works on any language

pithy 0.1.0 - an absurdly fast, strangely accurate, summariser Quick example: pithy -f your_file_here.txt --sentences 4 --help: Print this help messa

13 Oct 31, 2022

A Google-like web search engine that provides the user with the most relevant websites in accordance to his/her query, using crawled and indexed textual data and PageRank.

Mini Google Course project for the Architecture of Computer Systems course. Overview: Architecture: We are working on multiple components of the web c

11 Aug 10, 2022

Query textual streams with PromQL-like language

Related tags

Overview

pq - query textual streams with PromQL

Glossary

Run

Comments

Fix clippy warningis & errors

Edit: I fixed the warning below, @iximiuz please take a careful look there to make sure I maintained the logic there.

Note:

pq programs with arbitrary number of filters

Add Grok format support

Improve string literal parser

Implement missing functions

Add Basic CI/CD

Suggestion:

Releases(v0.1-pre-alpha.2)

v0.1-pre-alpha.2(Jul 27, 2021)

Owner

Ivan Velichko

Makdown-like text parser.

Natural language detection library for Rust. Try demo online: https://www.greyblake.com/whatlang/

👄 The most accurate natural language detection library in the Rust ecosystem, suitable for long and short text alike

A fast, low-resource Natural Language Processing and Text Correction library written in Rust.

Natural Language Processing for Rust

Rust-tokenizer offers high-performance tokenizers for modern language models, including WordPiece, Byte-Pair Encoding (BPE) and Unigram (SentencePiece) models

Simple, extendable and embeddable scripting language.

A HDPSG-inspired symbolic natural language parser written in Rust

Rust-nlp is a library to use Natural Language Processing algorithm with RUST

lingua-rs Python binding. An accurate natural language detection library, suitable for long and short text alike.

The Reactive Extensions for the Rust Programming Language

Ultra-fast, spookily accurate text summarizer that works on any language

A Google-like web search engine that provides the user with the most relevant websites in accordance to his/her query, using crawled and indexed textual data and PageRank.

PromQL Parser in Rust w/ native Node bindings

frawk is a small programming language for writing short programs processing textual data

Converts images into textual line art.

Access German-language public broadcasting live streams and archives on the Linux Desktop

Putting a brain behind `cat`🐈‍⬛ Integrating language models in the Unix commands ecosystem through text streams.

An object-relational in-memory cache, supports queries with an SQL-like query language.

Bind the Prisma ORM query engine to any programming language you like ❤️