Rust library to parse, deparse and normalize SQL queries using the PostgreSQL query parser

Related tags

Database pg_query.rs
Overview

pg_query.rs   Build Status Latest Version Docs Badge

This Rust library uses the actual PostgreSQL server source to parse SQL queries and return the internal PostgreSQL parse tree.

It also allows you to normalize queries (replacing constant values with ?) and parse these normalized queries into a parse tree again.

When you build this library, it builds parts of the PostgreSQL server source (see libpg_query), and then statically links it into this library.

This is slightly crazy, but is the only reliable way of parsing all valid PostgreSQL queries.

You can find further examples and a longer rationale for the original Ruby implementation here. The Rust version tries to have a very similar API.

Getting started

Add the following to your Cargo.toml

[dependencies]
pg_query = "0.6"

Examples

Parsing a query

use pg_query::ast::Node;

let result = pg_query::parse("SELECT * FROM contacts");
assert!(result.is_ok());
let result = result.unwrap();
assert!(matches!(*&result[0], Node::SelectStmt(_)));

Normalizing a query

let result = pg_query::normalize("SELECT 1 FROM x WHERE y = (SELECT 123 FROM a WHERE z = 'bla')").unwrap();
assert_eq!(result, "SELECT $1 FROM x WHERE y = (SELECT $2 FROM a WHERE z = $3)");

Fingerprinting a query

let result = pg_query::fingerprint("SELECT * FROM contacts.person WHERE id IN (1, 2, 3, 4);").unwrap();
assert_eq!(result.hex, "643d2a3c294ab8a7");

Truncating a query

let query = "INSERT INTO \"x\" (a, b, c, d, e, f) VALUES (?)";
let result = pg_query::parse(query).unwrap();
assert_eq!(result.truncate(32).unwrap(), "INSERT INTO x (...) VALUES (?)");

Credits

Thanks to Paul Mason for his work on pg_parse that this crate is based on.

After version 0.6.0, Paul donated the pg_query crate to the pganalyze team. pg_parse is a lighter alternative that focuses on query parsing, while pg_query aims for feaure parity with the Ruby gem.

Comments
  • use proper OUT_DIR for faster rebuilds

    use proper OUT_DIR for faster rebuilds

    According to the cargo book, any output from build scripts should be placed in the OUT_DIR directory. The current build script on main modifies assets in-place without respecting OUT_DIR, which might interfere with re-run logic from tools like cargo watch.

    This PR brings back the OUT_DIR, but tries to minimize the number of files brought over on each rebuild. In practice (on my machine), this has eliminated some of the flakiness around cargo watch when working with the main branch, and has reduced fresh build times from ~30s to ~10s.

    opened by NAlexPear 7
  • feat: split_with_parser, split_with_scanner

    feat: split_with_parser, split_with_scanner

    I'm noticing some weirdness in the split_with_scanner example. Malformed tokens seem to be getting dropped:

    assert_eq!(
        pg_query::split_with_scanner("select 1; asdf; select 3;").unwrap(),
        vec!["select 1", "asdf", "select 3"],
    ); // fails: actually produces ["select 1", "select 3"]
    

    Is this expected?

    Resolves #5.

    opened by SKalt 5
  • feature request: support pg_query_scan, pg_query_split, pg_query_split_with_scanner

    feature request: support pg_query_scan, pg_query_split, pg_query_split_with_scanner

    All of which are published in pg_query.h.

    I've got a viable draft of these functions over at skalt/pg_query_wrapper, and I'd be happy to adapt and PR them to this repo.

    opened by SKalt 1
  • Truncate bugfixes

    Truncate bugfixes

    This fixes accidental memory corruption leading to segfaults. When truncating a CTE we need to remove its nested target list truncation from the list of possibilities, otherwise we'll end up referencing an object that no longer exists. This PR uses std::mem::replace to return the previous SelectStmt so it can be removed from truncations before the next iteration.

    Two other bugfixes:

    • don't attempt truncating target lists when they're empty (accidentally adds a ... which just makes the query longer)
    • if node truncations aren't sufficient, apply simple truncation as a fallback on the node-truncated string instead of on the source string

    Review question: are there other truncation types where we could run into memory corruption? I tried subqueries in both SELECT and FROM and couldn't trigger another segfault.

    opened by seanlinsley 1
  • Release version 0.7.0

    Release version 0.7.0

    The version number had already been updated to 0.7.0 in Cargo.toml, but it hasn't been used on crates.io yet: https://crates.io/crates/pg_query.

    After merging this I will run cargo publish, and push a tag to this repo.

    opened by seanlinsley 0
  • Fix memory leaks in fingerprint and normalize

    Fix memory leaks in fingerprint and normalize

    • fingerprint is affected by https://github.com/pganalyze/libpg_query/pull/141
    • when an error occurs, normalize is returning early before pg_query_free_normalize_result was called
    opened by seanlinsley 0
  • Truncate on conflict bugfixes

    Truncate on conflict bugfixes

    Truncate: Simplify VALUES(...) lists
    
    Truncate: Correctly handle UPDATE and ON CONFLICT target lists
    
    These target list fields support MultiAssignRef nodes, which are not
    supported in regular SELECT target lists - thus requiring special handling.
    
    opened by lfittl 0
  • Investigate SEGV on unknown address

    Investigate SEGV on unknown address

    The parallel test as currently written opens 10,000 threads that each call out to libpg_query at the same time. This error happens rarely, and so far only on CI.

    https://github.com/pganalyze/pg_query.rs/runs/7419661487

    LeakSanitizer:DEADLYSIGNAL
    ==4846==ERROR: LeakSanitizer: SEGV on unknown address 0x7f7999ae3ff8 (pc 0x5607592a63f4 bp 0x7f7595e56e60 sp 0x7f7595e565f0 T16383)
    ==4846==The signal is caused by a READ memory access.
    
    Screen Shot 2022-07-19 at 6 43 13 PM
    opened by seanlinsley 1
  • rust-analyzer cannot figure out the type of Node

    rust-analyzer cannot figure out the type of Node

    This is likely more a problem with rust-analyzer than this library, but I want to raise awareness because this harms ergonomics and I'd like to understand if there's a work around you all have figured out.

    e.g. Even though type is Result<Vec<Node::SelectStmt, Error>> and this code compiles correctly, rust-analyzer provides Result<Vec<{unknown}, Error>> Screen Shot 2022-07-15 at 9 42 44 AM

    Anything that can be done to make this easier to work with?

    opened by rex-remind101 0
Owner
pganalyze
PostgreSQL Performance Monitoring
pganalyze
An object-relational in-memory cache, supports queries with an SQL-like query language.

qlcache An object-relational in-memory cache, supports queries with an SQL-like query language. Warning This is a rather low-level library, and only p

null 3 Nov 14, 2021
A Rust SQL query builder with a pleasant fluent API closely imitating actual SQL

Scooby An SQL query builder with a pleasant fluent API closely imitating actual SQL. Meant to comfortably build dynamic queries with a little bit of s

Aleksei Voronov 100 Nov 11, 2022
Gh-sql - Query GitHub Projects (beta) with SQL

gh-sql: Query GitHub Projects (beta) with SQL Installation gh extension install KOBA789/gh-sql Features SELECT items DELETE items UPDATE item fields

Hidekazu Kobayashi 108 Dec 7, 2022
postgres-ical - a PostgreSQL extension that adds features related to parsing RFC-5545 « iCalendar » data from within a PostgreSQL database

postgres-ical - a PostgreSQL extension that adds features related to parsing RFC-5545 « iCalendar » data from within a PostgreSQL database

Edgar Onghena 1 Feb 23, 2022
Distributed, version controlled, SQL database with cryptographically verifiable storage, queries and results. Think git for postgres.

SDB - SignatureDB Distributed, version controlled, SQL database with cryptographically verifiable storage, queries and results. Think git for postgres

Fremantle Industries 5 Apr 26, 2022
A query builder that builds and typechecks queries at compile time

typed-qb: a compile-time typed "query builder" typed-qb is a compile-time, typed, query builder. The goal of this crate is to explore the gap between

ferrouille 3 Jan 22, 2022
Run SQL queries on CSV files

zsql run SQL queries on csv files A terminal utility to easily run SQL queries on CSV files. zsql is shipped as a small single binary powered by rust

Zizaco 9 Jul 9, 2022
Running SQL-like queries on files.

filesql Running SQL-like queries on files. Features Supported: REPL Basic SQL expressions. INSERT clause. (which inserts data into another file) WHERE

Zhang Li 1 Nov 15, 2021
ReefDB is a minimalistic, in-memory and on-disk database management system written in Rust, implementing basic SQL query capabilities and full-text search.

ReefDB ReefDB is a minimalistic, in-memory and on-disk database management system written in Rust, implementing basic SQL query capabilities and full-

Sacha Arbonel 75 Jun 12, 2023
Query LDAP and AD with SQL

SQLDAP Ever wanted to query AD or LDAP with SQL like queries ? I'm going to answer this question myself: yes ! Why ? Because I never could remember al

null 9 Nov 15, 2022
SQL validator tool for BigQuery standard SQL.

bqvalid What bqvalid does bqvalid is the SQL validator tool for BigQuery standard SQL. bqvalid fails with error message if there's the expression that

null 10 Dec 25, 2022
Fully typed SQL query builder for Rust [deprecated]

What is Deuterium? Deuterium is a fancy SQL builder for Rust. It's designed to provide a DSL to easily build SQL queries in safe and typed way. Like R

Stanislav Panferov 169 Nov 20, 2022
Type-safe SQL query wrappers

fnsql   The fnsql crate provides simple type-safe optional wrappers around SQL queries. Instead of calling type-less .query() and .execute(), you call

Dan Aloni 9 Apr 29, 2022
A Toy Query Engine & SQL interface

Naive Query Engine (Toy for Learning) ?? This is a Query Engine which support SQL interface. And it is only a Toy for learn query engine only. You can

谭巍 45 Dec 21, 2022
Query system statistics with SQL.

pg_stat_sysinfo Collects system statistics. ---- CREATE EXTENSION pg_stat_sysinfo; CREATE EXTENSION ---- SELECT * FROM pg_stat_sysinfo_collect();

PostgresML 12 Jun 14, 2023
fast & easy CLI and vscode extension specialized to format MySQL INSERT queries.

insertfmt fast & easy CLI specialized to format MySQL INSERT queries. format queries so that they look like a table. NOTE: If you wanna use the VSCode

canalun 7 May 2, 2023
Rust - Build a CRUD API with SQLX and PostgreSQL

In this article, you'll learn how to build a CRUD API in Rust using SQLX, Actix-web, and PostgreSQL. Learning how to build a CRUD API as a developer will equip you with valuable skills for building robust, maintainable, and scalable applications.

CODEVO 5 Feb 20, 2023
Materialize simplifies application development with streaming data. Incrementally-updated materialized views - in PostgreSQL and in real time. Materialize is powered by Timely Dataflow.

Materialize is a streaming database for real-time applications. Get started Check out our getting started guide. About Materialize lets you ask questi

Materialize, Inc. 4.7k Jan 8, 2023
A tool for automated migrations for PostgreSQL, SQLite and MySQL.

Models Models is an implementation for a SQL migration management tool. It supports PostgreSQL, MySQL, and SQLite. Quick Start install the CLI by runn

null 45 Nov 16, 2022