HTTP Proxy based solution for real-time interception and prioritization of SQL queries.

Overview

starproxy

⚠️ starproxy is a prototype: Not currently used in production, but will likely be some day.

Table of Contents

Background

starproxy is a fully HTTP compliant proxy that is designed to sit between clients and a Trino/Starburst cluster. The motivation for developing a solution like this is laid out in some prior art below:

The most attractive items to us are probably:

  • enabling host based security
  • detecting "bad" queries and blocking/deprioritizing them with custom rules
  • load balancing across regions

How it works

First and foremost, starproxy is an http proxy implemented in rust using a combination of axum/hyper.

The choice of Rust for the backend is two-fold:

  1. Speed, we want to avoid adding appreciable measurable latency to end user requests. If a query is not subject to any rules/actions then the proxy should quickly forward the request to the destination as normal.
  2. Rust has excellent libraries for parsing SQL grammar, in this case we are using sqlparser-rs, which allows us to very quickly parse the AST of the query and use it as input for rules. This allows us to craft rules that understand the structure of the query, not just match some logic w/ regular expressions.

Diagram

  1. Parse the query AST, then check a variety of rules
  • inbound CIDR rule checking
  • checking for predicates in queries
  • identifying select * queries with no limit, among other rules
  1. If rules are violated they can be associated with actions, like tagging the query as low priority. This is done by modifying the request headers and injecting special tags.

Rules can also outright block requests by returning error status codes to the client directly.

Run && Development

To run starproxy locally, you first need to install the rust toolchain. Then, you can run

export STARPROXY_CONFIG_PATH=./config.json
export STARPROXY_UPSTREAM_URL=https://your.cluster.company.dev
cargo run

This will spin up the web app on port 3000, serving only HTTP.

⚠️ starproxy does not itself do SSL termination, you'll need an additional reverse proxy or hopefully your PaaS platform is providing an SSL cert (like a k8s Ingress or OpenShift Route). If you don't serve this proxy w/ SSL termination then the OAuthProvider for trino/starburst will likely not work for authentication. You've been warned!

Implemented Rules

For this prototype we've implemented a handful of rules:

  • WhereClause - If a query contains a specific table reference, ensure an associated where clause is in that query
  • SelectStarNoLimit - If a query contains a select *, enforce that a LIMIT is applied. If not, put it in low priority queue
  • CidrOrigin - A basic implmenetation of host based security, leveraging the x-forwarded-ip header
  • ScanEstimates - A basic implementation of a rule that runs an EXPLAIN on the query, and then takes action based on the projected cost.

An example config file is bundled with the repo here

If you have an idea for a rule that you'd like to see implemented, feel free to open a GitHub issue!

Future work

Write a rule + action that actually rewrites the body of the query, instead of modifying the request headers. If this is done, we should probably inject a warning so the client is notified that this ocurred.

Train a machine learning model (likely something like XGBoost) to do query performance classification using historical performance data. Identify likely offensive queries and proactively classify them without needing an admin to define rules.

Make "actions" more configurable. Right now actions are bound to rules at compile time, not at config generation time. Actions also deserve first class support in the config generation for stuff like parameters.

Allow wildcard or regex specifiers for tables to make rules more flexible.

Load balance / route queries between multiple clusters, to be used if the backing cluster footprint is HA or multi-az. It would be very nice to treat routing as an action, or even chain actions together. Like rewriting a query to add a limit and then sending it to a cluster designed to only support adhoc "exploration" type queries.

You might also like...
Murasaki is a Fast, Secure, and Reliable Webkit based web browser.
Murasaki is a Fast, Secure, and Reliable Webkit based web browser.

Murasaki is a Fast, Secure, and Reliable Webkit based web browser. Table of Contents Goals Status Usage License Goals Security: Be secure, and not com

A minimal and flexible blog generator based on GitHub Gists.
A minimal and flexible blog generator based on GitHub Gists.

gisture Utilizing GitHub Gists as a Blogging Platform A minimal and flexible blog generator based on GitHub Gists with SEO, Templating, Syntax Highlig

📝 Web-based, reactive Datalog notebooks for data analysis and visualization
📝 Web-based, reactive Datalog notebooks for data analysis and visualization

Percival is a declarative data query and visualization language. It provides a reactive, web-based notebook environment for exploring complex datasets, producing interactive graphics, and sharing results.

Layers, extractors and template engine wrappers for axum based Web MVC applications

axum-template Layers, extractors and template engine wrappers for axum based Web MVC applications Getting started Cargo.toml [dependencies] axum-templ

axum-serde is a library that provides multiple serde-based extractors and responders for the Axum web framework.

axum-serde 📑 Overview axum-serde is a library that provides multiple serde-based extractors / responses for the Axum web framework. It also offers a

A Blog & RSS system written in Rust based on Luke Smith's LB.
A Blog & RSS system written in Rust based on Luke Smith's LB.

OB - Oliver's Blog Script A Blog and RSS system written in Rust. Features Converts blog entries written in Markdown into HTML. ✍🏻 Keeps a rolling blo

A web application to configuration Caddy based on MoonZoon.

Cream A web application to configuration Caddy based on MoonZoon. MoonZoon is a Rust Fullstack Framework. Live demo Run on a local machine Check you'v

A GUI frontend in Rust based on web-view
A GUI frontend in Rust based on web-view

neutrino I am not working anymore on this project. If you want to become a maintainer of neutrino, please answer to this issue. Preamble Docs | Repo |

JLM: A research compiler based on the RVSDG IR

JLM: A research compiler based on the RVSDG IR Jlm is an experimental compiler/optimizer that consumes and produces LLVM IR. It uses the Regionalized

Owner
Will Eaton
python w/ a sprinkle of rust
Will Eaton
Real-time bidding API scaffold for MevWallet transactions

MevWallet RTB API This repo provides a standard interface for a real-time bidding API. Searchers run the API to provide bids to users. The bid represe

Blunt Instruments 17 Mar 23, 2023
Proof of concept writing a monolith BBS using Rust, GraphQL, WASM, and SQL. WILL BE ARCHIVED ONCE PROVEN

GraphQL Forum Important DO NOT even think about using this in production, lest your sanity be destroyed and credentials lost! Loosely following the aw

Rongcui Dong 25 Apr 25, 2023
The goal of this repository is to have a real world template of a Rust backend using Axum, Diesel, etc.

Axum Diesel Real-World Example A modular Rust backend template based on the Domain-Driven Design (DDD) architecture, utilizing the Axum and Diesel fra

Quentin Piot 11 Sep 22, 2023
A filter proxy for StatsD

statsd-filter-proxy-rs statsd-filter-proxy-rs is efficient and lightweight StatsD proxy that filters out unwanted metrics to a StatsD server. Why "If

Alan Ning 20 Mar 20, 2022
Rust HTTP API Template using PostgreSQL, Redis, RabbitMQ, and Hexagonal Architecture

Rust Template HTTP API Rust API Template using PostgreSQL, Redis, RabbitMQ, and Hexagonal Architecture The following template provides a basic structu

Paulo Bressan 7 Jun 9, 2023
A blazingly fast HTTP client with a magnificent request building syntax, made for humans.

?? glue Make requests, select JSON responses, nest them in other requests: A magnificent syntax for blazingly fast cli HTTP calls, made for humans. Ta

Michele Esposito 4 Dec 7, 2022
An async no_std HTTP server suitable for bare-metal environments, heavily inspired by axum

picoserve An async no_std HTTP server suitable for bare-metal environments, heavily inspired by axum. It was designed with embassy on the Raspberry Pi

Samuel Hicks 81 Oct 7, 2023
Rust Macro which loads files into the rust binary at compile time during release and loads the file from the fs during dev.

Rust Embed Rust Custom Derive Macro which loads files into the rust binary at compile time during release and loads the file from the fs during dev. Y

Peter 1k Jan 5, 2023
Rust I18n is use Rust codegen for load YAML file storage translations on compile time, and give you a t! macro for simply get translation texts.

Rust I18n Rust I18n is use Rust codegen for load YAML file storage translations on compile time, and give you a t! macro for simply get translation te

Longbridge 73 Dec 27, 2022
The simplest build-time framework for writing web apps with html templates and typescript

Encoped A build-time fast af tool to write static apps with html and TypeScript Features Template-based ESLint, Prettier and Rollup integration No ext

null 1 Dec 11, 2021