The Solana AccountsDb plugin for PostgreSQL database.

Overview

The solana-accountsdb-plugin-postgres crate implements a plugin storing account data to a PostgreSQL database to illustrate how a plugin can be developed to work with Solana validators using the Plugin Framework.

Configuration File Format

The plugin is configured using the input configuration file. An example configuration file looks like the following:

{
	"libpath": "/solana/target/release/libsolana_accountsdb_plugin_postgres.so",
	"host": "postgres-server",
	"user": "solana",
	"port": 5433,
	"threads": 20,
	"batch_size": 20,
	"panic_on_db_errors": true,
	"accounts_selector" : {
		"accounts" : ["*"]
	}
}

The host, user, and port control the PostgreSQL configuration information. For more advanced connection options, please use the connection_str field. Please see Rust Postgres Configuration.

To improve the throughput to the database, the plugin supports connection pooling using multiple threads, each maintaining a connection to the PostgreSQL database. The count of the threads is controlled by the threads field. A higher thread count usually offers better performance.

To further improve performance when saving large numbers of accounts at startup, the plugin uses bulk inserts. The batch size is controlled by the batch_size parameter. This can help reduce the round trips to the database.

The panic_on_db_errors can be used to panic the validator in case of database errors to ensure data consistency.

Account Selection

The accounts_selector can be used to filter the accounts that should be persisted.

For example, one can use the following to persist only the accounts with particular Base58-encoded Pubkeys,

    "accounts_selector" : {
         "accounts" : ["pubkey-1", "pubkey-2", ..., "pubkey-n"],
    }

Or use the following to select accounts with certain program owners:

    "accounts_selector" : {
         "owners" : ["pubkey-owner-1", "pubkey-owner-2", ..., "pubkey-owner-m"],
    }

To select all accounts, use the wildcard character (*):

    "accounts_selector" : {
         "accounts" : ["*"],
    }

Transaction Selection

transaction_selector, controls if and what transactions to store. If this field is missing, none of the transactions are stored.

For example, one can use the following to select only the transactions referencing accounts with particular Base58-encoded Pubkeys,

"transaction_selector" : {
    "mentions" : \["pubkey-1", "pubkey-2", ..., "pubkey-n"\],
}

The mentions field supports wildcards to select all transaction or all 'vote' transactions. For example, to select all transactions:

"transaction_selector" : {
    "mentions" : \["*"\],
}

To select all vote transactions:

"transaction_selector" : {
    "mentions" : \["all_votes"\],
}

Database Setup

Install PostgreSQL Server

Please follow PostgreSQL Ubuntu Installation on instructions to install the PostgreSQL database server. For example, to install postgresql-14,

/etc/apt/sources.list.d/pgdg.list' wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add - sudo apt-get update sudo apt-get -y install postgresql-14">
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt-get update
sudo apt-get -y install postgresql-14

Control the Database Access

Modify the pg_hba.conf as necessary to grant the plugin to access the database. For example, in /etc/postgresql/14/main/pg_hba.conf, the following entry allows nodes with IPs in the CIDR 10.138.0.0/24 to access all databases. The validator runs in a node with an ip in the specified range.

host    all             all             10.138.0.0/24           trust

It is recommended to run the database server on a separate node from the validator for better performance.

Configure the Database Performance Parameters

Please refer to the PostgreSQL Server Configuration for configuration details. The referential implementation uses the following configurations for better database performance in the /etc/postgresql/14/main/postgresql.conf which are different from the default postgresql-14 installation.

max_connections = 200                  # (change requires restart)
shared_buffers = 1GB                   # min 128kB
effective_io_concurrency = 1000        # 1-1000; 0 disables prefetching
wal_level = minimal                    # minimal, replica, or logical
fsync = off                            # flush data to disk for crash safety
synchronous_commit = off               # synchronization level;
full_page_writes = off                 # recover from partial page writes
max_wal_senders = 0                    # max number of walsender processes

The sample scripts/postgresql.conf can be used for reference.

Create the Database Instance and the Role

Start the server:

sudo systemctl start postgresql@14-main

Create the database. For example, the following creates a database named 'solana':

sudo -u postgres createdb solana -p 5433

Create the database user. For example, the following creates a regular user named 'solana':

sudo -u postgres createuser -p 5433 solana

Verify the database is working using psql. For example, assuming the node running PostgreSQL has the ip 10.138.0.9, the following command will land in a shell where SQL commands can be entered:

psql -U solana -p 5433 -h 10.138.0.9 -w -d solana

Create the Schema Objects

Use the scripts/create_schema.sql

psql -U solana -p 5433 -h 10.138.0.9 -w -d solana -f scripts/create_schema.sql

After this, start the validator with the plugin by using the --accountsdb-plugin-config argument mentioned above.

Destroy the Schema Objects

To destroy the database objects, created by create_schema.sql, use drop_schema.sql. For example,

psql -U solana -p 5433 -h 10.138.0.9 -w -d solana -f scripts/drop_schema.sql

Capture Historical Account Data

To capture account historical data, in the configuration file, turn store_account_historical_data to true.

And ensure the database trigger is created to save data in the audit_table when records in account are updated, as shown in create_schema.sql,

CREATE FUNCTION audit_account_update() RETURNS trigger AS $audit_account_update$
    BEGIN
		INSERT INTO account_audit (pubkey, owner, lamports, slot, executable, rent_epoch, data, write_version, updated_on)
            VALUES (OLD.pubkey, OLD.owner, OLD.lamports, OLD.slot,
                    OLD.executable, OLD.rent_epoch, OLD.data, OLD.write_version, OLD.updated_on);
        RETURN NEW;
    END;

$audit_account_update$ LANGUAGE plpgsql;

CREATE TRIGGER account_update_trigger AFTER UPDATE OR DELETE ON account
    FOR EACH ROW EXECUTE PROCEDURE audit_account_update();

The trigger can be dropped to disable this feature, for example,

DROP TRIGGER account_update_trigger ON account;

Over time, the account_audit can accumulate large amount of data. You may choose to limit that by deleting older historical data.

For example, the following SQL statement can be used to keep up to 1000 of the most recent records for an account:

delete from account_audit a2 where (pubkey, write_version) in
    (select pubkey, write_version from
        (select a.pubkey, a.updated_on, a.slot, a.write_version, a.lamports,
            rank() OVER ( partition by pubkey order by write_version desc) as rnk
            from account_audit a) ranked
            where ranked.rnk > 1000)

Main Tables

The following are the tables in the Postgres database

Table Description
account Account data
block Block metadata
slot Slot metadata
transaction Transaction data
account_audit Account historical data

Performance Considerations

When a validator lacks sufficient compute power, the overhead of saving the account data can cause it to fall behind the network especially when all accounts or a large number of accounts are selected. The node hosting the PostgreSQL database need to be powerful enough to handle the database loads as well. It has been found using GCP n2-standard-64 machine type for the validator and n2-highmem-32 for the PostgreSQL node is adequate for handling transmiting all accounts while keeping up with the network. In addition, it is best to keep the validator and the PostgreSQL in the same local network to reduce latency. You may need to size the validator and database nodes differently if serving other loads.

Comments
  • No Batching on Transaction Inserts

    No Batching on Transaction Inserts

    Hi @lijunwangs

    First of all, thank you for your amazing work. I learnt so much from this repo.

    I have a question regarding batching on transaction inserts. For accounts, the code batches the SQL queries until it reaches a batch size. It doesn't however do this for transactions. I'm curious if this is for a reason. Is this to minimize latency as the insert won't be made until the batch gets filled? If that's the case, I'm curious why you didn't do this for accounts?

    opened by thekeviv 7
  • Pointers for running `cargo test` on macOS arm64

    Pointers for running `cargo test` on macOS arm64

    This is less issue and more FYI on how I got cargo test to pass on

    1. Setup the database:
    brew install postgresql
    brew services restart postgresql
    createdb -O solana solana
    psql -U solana -p 5432 -h localhost -w -d solana -f scripts/create_schema.sql
    
    1. Build the library
    cargo build --lib
    
    1. Make the farf directory (not 100% sure the tests don't entirely run in a temp directory)

    2. Patch up the test with some platform specific stuff (there's probably a better way to deal with this; just went brute force for expediency)

    diff --git a/tests/test_postgres_plugin.rs b/tests/test_postgres_plugin.rs
    index 617fef1..59345bb 100644
    --- a/tests/test_postgres_plugin.rs
    +++ b/tests/test_postgres_plugin.rs
    @@ -122,7 +122,7 @@ fn generate_accountsdb_plugin_config() -> (TempDir, PathBuf) {
    
         let config_content = r#"
         {
    -        "libpath": "libsolana_accountsdb_plugin_postgres.so",
    +        "libpath": "libsolana_accountsdb_plugin_postgres.dylib",
             "connection_str": "host=localhost user=solana password=solana port=5432",
             "threads": 20,
             "batch_size": 20,
    @@ -210,10 +210,12 @@ fn test_postgres_plugin() {
         solana_logger::setup_with_default(RUST_LOG_FILTER);
    
         unsafe {
    -        let lib = Library::new("libsolana_accountsdb_plugin_postgres.so");
    +        let filename = "libsolana_accountsdb_plugin_postgres.dylib";
    +        let lib = Library::new(filename);
             if lib.is_err() {
                 info!(
    -                "Failed to load the dynamic library libsolana_accountsdb_plugin_postgres.so {:?}",
    +                "Failed to load the dynamic library {} {:?}",
    +                filename,
                     lib
                 );
                 return;
    

    OS Specifics for anyone else that might be looking for this:

    uname -a
    Darwin zola.local 21.1.0 Darwin Kernel Version 21.1.0: Wed Oct 13 17:33:01 PDT 2021; root:xnu-8019.41.5~1/RELEASE_ARM64_T6000 arm64
    
    opened by levicook 6
  • PostgreSQL client cert support

    PostgreSQL client cert support

    Google Cloud SQL enforces the use of TLS client certs. The accounts plugin currently does not support specifying CA/cert file/key file, so it cannot securely connect to Google Cloud SQL directly.

    Until then cloud_sql_proxy is sufficient as a workaround.

    opened by terorie 2
  • Plugin causing validator to not keep up

    Plugin causing validator to not keep up

    Are you able to deploy the plugin and keep up with the network, I am consistently falling behind. If you are able to keep up would love to know what flags/settings you have set.

    opened by Dassy23 6
  • Graceful error handling

    Graceful error handling

    The plugin currently has two error handling modes: panic_on_db_errors or "log error and ignore".

    Occasional errors sending data to PostgreSQL are expected however, e.g. when the DB host reboots or if there is a network blip. Therefore in production "log error and ignore" is the only choice.

    Since Solana mainnet does several thousand updates per second to accounts this results in a flood of error messages in the log in the case of downtime.

    To reduce the error rate, I suggest the workers should back off exponentially with writes until Postgres is back up by introducing sleeps.

    opened by terorie 0
Owner
Lijun Wang
Lijun Wang
A Distributed SQL Database - Building the Database in the Public to Learn Database Internals

Table of Contents Overview Usage TODO MVCC in entangleDB SQL Query Execution in entangleDB entangleDB Raft Consensus Engine What I am trying to build

Sarthak Dalabehera 38 Jan 2, 2024
Teach your PostgreSQL database how to speak MongoDB Wire Protocol

“If it looks like MongoDB, swims like MongoDB, and quacks like MongoDB, then it probably is PostgreSQL.” ?? Discord | Online Demo | Intro Video | Quic

Felipe Coury 261 Jun 18, 2023
Materialize simplifies application development with streaming data. Incrementally-updated materialized views - in PostgreSQL and in real time. Materialize is powered by Timely Dataflow.

Materialize is a streaming database for real-time applications. Get started Check out our getting started guide. About Materialize lets you ask questi

Materialize, Inc. 4.7k Jan 8, 2023
🧰 The Rust SQL Toolkit. An async, pure Rust SQL crate featuring compile-time checked queries without a DSL. Supports PostgreSQL, MySQL, SQLite, and MSSQL.

SQLx ?? The Rust SQL Toolkit Install | Usage | Docs Built with ❤️ by The LaunchBadge team SQLx is an async, pure Rust† SQL crate featuring compile-tim

launchbadge 7.6k Dec 31, 2022
Native PostgreSQL driver for the Rust programming language

Rust-Postgres PostgreSQL support for Rust. postgres Documentation A native, synchronous PostgreSQL client. tokio-postgres Documentation A native, asyn

Steven Fackler 2.8k Jan 8, 2023
PostgreSQL procedural language handler for Clojure via SCI

pl/sci Status This is very much an experiment and I'm open to feedback on where to take this next. Build Requirements lein GraalVM CE 20.3.0 Java 11 c

Michiel Borkent 45 Nov 28, 2022
A Rust crate for writing servers that speak PostgreSQL's wire protocol

Convergence A Rust crate for writing servers that speak PostgreSQL's wire protocol. Additionally, the experimental convergence-arrow crate enables con

ReservoirDB 63 Jan 2, 2023
A tool for automated migrations for PostgreSQL, SQLite and MySQL.

Models Models is an implementation for a SQL migration management tool. It supports PostgreSQL, MySQL, and SQLite. Quick Start install the CLI by runn

null 45 Nov 16, 2022
Zenith substitutes PostgreSQL storage layer and redistributes data across a cluster of nodes

Zenith substitutes PostgreSQL storage layer and redistributes data across a cluster of nodes

null 5.7k Jan 6, 2023
Rust library to parse, deparse and normalize SQL queries using the PostgreSQL query parser

This Rust library uses the actual PostgreSQL server source to parse SQL queries and return the internal PostgreSQL parse tree.

pganalyze 37 Dec 18, 2022
Generate type-checked Rust from your PostgreSQL.

Cornucopia Generate type checked Rust from your SQL Install | Example Cornucopia is a small CLI utility resting on postgres designed to facilitate Pos

null 206 Dec 25, 2022
Rust - Build a CRUD API with SQLX and PostgreSQL

In this article, you'll learn how to build a CRUD API in Rust using SQLX, Actix-web, and PostgreSQL. Learning how to build a CRUD API as a developer will equip you with valuable skills for building robust, maintainable, and scalable applications.

CODEVO 5 Feb 20, 2023
Experimental blockchain database

A database for the blockchain. Design considerations API The database is a universal key-value storage that supports transactions. It does not support

Parity Technologies 172 Dec 26, 2022
Immutable Ordered Key-Value Database Engine

PumpkinDB Build status (Linux) Build status (Windows) Project status Usable, between alpha and beta Production-readiness Depends on your risk toleranc

null 1.3k Jan 2, 2023
Skybase is an extremely fast, secure and reliable real-time NoSQL database with automated snapshots and SSL

Skybase The next-generation NoSQL database What is Skybase? Skybase (or SkybaseDB/SDB) is an effort to provide the best of key/value stores, document

Skybase 1.4k Dec 29, 2022
Distributed transactional key-value database, originally created to complement TiDB

Website | Documentation | Community Chat TiKV is an open-source, distributed, and transactional key-value database. Unlike other traditional NoSQL sys

TiKV Project 12.4k Jan 3, 2023
small distributed database protocol

clepsydra Overview This is a work-in-progress implementation of a core protocol for a minimalist distributed database. It strives to be as small and s

Graydon Hoare 19 Dec 2, 2021
A user crud written in Rust, designed to connect to a MySQL database with full integration test coverage.

SQLX User CRUD Purpose This application demonstrates the how to implement a common design for CRUDs in, potentially, a system of microservices. The de

null 78 Nov 27, 2022
Rust version of the Haskell ERD tool. Translates a plain text description of a relational database schema to dot files representing an entity relation diagram.

erd-rs Rust CLI tool for creating entity-relationship diagrams from plain text markup. Based on erd (uses the same input format and output rendering).

Dave Challis 32 Jul 25, 2022