Zenith substitutes PostgreSQL storage layer and redistributes data across a cluster of nodes

Related tags

Database zenith
Overview

Zenith

Zenith substitutes PostgreSQL storage layer and redistributes data across a cluster of nodes

Architecture overview

A Zenith installation consists of Compute nodes and Storage engine.

Compute nodes are stateless PostgreSQL nodes, backed by zenith storage.

Zenith storage engine consists of two major components:

  • Pageserver. Scalable storage backend for compute nodes.
  • WAL service. The service that receives WAL from compute node and ensures that it is stored durably.

Pageserver consists of:

  • Repository - Zenith storage implementation.
  • WAL receiver - service that receives WAL from WAL service and stores it in the repository.
  • Page service - service that communicates with compute nodes and responds with pages from the repository.
  • WAL redo - service that builds pages from base images and WAL records on Page service request.

Running local installation

  1. Install build dependencies and other useful packages

On Ubuntu or Debian this set of packages should be sufficient to build the code:

apt install build-essential libtool libreadline-dev zlib1g-dev flex bison libseccomp-dev \
libssl-dev clang pkg-config libpq-dev

[Rust] 1.52 or later is also required.

To run the psql client, install the postgresql-client package or modify PATH and LD_LIBRARY_PATH to include tmp_install/bin and tmp_install/lib, respectively.

To run the integration tests (not required to use the code), install Python (3.6 or higher), and install python3 packages with pipenv using pipenv install in the project directory.

  1. Build zenith and patched postgres
git clone --recursive https://github.com/zenithdb/zenith.git
cd zenith
make -j5
  1. Start pageserver and postgres on top of it (should be called from repo root):
# Create repository in .zenith with proper paths to binaries and data
# Later that would be responsibility of a package install script
> ./target/debug/zenith init
pageserver init succeeded

# start pageserver
> ./target/debug/zenith start
Starting pageserver at '127.0.0.1:64000' in .zenith
Pageserver started

# start postgres on top on the pageserver
> ./target/debug/zenith pg start main
Starting postgres node at 'host=127.0.0.1 port=55432 user=stas'
waiting for server to start.... done

# check list of running postgres instances
> ./target/debug/zenith pg list
BRANCH	ADDRESS		LSN		STATUS
main	127.0.0.1:55432	0/1609610	running
  1. Now it is possible to connect to postgres and run some queries:
> psql -p55432 -h 127.0.0.1 -U zenith_admin postgres
postgres=# CREATE TABLE t(key int primary key, value text);
CREATE TABLE
postgres=# insert into t values(1,1);
INSERT 0 1
postgres=# select * from t;
 key | value
-----+-------
   1 | 1
(1 row)
  1. And create branches and run postgres on them:
# create branch named migration_check
> ./target/debug/zenith branch migration_check main
Created branch 'migration_check' at 0/1609610

# check branches tree
> ./target/debug/zenith branch
 main
 ┗━ @0/1609610: migration_check

# start postgres on that branch
> ./target/debug/zenith pg start migration_check
Starting postgres node at 'host=127.0.0.1 port=55433 user=stas'
waiting for server to start.... done

# this new postgres instance will have all the data from 'main' postgres,
# but all modifications would not affect data in original postgres
> psql -p55433 -h 127.0.0.1 -U zenith_admin postgres
postgres=# select * from t;
 key | value
-----+-------
   1 | 1
(1 row)

postgres=# insert into t values(2,2);
INSERT 0 1
  1. If you want to run tests afterwards (see below), you have to stop pageserver and all postgres instances you have just started:
> ./target/debug/zenith pg stop migration_check
> ./target/debug/zenith pg stop main
> ./target/debug/zenith stop

Running tests

git clone --recursive https://github.com/zenithdb/zenith.git
make # builds also postgres and installs it to ./tmp_install
cd test_runner
pytest

Documentation

Now we use README files to cover design ideas and overall architecture for each module and rustdoc style documentation comments. See also /docs/ a top-level overview of all available markdown documentation.

To view your rustdoc documentation in a browser, try running cargo doc --no-deps --open

Postgres-specific terms

Due to Zenith's very close relation with PostgreSQL internals, there are numerous specific terms used. Same applies to certain spelling: i.e. we use MB to denote 1024 * 1024 bytes, while MiB would be technically more correct, it's inconsistent with what PostgreSQL code and its documentation use.

To get more familiar with this aspect, refer to:

Join the development

Comments
  • Add wal backpressure performance tests

    Add wal backpressure performance tests

    Resolves #1889.

    This PR adds new tests to measure the WAL backpressure's performance under different workloads.

    Changes

    • add new performance tests in test_wal_backpressure.py
    • allow safekeeper's fsync to be configurable when running tests
    opened by aome510 43
  • Storage format rewrite

    Storage format rewrite

    Lots of stuff happening here, unfortunately in one big patch:

    1. Simplify Repository to a value-store

    Move the responsibility of tracking relation metadata, like which relations exist and what are their sizes, from Repository to a new module, pgdatadir_mapping.rs. The interface to Repository is now a simple key-value PUT/GET operations.

    It's still not any old key-value store though. A Repository is still responsible from handling branching, and every GET operation comes with an LSN.

    1. Store arbitrary key-ranges in the layer files

    The concept of a "segment" is gone. Each layer file can store an arbitrary range of Keys.

    1. Have two levels of delta layers.

    When the in-memory layer fills up, its dumped into a delta layer file, by the checkpoint operation. These files cover the whole key-range, as they the WAL across the whole key-space, since the last checkpoint. These files are named like this:

    000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__0000000001698C48-0000000001698CC1
    

    When 10 of these level-0 files have been accumulated, they are reshuffled to for level-1 files. They look like this:

    012222222233333333444444445500000000-012222222233333333444444445500002710__0000000000000000-0000000000027101
    012222222233333333444444445500002710-012222222233333333444444445500004E20__0000000000027101-000000000004E201
    012222222233333333444444445500004E20-012222222233333333444444445500007530__000000000004E201-0000000000075301
    012222222233333333444444445500007530-012222222233333333444444445500009C40__0000000000075301-000000000009C401
    012222222233333333444444445500009C40-01222222223333333344444444550000C350__000000000009C401-00000000000C3501
    

    The "level" of a file is not explicitly stored anywhere. The distinction is just in the key range that the file covers. Currently, the code treates the level-0 files specially, and tries to compact those, but it never touches level-1 files again. So this forms an LSM tree with a constant of 2 levels (plus the in-memory level).

    In addition to those, image-layers are created. Each image layer covers a range of keys, at a single LSN. E.g:

    012222222233333333444444445500009C40-01222222223333333344444444550000C350__000000000009C401
    

    These are created at a different schedule from the delta layers, with different heuristics.

    TODOs:

    • I'm not totally happy with the way that the "key space" that's in use is tracked. I don't think I got the abstraction quite right there. The way it now works is that the code in pgdatadir_mapping.rs periodically scans the metadata, and constructs a set of ranges (KeyPartitioning) that represent all the keys that are stored in the repository, at the latest LSN. It passes that KeyPartitioning object down to the Repository implementation, which uses it to decide how to physically partition the image and delta layers. It is also used to reclaim space after deletions: if a Key is missing from the KeyPartitioning, it is left out when the next image layer is created.

    • The code in layer_map.rs tracks which layers exist. It is used to find the right layer to satisfy a get-request, and also to determine if a layer can be garbage collected (i.e. if its LSN range is old enough and it's covered by later image layers). But it's very simplistic: it just stores all the layers in a Vec, and performs a linear search over it, for all those operations. If you have more than a few dozen layers, that becomes slow.

    • The heuristics for when to compact, when to create image layers, etc. are very simplistic: If there are 10 delta layers at "level 0", they are compacted (or reshuffled, if you will). That merges the files and splits them again by key-dimension rather than LSN-dimension. And if there are more than 3 delta layers on top of the latest image layer of a partition, a new image layer is created. These heuristics could be improved a lot I'm sure.

    • We don't try to materialize pages or garbage collect anything during compaction. If the GC horizon is short, that would save a lot of effort, as you could get rid of old garbage instead of "reshuffling" it in the level 0.

    • All the background tasks, checkpointing, compaction and image-creation, are handled by a single thread. That easily becomes a bottleneck, causing e.g. checkpointing to fall behind, causing high memory usage and overly large layer files at level 0. This is easy to see with pgbench -s1000 -i, for example.

    • Lots of performance optimizations would be possible and required. For example, the image and delta layers don't use the page cache, which makes compaction pretty slow as it reads all the existing versions from a delta layer and writes it back out. That is currently slow and shows up in 'perf' profile. (The code in 'main' is missing that too, but it's not as obviously bad there because we don't do anything like the "compaction" there)

    opened by hlinnaka 34
  • Check performance on big databases (clickhouse datasets)

    Check performance on big databases (clickhouse datasets)

    a) yandex metrica dataset (~8gb compressed) data: https://clickhouse.com/docs/en/getting-started/example-datasets/metrica/ schema(hits): https://clickhouse.com/docs/en/getting-started/tutorial/ queries: https://clickhouse.com/benchmark/dbms/

    b) github dataset (~75gb compressed) https://ghe.clickhouse.tech https://ghe.clickhouse.tech/#download-the-dataset

    UPD, the results are summarized in the last comment here: https://github.com/zenithdb/zenith/issues/906#issuecomment-1016445326

    opened by kelvich 32
  • Bug in GC

    Bug in GC

    Sometimes GC deletes layers which are still needed. Originally I faced with this problem when played on EC2 with large data size and pageserver is crashed because of disk space exhaustion. My first idea was that it is caused by garbage collecting layers beyond disk consistent LSN: https://github.com/zenithdb/zenith/pull/1004 But I failed to create test for it because deleted layers are restored by replaying WAL from safekeeper: https://github.com/zenithdb/zenith/pull/1043 Configuration I have used on EC2 has not safekeepers.

    But recently I was able to reproduce this problems locally without any restarts. Just run read-only pgbench with scale 100 and 10 client for a long time (1000 sec). I got this errors:

    pgbench: error: client 4 script 0 aborted in command 1 query 0: ERROR:  could not read block 13685 in rel 1663/13010/16404.0 from page server at lsn 1/ACE40AA8
    DETAIL:  page server returned error: tried to request a page version that was garbage collected. requested at 1/ACE40AA8 gc cutoff 1/ACE4C878
    progress: 770.0 s, 788.3 tps, lat 12.500 ms stddev 52.382
    progress: 780.0 s, 730.6 tps, lat 12.206 ms stddev 32.596
    pgbench: error: client 6 script 0 aborted in command 1 query 0: ERROR:  could not read block 134612 in rel 1663/13010/16396.0 from page server at lsn 1/B1CC5D48
    DETAIL:  page server returned error: tried to request a page version that was garbage collected. requested at 1/B1CC5D48 gc cutoff 1/B1CCFB40
    

    So something else is wrong in GC logic.

    t/bug c/storage/pageserver 
    opened by knizhnik 29
  • Implement layered storage format

    Implement layered storage format

    This evolved from https://github.com/zenithdb/zenith/pull/299, but there has so many changes from the original in-memory repository that I thought a new PR is in order.

    This replaces the RocksDB based implementation with an approach using "snapshot files" on disk, and in-memory btreemaps to hold the recent changes.

    This make the repository implementation a configuration option. You can choose 'layered' or 'rocksdb' in the "pageserver init" call, but there is no corresponding --repository-formt option in 'zenith init', so in practice you have to change the default in pageserver.rs if you want to test different implementations. The unit tests have been refactored to exercise both implementations, though. 'layered' is now the default.

    TODOs:

    • Push/pull is not implemented, causing 'test_history_inmemory' test in 'cargo test' to fail. I think we'll have to rethink how push/pull is implemented. It would make sense to copy the immutable files directly, instead of serializing them into a stream of changes. The test is marked as 'ignore' in this PR.
    opened by hlinnaka 28
  • Fix potential UB due to transmute

    Fix potential UB due to transmute

    This is a continuation of the discussion from #6, and some related issues that came up in #201 (comment).

    There is a pattern that has begun to emerge in postgres_ffi, that looks like this:

    pub fn encode_foo(rec: FooStruct) -> Bytes {
        let b: [u8; XLOG_SIZE_OF_FOO];
        b = unsafe { std::mem::transmute::<FooStruct, [u8; XLOG_SIZE_OF_FOO]>(rec) };
        Bytes::copy_from_slice(&b[..])
    }
    

    This pattern seems unwise in a few ordinary ways (manual computation of struct size, non portability due to byte order and padding, unnecessary use of Bytes), but it also has a critical problem:

    transmute is likely to lead to UB (undefined behavior).

    This is not something that should be allowed. Rust UB is orders of magnitude more dangerous than UB in C or C++. The only solution is zero tolerance for UB (or any other unsoundness).

    We should always be able to run our code inside Miri (which analyzes rust intermediate-representation for UB, memory unsafety, etc.) Any alarm from Miri should be treated as a critical bug.

    Things that have been identified as potential UB in our code:

    • transmute to struct containing bool.
    • transmute to/from struct containing implicit (compiler-added) padding for alignment.
    • transmute to/from struct with a length that is not a multiple of its alignment (also leads to padding, when included in arrays or other structs).
    opened by ericseppanen 27
  • Add test that repeatedly kills and restarts the pageserver

    Add test that repeatedly kills and restarts the pageserver

    This seems like a useful test. I'm marking it as Draft for now, because it failed a few times when I ran it on my laptop, with what seemed to be a corrupt layer file with 0 headers. We've seen that error in staging a few times, so maybe this will help to track it down.

    Another funny thing I noticed with this:

    test_output/test_pageserver_chaos/repo/tenants/e85da3a3e328435c935b743f3d9e7779/timelines/a193f1b07f7b8bac38d1a16afdc2913e:
    total 345436
    -rw-r----- 1 heikki heikki  5021696 May 18 11:34 000000000000000000000000000000000000-000000067F0000000100000A720000000021__0000000001698C48-000000000423FFD1
    -rw-r----- 1 heikki heikki  9953280 May 18 11:34 000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__0000000006DD0891-00000000072AFFA1
    -rw-r----- 1 heikki heikki   286720 May 18 11:34 000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000072AFFA1-00000000072CFF69
    -rw-r----- 1 heikki heikki  9592832 May 18 11:34 000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000072CFF69-000000000779EF89
    -rw-r----- 1 heikki heikki  9977856 May 18 11:34 000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__000000000779EF89-0000000007C78509
    -rw-r----- 1 heikki heikki  9863168 May 18 11:34 000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__0000000007C78509-00000000081455A9
    -rw-r----- 1 heikki heikki 10027008 May 18 11:34 000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000081455A9-0000000008624A39
    -rw-r----- 1 heikki heikki 10018816 May 18 11:34 000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__0000000008624A39-0000000008B04029
    -rw-r----- 1 heikki heikki  9936896 May 18 11:34 000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__0000000008B04029-0000000008FD8F19
    -rw-r----- 1 heikki heikki   294912 May 18 11:34 000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__0000000008FD8F19-0000000008FF8F09
    -rw-r----- 1 heikki heikki  9707520 May 18 11:34 000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__0000000008FF8F09-00000000094D3179
    -rw-r----- 1 heikki heikki  9871360 May 18 11:34 000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000094D3179-00000000099A7759
    

    A few of the files created are very small 286720 and 294912 bytes. The file seems intact, dump_layerfile read it OK. It's just small because the LSN range is very small. Why did the pageserver decide to flush out a tiny layer like that?

    opened by hlinnaka 25
  • run performance tests and show the results difference over time

    run performance tests and show the results difference over time

    This includes basic setup to run pgbench against arbitrary connection string. Possible targets are our staging environment and local installations. Currently pgbench invocations support two parameters, scale and duration. The question is what other options are possibly needed? Parameters are defined in a "matrix" to test all combinations, for tests I've used small scales and short durations. What scales and durations should we use in CI tests? The suggestion was to use 10 minute duration on every push to main, and one hour duration to run once a day. Thoughts?

    This also includes generation of a dead simple html page representing test results. Here is a screenshot: image

    Here we can see the results. One more question is that which threshold should be used to color the results? Because here the threshold is 0.05 and all the results were taken from the same revision, so it is rather noisy. 5-10% as a threshold? We can use different values for positive and negative thresholds.

    TODO:

    • [x] parametrize matrix via command line something like --durations "5,10,20"
    • [x] threshold for colors
    • [x] values for matrix to run in the CI

    Note, for simplicity it is supposed to put the results as a json files in a different repo and use guthub pages to serve generated output file. Probably this design wouldn't scale to thousands of revisions, but for now it is reasonable to use just that and do not support some highly available database. Also the page itself probably will be hard to use with large amount of revisions, so different runs should be splitted into tabs/separate pages

    opened by LizardWizzard 24
  • Replace etcd with custom neon_broker.

    Replace etcd with custom neon_broker.

    It is a simple and fast pub-sub message bus based on tonic. In this patch only safekeeper message is supported, but others can be easily added.

    Compilation now requires protoc to be installed. Installing protobuf-compiler package is fine for Debian/Ubuntu.

    opened by arssher 23
  • Tokio based walredo

    Tokio based walredo

    Re-implements the walredo process handling as an tokio managed process and task. I think the implementation is at feature parity but not all of these are easy to test:

    • [x] lazy start (without it, many tests will fail)
    • [x] kill and relaunch the process if any request fails (untested)
      • this only happens with timeouts, or io error, like with previous version
      • no checking for zero pages

    Timeout is now handled as "from the write starting to the read completing", which might not be correct with the request pipelining.

    #1700 is not handled, as in requests are not retried. That could be implemented.

    An interesting deadlock was discovered because of blocking code elsewhere: #2975. This was solved by adding a new runtime.

    While working on this, I noticed that tokio does not yet support vectored writes for child processes: https://github.com/tokio-rs/tokio/pull/5216 -- the implementation will benefit from upgrading to this future tokio version.

    This branch currently has primitive implementations of scale up multiprocess walredo controller and rebasing of #2880 (which hopefully didn't go too wrong). These are only up for discussion, as they share much of the code with "scale down".

    "Scale walredo" down to zero seemed like an interesting one to implement, I don't know if it's needed or what would be a sensible timeout for it. That could be moved to a follow-up PR as well.

    Cc: #2778.

    run-benchmarks 
    opened by koivunej 22
  • Refactor ObjectTags, intruducing a new concept called

    Refactor ObjectTags, intruducing a new concept called "relish"

    This clarifies - I hope - the abstractions between Repository and ObjectRepository. The ObjectTag struct was a mix of objects that could be accessed directly through the public Timeline interface, and also objects that were created and used internally by the ObjectRepository implementation and not supposed to be accessed directly by the callers. With the RelishTag separaate from ObjectTag, the distinction is more clear: RelishTag is used in the public interface, and ObjectTag is used internally between object_repository.rs and object_store.rs, and it contains the internal metadata object types.

    One awkward thing with the ObjectTag struct was that the Repository implementation had to distinguish between ObjectTags for relations, and track the size of the relation, while others were used to store "blobs". With the RelishTags, some relishes are considered "non-blocky", and the Repository implementation is expected to track their sizes, while others are stored as blobs. I'm not 100% happy with how RelishTag captures that either: it just knows that some relish kinds are blocky and some non-blocky, and there's an is_block() function to check that. But this does enable size-tracking for SLRUs, allowing us to treat them more like relations.

    This changes the way SLRUs are stored in the repository. Each SLRU segment, e.g. "pg_clog/0000", "pg_clog/0001", are now handled as a separate relish. This removes the need for the SLRU-specific put_slru_truncate() function in the Timeline trait. SLRU truncation is now handled by caling put_unlink() on the segment. This is more in line with how PostgreSQL stores SLRUs and handles their trunction.

    The SLRUs are "blocky", so they are accessed one 8k page at a time, and repository tracks their size. I considered an alternative design where we would treat each SLRU segment as non-blocky, and just store the whole file as one blob. Each SLRU segment is up to 256 kB in size, which isn't that large, so that might've worked fine, too. One reason I didn't do that is that it seems better to have the WAL redo routines be as close as possible to the PostgreSQL routines. It doesn't matter much in the repository, though; we have to track the size for relations anyway, so there's not much difference in whether we also do it for SLRUs.

    Review guide

    This is a big patch, but it's mostly very mechanical code churn, replacing ObjectTags with RelishTags. For the substance, please start from relishes.rs. It has comments explaining the concept.

    I ended up doing this when I started to rebase the 'layered-repo' branch over the non-rel changes in main. I didn't like how the SLRUs were treated specially, with the special way they were truncated, and this kind of ballooned from there. We had discussions on this earlier, when the ObjectTags were introduced in PR #268, but back then I was not able to articulate my criticism so clearly, and didn't have a good alternative in mind. I think this fixes some of the "layering violations" between the Repository/Timeline traits and the ObjectRepository/ObjectStore implementation of those traits.

    Does anyone come with a better name for a "relish"? Something that would capture "a relation, or one of the other files that we store in the repository"? I considered just "file", but then you might expect a more POSIX-like API. I also considered "object", but "object" is an overloaded term, and it would differ from the way the term "object" is currently used. (Currently, an ObjectTag refers to an individual page in a relation (or one of the other things that are not relations), whereas a relish represents the relation itself). I also considered just calling everything a relation, even things that are not relations in PostgreSQL, but that could also be confusing. I kind of like "relish", but I'm not sure how it sounds to others or if it the connotation to food is too strong :-).

    opened by hlinnaka 21
  • Remove code and test to generate flamegraph on GetPage requests.

    Remove code and test to generate flamegraph on GetPage requests.

    It was nice to have and useful at the time, but unfortunately the method used to gather the profiling data doesn't play nicely with 'async'. PR #3228 will turn 'get_page_at_lsn' function async, which will break the profiling support. Let's remove it, and re-introduce some kind of profiling later, using some different method, if we feel like we need it again.

    opened by hlinnaka 0
  • Minor cleanup of test_ondemand_download_timetravel test.

    Minor cleanup of test_ondemand_download_timetravel test.

    • Fix and improve comments
    • Rename 'physical_size' local variable to 'resident_size' for clarity.
    • Remove one 'unnecessary wait_for_upload' call. The 'wait_for_sk_commit_lsn_to_reach_remote_storage' call after shutting down compute is sufficient.
    opened by hlinnaka 1
  • Fix panics at compute_ctl:monitor

    Fix panics at compute_ctl:monitor

    Closes #1513

    It's possible to receive NULL values. I've removed the conditions to test it:

    main=> SELECT state, to_char(state_change, 'YYYY-MM-DD\"T\"HH24:MI:SS.US\"Z\"') AS state_change
                             FROM pg_stat_activity;
     state  |          state_change
    --------+---------------------------------
            |
            |
            |
     active | 2023-01-03"T"10:37:41.498769"Z"
            |
            |
            |
    (7 rows)
    

    to check that values are NULLs

    main=> SELECT state is null, to_char(state_change, 'YYYY-MM-DD\"T\"HH24:MI:SS.US\"Z\"') is null AS state_change
                             FROM pg_stat_activity;
     ?column? | state_change
    ----------+--------------
     t        | t
     t        | t
     t        | t
     f        | f
     t        | t
     t        | t
     t        | t
    (7 rows)
    
    opened by vadim2404 0
  • Preserve anyhow context during error conversion

    Preserve anyhow context during error conversion

    I have a test failure that shows

    
    Caused by:
        0: Failed to reconstruct a page image:
        1: Directory not empty (os error 39)
    

    but does not really show where exactly that happens. https://neon-github-public-dev.s3.amazonaws.com/reports/pr-3227/release/3823785365/index.html#categories/c0057473fc9ec8fb70876fd29a171ce8/7088dab272f2c7b7/?attachment=60fe6ed2add4d82d

    The PR aims to add more context in debugging that issue.

    opened by SomeoneToIgnore 0
  • Use Weak<Timeline> instead of Arc<Timeline> in walreceiver

    Use Weak instead of Arc in walreceiver

    Part of https://github.com/neondatabase/neon/pull/2899 and related issues about the timeline guard object. Generally, it's good to never leak Arc objects, especially in detached job loops that write into Arc<Timeline>. We're safe now since we're stopping walreceiver tasks manually using task_mgr api, but now it will be more safe.

    Part of https://github.com/neondatabase/neon/issues/2106 : to print the context of walreceiver in timeline's timeline.rs::wait_lsn method, I would need some reference to walreceiver (or a related channel) in the timeline. That changes the order of initialisation of the objects, and simpler way seems to create some Walreceiver struct and put it into Timeline, that would later be able to start or stop it. For all that to happen, having Weak around seems simpler.

    opened by SomeoneToIgnore 1
Owner
null
OBKV Table Client is Rust Library that can be used to access table data from OceanBase storage layer.

OBKV Table Client is Rust Library that can be used to access table data from OceanBase storage layer. Its access method is different from JDBC, it skips the SQL parsing layer, so it has significant performance advantage.

OceanBase 4 Nov 14, 2022
Plugin for macro-, mini-quad (quads) to save data in simple local storage using Web Storage API in WASM and local file on a native platforms.

quad-storage This is the crate to save data in persistent local storage in miniquad/macroquad environment. In WASM the data persists even if tab or br

ilya sheprut 9 Jan 4, 2023
Materialize simplifies application development with streaming data. Incrementally-updated materialized views - in PostgreSQL and in real time. Materialize is powered by Timely Dataflow.

Materialize is a streaming database for real-time applications. Get started Check out our getting started guide. About Materialize lets you ask questi

Materialize, Inc. 4.7k Jan 8, 2023
Open Data Access Layer that connect the whole world together

OpenDAL Open Data Access Layer that connect the whole world together. Status OpenDAL is in alpha stage and has been early adopted by databend. Welcome

Datafuse Labs 302 Jan 4, 2023
A Key-Value data storage system. - dorea db

Dorea DB ?? Dorea is a key-value data storage system. It is based on the Bitcask storage model Documentation | Crates.io | API Doucment 简体中文 | English

ZhuoEr Liu 112 Dec 2, 2022
SQLite compiled to WASM with pluggable data storage

wasm-sqlite SQLite compiled to WASM with pluggable data storage. Useful to save SQLite in e.g. Cloudflare Durable Objects (example: https://github.com

Markus Ast 36 Dec 7, 2022
Scalable and fast data store optimised for time series data such as financial data, events, metrics for real time analysis

OnTimeDB Scalable and fast data store optimised for time series data such as financial data, events, metrics for real time analysis OnTimeDB is a time

Stuart 2 Apr 5, 2022
🧰 The Rust SQL Toolkit. An async, pure Rust SQL crate featuring compile-time checked queries without a DSL. Supports PostgreSQL, MySQL, SQLite, and MSSQL.

SQLx ?? The Rust SQL Toolkit Install | Usage | Docs Built with ❤️ by The LaunchBadge team SQLx is an async, pure Rust† SQL crate featuring compile-tim

launchbadge 7.6k Dec 31, 2022
A tool for automated migrations for PostgreSQL, SQLite and MySQL.

Models Models is an implementation for a SQL migration management tool. It supports PostgreSQL, MySQL, and SQLite. Quick Start install the CLI by runn

null 45 Nov 16, 2022
Rust library to parse, deparse and normalize SQL queries using the PostgreSQL query parser

This Rust library uses the actual PostgreSQL server source to parse SQL queries and return the internal PostgreSQL parse tree.

pganalyze 37 Dec 18, 2022
Rust - Build a CRUD API with SQLX and PostgreSQL

In this article, you'll learn how to build a CRUD API in Rust using SQLX, Actix-web, and PostgreSQL. Learning how to build a CRUD API as a developer will equip you with valuable skills for building robust, maintainable, and scalable applications.

CODEVO 5 Feb 20, 2023
Native PostgreSQL driver for the Rust programming language

Rust-Postgres PostgreSQL support for Rust. postgres Documentation A native, synchronous PostgreSQL client. tokio-postgres Documentation A native, asyn

Steven Fackler 2.8k Jan 8, 2023
PostgreSQL procedural language handler for Clojure via SCI

pl/sci Status This is very much an experiment and I'm open to feedback on where to take this next. Build Requirements lein GraalVM CE 20.3.0 Java 11 c

Michiel Borkent 45 Nov 28, 2022
A Rust crate for writing servers that speak PostgreSQL's wire protocol

Convergence A Rust crate for writing servers that speak PostgreSQL's wire protocol. Additionally, the experimental convergence-arrow crate enables con

ReservoirDB 63 Jan 2, 2023
The Solana AccountsDb plugin for PostgreSQL database.

The solana-accountsdb-plugin-postgres crate implements a plugin storing account data to a PostgreSQL database to illustrate how a plugin can be develo

Lijun Wang 3 Jun 16, 2022
Generate type-checked Rust from your PostgreSQL.

Cornucopia Generate type checked Rust from your SQL Install | Example Cornucopia is a small CLI utility resting on postgres designed to facilitate Pos

null 206 Dec 25, 2022
Teach your PostgreSQL database how to speak MongoDB Wire Protocol

“If it looks like MongoDB, swims like MongoDB, and quacks like MongoDB, then it probably is PostgreSQL.” ?? Discord | Online Demo | Intro Video | Quic

Felipe Coury 261 Jun 18, 2023
Distributed, version controlled, SQL database with cryptographically verifiable storage, queries and results. Think git for postgres.

SDB - SignatureDB Distributed, version controlled, SQL database with cryptographically verifiable storage, queries and results. Think git for postgres

Fremantle Industries 5 Apr 26, 2022
Appendable and iterable key/list storage, backed by S3, written in rust

klstore Appendable and iterable key/list storage, backed by S3. General Overview Per key, a single writer appends to underlying storage, enabling many

Eric Thill 3 Sep 29, 2022