A high-performance storage engine for modern hardware and platforms.

Overview

PhotonDB

crates docs

A high-performance storage engine for modern hardware and platforms.

PhotonDB is designed from scratch to leverage the power of modern multi-core chips, storage devices, operating systems, and programming languages.

Features:

  • Latch-free data structures, scale to many cores.
  • Log-structured persistent stores, optimized for flash storage.
  • Asynchronous APIs and efficient file IO, powered by io_uring on Linux.

Design

Architecture

Progress

We have published the photondb crate v0.0.4. You can try some examples to see what it can do so far. It is important to note that the current version is still too young to be used for anything serious.

Use the synchronous APIs:

[dependencies]
photondb = "0.0.4"
use photondb::{std::Table, Result, TableOptions};

fn main() -> Result<()> {
    let table = Table::open("/tmp/photondb", TableOptions::default())?;
    let key = vec![1];
    let val1 = vec![2];
    let val2 = vec![3];
    // Simple CRUD operations.
    table.put(&key, 1, &val1)?;
    table.delete(&key, 2)?;
    table.put(&key, 3, &val2)?;
    assert_eq!(table.get(&key, 1)?, Some(val1));
    assert_eq!(table.get(&key, 2)?, None);
    assert_eq!(table.get(&key, 3)?, Some(val2.clone()));
    let guard = table.pin();
    // Get the value without copy.
    assert_eq!(guard.get(&key, 3)?, Some(val2.as_slice()));
    // Iterate the tree page by page.
    let mut pages = guard.pages();
    while let Some(page) = pages.next()? {
        for (k, v) in page {
            println!("{:?} {:?}", k, v);
        }
    }
    Ok(())
}

Use the asynchronous APIs:

[dependencies]
photondb = "0.0.4"
photonio = "0.0.5"
use photondb::{Result, Table, TableOptions};

#[photonio::main]
async fn main() -> Result<()> {
    let table = Table::open("/tmp/photondb", TableOptions::default()).await?;
    let key = vec![1];
    let val1 = vec![2];
    let val2 = vec![3];
    // Simple CRUD operations.
    table.put(&key, 1, &val1).await?;
    table.delete(&key, 2).await?;
    table.put(&key, 3, &val2).await?;
    assert_eq!(table.get(&key, 1).await?, Some(val1.clone()));
    assert_eq!(table.get(&key, 2).await?, None);
    assert_eq!(table.get(&key, 3).await?, Some(val2.clone()));
    let guard = table.pin();
    // Get the value without copy.
    assert_eq!(guard.get(&key, 3).await?, Some(val2.as_slice()));
    // Iterate the tree page by page.
    let mut pages = guard.pages();
    while let Some(page) = pages.next().await? {
        for (k, v) in page {
            println!("{:?} {:?}", k, v);
        }
    }
    Ok(())
}

Run the example with:

cargo +nightly-2022-10-01 run
Comments
  • panicked at 'The reclamied file 34 has active pages'

    panicked at 'The reclamied file 34 has active pages'

    [2022-11-08T12:24:23Z DEBUG photondb::page_store::strategy] Min decline rate strategy scores: [FileScore { score: 12937502.183856335, effective_rate: 0.9999016740837005, [272/4451]
    ify: 1.6382464357250306e-6, file_id: 35 }, FileScore { score: 8510.157789679954, effective_rate: 0.9937804176113808, write_amplify: 3.208517710101981e-8, file_id: 31 }, FileScore { score: 25.21559370420057, effective_rate: 0.8765659312954175, write_amplify: 1.542835571968302e-7, file_id: 34 }]                                                                  
    [2022-11-08T12:24:23Z INFO  photondb::page_store::jobs::gc] next version 37 pick 1 page files by strategy for gc                                                                    [2022-11-08T12:24:23Z INFO  photondb::page_store::jobs::gc] Rewrite file 34 with 1 active pages, 21 dealloc pages, latest 1932 microseconds                                         
    [2022-11-08T12:24:23Z INFO  photondb::page_store::jobs::gc] gc version finished, cost 2105 microseconds                                                                             [2022-11-08T12:24:24Z INFO  photondb::page_store::write_buffer] Seal write buffer 36, 0 pending writers, allocated 127366104 bytes, usage 0.9489514231681824                        
    [2022-11-08T12:24:24Z DEBUG photondb::page_store::version] Install new buffer 37, buffer set range 36..38                                                                           [2022-11-08T12:24:24Z DEBUG photondb::page_store::jobs::flush] write buffer 36 is flushable                                                                                         [2022-11-08T12:24:24Z DEBUG photondb::page_store::jobs::flush] begin flush version with next file id 37                                                                             [2022-11-08T12:24:24Z INFO  photondb::page_store::jobs::flush] Flush write buffer 36 to page file, num_records: 26577 num_tombstone_records: 0 num_dealloc_pages: 24 num_skip_pages: 17747 data_size: 68751820
    [2022-11-08T12:24:24Z INFO  photondb::page_store::jobs::flush] Flush output file 36 with 1458578 bytes, 13 active pages, 24 dealloc pages, lasted 40412 microseconds      
              
    thread 'tokio-runtime-worker' panicked at 'The reclamied file 34 has active pages', src/page_store/jobs/flush.rs:243:21                                                             stack backtrace:                                                                                                                                                                       
    0: rust_begin_unwind                                                                                                                                                                          
    at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/std/src/panicking.rs:556:5                                                                                     1: core::panicking::panic_fmt                                                                                                                                                                 at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/panicking.rs:142:14                                                                                   2: photondb::page_store::jobs::flush::assert_files_are_deletable                                                                                                                              at ./src/page_store/jobs/flush.rs:243:21                                                                                                                                  
    3: photondb::page_store::jobs::flush::drain_obsoleted_files                                                                                                                                   at ./src/page_store/jobs/flush.rs:219:5                                                                                                                                   
    4: photondb::page_store::jobs::flush::FlushCtx<E>::flush::{{closure}}
                 at ./src/page_store/jobs/flush.rs:101:31                                                                                                                                  
    5: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
                 at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/future/mod.rs:91:19                                                                                   6: photondb::page_store::jobs::flush::FlushCtx<E>::run::{{closure}}
                 at ./src/page_store/jobs/flush.rs:73:62                                                                                                                                   
    7: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
                 at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/future/mod.rs:91:19        
    
    opened by w41ter 12
  • Does photondb support write ahead log?

    Does photondb support write ahead log?

    A glance at the code, photondb is not support wal and page store write buffer flushing is asynchronous. So after page txn commit finished, page store won't ensure modification is durable.

    Currently table's get / put api have lsn args but write buffer isn't aware of the lsn, so write buffer can't know checkpoint_lsn(like innodb) after flushed.

    Will photondb support wal in the future and page store be aware of something of lsn ?

    opened by SGZW 3
  • delete reference in Guard

    delete reference in Guard

    Iterator in table entries is not easy to solved because of lifetime annotation in the codebase. I would like to start to delete unnecessary lifetime annotation in struct definition so that it is easier and more clear to reason about the lifetime of data structure without sacrifices the safety and performance.

    This PR deletes lifetime annotation in Guard.

    opened by ming535 3
  • panicked at 'File 20 is not exists'

    panicked at 'File 20 is not exists'

    [2022-11-08T12:43:57Z DEBUG photondb::page_store::jobs::flush] write buffer 20 is not flushable, current buffer state BufferState { sealed: false, num_writer: 1, allocated: 12569928 } wait
    [2022-11-08T12:43:57Z INFO  photondb::page_store::write_buffer] Seal write buffer 20, 0 pending writers, allocated 125704360 bytes, usage 0.9365704655647278
    [2022-11-08T12:43:57Z DEBUG photondb::page_store::version] Install new buffer 21, buffer set range 20..22
    [2022-11-08T12:43:57Z DEBUG photondb::page_store::jobs::flush] write buffer 20 is flushable
    [2022-11-08T12:43:57Z DEBUG photondb::page_store::jobs::flush] begin flush version with next file id 21
    [2022-11-08T12:43:57Z INFO  photondb::page_store::jobs::flush] Flush write buffer 20 to page file, num_records: 8 num_tombstone_records: 0 num_dealloc_pages: 0 num_skip_pages: 0 data_size: 1144
    [2022-11-08T12:43:57Z INFO  photondb::page_store::jobs::flush] Flush output file 20 with 1456 bytes, 8 active pages, 0 dealloc pages, lasted 10698 microseconds
    [2022-11-08T12:43:57Z INFO  photondb::page_store::jobs::flush] Apply new file 20 dealloc pages []
    [2022-11-08T12:43:57Z DEBUG photondb::page_store::jobs::flush] Flush install version with new file 20, total 11818 microseconds
    [2022-11-08T12:43:57Z DEBUG photondb::page_store::version] Switch version from 20 to 21
    [2022-11-08T12:43:57Z DEBUG photondb::page_store::version] Release write buffer 20
    [2022-11-08T12:43:57Z DEBUG photondb::page_store::jobs::flush] Flush file 20 is success
    [2022-11-08T12:43:57Z DEBUG photondb::page_store::jobs::flush] write buffer 21 is not flushable, current buffer state BufferState { sealed: false, num_writer: 1, allocated: 12570656 } wait
    thread 'tokio-runtime-worker' panicked at 'File 20 is not exists', /home/patrick/workspace/montplex/photondb/src/page_store/page_txn.rs:85:17
    stack backtrace:
       0: rust_begin_unwind
                 at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/std/src/panicking.rs:556:5
       1: core::panicking::panic_fmt
                 at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/panicking.rs:142:14
       2: photondb::page_store::page_txn::Guard<E>::read_page::{{closure}}
                 at ./src/page_store/page_txn.rs:85:17
       3: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
                 at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/future/mod.rs:91:19
       4: photondb::tree::TreeTxn<E>::walk_page::{{closure}}
                 at ./src/tree/mod.rs:261:59
       5: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
                 at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/future/mod.rs:91:19
       6: photondb::tree::TreeTxn<E>::collect_consolidation_info::{{closure}}
                 at ./src/tree/mod.rs:670:9
       7: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
                 at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/future/mod.rs:91:19
       8: photondb::tree::TreeTxn<E>::consolidate_page_impl::{{closure}}
                 at ./src/tree/mod.rs:603:58
       9: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
                 at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/future/mod.rs:91:19
      10: photondb::tree::TreeTxn<E>::consolidate_page::{{closure}}
                 at ./src/tree/mod.rs:582:21
      11: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
                 at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/future/mod.rs:91:19
      12: photondb::tree::TreeTxn<E>::try_rewrite_page::{{closure}}
                 at ./src/tree/mod.rs:572:36
      13: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
                 at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/future/mod.rs:91:19
      14: photondb::tree::TreeTxn<E>::rewrite_page::{{closure}}
                 at ./src/tree/mod.rs:562:44
      15: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
                 at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/future/mod.rs:91:19
      16: photondb::tree::<impl photondb::page_store::jobs::gc::RewritePage<E> for alloc::sync::Arc<photondb::tree::Tree>>::rewrite::{{closure}}
                 at ./src/tree/mod.rs:87:33
      17: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
                 at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/future/mod.rs:91:19
      18: photondb::page_store::jobs::gc::GcCtx<E,R>::rewrite_active_pages::{{closure}}
                 at ./src/page_store/jobs/gc.rs:216:50
      19: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
                 at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/future/mod.rs:91:19
      20: photondb::page_store::jobs::gc::GcCtx<E,R>::rewrite_file::{{closure}}
                 at ./src/page_store/jobs/gc.rs:176:13
      21: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
                 at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/future/mod.rs:91:19
      22: photondb::page_store::jobs::gc::GcCtx<E,R>::rewrite_files::{{closure}}
                 at ./src/page_store/jobs/gc.rs:160:63
    
    
    opened by w41ter 3
  • infine loop when put a kv pair bigger than writebuffer

    infine loop when put a kv pair bigger than writebuffer

        #[photonio::test]
        async fn crud() {
            env_logger::init();
    
            const OPTIONS: TableOptions = TableOptions {
                page_size: 64,
                page_chain_length: 2,
                page_store: PageStoreOptions {
                    write_buffer_capacity: 1 << 10,
                    max_write_buffers: 8,
                    use_direct_io: false,
                    max_space_amplification_percent: 10,
                    space_used_high: u64::MAX,
                },
            };
    
            let path = tempdir().unwrap();
            let table = Table::open(&path, OPTIONS).await.unwrap();
    
            let buf = 1u8.to_be_bytes().repeat(2048);
            table.put(&buf, 0, &buf).await.unwrap();
    
            table.close().await.unwrap();
        }
    

    It will infine loop and output log like

    [2022-11-22T04:05:27Z INFO  photondb::page_store::write_buffer] Seal write buffer 2636, allocated 0 bytes, usage 0.0000
    [2022-11-22T04:05:27Z INFO  photondb::page_store::buffer_set] Stalling writes because we have 8 sealed write buffers (wait for flush)
    [2022-11-22T04:05:27Z INFO  photondb::page_store::page_file::file_builder] finish flush file: 2629
    [2022-11-22T04:05:27Z INFO  photondb::page_store::jobs::flush] Flush output file 2629 with 56 bytes, 0 active pages, 0 dealloc pages, lasted 5355 microseconds
    [2022-11-22T04:05:27Z INFO  photondb::page_store::jobs::flush] Flush write buffer 2630 to page file, num_records: 0 num_tombstone_records: 0 num_dealloc_pages: 0 num_skip_pages: 0 data_size: 0
    [2022-11-22T04:05:27Z INFO  photondb::page_store::write_buffer] Seal write buffer 2637, allocated 0 bytes, usage 0.0000
    [2022-11-22T04:05:27Z INFO  photondb::page_store::buffer_set] Stalling writes because we have 8 sealed write buffers (wait for flush)
    [2022-11-22T04:05:27Z INFO  photondb::page_store::page_file::file_builder] finish flush file: 2630
    [2022-11-22T04:05:27Z INFO  photondb::page_store::jobs::flush] Flush output file 2630 with 56 bytes, 0 active pages, 0 dealloc pages, lasted 5402 microseconds
    [2022-11-22T04:05:27Z INFO  photondb::page_store::jobs::flush] Flush write buffer 2631 to page file, num_records: 0 num_tombstone_records: 0 num_dealloc_pages: 0 num_skip_pages: 0 data_size: 0
    [2022-11-22T04:05:27Z INFO  photondb::page_store::write_buffer] Seal write buffer 2638, allocated 0 bytes, usage 0.0000
    [2022-11-22T04:05:27Z INFO  photondb::page_store::buffer_set] Stalling writes because we have 8 sealed write buffers (wait for flush)
    [2022-11-22T04:05:27Z INFO  photondb::page_store::page_file::file_builder] finish flush file: 2631
    [2022-11-22T04:40:53Z INFO  photondb::page_store::jobs::flush] Flush output file 2631 with 56 bytes, 0 active pages, 0 dealloc pages, lasted 2126118146 microseconds
    [2022-11-22T04:40:53Z INFO  photondb::page_store::jobs::flush] Flush write buffer 2632 to page file, num_records: 0 num_tombstone_records: 0 num_dealloc_pages: 0 num_skip_pages: 0 data_size: 0
    [2022-11-22T04:42:35Z INFO  photondb::page_store::page_file::file_builder] finish flush file: 2632
    [2022-11-22T04:42:35Z INFO  photondb::page_store::write_buffer] Seal write buffer 2639, allocated 0 bytes, usage 0.0000
    [2022-11-22T04:42:35Z INFO  photondb::page_store::buffer_set] Stalling writes because we have 8 sealed write buffers (wait for flush)
    [2022-11-22T04:42:35Z INFO  photondb::page_store::jobs::flush] Flush output file 2632 with 56 bytes, 0 active pages, 0 dealloc pages, lasted 102241569 microseconds
    [2022-11-22T04:42:35Z INFO  photondb::page_store::jobs::flush] Flush write buffer 2633 to page file, num_records: 0 num_tombstone_records: 0 num_dealloc_pages: 0 num_skip_pages: 0 data_size: 0
    [2022-11-22T04:42:47Z INFO  photondb::page_store::write_buffer] Seal write buffer 2640, allocated 0 bytes, usage 0.0000
    [2022-11-22T04:42:47Z INFO  photondb::page_store::buffer_set] Stalling writes because we have 8 sealed write buffers (wait for flush)
    [2022-11-22T04:42:47Z INFO  photondb::page_store::page_file::file_builder] finish flush file: 2633
    [2022-11-22T04:42:47Z INFO  photondb::page_store::jobs::flush] Flush output file 2633 with 56 bytes, 0 active pages, 0 dealloc pages, lasted 11698799 microseconds
    [2022-11-22T04:42:47Z INFO  photondb::page_store::jobs::flush] Flush write buffer 2634 to page file, num_records: 0 num_tombstone_records: 0 num_dealloc_pages: 0 num_skip_pages: 0 data_size: 0
    [2022-11-22T04:42:48Z INFO  photondb::page_store::write_buffer] Seal write buffer 2641, allocated 0 bytes, usage 0.0000
    [2022-11-22T04:42:48Z INFO  photondb::page_store::buffer_set] Stalling writes because we have 8 sealed write buffers (wait for flush)
    [2022-11-22T04:42:48Z INFO  photondb::page_store::page_file::file_builder] finish flush file: 2634
    [2022-11-22T04:42:48Z INFO  photondb::page_store::jobs::flush] Flush output file 2634 with 56 bytes, 0 active pages, 0 dealloc pages, lasted 898807 microseconds
    [2022-11-22T04:42:48Z INFO  photondb::page_store::jobs::flush] Flush write buffer 2635 to page file, num_records: 0 num_tombstone_records: 0 num_dealloc_pages: 0 num_skip_pages: 0 data_size: 0
    [2022-11-22T04:42:49Z INFO  photondb::page_store::write_buffer] Seal write buffer 2642, allocated 0 bytes, usage 0.0000
    [2022-11-22T04:42:49Z INFO  photondb::page_store::buffer_set] Stalling writes because we have 8 sealed write buffers (wait for flush)
    [2022-11-22T04:42:49Z INFO  photondb::page_store::page_file::file_builder] finish flush file: 2635
    [2022-11-22T04:42:49Z INFO  photondb::page_store::jobs::flush] Flush output file 2635 with 56 bytes, 0 active pages, 0 dealloc pages, lasted 835919 microseconds
    [2022-11-22T04:42:49Z INFO  photondb::page_store::jobs::flush] Flush write buffer 2636 to page file, num_records: 0 num_tombstone_records: 0 num_dealloc_pages: 0 num_skip_pages: 0 data_size: 0
    [2022-11-22T04:42:50Z INFO  photondb::page_store::write_buffer] Seal write buffer 2643, allocated 0 bytes, usage 0.0000
    [2022-11-22T04:42:50Z INFO  photondb::page_store::buffer_set] Stalling writes because we have 8 sealed write buffers (wait for flush)
    [2022-11-22T04:42:50Z INFO  photondb::page_store::page_file::file_builder] finish flush file: 2636
    [2022-11-22T04:42:50Z INFO  photondb::page_store::jobs::flush] Flush output file 2636 with 56 bytes, 0 active pages, 0 dealloc pages, lasted 850552 microseconds
    [2022-11-22T04:42:50Z INFO  photondb::page_store::jobs::flush] Flush write buffer 2637 to page file, num_records: 0 num_tombstone_records: 0 num_dealloc_pages: 0 num_skip_pages: 0 data_size: 0
    [2022-11-22T04:42:51Z INFO  photondb::page_store::write_buffer] Seal write buffer 2644, allocated 0 bytes, usage 0.0000
    [2022-11-22T04:42:51Z INFO  photondb::page_store::buffer_set] Stalling writes because we have 8 sealed write buffers (wait for flush)
    [2022-11-22T04:42:51Z INFO  photondb::page_store::page_file::file_builder] finish flush file: 2637
    [2022-11-22T04:42:51Z INFO  photondb::page_store::jobs::flush] Flush output file 2637 with 56 bytes, 0 active pages, 0 dealloc pages, lasted 1130855 microseconds
    [2022-11-22T04:42:51Z INFO  photondb::page_store::jobs::flush] Flush write buffer 2638 to page file, num_records: 0 num_tombstone_records: 0 num_dealloc_pages: 0 num_skip_pages: 0 data_size: 0
    [2022-11-22T04:43:09Z INFO  photondb::page_store::write_buffer] Seal write buffer 2645, allocated 0 bytes, usage 0.0000
    [2022-11-22T04:43:09Z INFO  photondb::page_store::buffer_set] Stalling writes because we have 8 sealed write buffers (wait for flush)
    [2022-11-22T04:43:09Z INFO  photondb::page_store::page_file::file_builder] finish flush file: 2638
    [2022-11-22T04:43:09Z INFO  photondb::page_store::jobs::flush] Flush output file 2638 with 56 bytes, 0 active pages, 0 dealloc pages, lasted 17724840 microseconds
    [2022-11-22T04:43:09Z INFO  photondb::page_store::jobs::flush] Flush write buffer 2639 to page file, num_records: 0 num_tombstone_records: 0 num_dealloc_pages: 0 num_skip_pages: 0 data_size: 0
    [2022-11-22T04:43:11Z INFO  photondb::page_store::write_buffer] Seal write buffer 2646, allocated 0 bytes, usage 0.0000
    [2022-11-22T04:43:11Z INFO  photondb::page_store::buffer_set] Stalling writes because we have 8 sealed write buffers (wait for flush)
    [2022-11-22T04:43:11Z INFO  photondb::page_store::page_file::file_builder] finish flush file: 2639
    [2022-11-22T04:43:11Z INFO  photondb::page_store::jobs::flush] Flush output file 2639 with 56 bytes, 0 active pages, 0 dealloc pages, lasted 2502016 microseconds
    [2022-11-22T04:43:11Z INFO  photondb::page_store::jobs::flush] Flush write buffer 2640 to page file, num_records: 0 num_tombstone_records: 0 num_dealloc_pages: 0 num_skip_pages: 0 data_size: 0
    [2022-11-22T04:43:27Z INFO  photondb::page_store::page_file::file_builder] finish flush file: 2640
    [2022-11-22T04:43:29Z INFO  photondb::page_store::jobs::flush] Flush output file 2640 with 56 bytes, 0 active pages, 0 dealloc pages, lasted 18111455 microseconds
    [2022-11-22T04:43:32Z INFO  photondb::page_store::write_buffer] Seal write buffer 2647, allocated 0 bytes, usage 0.0000
    [2022-11-22T04:43:32Z INFO  photondb::page_store::buffer_set] Stalling writes because we have 8 sealed write buffers (wait for flush)
    [2022-11-22T04:43:38Z INFO  photondb::page_store::jobs::flush] Flush write buffer 2641 to page file, num_records: 0 num_tombstone_records: 0 num_dealloc_pages: 0 num_skip_pages: 0 data_size: 0
    [2022-11-22T04:43:41Z INFO  photondb::page_store::page_file::file_builder] finish flush file: 2641
    
    opened by zojw 2
  • segmentation fault in SortedPageIter

    segmentation fault in SortedPageIter

    (gdb) bt
    #0  __memcmp_avx2_movbe () at ../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S:248
    #1  0x00005632e6470eb6 in <u8 as core::slice::cmp::SliceOrd>::compare (left=..., right=...) at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/slice/cmp.rs:198
    #2  0x00005632e647107d in core::slice::cmp::<impl core::cmp::Ord for [T]>::cmp (self=..., other=...) at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/slice/cmp.rs:41
    #3  0x00005632e63477c2 in <photondb::page::data::Key as core::cmp::Ord>::cmp (self=0x7fe0380182b0, other=0x7fe038018248) at photondb/src/page/data.rs:18
    #4  0x00005632e6379842 in <photondb::page::iter::OrderedIter<I> as core::cmp::Ord>::cmp (self=0x7fe038018270, other=0x7fe038018208) at photondb/src/page/iter.rs:158
    #5  0x00005632e63798f4 in <photondb::page::iter::OrderedIter<I> as core::cmp::PartialOrd>::partial_cmp (self=0x7fe038018270, other=0x7fe038018208) at photondb/src/page/iter.rs:176
    #6  0x00005632e6378ec4 in core::cmp::PartialOrd::le (self=0x7fe038018270, other=0x7fe038018208) at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/cmp.rs:1139
    #7  0x00005632e6387551 in <core::cmp::Reverse<T> as core::cmp::PartialOrd>::le (self=0x7fe038018208, other=0x7fe038018270) at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/cmp.rs:616
    #8  0x00005632e63896dc in core::cmp::impls::<impl core::cmp::PartialOrd<&B> for &A>::le () at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/alloc/src/collections/binary_heap.rs:598
    #9  alloc::collections::binary_heap::BinaryHeap<T>::sift_down_range (self=0x7fe051841bc0, pos=0, end=63) at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/alloc/src/collections/binary_heap.rs:598
    #10 0x00005632e638a3ad in alloc::collections::binary_heap::BinaryHeap<T>::sift_down (self=0x7fe051841bc0, pos=0) at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/alloc/src/collections/binary_heap.rs:628
    #11 0x00005632e635b00b in <alloc::collections::binary_heap::PeekMut<T> as core::ops::drop::Drop>::drop (self=0x7fe0518407d8)
        at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/alloc/src/collections/binary_heap.rs:297
    #12 0x00005632e63543db in core::ptr::drop_in_place<alloc::collections::binary_heap::PeekMut<core::cmp::Reverse<photondb::page::iter::OrderedIter<photondb::page::sorted_page::SortedPageIter<photondb::page::data::Key,photondb::page::data::Value>>>>> () at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/ptr/mod.rs:487
    #13 0x00005632e637a49c in <photondb::page::iter::MergingIter<I> as core::iter::traits::iterator::Iterator>::next (self=0x7fe051841bc0) at photondb/src/page/iter.rs:260
    #14 0x00005632e63a14f9 in <photondb::tree::page::MergingPageIter<K,V> as core::iter::traits::iterator::Iterator>::next (self=0x7fe051841bc0) at photondb/src/tree/page.rs:93
    #15 0x00005632e63a118a in <&mut I as core::iter::traits::iterator::Iterator>::next (self=0x7fe051840978) at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/iter/traits/iterator.rs:3878
    #16 0x00005632e6380bcb in <photondb::tree::page::MergingLeafPageIter as core::iter::traits::iterator::Iterator>::next (self=0x7fe051841bc0) at photondb/src/tree/page.rs:156
    #17 0x00005632e63a11ba in <&mut I as core::iter::traits::iterator::Iterator>::next (self=0x7fe051840b80) at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/iter/traits/iterator.rs:3878
    #18 0x00005632e633d2fd in photondb::page::sorted_page::SortedPageBuilder<I>::build (self=..., page=0x7fe051841b30) at photondb/src/page/sorted_page.rs:72
    #19 0x00005632e5ffa9d6 in photondb::tree::TreeTxn<E>::consolidate_page_impl::{{closure}} () at photondb/src/tree/mod.rs:608
    #20 0x00005632e60308b3 in <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll (self=..., cx=0x7fe05184fd68)
        at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/future/mod.rs:91
    #21 0x00005632e5ff779b in photondb::tree::TreeTxn<E>::consolidate_page::{{closure}} () at photondb/src/tree/mod.rs:582
    #22 0x00005632e6030433 in <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll (self=..., cx=0x7fe05184fd68)
        at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/future/mod.rs:91
    #23 0x00005632e5ffcd58 in photondb::tree::TreeTxn<E>::consolidate_and_restructure_page::{{closure}} () at photondb/src/tree/mod.rs:685
    #24 0x00005632e602a3cf in <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll (self=..., cx=0x7fe05184fd68)
        at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/future/mod.rs:91
    #25 0x00005632e6000455 in photondb::tree::TreeTxn<E>::try_write::{{closure}} () at photondb/src/tree/mod.rs:182
    #26 0x00005632e602bd5f in <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll (self=..., cx=0x7fe05184fd68)
        at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/future/mod.rs:91
    #27 0x00005632e5ffe22a in photondb::tree::TreeTxn<E>::write::{{closure}} () at photondb/src/tree/mod.rs:131
    #28 0x00005632e60300df in <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll (self=..., cx=0x7fe05184fd68)
        at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/future/mod.rs:91
    #29 0x00005632e5feced1 in photondb::raw::table::Table<E>::put::{{closure}} () at photondb/src/raw/table.rs:74
    #30 0x00005632e602f33c in <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll (self=..., cx=0x7fe05184fd68)
        at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/future/mod.rs:91
    #31 0x00005632e614e5e3 in photondb_tools::stress::write_task::{{closure}} () at photondb-tools/src/stress/mod.rs:289
    #32 0x00005632e602f7fc in <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll (self=..., cx=0x7fe05184fd68)
    --Type <RET> for more, q to quit, c to continue without paging--
        at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/future/mod.rs:91
    #33 0x00005632e60b7e7d in photonio_uring::task::raw::poll::{{closure}} () at /home/robi/.cargo/git/checkouts/photonio-4d2b22930312944b/8aab9a4/photonio-uring/src/task/raw.rs:209
    #34 0x00005632e5fee233 in <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once (self=..., _args=())
        at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/panic/unwind_safe.rs:271
    #35 0x00005632e601b5cd in std::panicking::try::do_call (data=0x7fe05184fc70 "0 ") at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/std/src/panicking.rs:464
    #36 0x00005632e601c2ab in __rust_try ()
    #37 0x00005632e601ac15 in std::panicking::try (f=...) at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/std/src/panicking.rs:428
    #38 0x00005632e60b97fa in std::panic::catch_unwind (f=...) at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/std/src/panic.rs:137
    #39 0x00005632e60b7b19 in photonio_uring::task::raw::poll (head=0x7fe05184fee8) at /home/robi/.cargo/git/checkouts/photonio-4d2b22930312944b/8aab9a4/photonio-uring/src/task/raw.rs:209
    #40 0x00005632e64862b0 in photonio_uring::task::raw::Head::poll (self=0x7fe04c001ff0, this=0x7fe05184fee8) at src/task/raw.rs:30
    #41 0x00005632e647870a in photonio_uring::task::Task::poll (self=0x7fe05184fee8) at src/task/mod.rs:61
    #42 0x00005632e647ca1d in photonio_uring::runtime::worker::Local::poll (self=0x7fe0518502b8) at src/runtime/worker.rs:82
    #43 0x00005632e647c110 in photonio_uring::runtime::worker::Local::run (self=0x7fe0518502b8) at src/runtime/worker.rs:53
    #44 0x00005632e647d7a3 in photonio_uring::runtime::worker::enter::{{closure}} () at src/runtime/worker.rs:154
    #45 0x00005632e6479b04 in scoped_tls::ScopedKey<T>::set (self=0x5632e6841e58 <photonio_uring::runtime::worker::CURRENT>, t=0x7fe0518502b8, f=...)
        at /home/robi/.cargo/registry/src/mirrors.tuna.tsinghua.edu.cn-df7c3c540f42cdbd/scoped-tls-1.0.1/src/lib.rs:137
    #46 0x00005632e647d735 in photonio_uring::runtime::worker::enter (local=<error reading variable: Cannot access memory at address 0x7fdfc4532c74>) at src/runtime/worker.rs:154
    #47 0x00005632e647d6a3 in photonio_uring::runtime::worker::Worker::launch::{{closure}} () at src/runtime/worker.rs:129
    #48 0x00005632e6485373 in std::sys_common::backtrace::__rust_begin_short_backtrace (f=...) at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/std/src/sys_common/backtrace.rs:122
    #49 0x00005632e6479684 in std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}} () at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/std/src/thread/mod.rs:514
    #50 0x00005632e6485334 in <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once (self=..., _args=())
        at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/panic/unwind_safe.rs:271
    #51 0x00005632e647e379 in std::panicking::try::do_call (data=0x7fe051851140 "\005") at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/std/src/panicking.rs:464
    #52 0x00005632e647ec5b in __rust_try ()
    #53 0x00005632e647e1e3 in std::panicking::try (f=...) at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/std/src/panicking.rs:428
    #54 0x00005632e64814b0 in std::panic::catch_unwind (f=...) at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/std/src/panic.rs:137
    #55 0x00005632e6479446 in std::thread::Builder::spawn_unchecked_::{{closure}} () at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/std/src/thread/mod.rs:513
    #56 0x00005632e647523f in core::ops::function::FnOnce::call_once{{vtable-shim}} () at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/ops/function.rs:251
    #57 0x00005632e662b773 in <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once () at library/alloc/src/boxed.rs:1938
    #58 <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once () at library/alloc/src/boxed.rs:1938
    #59 std::sys::unix::thread::Thread::new::thread_start () at library/std/src/sys/unix/thread.rs:108
    #60 0x00007fe0527a4609 in start_thread (arg=<optimized out>) at pthread_create.c:477
    #61 0x00007fe052574133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
    

    git hash: 8140ae7b796b375302502e9a9d024a4041f4a83f workload: stress

    opened by zojw 2
  • page_store: skip dealloc_page_addr that point to inactive files during flush

    page_store: skip dealloc_page_addr that point to inactive files during flush

    It seems we can skip dealloc page hint that point to already cleaned file when flush, but I not very sure about is any other question lead to those "dealloc page hint" produced

    ptal @w41ter

    opened by zojw 2
  • panicked at 'assertion failed: !score.is_nan()'

    panicked at 'assertion failed: !score.is_nan()'

    thread 'photonio-worker/6' panicked at 'assertion failed: !score.is_nan()', src/page_store/strategy/mod.rs:54:9
    stack backtrace:
       0: rust_begin_unwind
                 at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/std/src/panicking.rs:556:5
       1: core::panicking::panic_fmt
                 at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/panicking.rs:142:14
       2: core::panicking::panic
                 at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/panicking.rs:48:5
       3: <photondb::page_store::strategy::MinDeclineRateStrategy as photondb::page_store::strategy::GcPickStrategy>::collect
                 at /home/robi/Code/rust/photondb/src/page_store/strategy/mod.rs:54:9
       4: photondb::page_store::jobs::gc::GcCtx<E,R>::gc::{{closure}}
                 at /home/robi/Code/rust/photondb/src/page_store/jobs/gc.rs:83:13
       5: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
                 at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/future/mod.rs:91:19
       6: photondb::page_store::jobs::gc::GcCtx<E,R>::run::{{closure}}
                 at /home/robi/Code/rust/photondb/src/page_store/jobs/gc.rs:65:30
       7: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
                 at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/future/mod.rs:91:19
       8: photonio_uring::task::raw::poll::{{closure}}
                 at /home/robi/.cargo/git/checkouts/photonio-4d2b22930312944b/fa68ae8/photonio-uring/src/task/raw.rs:202:65
       9: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
                 at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/core/src/panic/unwind_safe.rs:271:9
      10: std::panicking::try::do_call
                 at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/std/src/panicking.rs:464:40
      11: std::panicking::try
                 at /rustc/8ce3204af9463db3192ea1eb31c45c2f6d4b5ae6/library/std/src/panicking.rs:428:19
    
    opened by zojw 2
  • unit test failed `buffer_set_write_buffer_switch_release`

    unit test failed `buffer_set_write_buffer_switch_release`

    failures:
    
    ---- page_store::version::tests::buffer_set_write_buffer_switch_release stdout ----
    thread 'page_store::version::tests::buffer_set_write_buffer_switch_release' panicked at 'assertion failed: `(left == right)`
      left: `3`,
     right: `1`', src/page_store/version.rs:590:9
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
    
    opened by huachaohuang 2
  • panicked at '`async fn` resumed after completion', src/page_store/jobs/gc.rs:63:61

    panicked at '`async fn` resumed after completion', src/page_store/jobs/gc.rs:63:61

    it's unstable and produce panic, mark it for later investigation

    thread 'photonio-worker/0' panicked at 'async fn resumed after completion', /photondb/src/page_store/jobs/gc.rs:63:61

    rust_panic (@rust_panic:7)
    std::panicking::rust_panic_with_hook (@std::panicking::rust_panic_with_hook:112)
    std::panicking::begin_panic_handler::{{closure}} (@std::panicking::begin_panic_handler::{{closure}}:40)
    std::sys_common::backtrace::__rust_end_short_backtrace (@std::sys_common::backtrace::__rust_end_short_backtrace:10)
    rust_begin_unwind (@rust_begin_unwind:30)
    core::panicking::panic_fmt (@core::panicking::panic_fmt:13)
    core::panicking::panic (@core::panicking::panic:16)
    photondb::page_store::jobs::gc::GcCtx<E,R>::run::{{closure}} (/home/robi/Code/rust/photondb/src/page_store/jobs/gc.rs:63)
    <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll (@<core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll:22)
    photonio_uring::task::raw::poll::{{closure}} (/home/robi/.cargo/git/checkouts/photonio-4d2b22930312944b/3298981/photonio-uring/src/task/raw.rs:205)
    <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once (@<core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once:7)
    std::panicking::try::do_call (@std::panicking::try::do_call:29)
    __rust_try (@__rust_try:10)
    std::panicking::try (@std::panicking::try:22)
    std::panic::catch_unwind (@std::panic::catch_unwind:9)
    photonio_uring::task::raw::poll (/home/robi/.cargo/git/checkouts/photonio-4d2b22930312944b/3298981/photonio-uring/src/task/raw.rs:205)
    photonio_uring::task::raw::Head::poll (/home/robi/.cargo/git/checkouts/photonio-4d2b22930312944b/3298981/photonio-uring/src/task/raw.rs:30)
    photonio_uring::task::Task::poll (/home/robi/.cargo/git/checkouts/photonio-4d2b22930312944b/3298981/photonio-uring/src/task/mod.rs:61)
    photonio_uring::runtime::worker::Local::poll (/home/robi/.cargo/git/checkouts/photonio-4d2b22930312944b/3298981/photonio-uring/src/runtime/worker.rs:82)
    photonio_uring::runtime::worker::Local::run (/home/robi/.cargo/git/checkouts/photonio-4d2b22930312944b/3298981/photonio-uring/src/runtime/worker.rs:53)
    photonio_uring::runtime::worker::enter::{{closure}} (/home/robi/.cargo/git/checkouts/photonio-4d2b22930312944b/3298981/photonio-uring/src/runtime/worker.rs:154)
    scoped_tls::ScopedKey<T>::set (/home/robi/.cargo/registry/src/mirrors.tuna.tsinghua.edu.cn-df7c3c540f42cdbd/scoped-tls-1.0.0/src/lib.rs:137)
    photonio_uring::runtime::worker::enter (/home/robi/.cargo/git/checkouts/photonio-4d2b22930312944b/3298981/photonio-uring/src/runtime/worker.rs:154)
    photonio_uring::runtime::worker::Worker::launch::{{closure}} (/home/robi/.cargo/git/checkouts/photonio-4d2b22930312944b/3298981/photonio-uring/src/runtime/worker.rs:129)
    std::sys_common::backtrace::__rust_begin_short_backtrace (@std::sys_common::backtrace::__rust_begin_short_backtrace:10)
    std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}} (@std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}:10)
    <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once (@<core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once:10)
    std::panicking::try::do_call (@std::panicking::try::do_call:34)
    __rust_try (@__rust_try:10)
    std::panicking::try (@std::panicking::try:27)
    std::panic::catch_unwind (@std::panic::catch_unwind:12)
    std::thread::Builder::spawn_unchecked_::{{closure}} (@std::thread::Builder::spawn_unchecked_::{{closure}}:83)
    core::ops::function::FnOnce::call_once{{vtable.shim}} (@core::ops::function::FnOnce::call_once{{vtable.shim}}:6)
    <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once (@std::sys::unix::thread::Thread::new::thread_start:15)
    <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once (@std::sys::unix::thread::Thread::new::thread_start:13)
    std::sys::unix::thread::Thread::new::thread_start (@std::sys::unix::thread::Thread::new::thread_start:11)
    start_thread (@start_thread:48)
    __clone (@__clone:26)
    
    question 
    opened by zojw 2
  • MS patents

    MS patents

    Hi guys. Are you aware of this patent https://patents.google.com/patent/US20130346725A1/en ? Seems like the algorithm which you are trying to implement is owned by MS and the patent is still active.

    opened by laa 2
  • lack of LockFile to avoid reopen the same path at the same time

    lack of LockFile to avoid reopen the same path at the same time

    Open the same path twice cause manifest Corrupted

    [INFO  photondb::page_store::manifest] roll manifest to "/tmp/cmp_test/MANIFEST_1"
    [INFO  photondb::raw::tree_test] open table
    [INFO  photondb::page_store::manifest] clean obsolete manifest: "/tmp/cmp_test/MANIFEST_1" , None
    [INFO  photondb::page_store::manifest] set manifest to 1
    thread 'photonio-worker/0' panicked at 'open manifest fail 1: Os { code: 2, kind: NotFound, message: "No such file or directory" }', photondb/src/page_store/manifest.rs:174:18
    
    opened by zojw 0
  • Some thoughts on PageTxn#alloc_page's contention

    Some thoughts on PageTxn#alloc_page's contention

    Some thoughts on PageTxn#alloc_page's contention.

    Right now, every Table#put is using PageTxn#alloc_page.

    And all of them eventually is using WriteBuffer#alloc_size which has a contention on BufferState. This design ends up all write request are serialized.

    The problem is that WritBuffer is used both as a Cache Layer and Flush Buffer in the LSS layer in the paper.

    I am not sure if this is in the plan, but it seems that we should:

    1. Separate WriteBuffer into Cache Layer and Flush Buffer.
    2. Copy data from Cache Layer into Flush Buffer when "flush" is needed (SMO change, user requested flush if we have the API, or background flush).
    3. Normal write goes to Cache Layer directly without going through the Flush Buffer.
    opened by ming535 5
  • some random thought on add RMW

    some random thought on add RMW

    I think it's useful to support RMW (Read Modify Write) for test purpose. The API could look like this:

        pub async fn rmw(&self, key: &[u8], lsn: u64, value: &[u8], f: fn(&[u8], Option<&[u8]>) -> Option<[u8]>) -> Result<()>;
    

    The f is a function that takes the old key value and returns the new value. rmw will set the new value for this key+lsn.

    The purpose of this API is mainly for testing the "Atomicity" of the tree operation. We can enable it under special feature flag/test mode.

    To test the "Atomicity" of concurrent tree operation, we can devise a client using this API.

    For example, we can have concurrent written the tree with the "value" layout/encoded like this (using the same key and lsn).

        last_written_thread_id,[thread_id:checksum],[thread_id:checksum]..., [thead_id,checksum]
    

    Each thread rmw to the same "key" and modifies different part ("thread_id:checksum") of the "value".

    The f for each thread:

    1. calculates the checksum of the whole "value" put it into its own slot ([thread_id:checksum])
    2. and change the "last_written_thread_id" to itself.

    Another thread read this "key":

    1. use the "last_written_thread_id" to find the "thread_id:checksum"
    2. calculate the checksum checker_checksum using the same f
    3. verify if the checker_checksum is the same as the checksum in "thread_id:checksum".

    We can also add some random garbage into the "value" make the page more likely to split (The current implementation seems to split a page on the size of the page rather the number of entries in the page).

    opened by ming535 1
  • Implement sqlite varint encoding for the page format

    Implement sqlite varint encoding for the page format

    I think the sqlite varint encoding is better than the protobuf one. We can introduce it for the page format in the future.

    https://github.com/mohae/sqlite4-varint-bench

    opened by huachaohuang 1
ForestDB - A Fast Key-Value Storage Engine Based on Hierarchical B+-Tree Trie

ForestDB is a key-value storage engine developed by Couchbase Caching and Storage Team, and its main index structure is built from Hierarchic

null 1.2k Dec 26, 2022
An LSM storage engine designed to significantly reduce I/O amplification written in safe rust (Under active development)

VelarixDB is an LSM-based storage engine designed to significantly reduce IO amplification, resulting in better performance and durability for storage

gifted_dl 14 Sep 25, 2024
High-performance, lock-free local and concurrent object memory pool with automated allocation, cleanup, and verification.

Opool: Fast lock-free concurrent and local object pool Opool is a high-performance Rust library that offers a concurrent and local object pool impleme

Khashayar Fereidani 8 Jun 3, 2023
High performance and distributed KV store w/ REST API. 🦀

About Lucid KV High performance and distributed KV store w/ REST API. ?? Introduction Lucid is an high performance, secure and distributed key-value s

Lucid ᵏᵛ 306 Dec 28, 2022
The rust client for CeresDB. CeresDB is a high-performance, distributed, schema-less, cloud native time-series database that can handle both time-series and analytics workloads.

The rust client for CeresDB. CeresDB is a high-performance, distributed, schema-less, cloud native time-series database that can handle both time-series and analytics workloads.

null 12 Nov 18, 2022
A prototype of a high-performance KV database built with Rust.

async-redis A prototype of a high-performance KV database built with Rust. Author: 3andero 11/10/2021 Overview The project starts as a fork of mini-re

null 3 Nov 29, 2022
Quick Pool: High Performance Rust Async Resource Pool

Quick Pool High Performance Rust Async Resource Pool Usage DBCP Database Backend Adapter Version PostgreSQL tokio-postgres qp-postgres Example use asy

Seungjae Park 13 Aug 23, 2022
🔥🌲 High-performance Merkle key/value store

merk High-performance Merkle key/value store Merk is a crypto key/value store - more specifically, it's a Merkle AVL tree built on top of RocksDB (Fac

Nomic 189 Dec 13, 2022
Rust High Performance compile-time ORM(RBSON based)

WebSite | 简体中文 | Showcase | 案例 A highly Performant,Safe,Dynamic SQL(Compile time) ORM framework written in Rust, inspired by Mybatis and MybatisPlus.

rbatis 1.7k Jan 7, 2023
A high-performance, distributed, schema-less, cloud native time-series database

CeresDB is a high-performance, distributed, schema-less, cloud native time-series database that can handle both time-series and analytics workloads.

null 1.8k Dec 30, 2022
ReadySet is a lightweight SQL caching engine written in Rust that helps developers enhance the performance and scalability of existing applications.

ReadySet is a SQL caching engine designed to help developers enhance the performance and scalability of their existing database-backed applications. W

ReadySet 1.7k Jan 8, 2023
X-Engine: A SQL Engine built from scratch in Rust.

XNGIN (pronounced "X Engine") This is a personal project to build a SQL engine from scratch. The project name is inspired by Nginx, which is a very po

Jiang Zhe 111 Dec 15, 2022
Zenith substitutes PostgreSQL storage layer and redistributes data across a cluster of nodes

Zenith substitutes PostgreSQL storage layer and redistributes data across a cluster of nodes

null 5.7k Jan 6, 2023
Distributed, version controlled, SQL database with cryptographically verifiable storage, queries and results. Think git for postgres.

SDB - SignatureDB Distributed, version controlled, SQL database with cryptographically verifiable storage, queries and results. Think git for postgres

Fremantle Industries 5 Apr 26, 2022
Appendable and iterable key/list storage, backed by S3, written in rust

klstore Appendable and iterable key/list storage, backed by S3. General Overview Per key, a single writer appends to underlying storage, enabling many

Eric Thill 3 Sep 29, 2022
PRQL is a modern language for transforming data — a simpler and more powerful SQL

PRQL Pipelined Relational Query Language, pronounced "Prequel". PRQL is a modern language for transforming data — a simpler and more powerful SQL. Lik

PRQL 6.5k Jan 5, 2023
LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values. Authors: Sanjay Ghem

Google 31.5k Jan 1, 2023
A Key-Value data storage system. - dorea db

Dorea DB ?? Dorea is a key-value data storage system. It is based on the Bitcask storage model Documentation | Crates.io | API Doucment 简体中文 | English

ZhuoEr Liu 112 Dec 2, 2022
Forage is for Storage

Forage Forage is for Storage Remote storage: Open storage channels to a remote storage provider over Tor Lightweight: Platform-optimized using Blake3-

FuzzrNet 11 Dec 26, 2022