cryo is the easiest way to extract blockchain data to parquet, csv, or json

Overview

โ„๏ธ ๐ŸงŠ cryo ๐ŸงŠ โ„๏ธ

Rust

cryo is the easiest way to extract blockchain data to parquet, csv, or json

cryo is also extremely flexible, with many different options to control how data is extracted + filtered + formatted

cryo is an early WIP, please report bugs + feedback to the issue tracker

note that cryo's default settings will slam a node too hard for use with 3rd party RPC providers. Instead, --requests-per-second and --max-concurrent-requests should be used to impose ratelimits. Such settings will be handled automatically in a future release.

Example Usage

use as cryo <dataset> [OPTIONS]

Example Command
Extract all logs from block 16,000,000 to block 17,000,000 cryo logs -b 16M:17M
Extract blocks, logs, or traces missing from current directory cryo blocks txs traces
Extract to csv instead of parquet cryo blocks txs traces --csv
Extract only certain columns cryo blocks --include number timestamp
Dry run to view output schemas or expected work cryo storage_diffs --dry
Extract all USDC events cryo logs --contract 0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48

cryo uses ETH_RPC_URL env var as the data source unless --rpc <url> is given

Datasets

cryo can extract the following datasets from EVM nodes:

  • blocks
  • transactions (alias = txs)
  • logs (alias = events)
  • traces (alias = call_traces)
  • state_diffs (alias for storage_diffs + balance_diff + nonce_diffs + code_diffs)
  • balance_diffs
  • code_diffs
  • storage_diffs
  • nonce_diffs
  • vm_traces (alias = opcode_traces)

Installation

Method 1: install from source

git clone https://github.com/paradigmxyz/cryo
cd cryo
cargo install --path ./crates/cli

This method requires having rust installed. See rustup for instructions.

Method 2: install from crates.io

cargo install cryo_cli

This method requires having rust installed. See rustup for instructions.

Make sure that ~/.cargo/bin is on your PATH. One way to do this is by adding the line export PATH="$HOME/.cargo/bin:$PATH" to your ~/.bashrc or ~/.profile.

Data Schema

Many cryo cli options will affect output schemas by adding/removing columns or changing column datatypes.

cryo will always print out data schemas before collecting any data. To view these schemas without collecting data, use --dry to perform a dry run.

JSON-RPC

cryo currently obtains all of its data using the JSON-RPC protocol standard.

dataset blocks per request results per block method
Blocks 1 1 eth_getBlockByNumber
Transactions 1 multiple eth_getBlockByNumber
Logs multiple multiple eth_getLogs
Traces 1 multiple trace_block
State Diffs 1 multiple trace_replayBlockTransactions
Vm Traces 1 multiple trace_replayBlockTransactions

cryo use ethers.rs to perform JSON-RPC requests, so it can be used any chain that ethers-rs is compatible with. This includes Ethereum, Optimism, Arbitrum, Polygon, BNB, and Avalanche.

A future version of cryo will be able to bypass JSON-RPC and query node data directly.

CLI Options

output of cryo --help:

cryo extracts blockchain data to parquet, csv, or json

Usage: cryo [OPTIONS] <DATATYPE>...

Arguments:
  <DATATYPE>...  datatype(s) to collect, one or more of:
                 - blocks
                 - transactions  (alias = txs)
                 - logs          (alias = events)
                 - traces        (alias = call_traces)
                 - state_diffs   (= balance + code + nonce + storage diffs)
                 - balance_diffs
                 - code_diffs
                 - nonce_diffs
                 - storage_diffs
                 - vm_traces     (alias = opcode_traces)

Options:
  -h, --help     Print help
  -V, --version  Print version

Content Options:
  -b, --blocks <BLOCKS>              Block numbers, see syntax below [default: 0:latest]
  -a, --align                        Align block chunk boundaries to regular intervals
                                     e.g. (1000, 2000, 3000) instead of (1106, 2106, 3106)
      --reorg-buffer <N_BLOCKS>      Reorg buffer, save blocks only when they are this old,
                                     can be a number of blocks [default: 0]
  -i, --include-columns [<COLS>...]  Columns to include alongside the default output
  -e, --exclude-columns [<COLS>...]  Columns to exclude from the default output
      --columns [<COLS>...]          Use these columns instead of the default
      --hex                          Use hex string encoding for binary columns
  -s, --sort [<SORT>...]             Columns(s) to sort by

Source Options:
  -r, --rpc <RPC>                    RPC url [default: ETH_RPC_URL env var]
      --network-name <NETWORK_NAME>  Network name [default: use name of eth_getChainId]

Acquisition Options:
  -l, --requests-per-second <limit>  Ratelimit on requests per second
      --max-concurrent-requests <M>  Global number of concurrent requests
      --max-concurrent-chunks <M>    Number of chunks processed concurrently
      --max-concurrent-blocks <M>    Number blocks within a chunk processed concurrently
  -d, --dry                          Dry run, collect no data

Output Options:
  -c, --chunk-size <CHUNK_SIZE>      Number of blocks per file [default: 1000]
      --n-chunks <N_CHUNKS>          Number of files (alternative to --chunk-size)
  -o, --output-dir <OUTPUT_DIR>      Directory for output files [default: .]
      --file-suffix <FILE_SUFFIX>    Suffix to attach to end of each filename
      --overwrite                    Overwrite existing files instead of skipping them
      --csv                          Save as csv instead of parquet
      --json                         Save as json instead of parquet
      --row-group-size <GROUP_SIZE>  Number of rows per row group in parquet file
      --n-row-groups <N_ROW_GROUPS>  Number of rows groups in parquet file
      --no-stats                     Do not write statistics to parquet files
      --compression <NAME [#]>...    Set compression algorithm and level [default: lz4]

Dataset-specific Options:
      --contract <CONTRACT>          [logs] filter logs by contract address
      --topic0 <TOPIC0>              [logs] filter logs by topic0 [aliases: event]
      --topic1 <TOPIC1>              [logs] filter logs by topic1
      --topic2 <TOPIC2>              [logs] filter logs by topic2
      --topic3 <TOPIC3>              [logs] filter logs by topic3
      --log-request-size <N_BLOCKS>  [logs] Number of blocks per log request [default: 1]


Block specification syntax
- can use numbers                    --blocks 5000 6000 7000
- can use ranges                     --blocks 12M:13M 15M:16M
- numbers can contain { _ . K M B }  5_000 5K 15M 15.5M
- omiting range end means latest     15.5M: == 15.5M:latest
- omitting range start means 0       :700 == 0:700
- minus on start means minus end     -1000:7000 == 6000:7000
- plus sign on end means plus start  15M:+1000 == 15M:15.001K
Comments
  • Fix install command in docs

    Fix install command in docs

    Motivation

    โžœ  cryo git:(main) cargo install --path .
    error: found a virtual manifest at `/Users/erikreppel/dev/cryo/Cargo.toml` instead of a package manifest
    

    Install command in README fails

    Solution

    Install the cli directly

    โžœ  cryo git:(main) cargo install --path ./crates/cli 
      Installing cryo_cli v0.1.0 (/Users/erikreppel/dev/cryo/crates/cli)
        Updating crates.io index
      Downloaded byte-slice-cast v1.2.2
      Downloaded phf_macros v0.11.2
      Downloaded bytemuck v1.13.1
      ...
      Installing /Users/erikreppel/.cargo/bin/cryo
       Installed package `cryo_cli v0.1.0 (/Users/erikreppel/dev/cryo/crates/cli)` (executable `cryo`)
    โžœ  cryo git:(main) cryo
    error: the following required arguments were not provided:
      <DATATYPE>...
    
    Usage: cryo <DATATYPE>...
    
    For more information, try '--help'.
    

    PR Checklist

    • [ ] Added Tests
    • [ ] Added Documentation
    • [ ] Breaking changes
    opened by erikreppel 1
  • feat: add `rustfmt.toml`

    feat: add `rustfmt.toml`

    Introduces the same rustfmt settings as we use in Foundry, Reth, and Alloy, and formats the code.

    This config features some nice things like comment formatting, import "merging", and import reodering.

    opened by onbjerg 0
  • Tip: Update Rust to fix `send: no filter connected` error

    Tip: Update Rust to fix `send: no filter connected` error

    Run cargo install cryo_cli and got this error.

    error: failed to compile `cryo_cli v0.1.0`, intermediate artifacts can be found at `/var/folders/c7/s3bvshs91jzf3ck0vp8hs1640000gn/T/cargo-installwLikBg`
    
    Caused by:
      failed to download from `https://crates.io/api/v1/crates/zstd-safe/6.0.5+zstd.1.5.4/download`
    
    Caused by:
      [2] Failed initialization ([CONN-1-0] send: no filter connected)
    

    See related issue in Cargo https://github.com/rust-lang/cargo/issues/12202.

    Just rustup update fixes everything on my side.

    opened by p0n1 0
  • Gracefully handle ratelimits for 3rd party providers

    Gracefully handle ratelimits for 3rd party providers

    Is your feature request related to a problem? Please describe. Cryo can encounter errors when it encounters insufficient rate limits. It should handle these ratelimits gracefully without end users needing to worry about it.

    Describe the solution you'd like Ethers.rs has a variety of knobs for adjusting to such limits and failure modes. Cryo itself also has many knobs for controlling different concurrency behaviors. Better defaults and heuristics on these knobs should fix the problem.

    opened by sslivkoff 1
  • Use MockProvider for better tests, using reth as an example

    Use MockProvider for better tests, using reth as an example

    Is your feature request related to a problem? Please describe. Test coverage is currently poor and doesn't cover many failure modes.

    Describe the solution you'd like MockProvider should be used to create rich tests under a variety of response scenarios. Reth uses this approach extensively and can be used for inspiration.

    opened by sslivkoff 0
  • Bypass JSON RPC using Reth Database Bindings

    Bypass JSON RPC using Reth Database Bindings

    Is your feature request related to a problem? Please describe. JSON-RPC adds a large amount of overhead. Connecting to a node database directly could yield significant performance improvements.

    Describe the solution you'd like Reth is an obvious first node to integrate with given: it is also in rust, it provides db bindings, and it is performance-oriented.

    It can hook into the cryo codebase as alternative implementations of the collect methods in cryo_freeze::types::Dataset

    opened by sslivkoff 0
Owner
Paradigm
A research-driven technology investment firm.
Paradigm
Extract data from helium-programs via Solana RPC and serves it via HTTP

hnt-explorer This application extracts data from helium-programs via Solana RPC and serves it via HTTP. There are CLI commands meant to run and test t

Louis Thiery 3 May 4, 2023
A Minimalistic Rust library to extract all potential function selectors from EVM bytecode without source code.

EVM Hound A Minimalistic Rust library to extract all potential function selectors from EVM bytecode without source code. Installation $ cargo add evm_

null 34 Dec 3, 2023
A guide for Mozilla's developers and data scientists to analyze and interpret the data gathered by our data collection systems.

Mozilla Data Documentation This documentation was written to help Mozillians analyze and interpret data collected by our products, such as Firefox and

Mozilla 75 Dec 1, 2022
The Data Highway Substrate-based blockchain node.

DataHighway-Parachain, a parachain on the Polkadot network. Planned features include a decentralized LPWAN roaming hub for LoRaWAN IoT devices and network operator roaming agreements, participative mining, an inter-chain data market, and DAO governance.

DataHighway 11 Dec 2, 2022
The powerful analysis platform to explore and visualize data from blockchain.

Mars: The powerful analysis platform to explore and visualize data from Web3 Features Blazing Fast Create from scratch with Rust. Pipeline Processor E

DeepETH 66 Dec 17, 2022
A framework for creating PoC's for Solana Smart Contracts in a painless and intuitive way

Solana PoC Framework DISCLAIMER: any illegal usage of this framework is heavily discouraged. Most projects on Solana offer a more than generous bug bo

Neodyme 165 Dec 18, 2022
Fast way to test a Substrate Runtime via RPC (eg. PolkadotJS UI).

runstrate Fast way to test a Substrate Runtime via RPC (eg. PolkadotJS UI). Build & Run git clone https://github.com/arturgontijo/runstrate cd runstra

Artur Gontijo 3 May 9, 2023
Easy to use cryptographic framework for data protection: secure messaging with forward secrecy and secure data storage. Has unified APIs across 14 platforms.

Themis provides strong, usable cryptography for busy people General purpose cryptographic library for storage and messaging for iOS (Swift, Obj-C), An

Cossack Labs 1.6k Dec 30, 2022
Demonstrates Solana data account versioning used in supporting the Solana Cookbook article: Account Data Versioning

versioning-solana This repo demonstrates ONE rudimentary way to upgrade/migrate account data changes with solana program changes. What is data version

Frank V. Castellucci 6 Sep 30, 2022
reth-indexer reads directly from the reth db and indexes the data into a postgres database all decoded with a simple config file and no extra setup alongside exposing a API ready to query the data.

reth-indexer reth-indexer reads directly from the reth db and indexes the data into a postgres database all decoded with a simple config file and no e

Josh Stevens 306 Jul 12, 2023
A high performance blockchain kernel for enterprise users.

English | ็ฎ€ไฝ“ไธญๆ–‡ What is CITA CITA is a fast and scalable blockchain kernel for enterprises. CITA supports both native contract and EVM contract, by whi

CITAHub 1.3k Dec 22, 2022
The Nervos CKB is a public permissionless blockchain, and the layer 1 of Nervos network.

Nervos CKB - The Common Knowledge Base master develop About CKB CKB is the layer 1 of Nervos Network, a public/permissionless blockchain. CKB uses Pro

Nervos Network 1k Dec 30, 2022
The Phala Network Blockchain, pRuntime and the bridge.

Phala Blockchain Phala Network is a TEE-Blockchain hybrid architecture implementing Confidential Contract. This repo includes: node/: the main blockch

Phala Network 314 Jan 6, 2023
Substrate: The platform for blockchain innovators

Substrate ยท Substrate is a next-generation framework for blockchain innovation ?? . Trying it out Simply go to substrate.dev and follow the installati

Parity Technologies 7.7k Dec 30, 2022
An extensible open-source framework for creating private/permissioned blockchain applications

Exonum Status: Project info: Community: Exonum is an extensible open-source framework for creating blockchain applications. Exonum can be used to crea

Exonum 1.2k Jan 1, 2023
Local blockchain for Free TON DApp development and testing.

TON OS Startup Edition Local blockchain for Free TON DApp development and testing. Have a question? Get quick help in our channel: TON OS Startup Edit

TON Labs 35 Jan 2, 2023
A value transfer bridge between the Monero blockchain and the Secret Network.

Secret-Monero-Bridge A value transfer bridge between the Monero blockchain and the Secret Network. Proof-of-Concept Video Demonstration: https://ipfs.

null 28 Dec 7, 2022
C++ `std::unique_ptr` that represents each object as an NFT on the Ethereum blockchain

C++ `std::unique_ptr` that represents each object as an NFT on the Ethereum blockchain

null 1.9k Dec 28, 2022