โ๏ธ
๐ง
cryo
๐ง
โ๏ธ
cryo
is the easiest way to extract blockchain data to parquet, csv, or json
cryo
is also extremely flexible, with many different options to control how data is extracted + filtered + formatted
cryo
is an early WIP, please report bugs + feedback to the issue tracker
note that cryo
's default settings will slam a node too hard for use with 3rd party RPC providers. Instead, --requests-per-second
and --max-concurrent-requests
should be used to impose ratelimits. Such settings will be handled automatically in a future release.
Example Usage
use as cryo <dataset> [OPTIONS]
Example | Command |
---|---|
Extract all logs from block 16,000,000 to block 17,000,000 | cryo logs -b 16M:17M |
Extract blocks, logs, or traces missing from current directory | cryo blocks txs traces |
Extract to csv instead of parquet | cryo blocks txs traces --csv |
Extract only certain columns | cryo blocks --include number timestamp |
Dry run to view output schemas or expected work | cryo storage_diffs --dry |
Extract all USDC events | cryo logs --contract 0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48 |
cryo
uses ETH_RPC_URL
env var as the data source unless --rpc <url>
is given
Datasets
cryo can extract the following datasets from EVM nodes:
blocks
transactions
(alias =txs
)logs
(alias =events
)traces
(alias =call_traces
)state_diffs
(alias forstorage_diffs
+balance_diff
+nonce_diffs
+code_diffs
)balance_diffs
code_diffs
storage_diffs
nonce_diffs
vm_traces
(alias =opcode_traces
)
Installation
Method 1: install from source
git clone https://github.com/paradigmxyz/cryo
cd cryo
cargo install --path ./crates/cli
This method requires having rust installed. See rustup for instructions.
Method 2: install from crates.io
cargo install cryo_cli
This method requires having rust installed. See rustup for instructions.
Make sure that ~/.cargo/bin
is on your PATH
. One way to do this is by adding the line export PATH="$HOME/.cargo/bin:$PATH"
to your ~/.bashrc
or ~/.profile
.
Data Schema
Many cryo
cli options will affect output schemas by adding/removing columns or changing column datatypes.
cryo
will always print out data schemas before collecting any data. To view these schemas without collecting data, use --dry
to perform a dry run.
JSON-RPC
cryo
currently obtains all of its data using the JSON-RPC protocol standard.
dataset | blocks per request | results per block | method |
---|---|---|---|
Blocks | 1 | 1 | eth_getBlockByNumber |
Transactions | 1 | multiple | eth_getBlockByNumber |
Logs | multiple | multiple | eth_getLogs |
Traces | 1 | multiple | trace_block |
State Diffs | 1 | multiple | trace_replayBlockTransactions |
Vm Traces | 1 | multiple | trace_replayBlockTransactions |
cryo
use ethers.rs to perform JSON-RPC requests, so it can be used any chain that ethers-rs is compatible with. This includes Ethereum, Optimism, Arbitrum, Polygon, BNB, and Avalanche.
A future version of cryo
will be able to bypass JSON-RPC and query node data directly.
CLI Options
output of cryo --help
:
cryo extracts blockchain data to parquet, csv, or json
Usage: cryo [OPTIONS] <DATATYPE>...
Arguments:
<DATATYPE>... datatype(s) to collect, one or more of:
- blocks
- transactions (alias = txs)
- logs (alias = events)
- traces (alias = call_traces)
- state_diffs (= balance + code + nonce + storage diffs)
- balance_diffs
- code_diffs
- nonce_diffs
- storage_diffs
- vm_traces (alias = opcode_traces)
Options:
-h, --help Print help
-V, --version Print version
Content Options:
-b, --blocks <BLOCKS> Block numbers, see syntax below [default: 0:latest]
-a, --align Align block chunk boundaries to regular intervals
e.g. (1000, 2000, 3000) instead of (1106, 2106, 3106)
--reorg-buffer <N_BLOCKS> Reorg buffer, save blocks only when they are this old,
can be a number of blocks [default: 0]
-i, --include-columns [<COLS>...] Columns to include alongside the default output
-e, --exclude-columns [<COLS>...] Columns to exclude from the default output
--columns [<COLS>...] Use these columns instead of the default
--hex Use hex string encoding for binary columns
-s, --sort [<SORT>...] Columns(s) to sort by
Source Options:
-r, --rpc <RPC> RPC url [default: ETH_RPC_URL env var]
--network-name <NETWORK_NAME> Network name [default: use name of eth_getChainId]
Acquisition Options:
-l, --requests-per-second <limit> Ratelimit on requests per second
--max-concurrent-requests <M> Global number of concurrent requests
--max-concurrent-chunks <M> Number of chunks processed concurrently
--max-concurrent-blocks <M> Number blocks within a chunk processed concurrently
-d, --dry Dry run, collect no data
Output Options:
-c, --chunk-size <CHUNK_SIZE> Number of blocks per file [default: 1000]
--n-chunks <N_CHUNKS> Number of files (alternative to --chunk-size)
-o, --output-dir <OUTPUT_DIR> Directory for output files [default: .]
--file-suffix <FILE_SUFFIX> Suffix to attach to end of each filename
--overwrite Overwrite existing files instead of skipping them
--csv Save as csv instead of parquet
--json Save as json instead of parquet
--row-group-size <GROUP_SIZE> Number of rows per row group in parquet file
--n-row-groups <N_ROW_GROUPS> Number of rows groups in parquet file
--no-stats Do not write statistics to parquet files
--compression <NAME [#]>... Set compression algorithm and level [default: lz4]
Dataset-specific Options:
--contract <CONTRACT> [logs] filter logs by contract address
--topic0 <TOPIC0> [logs] filter logs by topic0 [aliases: event]
--topic1 <TOPIC1> [logs] filter logs by topic1
--topic2 <TOPIC2> [logs] filter logs by topic2
--topic3 <TOPIC3> [logs] filter logs by topic3
--log-request-size <N_BLOCKS> [logs] Number of blocks per log request [default: 1]
Block specification syntax
- can use numbers --blocks 5000 6000 7000
- can use ranges --blocks 12M:13M 15M:16M
- numbers can contain { _ . K M B } 5_000 5K 15M 15.5M
- omiting range end means latest 15.5M: == 15.5M:latest
- omitting range start means 0 :700 == 0:700
- minus on start means minus end -1000:7000 == 6000:7000
- plus sign on end means plus start 15M:+1000 == 15M:15.001K