Merge together and efficiently time-sort compressed .pcap files stored in AWS S3 object storage (or locally) to stdout for pipelined processing.

Overview

Work in progress - crate stream-merge

Merge together and efficiently time-sort compressed .pcap files stored in AWS S3 object storage (or locally) to stdout for pipelined processing. High performance and parallel implementation for > 10 Gbps playback throughput with large numbers of files (~4k).

merge_pcaps

A utility executable for merging a collection of .pcap files to stdout in time-sequence.

Note: All input .pcap files must be time-ordered (common case).

  • Stream thousands of huge (bigger than can be downloaded to your system) compressed .pcap files from AWS S3 to another machine, which orchestrates parallel downloading and file prioritization, parallel decompression, and time-ordering of packets.

  • Scalable high performance. Architected for efficient use of resources (compute and memory, even with large numbers of files) and a target > 1.2GB/s output throughput (a modest target which this implementation can outperform with Zstd-compressed .pcap.zst files stored in S3).

    In my specific target usage of this utility, > 4,000 .pcap.gz files (hundreds of gigabytes) stored in S3 need to be decompressed and merged together to stdout. An interesting note about my environment, however, is that the .pcap files I merge together are segmented into some number of groupings of hour-long recordings of packet data. The tool is general purpose, fast and scalable (unlike mergecap), but it is also especially designed to efficiently handle large numbers of input files, some of which will not be needed until late in the playback.

  • Extensible decompression infrastructure. Currently supports Gzip and Zstd, but additional formats could likely be added very quickly and simply (reach out!).

You might also like...
Merge multiple Juniper object definitions into a single object type.

juniper-compose Merge multiple Juniper object definitions into a single object type. crates.io | docs | github Motivation You are building a GraphQL s

Attempt to summarize text from `stdin`, using a large language model (locally and offline), to `stdout`

summarize-cli Attempt to summarize text from stdin, using a large language model (locally and offline), to stdout. cargo build --release target/releas

A blazingly fast Insertion Sort and Quick Sort visualizer built with Rust and WASM.
A blazingly fast Insertion Sort and Quick Sort visualizer built with Rust and WASM.

sortysort A blazingly fast Insertion Sort and Quick Sort visualizer built with Rust and WASM. Try it in your browser from here Testing locally cargo r

Yet another sort crate, porting Golang sort package to Rust.

IndexSort IndexSort Yet another sort crate (in place), porting Golang's standard sort package to Rust. Installation [dependencies] indexsort = "0.1.0"

Easy c̵̰͠r̵̛̠ö̴̪s̶̩̒s̵̭̀-t̶̲͝h̶̯̚r̵̺͐e̷̖̽ḁ̴̍d̶̖̔ ȓ̵͙ė̶͎ḟ̴͙e̸̖͛r̶̖͗ë̶̱́ṉ̵̒ĉ̷̥e̷͚̍ s̷̹͌h̷̲̉a̵̭͋r̷̫̊ḭ̵̊n̷̬͂g̵̦̃ f̶̻̊ơ̵̜ṟ̸̈́ R̵̞̋ù̵̺s̷̖̅ţ̸͗!̸̼͋

Rust S̵̓i̸̓n̵̉ I̴n̴f̶e̸r̵n̷a̴l mutability! Howdy, friendly Rust developer! Ever had a value get m̵̯̅ð̶͊v̴̮̾ê̴̼͘d away right under your nose just when

Replay packets from pcap -file to network interface

pktreplay can be used to read packets from pcap file or interface and write them into interface. By default packets are written with the same rate they have been saved into the pcap file, or, when reading from interface, as fast as they are received.

Convenience library for reading and writing compressed files/streams

compress_io Convenience library for reading and writing compressed files/streams The aim of compress_io is to make it simple for an application to sup

Aws-sdk-rust - AWS SDK for the Rust Programming Language

The AWS SDK for Rust This repo contains the new AWS SDK for Rust (the SDK) and its public roadmap. Please Note: The SDK is currently released as a dev

Rs.aws-login - A command line utility to simplify logging into AWS services.

aws-login A command line utility to simplify logging into AWS accounts and services. $ aws-login use ? Please select a profile to use: › ❯ dev-read

write licenses to stdout

licensor write licenses to stdout About Write a license to standard output given its SPDX ID. A name for the copyright holder can optionally be provid

A tool that uses the credentials stored in 1password as an environment variable.

openv A tool that uses the credentials stored in 1password as an environment variable. Requirements 1password command-line tool Getting Started # Sign

A command-line tool to export FoundationDB stored protocol buffers to ClickhouseDB

fdb-ch-proto-export A command-line tool to export FoundationDB stored Protocol buffers to ClickhouseDB. Installation N/A Usage fdb-ch [command] fdb-c

Pipelined Relational Query Language, pronounced "Prequel"

PRQL Pipelined Relational Query Language, pronounced "Prequel". PRQL is a modern language for transforming data — a simpler and more powerful SQL. Lik

Parallel pipelined map over iterators.

plmap Parallel pipelined map over iterators for rust. Documentation docs.rs/plmap Example Parallel pipelined mapping: // Import the iterator extension

UnlimCloud provides unlimited cloud storage for your files, utilizing Telegram as the storage solution
UnlimCloud provides unlimited cloud storage for your files, utilizing Telegram as the storage solution

UnlimCloud provides unlimited cloud storage for your files, utilizing Telegram as the storage solution. Simply log in using your Telegram ID, and you are good to go.

Finding all pairs of similar documents time- and memory-efficiently

Finding all pairs of similar documents This software provides time- and memory-efficient all pairs similarity searches in documents. Problem definitio

Parse, edit and merge Prometheus metrics exposition format

promerge Promerge provides minimalistic and easy to use API to parse and manipulate Prometheus metrics. A simple usecase could be collecting metrics f

It is a backup tool that creates backups and stores them on an object storage
It is a backup tool that creates backups and stores them on an object storage

Hold My Backup It is a backup tool that creates backups and stores them on an object storage. By default it uses minio but you can use AWS: S3 as well

The core innovation of EinsteinDB is its merge-append-commit protocol

The core innovation of EinsteinDB is its merge-append-commit protocol. The protocol is friendly and does not require more than two parties to agree on the order of events. It is also relativistic and can tolerate clock and network lag and drag.

Owner
null
ezio offers an easy to use IO API for reading and writing to files and stdio

ezio - a crate for easy IO ezio offers an easy to use IO API for reading and writing to files and stdio. ezio includes utilities for generating random

Nick Cameron 98 Dec 21, 2022
ergonomic paths and files in rust

path_abs: ergonomic paths and files in rust. This library aims to provide ergonomic path and file operations to rust with reasonable performance. See

Rett Berg 45 Oct 29, 2022
Find files with SQL-like queries

Find files with SQL-like queries

null 3.4k Dec 29, 2022
fftp is the "Fast File Transport Protocol". It transfers files quickly between computers on a network with low overhead.

fftp fftp is the "Fast File Transport Protocol". It transfers files quickly between computers on a network with low overhead. Motivation FTP uses two

leo 4 May 12, 2022
Collects accurate files while running in parallel through directories. (Simple, Fast, Powerful)

collectfiles Collects accurate files while running in parallel through directories. (Simple, Fast, Powerful) | Docs | Latest Note | [dependencies] col

Doha Lee 2 Jun 1, 2022
A tool for analyzing the size of dependencies in compiled Golang binary files, providing insights into their impact on the final build.

gsv A simple tool to view the size of a Go compiled binary. Build on top of bloaty. Usage First, you need to compile your Go program with the followin

null 70 Apr 12, 2023
Expanding opportunities standard library std::fs and std::io

fs_extra A Rust library that provides additional functionality not present in std::fs. Documentation Migrations to 1.x.x version Key features: Copy fi

Denis Kurilenko 163 Dec 30, 2022
Supertag is a tag-based filesystem, written in Rust, for Linux and MacOS

Supertag is a tag-based filesystem, written in Rust, for Linux and MacOS. It provides a tag-based view of your files by removing the hierarchy constraints typically imposed on files and folders. In other words, it allows you to think about your files not as objects stored in folders, but as objects that can be filtered by folders.

Andrew Moffat 539 Dec 24, 2022
rswatch 🔎 is simple, efficient and reliable file watcher.

rswatch File watcher rswatch is a simple, reliable and efficient file watcher, it can watch a single file or a directory and run arbitrary commands wh

Eugene 3 Sep 23, 2022