Open Zignatures Database

Related tags

Database openzign
Overview

The openZign project

Zignatures and other binary identification database. For fun and to aid reverse-engineering tasks. Collected from various datasources:

  • vx-underground collection (>2TB decompressed)
  • BinKit dataset (>200GB decompressed)
  • Std-libs from statically compiled languages (golang, rust)
  • Benign windows binaries
  • ?

Note: This is still under heavy development. This README serves primarly to organize my thoughts.

Project Structure

oz-fila

Helper util to mass-analyse binary artifacts (exes, libraries, ...) from a directory. Result is one JSON file per binary containing analysis information from radare2.

oz-indexer

Helper util to index and search the JSON files created by oz-fila.

oz-api

Since the index get quite big, the final goal will be to provide some kind of http/rest API. (Reminds of IDA Lumina Server)

(TODO) r2 plugin

Provide r2 plugin for convenience.

Indexing

First try with indexing with tantivy search. It looks like it can handle large data volumes quite well.

Indexing is not yet continuous / automated (it literally takes weeks to analyze and index everything on my consumer grade desktop hardware).

Facets

  1. Level: Classification of the Binary Sample (Malware, Library, Various)
  2. Level: CPU Architecture (x86, arm, ...)
  3. Level: OS, lang, machine, format, bintype

Fields

  • Strings, Links, Imports, Yara: Default indexer
  • name, sha256, magic, size, error

Zignatures, Segments, Sections

Indexed seperately. MultiValues field containing child document IDs.

Zignatures

The masked zignature should be what you want to search for. Whether it's better to just split at the mask bytes and use SimpleTokenizer or strip them off

  • Name
  • Size
  • ssdeep
  • Entropy
  • bytes
  • mask
  • masked
  • bbsum
  • vars

Segments & Sections

  • Name
  • ssdeep

Ideas and Todos

  • Index ESIL and assembly (how to avoid duplicates with what is already in zignatures?)
  • Use KV store (rkv/tikv/sled) for documents and use tantivy only for search index
  • Some improvements:
    • Add a timestamp to see when the document was indexed
    • Handle "special" cases (Code inside APK, unpack packed samples)
    • Collect whole binary code instead only code recognized as function (zaF)
  • Proper documentation
  • Tweak user experience (simple default search query probably doesnt provide good results)
You might also like...
🐸Slippi DB ingests Slippi replays and puts the data into a SQLite database for easier parsing.
🐸Slippi DB ingests Slippi replays and puts the data into a SQLite database for easier parsing.

The primary goal of this project is to make it easier to analyze large amounts of Slippi data. Its end goal is to create something similar to Ballchasing.com but for Melee.

A cross-platform terminal database tool written in Rust
A cross-platform terminal database tool written in Rust

gobang is currently in alpha A cross-platform terminal database tool written in Rust Features Cross-platform support (macOS, Windows, Linux) Mu

Pure rust embeddable key-value store database.

MHdb is a pure Rust database implementation, based on dbm. See crate documentation. Changelog v1.0.3 Update Cargo.toml v1.0.2 Update Cargo.toml v1.0.1

influxdb provides an asynchronous Rust interface to an InfluxDB database.

influxdb influxdb provides an asynchronous Rust interface to an InfluxDB database. This crate supports insertion of strings already in the InfluxDB Li

Yet Another Kev-Value DataBase

Yet Another Kev-Value DataBase Extremely simple (simplest possible?) single-file BTree-based key-value database. Build for fun and learning: goal is t

Skytable is an extremely fast, secure and reliable real-time NoSQL database with automated snapshots and TLS
Skytable is an extremely fast, secure and reliable real-time NoSQL database with automated snapshots and TLS

Skytable is an effort to provide the best of key/value stores, document stores and columnar databases, that is, simplicity, flexibility and queryability at scale. The name 'Skytable' exemplifies our vision to create a database that has limitless possibilities. Skytable was previously known as TerrabaseDB (and then Skybase) and is also nicknamed "STable", "Sky" and "SDB" by the community.

RefineDB - A strongly-typed document database that runs on any transactional key-value store.

RefineDB - A strongly-typed document database that runs on any transactional key-value store.

FeOphant - A SQL database server written in Rust and inspired by PostreSQL.

A PostgreSQL inspired SQL database written in Rust.

GlueSQL is a SQL database library written in Rust

GlueSQL is a SQL database library written in Rust. It provides a parser (sqlparser-rs), execution layer, and optional storage (sled) packaged into a single library.

Owner
Cyrill Leutwiler
Open Source - Open Mind
Cyrill Leutwiler
open source training courses about distributed database and distributed systemes

Welcome to learn Talent Plan Courses! Talent Plan is an open source training program initiated by PingCAP. It aims to create or combine some open sour

PingCAP 8.3k Dec 30, 2022
Experimental blockchain database

A database for the blockchain. Design considerations API The database is a universal key-value storage that supports transactions. It does not support

Parity Technologies 172 Dec 26, 2022
Immutable Ordered Key-Value Database Engine

PumpkinDB Build status (Linux) Build status (Windows) Project status Usable, between alpha and beta Production-readiness Depends on your risk toleranc

null 1.3k Jan 2, 2023
Skybase is an extremely fast, secure and reliable real-time NoSQL database with automated snapshots and SSL

Skybase The next-generation NoSQL database What is Skybase? Skybase (or SkybaseDB/SDB) is an effort to provide the best of key/value stores, document

Skybase 1.4k Dec 29, 2022
Distributed transactional key-value database, originally created to complement TiDB

Website | Documentation | Community Chat TiKV is an open-source, distributed, and transactional key-value database. Unlike other traditional NoSQL sys

TiKV Project 12.4k Jan 3, 2023
small distributed database protocol

clepsydra Overview This is a work-in-progress implementation of a core protocol for a minimalist distributed database. It strives to be as small and s

Graydon Hoare 19 Dec 2, 2021
A user crud written in Rust, designed to connect to a MySQL database with full integration test coverage.

SQLX User CRUD Purpose This application demonstrates the how to implement a common design for CRUDs in, potentially, a system of microservices. The de

null 78 Nov 27, 2022
Rust version of the Haskell ERD tool. Translates a plain text description of a relational database schema to dot files representing an entity relation diagram.

erd-rs Rust CLI tool for creating entity-relationship diagrams from plain text markup. Based on erd (uses the same input format and output rendering).

Dave Challis 32 Jul 25, 2022
AgateDB is an embeddable, persistent and fast key-value (KV) database written in pure Rust

AgateDB is an embeddable, persistent and fast key-value (KV) database written in pure Rust. It is designed as an experimental engine for the TiKV project, and will bring aggressive optimizations for TiKV specifically.

TiKV Project 535 Jan 9, 2023
A programmable document database inspired by CouchDB written in Rust

PliantDB PliantDB aims to be a Rust-written, ACID-compliant, document-database inspired by CouchDB. While it is inspired by CouchDB, this project will

Khonsu Labs 718 Dec 31, 2022