Open Zignatures Database

Cyrill Leutwiler

Last update: Sep 19, 2021

Related tags

Database openzign

Overview

The openZign project

Zignatures and other binary identification database. For fun and to aid reverse-engineering tasks. Collected from various datasources:

vx-underground collection (>2TB decompressed)
BinKit dataset (>200GB decompressed)
Std-libs from statically compiled languages (golang, rust)
Benign windows binaries
?

Note: This is still under heavy development. This README serves primarly to organize my thoughts.

Project Structure

oz-fila

Helper util to mass-analyse binary artifacts (exes, libraries, ...) from a directory. Result is one JSON file per binary containing analysis information from radare2.

oz-indexer

Helper util to index and search the JSON files created by oz-fila.

oz-api

Since the index get quite big, the final goal will be to provide some kind of http/rest API. (Reminds of IDA Lumina Server)

(TODO) r2 plugin

Provide r2 plugin for convenience.

Indexing

First try with indexing with tantivy search. It looks like it can handle large data volumes quite well.

Indexing is not yet continuous / automated (it literally takes weeks to analyze and index everything on my consumer grade desktop hardware).

Facets

Level: Classification of the Binary Sample (Malware, Library, Various)
Level: CPU Architecture (x86, arm, ...)
Level: OS, lang, machine, format, bintype

Fields

Strings, Links, Imports, Yara: Default indexer
name, sha256, magic, size, error

Zignatures, Segments, Sections

Indexed seperately. MultiValues field containing child document IDs.

Zignatures

The masked zignature should be what you want to search for. Whether it's better to just split at the mask bytes and use SimpleTokenizer or strip them off

Name
Size
ssdeep
Entropy
bytes
mask
masked
bbsum
vars

Segments & Sections

Name
ssdeep

Ideas and Todos

Index ESIL and assembly (how to avoid duplicates with what is already in zignatures?)
Use KV store (rkv/tikv/sled) for documents and use tantivy only for search index
Some improvements:
- Add a timestamp to see when the document was indexed
- Handle "special" cases (Code inside APK, unpack packed samples)
- Collect whole binary code instead only code recognized as function (zaF)
Proper documentation
Tweak user experience (simple default search query probably doesnt provide good results)

You might also like...

🐸Slippi DB ingests Slippi replays and puts the data into a SQLite database for easier parsing.

The primary goal of this project is to make it easier to analyze large amounts of Slippi data. Its end goal is to create something similar to Ballchasing.com but for Melee.

20 Jan 2, 2023

A cross-platform terminal database tool written in Rust

gobang is currently in alpha A cross-platform terminal database tool written in Rust Features Cross-platform support (macOS, Windows, Linux) Mu

2.1k Jan 5, 2023

Pure rust embeddable key-value store database.

MHdb is a pure Rust database implementation, based on dbm. See crate documentation. Changelog v1.0.3 Update Cargo.toml v1.0.2 Update Cargo.toml v1.0.1

7 Dec 10, 2022

influxdb provides an asynchronous Rust interface to an InfluxDB database.

influxdb influxdb provides an asynchronous Rust interface to an InfluxDB database. This crate supports insertion of strings already in the InfluxDB Li

9 Feb 16, 2021

Yet Another Kev-Value DataBase

Yet Another Kev-Value DataBase Extremely simple (simplest possible?) single-file BTree-based key-value database. Build for fun and learning: goal is t

18 May 23, 2022

Skytable is an extremely fast, secure and reliable real-time NoSQL database with automated snapshots and TLS

Skytable is an effort to provide the best of key/value stores, document stores and columnar databases, that is, simplicity, flexibility and queryability at scale. The name 'Skytable' exemplifies our vision to create a database that has limitless possibilities. Skytable was previously known as TerrabaseDB (and then Skybase) and is also nicknamed "STable", "Sky" and "SDB" by the community.

1.4k Dec 29, 2022

Open Zignatures Database

Related tags

Overview

The openZign project

Project Structure

oz-fila

oz-indexer

oz-api

(TODO) r2 plugin

Indexing

Facets

Fields

Zignatures, Segments, Sections

Zignatures

Segments & Sections

Ideas and Todos

You might also like...

🐸Slippi DB ingests Slippi replays and puts the data into a SQLite database for easier parsing.

A cross-platform terminal database tool written in Rust

Pure rust embeddable key-value store database.

influxdb provides an asynchronous Rust interface to an InfluxDB database.

Yet Another Kev-Value DataBase

Skytable is an extremely fast, secure and reliable real-time NoSQL database with automated snapshots and TLS

RefineDB - A strongly-typed document database that runs on any transactional key-value store.

FeOphant - A SQL database server written in Rust and inspired by PostreSQL.

GlueSQL is a SQL database library written in Rust

Owner

Cyrill Leutwiler

open source training courses about distributed database and distributed systemes

Experimental blockchain database

Immutable Ordered Key-Value Database Engine

Skybase is an extremely fast, secure and reliable real-time NoSQL database with automated snapshots and SSL

Distributed transactional key-value database, originally created to complement TiDB

small distributed database protocol

A user crud written in Rust, designed to connect to a MySQL database with full integration test coverage.

Rust version of the Haskell ERD tool. Translates a plain text description of a relational database schema to dot files representing an entity relation diagram.

AgateDB is an embeddable, persistent and fast key-value (KV) database written in pure Rust

A programmable document database inspired by CouchDB written in Rust