a rust library to find near-duplicate video files

Overview

Video Duplicate Finder

vid_dup_finder finds near-duplicate video files on disk. It detects videos whose frames look similar, and where the videos are roughly the same length (within ~5%).

vid_dup_finder will work with most common video file formats (any format supported by FFMPEG.)

How it works

Video Duplicate finder extracts several frames from the first minute of each video. It creates a "perceptual hash" from these frames using 'Spatial' and 'Temporal' information from those frames:

  • The spatial component describes the parts of each frame that are bright and dark. It is generated using the pHash algorithm described in here
  • The temporal component describes the parts of each frame that are brighter/darker than the previous frame. (It is calculated directly from the bits of the spatial hash)

The resulting hashes can then be compared according to their hamming distance. Shorter distances represent similar videos.

Requirements

Ffmpeg must be installed on your system and be accessible on the command line. You can do this by:

  • Debian-based systems: # apt install ffmpeg
  • Yum-based systems: # yum install ffmpeg
  • Windows:
    1. Download the correct installer from https://ffmpeg.org/download.html
    2. Run the installer and install ffmpeg to any directory
    3. Add the directory into the PATH environment variable

Limitations

vid_dup_finder will find duplicates if minor changes have been made to the video, such as resizing, small colour corrections, small crops or faint watermarks. It will not find duplicates if there are larger changes (flipping or rotation, embedding in a corner of a different video etc)

To save processing time when working on large datasets, vid_dup_finder uses only frames from the first 30 seconds of any video. vid_dup_finder may return false positives when used on content of the same length and and a common first-30- seconds (for example a series of cartoons with a fixed into sequence)

False Positives

Because this library only checks the first 30 seconds of each video, if two videos are the same length and share the first 30 seconds of video content, they will be reported as a false match. This may occur for TV shows which contain opening credits.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

You might also like...
Rust library to create a Good Game Easily

ggez What is this? ggez is a Rust library to create a Good Game Easily. The current version is 0.6.0-rc0. This is a RELEASE CANDIDATE version, which m

Rust bindings for libtcod 1.6.3 (the Doryen library/roguelike toolkit)

Warning: Not Maintained This project is no longer actively developed or maintained. Please accept our apologies. Open pull requests may still get merg

A Rust library for blitting 2D sprites
A Rust library for blitting 2D sprites

blit A Rust library for blitting 2D sprites Documentation Usage Add this to your Cargo.toml:

FPS library for gdnative written in Rust.

gd_rusty_fps FPS library for gdnative written in Rust. This projects aims to create easy to use .dll library to be used with godot engine for FPS game

A low-level library for OpenGL context creation, written in pure Rust.

glutin - OpenGL, UTilities and INput A low-level library for OpenGL context creation, written in pure Rust. [dependencies] glutin = "0.28.0" Documenta

A low-level library for OpenGL context creation, written in pure Rust.

glutin - OpenGL, UTilities and INput A low-level library for OpenGL context creation, written in pure Rust. [dependencies] glutin = "0.28.0" Documenta

Pleco is a chess Engine & Library derived from Stockfish, written entirely in Rust

Pleco Pleco is a chess Engine & Library derived from Stockfish, written entirely in Rust. This project is split into two crates, pleco, which contains

Rust library to download and run Minecraft instances.

Rust library to download and run Minecraft instances. Build the code To build the library, the do the following command: carbo build Run the example Y

General purpose client/server networking library written in Rust, built on top of the QUIC protocol which is implemented by quinn

Overview "This library stinks!" ... "Unless you like durian" durian is a client-server networking library built on top of the QUIC protocol which is i

Comments
  • Add additional API for VideoHash

    Add additional API for VideoHash

    Today I finished implementing this library in czkawka and results can be seen here - https://github.com/qarmin/czkawka/pull/460

    It was surprisingly easy to do(most of time took preparing GUI)

    In app I use custom system to save cache to text files, so I need info about each element from VideoHash. I found that currenty it is not possible to do it, so I had to create custom fork.

    Changes from this fork are visible here(I used rust formatter, so there is a lot diffs from this tool) - https://github.com/qarmin/vid_dup_finder_lib/commit/a4809772aea8f73c9a22da6fb43df50bfdd1b31d. It would be good to integrate this or similar changes to project, because this will allow me to work with upstream version of this library and get always the newest features.

    Since I load hashes from file, then I need to create VideoHash object manually so I added this function

            pub fn with_start_data(
                duration: u32,
                src_path: impl AsRef<Path>,
                hash: [u64; HASH_QWORDS],
                num_frames: u32,
            ) -> Self {
                VideoHash {
                    hash,
                    num_frames,
                    src_path: src_path.as_ref().to_path_buf(),
                    duration,
                }
            }
    

    Also there is no possibility to get hash or num_frames from VideoHash so I added this functions

        /// The hash.
        pub fn hash(&self) -> &[u64; HASH_QWORDS] {
            &self.hash
        }
        pub fn num_frames(&self) -> u32 {
            self.num_frames
        }
    
    

    The last thing is that const HASH_QWORDS is not public. I need to check if number of loaded hash elements is equal to number of needed hash elements, so access to such constant would be really helpful(for now I have hardcoded number 19)

    opened by qarmin 2
Owner
null
A plugin-first anime-ish video game

?? ?? Project Flara ?? ?? A plugin-first anime-ish video game Have you ever played an anime mobile video game, and then wondered. Huh, I wish I could

null 2 Dec 23, 2022
Find out what takes most of the space in your executable.

cargo-bloat Find out what takes most of the space in your executable. Supports ELF (Linux, BSD), Mach-O (macOS) and PE (Windows) binaries. WASM is not

Yevhenii Reizner 1.7k Jan 4, 2023
A Rust library for reading asset files and resource packs for any version of Minecraft

minecraft-assets A Rust library for reading asset files and resource packs for any version of Minecraft. Example use minecraft_assets::api::AssetPack;

Ben Reeves 7 Aug 14, 2022
Reads files from the Tiled editor into Rust

rs-tiled Read maps from the Tiled Map Editor into rust for use in video games. It is game engine agnostic and pretty barebones at the moment. Document

mapeditor.org 227 Jan 5, 2023
A tool for creating optimised, platform specific glTF files.

Squisher What? squisher is a program that takes a glTF or .glb file with PNG/JPG textures and produces a .glb file where the textures have been replac

Let Eyes Equals Two 4 Aug 24, 2022
A Win32 GUI program which modifies save files from the game Freelancer (2003).

FL Save Convert A Win32 GUI program which modifies save files from the game Freelancer (2003). System Dependencies Your system will need the latest Mi

Devin Mutlu 3 Nov 15, 2022
Quake .map files for the Bevy game engine.

Qevy A plugin that adds Quake .map file support for the Bevy game engine Supported Bevy Versions: 0.12 Supported Physics Engines: XPBD (Rapier coming

Brian Howard 8 Nov 12, 2023
McShell - A programming language compiles to Minecraft mcfunction files

MCSH MCSH语言是一个语法类似Rust的编译型编程语言,其编译目标是mcfunction文件,以在Minecraft中运行。 MCSH有内存条,可实现函数递归操作。 MCSH 编译 使用 CLI 在虚拟仿真运行 编译 语法 标准库 编译 您需要先安装Rust 然后在您的控制台运行 git cl

FancyFlame 4 Feb 19, 2024
A Rust wrapper and bindings of Allegro 5 game programming library

RustAllegro A thin Rust wrapper of Allegro 5. Game loop example extern crate allegro; extern crate allegro_font; use allegro::*; use allegro_font::*;

null 80 Dec 31, 2022
High performance Rust ECS library

Legion aims to be a feature rich high performance Entity component system (ECS) library for Rust game projects with minimal boilerplate. Getting Start

Amethyst Engine 1.4k Jan 5, 2023