Rust library to detect bots using a user-agent string

Overview

isbot

CI Security Audit Crate API

Rust library to detect bots using a user-agent string.

Features

  • Focused on speed, simplicity, and ensuring real browsers don't get falsely identified as bots
  • Tested on over 12k bot user-agents and 180k browser user-agents - updated bot and browser lists are downloaded as part of the integration test suite
  • Easy to plugin as middleware to Actix, Rocket, or other Rust web frameworks
  • Includes a default collection of 300+ known bot user-agent regular expressions at compile time
  • Allows user-agent patterns to be manually added and removed at runtime

Usage

Add this to your Cargo.toml:

[dependencies]
isbot = "0.1.0"

The example below uses the default bot patterns to correctly identify the Googlebot-Image user-agent as a bot and the Opera user-agent as a browser.

use isbot::Bots;

let bots = Bots::default();

assert_eq!(bots.is_bot("Googlebot-Image/1.0"), true);
assert_eq!(bots.is_bot("Opera/9.60 (Windows NT 6.0; U; en) Presto/2.1.1"), false);

Middleware: Actix or Rocket

isbot can be added as middleware to enable global or per-handler rejections of known bots.

Actix Example

There are multiple ways to use isbot in Actix. One way is to pass a Bots instance into the app data as state. For example:

use isbot::Bots;

struct AppState {
    bots: Bots,
}

let state = AppState {
    bots: Bots::default(),
};
let app = App::new()
    .app_data(web::Data::new(state))
    .route("/", web::get().to(index));

Request handlers can use the Bots data to filter out bots:

async fn index(req: HttpRequest, data: web::Data) -> HttpResponse {
    if let Some(user_agent) = get_user_agent(req.headers()) {
        if data.bots.is_bot(user_agent) {
            return HttpResponse::Forbidden().body("Bots not allowed");
        }
    }
    HttpResponse::Ok().body("Home")
}

Another option is to use add the middleware using the Actix wrap_fn function and globally reject all requests from bots. These 2 examples plus other options and details can be seen in the test examples:

Customizing

Bot user-agent patterns can be customized by adding or removing patterns, using the append and remove methods.

Add bot pattern

To add new bot patterns, use append to specify an array of regular expression patterns. For example:

let mut bots = isbot::Bots::default();

assert_eq!(bots.is_bot("Mozilla/5.0 (CustomNewTestB0T /1.2)"), false);

bots.append(&[r"CustomNewTestB0T\s/\d\.\d"]);

assert_eq!(bots.is_bot("Mozilla/5.0 (CustomNewTestB0T /1.2)"), true);

Remove bots

To remove bot patterns, use remove and specify an array of existing patterns to remove. For example, to remove the Chrome Lighthouse user-agent pattern to indicate it is not a bot:

let mut bots = isbot::Bots::default();

bots.remove(&["Chrome-Lighthouse"]);

assert_eq!(bots.is_bot("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36 Chrome-Lighthouse"), false);

Custom Bot list

The default user-agent regular expression patterns are managed in the bot_regex_patterns.txt file.

If you don't want to use the default bot patterns you can supply your own list. Since the default bot patterns are automatically added to the library at compile time you should first disable the default feature. The include-default-bots feature is enabled by default so the patterns defined in bot_regex_patterns.txt are included in the library at compile time.

You can exclude the patterns by disabling the default features and then including your own bot regular expressions. To do that set default-features to false in your Cargo.toml dependency definition. For example:

[dependencies]
isbot = { version = "0.1.0", default-features = false }

And then use Bots::new() to supply a newline delimited list of regular expressions. For example:

use isbot::Bots;

let custom_user_agent_patterns = r#"
^Googlebot-Image/
bingpreview/"#;

let bots = Bots::new(custom_user_agent_patterns);
assert_eq!(bots.is_bot("Googlebot-Image/1.0"), true);

Testing

Some of the test fixture data is download from multiple sources to ensure the latest user-agents are validated.

To download the latest test data fixures, run the download_fixture_data.rs executable:

cargo run --bin download_fixture_data --features="download-fixture-data"

This will update files in the fixtures directory.

Unit and integration tests

To run all unit and integration tests:

cargo test

Actix tests

To validate changes to the Actix examples run the following:

cargo test --example actix_example

Rocket tests

To validate changes to the Rocket examples run the following:

cargo test --example rocket_example

Philosophy

Bot detection is a gray area since there are no clear lines on what defines a bot user-agent and a real browser user-agent. Some libraries focus on broadly classifying bots and trying to identify as many as possible, with the risk that real user browser may be caught and falsely flagged as bots.

This library's focus is on identifying known bots while primarily ensuring no real users or browsers are falsely flagged. All of the bot user-agent patterns are validated against a large number of real browsers and bot patterns to eliminate false positives.

For example, the user-agent string below is identified as both a bot and a real browser by various libraries and data sources:

Mozilla/5.0 (Linux; Android 4.2.1; CUBOT GT99 Build/JOP40D) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.166 Mobile Safari/535.19

Credits

There are many excellent bot detection libraries available for other languages and awesome developers maintaining bot and user-agent identification data. This library draws inspiration from many of them, especially:

Library Language
https://github.com/omrilotan/isbot JavaScript
https://github.com/JayBizzle/Crawler-Detect/ PHP
https://github.com/matomo-org/device-detector PHP
https://github.com/fnando/browser Ruby
https://github.com/biola/Voight-Kampff Ruby

The following data sources are used directly or as inspiration for the static test data and downloaded user-agent identification:

Data Source Notes
user-agents.net User-Agents Database
myip.ms List of IP addresses of Known Web Bots & Spiders in Myip.ms Database
monperrus Collection of user-agents used by robots, crawlers, and spiders
ua-core Regex file necessary to build language ports of Browserscope's user-agent parser

Contributing

See the Contributing guide.

License

isbot is distributed under the terms of the MIT license. See LICENSE for details.

You might also like...
A memory efficient immutable string type that can store up to 24* bytes on the stack

compact_str A memory efficient immutable string type that can store up to 24* bytes on the stack. * 12 bytes for 32-bit architectures About A CompactS

An efficient method of heaplessly converting numbers into their string representations, storing the representation within a reusable byte array.

NumToA #![no_std] Compatible with Zero Heap Allocations The standard library provides a convenient method of converting numbers into strings, but thes

Count and convert between different indexing schemes on utf8 string slices

Str Indices Count and convert between different indexing schemes on utf8 string slices. The following schemes are currently supported: Chars (or "Unic

A program written in Rust, that allows the user to find the current location of the International Space Station and see it on a map.

ISS Location ViewFinder A program written in Rust, that allows the user to find the current location of the International Space Station and see it on

Take user input in Rust.

Take user input in Rust.

Minimal, flexible & user-friendly X and Wayland tiling window manager with rust
Minimal, flexible & user-friendly X and Wayland tiling window manager with rust

SSWM Minimal, flexible & user-friendly X and Wayland tiling window manager but with rust. Feel free to open issues and make pull requests. [Overview]

Simple color picker that lets the user create harmonic palettes with ease.
Simple color picker that lets the user create harmonic palettes with ease.

epick Simple color picker that lets the user create harmonic palettes with ease. Get it You can checkout the web demo over here or get a native binary

Add nice user-facing diagnostics to your errors without being weird about it.

thisdiagnostic is a Rust library for adding rich diagnostic metadata to errors, for some really fancy and customizable error reporting!

Simple fake AWS Cognito User Pool API server for development.

Fakey Cognito 🏡 Homepage Simple fake AWS Cognito API server for development. ✅ Implemented features AdminXxx on User Pools API. Get Started # run wit

Releases(v0.1.3)
  • v0.1.3(Jun 18, 2022)

  • v0.1.1(Apr 3, 2022)

  • v0.1.0(Mar 27, 2022)

    Initial version of library

    • Tested on over 12k bot user-agents and 180k browser user-agents
    • Updated bot and browser lists are downloaded as part of the integration test suite
    • Contains over 300 regular expression patterns to identify bots
    • Includes example for Actix and Rocket
    • Allows user-agent patterns to be manually added and removed at runtime
    Source code(tar.gz)
    Source code(zip)
Owner
Bryan Morgan
Bryan Morgan
A rust interval arithmetic library which provides flags that detect domain errors.

intervals-good A Rust interval arithmetic library which provides flags that detect domain errors, supports more functions than any other interval arit

Oliver Flatt 3 Jul 27, 2022
A security-focused telemetry agent written in Rust using eBPF.

Vesper A security-focused telemetry agent written in Rust using eBPF. Important: While public, this project is an educational endeavor and is not mean

Brian Celenza 2 Oct 23, 2022
Detect if code is running inside a virtual machine (x86 and x86-64 only).

inside-vm Detect if code is running inside a virtual machine. Only works on x86 and x86-64. How does it work Measure average cpu cycles when calling c

null 34 Oct 3, 2022
A simple string parsing utility library for Rust, supporting no_std contexts.

strp Utility library for parsing data from an input string, or stdin if built with the std feature. Supports no_std contexts when built without the st

iqon 5 Nov 3, 2022
Byte is a blazingly fast🚀 Discord Bot with a user-friendly design using twilight written in rust🦀.

Byte Byte is a blazingly fast?? Discord Bot with a user-friendly design using twilight written in rust??. How To Run There is a public version of the

TakoTheDev 3 Nov 15, 2023
SubStrings, Slices and Random String Access in Rust

SubStrings, Slices and Random String Access in Rust This is a simple way to do it. Description Rust string processing is kind of hard, because text in

João Nuno Carvalho 2 Oct 24, 2021
A simple string interner / symbol table for Rust projects.

Symbol Interner A small Rust crate that provides a naïve string interner. Consult the documentation to learn about the types that are exposed. Install

Ryan Chandler 1 Nov 18, 2021
A string truncator and scroller written in Rust

scissrs A string truncator and scroller written in Rust. Usage scissrs --help covers the definitions of this program's flags.

Skybbles 5 Aug 3, 2022
A flexible, simple to use, immutable, clone-efficient String replacement for Rust

A flexible, simple to use, immutable, clone-efficient String replacement for Rust. It unifies literals, inlined, and heap allocated strings into a single type.

Scott Meeuwsen 119 Dec 12, 2022
microtemplate - A fast, microscopic helper crate for runtime string interpolation.

microtemplate A fast, microscopic helper crate for runtime string interpolation. Design Goals Very lightweight: I want microtemplate to do exactly one

_iPhoenix_ 13 Jan 31, 2022