A simple bayesian spam classifier written in Rust.

Overview

bayespam

Build Status Crates.io Docs License: MIT

A simple bayesian spam classifier.

About

Bayespam is inspired by Naive Bayes classifiers, a popular statistical technique of e-mail filtering.

Here, the message to be identified is cut into simple words, also called tokens. That are compared to all the corpus of messages (spam or not), to determine the frequency of different tokens in both categories.

A probabilistic formula is used to calculate the probability that the message is a spam. When the probability is high enough, the classifier categorizes the message as likely a spam, otherwise as likely a ham. The probability threshold is fixed at 0.8 by default.

Documentation

Learn more about Bayespam here: https://docs.rs/bayespam.

Usage

Add to your Cargo.toml manifest:

[dependencies]
bayespam = "1.1.0"

Use a pre-trained model

Add a model.json file to your package root. Then, you can use it to score and identify messages:

extern crate bayespam;

use bayespam::classifier;

fn main() -> Result<(), std::io::Error> {
    // Identify a typical spam message
    let spam = "Lose up to 19% weight. Special promotion on our new weightloss.";
    let score = classifier::score(spam)?;
    let is_spam = classifier::identify(spam)?;
    println!("{:.4?}", score);
    println!("{:?}", is_spam);

    // Identify a typical ham message
    let ham = "Hi Bob, can you send me your machine learning homework?";
    let score = classifier::score(ham)?;
    let is_spam = classifier::identify(ham)?;
    println!("{:.4?}", score);
    println!("{:?}", is_spam);

    Ok(())
}
$> cargo run
0.9993
true
0.6311
false

Train your own model and save it as JSON into a file

You can train a new model from scratch, save it as JSON to reload it later:

extern crate bayespam;

use bayespam::classifier::Classifier;
use std::fs::File;

fn main() -> Result<(), std::io::Error> {
    // Create a new classifier with an empty model
    let mut classifier = Classifier::new();

    // Train the classifier with a new spam example
    let spam = "Don't forget our special promotion: -30% on men shoes, only today!";
    classifier.train_spam(spam);

    // Train the classifier with a new ham example
    let ham = "Hi Bob, don't forget our meeting today at 4pm.";
    classifier.train_ham(ham);

    // Identify a typical spam message
    let spam = "Lose up to 19% weight. Special promotion on our new weightloss.";
    let score = classifier.score(spam);
    let is_spam = classifier.identify(spam);
    println!("{:.4}", score);
    println!("{}", is_spam);

    // Identify a typical ham message
    let ham = "Hi Bob, can you send me your machine learning homework?";
    let score = classifier.score(ham);
    let is_spam = classifier.identify(ham);
    println!("{:.4}", score);
    println!("{}", is_spam);

    // Serialize the model and save it as JSON into a file
    let mut file = File::create("my_super_model.json")?;
    classifier.save(&mut file, false)?;

    Ok(())
}
$> cargo run
0.9999
true
0.0001
false
$> cat my_super_model.json
{"token_table":{"forget":{"ham":1,"spam":1},"only":{"ham":0,"spam":1},"meeting":{"ham":1,"spam":0},"our":{"ham":1,"spam":1},"dont":{"ham":1,"spam":1},"bob":{"ham":1,"spam":0},"men":{"ham":0,"spam":1},"today":{"ham":1,"spam":1},"shoes":{"ham":0,"spam":1},"special":{"ham":0,"spam":1},"promotion:":{"ham":0,"spam":1}}}

Contribution

Contributions via issues or pull requests are appreciated.

License

Bayespam is distributed under the terms of the MIT License.

You might also like...
l2 is a fast, Pytorch-style Tensor+Autograd library written in Rust
l2 is a fast, Pytorch-style Tensor+Autograd library written in Rust

l2 • 🤖 A Pytorch-style Tensor+Autograd library written in Rust Installation • Contributing • Authors • License • Acknowledgements Made by Bilal Khan

Reinforcement learning library written in Rust

REnforce Reinforcement library written in Rust This library is still in early stages, and the API has not yet been finalized. The documentation can be

Barnes-Hut t-SNE implementation written in Rust.
Barnes-Hut t-SNE implementation written in Rust.

bhtsne Barnes-Hut implementation of t-SNE written in Rust. The algorithm is described with fine detail in this paper by Laurens van der Maaten. Instal

A Machine Learning Framework for High Performance written in Rust
A Machine Learning Framework for High Performance written in Rust

polarlight polarlight is a machine learning framework for high performance written in Rust. Key Features TBA Quick Start TBA How To Contribute Contrib

🚀  efficient approximate nearest neighbor search algorithm collections library written in Rust 🦀 .
🚀 efficient approximate nearest neighbor search algorithm collections library written in Rust 🦀 .

🚀 efficient approximate nearest neighbor search algorithm collections library written in Rust 🦀 .

miniature: a toy deep learning library written in Rust

miniature: a toy deep learning library written in Rust A miniature is a toy deep learning library written in Rust. The miniature is: implemented for a

Generic k-means implementation written in Rust

RKM - Rust k-means A simple Rust implementation of the k-means clustering algorithm based on a C++ implementation, dkm. This implementation is generic

A naive density-based clustering algorithm written in Rust
A naive density-based clustering algorithm written in Rust

Density-based clustering This a pure Rust implementation of a naive density-based clustering algorithm similar to DBSCAN. Here, 50 points are located

A Demo server serving Bert through ONNX with GPU written in Rust with 3

Demo BERT ONNX server written in rust This demo showcase the use of onnxruntime-rs on BERT with a GPU on CUDA 11 served by actix-web and tokenized wit

Simple Neural Network on rust

Simple Artificial Neural Network A crate that implements simple usage of dense neural networks. Instalation Add this to your dependencies on Cargo.tom

null 6 Jul 1, 2022
Simple WIP GPGPU framework for Rust built on top of wgpu

gpgpu A simple GPU compute library based on wgpu. It is meant to be used alongside wgpu if desired. To start using gpgpu, just create a Framework inst

Jerónimo Sánchez 97 Dec 26, 2022
This repository features a simple Kalman filter and RTS smoother (KFS) implementation in Rust by using the ndarray library.

Kalman filter and RTS smoother in Rust (ndarray) This repository features a simple Kalman filter and RTS smoother (KFS) implementation in Rust by usin

SPDEs 3 Dec 1, 2022
Simple type-safe relational algebra evaluator built entirely in Rust

ra-evaluator A simple type-safe relational algebra evaluator. Relational algebra provides the theoretical foundation for relational databases and the

Vincent Wong 4 Aug 8, 2022
Mars is a rust machine learning library. [Goal is to make Simple as possible]

Mars Mars (ma-rs) is an blazingly fast rust machine learning library. Simple and Powerful! ?? ?? Contribution: Feel free to build this project. This i

KoBruh 3 Dec 25, 2022
A simple neural net implementation.

PROPHET - Neural Network Library Linux Windows Codecov Coveralls Docs Crates.io A simple neural net implementation written in Rust with a focus on cac

Robin Freyler 41 Sep 16, 2022
A simple url checker for finding fraud url(s) or nearest url

urlchecker A simple url checker for finding fraud url(s) or nearest url while being fast (threading) Eg:- use std::collections::HashMap; use urlchecke

Subconscious Compute 2 Aug 7, 2022
Msgpack serialization/deserialization library for Python, written in Rust using PyO3, and rust-msgpack. Reboot of orjson. msgpack.org[Python]

ormsgpack ormsgpack is a fast msgpack library for Python. It is a fork/reboot of orjson It serializes faster than msgpack-python and deserializes a bi

Aviram Hassan 139 Dec 30, 2022
Fwumious Wabbit, fast on-line machine learning toolkit written in Rust

Fwumious Wabbit is a very fast machine learning tool built with Rust inspired by and partially compatible with Vowpal Wabbit (much love! read more abo

Outbrain 115 Dec 9, 2022
RustFFT is a high-performance FFT library written in pure Rust.

RustFFT is a high-performance FFT library written in pure Rust. It can compute FFTs of any size, including prime-number sizes, in O(nlogn) time.

Elliott Mahler 411 Jan 9, 2023