Multilingual implementation of RAKE algorithm for Rust

Overview

RAKE.rs

crates.io Documentation Build Status Crates.io

The library provides a multilingual implementation of Rapid Automatic Keyword Extraction (RAKE) algorithm for Rust.

How to Use

  • Append rake to dependencies of Cargo.toml:
rake = "0.3"
  • Import modules:
use rake::*;
  • Create a new instance of Rake struct:
let text = "a long text";
let sw = StopWords::from_file("path/to/stop_words_list.txt").unwrap();
let r = Rake::new(sw);
let keywords = r.run(text);
  • Iterate over keywords:
keywords.iter().for_each(
    |&KeywordScore {
        ref keyword,
        ref score,
    }| println!("{}: {}", keyword, score),
);
You might also like...
Simple, robust, BitTorrent's Mainline DHT implementation

Mainline Simple, robust, BitTorrent's Mainline DHT implementation. This library is focused on being the best and simplest Rust client for Mainline, es

Fast suffix arrays for Rust (with Unicode support).
Fast suffix arrays for Rust (with Unicode support).

suffix Fast linear time & space suffix arrays for Rust. Supports Unicode! Dual-licensed under MIT or the UNLICENSE. Documentation https://docs.rs/suff

Elastic tabstops for Rust.

tabwriter is a crate that implements elastic tabstops. It provides both a library for wrapping Rust Writers and a small program that exposes the same

An efficient and powerful Rust library for word wrapping text.

Textwrap Textwrap is a library for wrapping and indenting text. It is most often used by command-line programs to format dynamic output nicely so it l

⏮ ⏯ ⏭ A Rust library to easily read forwards, backwards or randomly through the lines of huge files.

EasyReader The main goal of this library is to allow long navigations through the lines of large files, freely moving forwards and backwards or gettin

Natural language detection library for Rust. Try demo online: https://www.greyblake.com/whatlang/
Natural language detection library for Rust. Try demo online: https://www.greyblake.com/whatlang/

Whatlang Natural language detection for Rust with focus on simplicity and performance. Content Features Get started Documentation Supported languages

A Rust library for generically joining iterables with a separator

joinery A Rust library for generically joining iterables with a separator. Provides the tragically missing string join functionality to rust. extern c

Rust edit distance routines accelerated using SIMD. Supports fast Hamming, Levenshtein, restricted Damerau-Levenshtein, etc. distance calculations and string search.

triple_accel Rust edit distance routines accelerated using SIMD. Supports fast Hamming, Levenshtein, restricted Damerau-Levenshtein, etc. distance cal

Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)

rust-bert Rust native Transformer-based models implementation. Port of Hugging Face's Transformers library, using the tch-rs crate and pre-processing

Comments
  • remove lazy_static dependency

    remove lazy_static dependency

    Was getting version requirement clashes with some crates wanting lazy_static 1.4.0 and this wanting 1.3.0 Thought i'd come bump the version but it seems more like it's not even required with a few internal changes.

    opened by dten 1
  • WordDegree and WordFrequency ranking metrics added

    WordDegree and WordFrequency ranking metrics added

    Multiple ranking metrics added, used the same as in the rake-nltk python implementation: https://csurfer.github.io/rake-nltk/_build/html/advanced.html#to-control-the-metric-for-ranking

    opened by jaroslavgratz 0
  • Tokenization of 's

    Tokenization of 's

    The punctuation regex includes apostrophe, so it splits "foo's" as two separate phrases. I'm seeing "s something" in keywords.

    I think it could be fixed by using less smart splitting:

        text.split(|c: char| match c {
                    '.'| ',' | '!' | '?' | ':' | ';' | '(' | ')' | '{' | '}' => true,
                    _ => false,
                }).filter(|s| !s.is_empty()).for_each(|s| {
                    let mut phrase = Vec::new();
                    s.split(|c:char| !c.is_alphanumeric() && c != '\'' && c != '’').filter(|s| !s.is_empty()).for_each(|word| {
                        let word = word.trim_matches(|c: char| !c.is_alphanumeric());
    
    bug 
    opened by kornelski 1
Rust-nlp is a library to use Natural Language Processing algorithm with RUST

nlp Rust-nlp Implemented algorithm Distance Levenshtein (Explanation) Jaro / Jaro-Winkler (Explanation) Phonetics Soundex (Explanation) Metaphone (Exp

Simon Paitrault 34 Dec 20, 2022
An implementation of regular expressions for Rust. This implementation uses finite automata and guarantees linear time matching on all inputs.

regex A Rust library for parsing, compiling, and executing regular expressions. Its syntax is similar to Perl-style regular expressions, but lacks a f

The Rust Programming Language 2.6k Jan 8, 2023
Snips NLU rust implementation

Snips NLU Rust Installation Add it to your Cargo.toml: [dependencies] snips-nlu-lib = { git = "https://github.com/snipsco/snips-nlu-rs", branch = "mas

Snips 327 Dec 26, 2022
A fast implementation of Aho-Corasick in Rust.

aho-corasick A library for finding occurrences of many patterns at once with SIMD acceleration in some cases. This library provides multiple pattern s

Andrew Gallant 662 Dec 31, 2022
🦀 A Rust implementation of a RoBERTa classification model for the SNLI dataset

RustBERTa-SNLI A Rust implementation of a RoBERTa classification model for the SNLI dataset, with support for fine-tuning, predicting, and serving. Th

AI2 11 Oct 17, 2022
A rust implementation of some popular snowball stemming algorithms

Rust Stemmers This crate implements some stemmer algorithms found in the snowball project which are compiled to rust using the rust-backend of the sno

CurrySoftware GmbH 84 Dec 15, 2022
Gomez - A pure Rust framework and implementation of (derivative-free) methods for solving nonlinear (bound-constrained) systems of equations

Gomez A pure Rust framework and implementation of (derivative-free) methods for solving nonlinear (bound-constrained) systems of equations. Warning: T

Datamole 19 Dec 24, 2022
Implementation of sentence embeddings with BERT in Rust, using the Burn library.

Sentence Transformers in Burn This library provides an implementation of the Sentence Transformers framework for computing text representations as vec

Tyler Vergho 4 Sep 4, 2023
A naive native 128-bit cityhash v102 implementation

Naive CityHash naive-cityhash is a naive native 128-bit cityhash v102 implementation for clickhouse*. Contact Chojan Shang - @PsiACE - psiace@outlook.

Chojan Shang 5 Apr 4, 2022
A naive (read: slow) implementation of Word2Vec. Uses BLAS behind the scenes for speed.

SloWord2Vec This is a naive implementation of Word2Vec implemented in Rust. The goal is to learn the basic principles and formulas behind Word2Vec. BT

Lloyd 2 Jul 5, 2018