fastText Rust binding

Overview

fasttext-rs

Build Status codecov Crates.io docs.rs

fastText Rust binding

Installation

Add it to your Cargo.toml:

[dependencies]
fasttext = "0.6"

Add extern crate fasttext to your crate root and your're good to go!

License

This work is released under the MIT license. A copy of the license is provided in the LICENSE file.

Comments
  • Ability to load compressed models

    Ability to load compressed models

    Hi! Thank you for this project.

    I was curious whether you had any plans to implement support for compressed models?

    • They are described in detail in this blog post: https://towardsdatascience.com/compressing-unsupervised-fasttext-models-eb212e9919ca
    • Here are several of the models mentioned in the blog post: https://github.com/avidale/compress-fasttext/releases/tag/gensim-4-draft
    • This is the Python library that provides support for loading compressed models: https://github.com/avidale/compress-fasttext
    opened by cjrh 2
  • 'error: failed to add native library' on Apple M1

    'error: failed to add native library' on Apple M1

    Failed to compile fasttext on Apple M1.

    The error message is:

       Compiling fasttext v0.6.0
    error: failed to add native library xxx/target/debug/build/cfasttext-sys-b23c8188e065dcaa/out/build/libcfasttext_static.a: file too small to be an archive
    

    My toolchain versions

    $ rustc -V
    rustc 1.56.0-nightly (ad981d58e 2021-08-08)
    
    $ rustup show
    Default host: aarch64-apple-darwin
    rustup home:  /Users/xxx/.rustup
    
    nightly-aarch64-apple-darwin (default)
    rustc 1.56.0-nightly (ad981d58e 2021-08-08)
    
    $ cmake --version
    cmake version 3.21.0
    
    CMake suite maintained and supported by Kitware (kitware.com/cmake).
    

    Any suggestion?

    Thanks a lot!

    opened by fiag 2
  • Program panics if any null byte is encountered.

    Program panics if any null byte is encountered.

    When using prediction functions with strings containing a null byte, the library crashes due to an unwrap() on a CString creation. However, encountering null bytes happens in the nature and while the presence of null bytes could be checked upstream, I don't think that having the program crash is intended behaviour.

    This could be fixed by removing unwrap, change return types if applicable and map the error to a String.

    I'd be glad to implement such a fix :)

    bug 
    opened by Uinelj 2
  • Fail to compile for Android target armeabi-v7a

    Fail to compile for Android target armeabi-v7a

    I have the following error when I try to compile for armeabi-v7a:

    [2022-09-23T19:42:47Z INFO  cargo_ndk::cli] Building armeabi-v7a (armv7-linux-androideabi)
       Compiling cfasttext-sys v0.7.0
    error: failed to run custom build command for `cfasttext-sys v0.7.0`
    
    Caused by:
      process didn't exit successfully: `/rust/target/debug/build/cfasttext-sys-34b28a9c9dd21f71/build-script-build` (exit status: 101)
      --- stdout
      CMAKE_TOOLCHAIN_FILE_armv7-linux-androideabi = None
      CMAKE_TOOLCHAIN_FILE_armv7_linux_androideabi = None
      TARGET_CMAKE_TOOLCHAIN_FILE = None
      CMAKE_TOOLCHAIN_FILE = None
      CMAKE_GENERATOR_armv7-linux-androideabi = None
      CMAKE_GENERATOR_armv7_linux_androideabi = None
      TARGET_CMAKE_GENERATOR = None
      CMAKE_GENERATOR = None
      CMAKE_PREFIX_PATH_armv7-linux-androideabi = None
      CMAKE_PREFIX_PATH_armv7_linux_androideabi = None
      TARGET_CMAKE_PREFIX_PATH = None
      CMAKE_PREFIX_PATH = None
      CMAKE_armv7-linux-androideabi = None
      CMAKE_armv7_linux_androideabi = None
      TARGET_CMAKE = None
      CMAKE = None
      running: "cmake" "/.cargo/registry/src/github.com-1ecc6299db9ec823/cfasttext-sys-0.7.0/cfasttext" "-DCMAKE_INSTALL_PREFIX=/IdeaProjects/rust/target/armv7-linux-androideabi/debug/build/cfasttext-sys-97389a1d85d6c5c7/out" "-DCMAKE_C_FLAGS= -DANDROID -ffunction-sections -fdata-sections -fPIC" "-DCMAKE_C_COMPILER=/Android/Sdk/ndk/22.1.7171670/toolchains/llvm/prebuilt/linux-x86_64/bin/armv7a-linux-androideabi21-clang" "-DCMAKE_CXX_FLAGS= -DANDROID -ffunction-sections -fdata-sections -fPIC" "-DCMAKE_CXX_COMPILER=/Android/Sdk/ndk/22.1.7171670/toolchains/llvm/prebuilt/linux-x86_64/bin/armv7a-linux-androideabi21-clang++" "-DCMAKE_ASM_FLAGS= -DANDROID -ffunction-sections -fdata-sections -fPIC" "-DCMAKE_ASM_COMPILER=/Android/Sdk/ndk/22.1.7171670/toolchains/llvm/prebuilt/linux-x86_64/bin/armv7a-linux-androideabi21-clang" "-DCMAKE_BUILD_TYPE=Debug"
      -- The C compiler identification is Clang 11.0.5
      -- The CXX compiler identification is Clang 11.0.5
      -- Check for working C compiler: /Android/Sdk/ndk/22.1.7171670/toolchains/llvm/prebuilt/linux-x86_64/bin/armv7a-linux-androideabi21-clang
      -- Check for working C compiler: /Android/Sdk/ndk/22.1.7171670/toolchains/llvm/prebuilt/linux-x86_64/bin/armv7a-linux-androideabi21-clang -- works
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - done
      -- Detecting C compile features
      -- Detecting C compile features - done
      -- Check for working CXX compiler: /Android/Sdk/ndk/22.1.7171670/toolchains/llvm/prebuilt/linux-x86_64/bin/armv7a-linux-androideabi21-clang++
      -- Check for working CXX compiler: /Android/Sdk/ndk/22.1.7171670/toolchains/llvm/prebuilt/linux-x86_64/bin/armv7a-linux-androideabi21-clang++ -- works
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Configuring done
      -- Generating done
      -- Build files have been written to: /IdeaProjects/rust/target/armv7-linux-androideabi/debug/build/cfasttext-sys-97389a1d85d6c5c7/out/build
      running: "cmake" "--build" "." "--target" "cfasttext_static" "--config" "Debug" "--parallel" "8"
      Scanning dependencies of target objlib
      Scanning dependencies of target fasttext-static
      [ 35%] Building CXX object fasttext/CMakeFiles/fasttext-static.dir/src/args.cc.o
      [ 35%] Building CXX object fasttext/CMakeFiles/fasttext-static.dir/src/main.cc.o
      [  5%] Building CXX object fasttext/CMakeFiles/fasttext-static.dir/src/autotune.cc.o
      [ 35%] Building CXX object fasttext/CMakeFiles/fasttext-static.dir/src/dictionary.cc.o
      [ 35%] Building CXX object fasttext/CMakeFiles/fasttext-static.dir/src/densematrix.cc.o
      [ 41%] Building CXX object fasttext/CMakeFiles/fasttext-static.dir/src/fasttext.cc.o
      [ 41%] Building CXX object fasttext/CMakeFiles/fasttext-static.dir/src/loss.cc.o
      [ 47%] Building CXX object CMakeFiles/objlib.dir/lib/cfasttext.cc.o
      [ 47%] Built target objlib
    
      --- stderr
      CMake Warning:
        Manually-specified variables were not used by the project:
    
          CMAKE_ASM_COMPILER
          CMAKE_ASM_FLAGS
    
    
      make: warning: -j8 forced in submake: resetting jobserver mode.
      clang++clang++clang++clang++clang++clang++: clang++error: : : : : : errorerrorerrorerror: : the clang compiler does not support '-march=native'the clang compiler does not support '-march=native': : 
      the clang compiler does not support '-march=native'
      the clang compiler does not support '-march=native'
      the clang compiler does not support '-march=native'
    
      error: the clang compiler does not support '-march=native'
      : error: the clang compiler does not support '-march=native'
      make[3]: *** [fasttext/CMakeFiles/fasttext-static.dir/build.make:141: fasttext/CMakeFiles/fasttext-static.dir/src/main.cc.o] Error 1
      make[3]: *** Waiting for unfinished jobs....
      make[3]: *** [fasttext/CMakeFiles/fasttext-static.dir/build.make:63: fasttext/CMakeFiles/fasttext-static.dir/src/args.cc.o] Error 1
      make[3]: *** [fasttext/CMakeFiles/fasttext-static.dir/build.make:76: fasttext/CMakeFiles/fasttext-static.dir/src/autotune.cc.o] Error 1
      make[3]: *** [fasttext/CMakeFiles/fasttext-static.dir/build.make:115: fasttext/CMakeFiles/fasttext-static.dir/src/fasttext.cc.o] Error 1
      make[3]: *** [fasttext/CMakeFiles/fasttext-static.dir/build.make:128: fasttext/CMakeFiles/fasttext-static.dir/src/loss.cc.o] Error 1
      make[3]: *** [fasttext/CMakeFiles/fasttext-static.dir/build.make:89: fasttext/CMakeFiles/fasttext-static.dir/src/densematrix.cc.o] Error 1
      make[3]: *** [fasttext/CMakeFiles/fasttext-static.dir/build.make:102: fasttext/CMakeFiles/fasttext-static.dir/src/dictionary.cc.o] Error 1
      make[2]: *** [CMakeFiles/Makefile2:242: fasttext/CMakeFiles/fasttext-static.dir/all] Error 2
      make[2]: *** Waiting for unfinished jobs....
      make[1]: *** [CMakeFiles/Makefile2:113: CMakeFiles/cfasttext_static.dir/rule] Error 2
      make: *** [Makefile:164: cfasttext_static] Error 2
      thread 'main' panicked at '
      command did not execute successfully, got: exit status: 2
    
      build script failed, must exit now', /.cargo/registry/src/github.com-1ecc6299db9ec823/cmake-0.1.48/src/lib.rs:975:5
      note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
    [2022-09-23T19:42:50Z INFO  cargo_ndk::cli] If the build failed due to a missing target, you can run this command:
    [2022-09-23T19:42:50Z INFO  cargo_ndk::cli] 
    [2022-09-23T19:42:50Z INFO  cargo_ndk::cli]     rustup target install armv7-linux-androideabi
    
    opened by vincent-herlemont 1
  • FastText struct shouldn't implement clone

    FastText struct shouldn't implement clone

    Hello,

    First of all, thanks a lot for your work on this crate!

    Cloning doesn't play well with the custom drop logic, it's better to remove the Clone bit from derive and let users share it via Arc/Rc I think, otherwise this can easily happen:

    use fasttext::FastText;
    
    fn main() {
        let mut ft = FastText::new();
        ft.load_model("cooking.model.bin").unwrap();
    
        let cloned = ft.clone();
        drop(cloned);
    
        // boom!
        ft.is_quant();
    }
    

    Running it causes a 'cargo run' terminated by signal SIGSEGV (Address boundary error)

    bug 
    opened by caio 1
  • Ability to get all labels

    Ability to get all labels

    Hello,

    I've been wondering if it could be possible to get the same functionnality as the labels method/property on the python wrapper. I see that you rely on cfasttext, and I haven't found anything that could replicate this feature. I'd be glad to try to add this in Rust, but I don't know if I'd have to touch the underlying C wrapper.

    opened by Uinelj 0
Owner
messense
Python Backend Developer at day, Rustacean at night.
messense
Rust-nlp is a library to use Natural Language Processing algorithm with RUST

nlp Rust-nlp Implemented algorithm Distance Levenshtein (Explanation) Jaro / Jaro-Winkler (Explanation) Phonetics Soundex (Explanation) Metaphone (Exp

Simon Paitrault 34 Dec 20, 2022
Fast suffix arrays for Rust (with Unicode support).

suffix Fast linear time & space suffix arrays for Rust. Supports Unicode! Dual-licensed under MIT or the UNLICENSE. Documentation https://docs.rs/suff

Andrew Gallant 207 Dec 26, 2022
Elastic tabstops for Rust.

tabwriter is a crate that implements elastic tabstops. It provides both a library for wrapping Rust Writers and a small program that exposes the same

Andrew Gallant 212 Dec 16, 2022
An efficient and powerful Rust library for word wrapping text.

Textwrap Textwrap is a library for wrapping and indenting text. It is most often used by command-line programs to format dynamic output nicely so it l

Martin Geisler 322 Dec 26, 2022
⏮ ⏯ ⏭ A Rust library to easily read forwards, backwards or randomly through the lines of huge files.

EasyReader The main goal of this library is to allow long navigations through the lines of large files, freely moving forwards and backwards or gettin

Michele Federici 81 Dec 6, 2022
An implementation of regular expressions for Rust. This implementation uses finite automata and guarantees linear time matching on all inputs.

regex A Rust library for parsing, compiling, and executing regular expressions. Its syntax is similar to Perl-style regular expressions, but lacks a f

The Rust Programming Language 2.6k Jan 8, 2023
Natural language detection library for Rust. Try demo online: https://www.greyblake.com/whatlang/

Whatlang Natural language detection for Rust with focus on simplicity and performance. Content Features Get started Documentation Supported languages

Sergey Potapov 805 Dec 28, 2022
Multilingual implementation of RAKE algorithm for Rust

RAKE.rs The library provides a multilingual implementation of Rapid Automatic Keyword Extraction (RAKE) algorithm for Rust. How to Use Append rake to

Navid 26 Dec 16, 2022
A Rust library for generically joining iterables with a separator

joinery A Rust library for generically joining iterables with a separator. Provides the tragically missing string join functionality to rust. extern c

Nathan West 72 Dec 16, 2022
Rust edit distance routines accelerated using SIMD. Supports fast Hamming, Levenshtein, restricted Damerau-Levenshtein, etc. distance calculations and string search.

triple_accel Rust edit distance routines accelerated using SIMD. Supports fast Hamming, Levenshtein, restricted Damerau-Levenshtein, etc. distance cal

Daniel Liu 75 Jan 8, 2023
Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)

rust-bert Rust native Transformer-based models implementation. Port of Hugging Face's Transformers library, using the tch-rs crate and pre-processing

null 1.3k Jan 8, 2023
👄 The most accurate natural language detection library in the Rust ecosystem, suitable for long and short text alike

Table of Contents What does this library do? Why does this library exist? Which languages are supported? How good is it? Why is it better than other l

Peter M. Stahl 569 Jan 3, 2023
Snips NLU rust implementation

Snips NLU Rust Installation Add it to your Cargo.toml: [dependencies] snips-nlu-lib = { git = "https://github.com/snipsco/snips-nlu-rs", branch = "mas

Snips 327 Dec 26, 2022
A fast, low-resource Natural Language Processing and Text Correction library written in Rust.

nlprule A fast, low-resource Natural Language Processing and Error Correction library written in Rust. nlprule implements a rule- and lookup-based app

Benjamin Minixhofer 496 Jan 8, 2023
A fast implementation of Aho-Corasick in Rust.

aho-corasick A library for finding occurrences of many patterns at once with SIMD acceleration in some cases. This library provides multiple pattern s

Andrew Gallant 662 Dec 31, 2022
Natural Language Processing for Rust

rs-natural Natural language processing library written in Rust. Still very much a work in progress. Basically an experiment, but hey maybe something c

Chris Tramel 211 Dec 28, 2022
finalfusion embeddings in Rust

Introduction finalfusion is a crate for reading, writing, and using embeddings in Rust. finalfusion primarily works with its own format which supports

finalfusion 55 Jan 2, 2023
Rust-tokenizer offers high-performance tokenizers for modern language models, including WordPiece, Byte-Pair Encoding (BPE) and Unigram (SentencePiece) models

rust-tokenizers Rust-tokenizer offers high-performance tokenizers for modern language models, including WordPiece, Byte-Pair Encoding (BPE) and Unigra

null 165 Jan 1, 2023
Context-sensitive word embeddings with subwords. In Rust.

finalfrontier Introduction finalfrontier is a Rust program for training word embeddings. finalfrontier currently has the following features: Models: s

finalfusion 74 Dec 29, 2022