Fast state-of-the-art tokenizers for Ruby

Overview

Tokenizers Ruby

🙂 Fast state-of-the-art tokenizers for Ruby

Build Status

Installation

Add this line to your application’s Gemfile:

gem "tokenizers"

Note: Rust is currently required for installation.

Getting Started

Load a pretrained tokenizer

tokenizer = Tokenizers.from_pretrained("bert-base-cased")

Encode

encoded = tokenizer.encode("I can feel the magic, can you?")
encoded.ids
encoded.tokens

Decode

tokenizer.decode(ids)

Load a tokenizer from files

tokenizer = Tokenizers::CharBPETokenizer.new("vocab.json", "merges.txt")

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/tokenizers-ruby.git
cd tokenizers-ruby
bundle install
bundle exec ruby ext/tokenizers/extconf.rb && make
bundle exec rake download:files
bundle exec rake test
You might also like...
swc is a super-fast compiler written in rust; producing widely-supported javascript from modern standards and typescript.
swc is a super-fast compiler written in rust; producing widely-supported javascript from modern standards and typescript.

Make the web (development) faster. SWC (stands for Speedy Web Compiler) is a super-fast TypeScript / JavaScript compiler written in Rust. It's a libra

Robust and Fast tokenizations alignment library for Rust and Python
Robust and Fast tokenizations alignment library for Rust and Python

Robust and Fast tokenizations alignment library for Rust and Python Demo: demo Rust document: docs.rs Blog post: How to calculate the alignment betwee

🚀 A fast, modern & efficient interpreted language.

Lace is an efficient, modern and predictable procedural programming language written in rust. Easy to write: Lace's syntax is easy to learn and write,

Ethereal - a general-purpose programming language that is designed to be fast and simple
Ethereal - a general-purpose programming language that is designed to be fast and simple

Ethereal is a general-purpose programming language that is designed to be fast and simple. Heavly inspired by Monkey and written in Rust

Lisp interpreter that might be fast someday maybe?

ehlisp Pronunciation I'm not really sure. Maybe like an incorrect pronunciation of "ellipse", like "ellisp"? Also maybe like "a lisp". I named it this

 ⚡ Fast Web Security Scanner written in Rust based on Lua Scripts 🌖 🦀
⚡ Fast Web Security Scanner written in Rust based on Lua Scripts 🌖 🦀

⚡ Fast Web Security Scanner written in Rust based on Lua Scripts 🌖 🦀

Fast regex in Rust for Apache Arrow, compiled to WASM

Rust regex in wasm I have been looking for a fast regular expression library in Javascript that runs on Apache Arrow for a few years. Arrow uses UTF-8

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tok

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tok

Easy c̵̰͠r̵̛̠ö̴̪s̶̩̒s̵̭̀-t̶̲͝h̶̯̚r̵̺͐e̷̖̽ḁ̴̍d̶̖̔ ȓ̵͙ė̶͎ḟ̴͙e̸̖͛r̶̖͗ë̶̱́ṉ̵̒ĉ̷̥e̷͚̍ s̷̹͌h̷̲̉a̵̭͋r̷̫̊ḭ̵̊n̷̬͂g̵̦̃ f̶̻̊ơ̵̜ṟ̸̈́ R̵̞̋ù̵̺s̷̖̅ţ̸͗!̸̼͋

Rust S̵̓i̸̓n̵̉ I̴n̴f̶e̸r̵n̷a̴l mutability! Howdy, friendly Rust developer! Ever had a value get m̵̯̅ð̶͊v̴̮̾ê̴̼͘d away right under your nose just when

ruby-build is a command-line utility that makes it easy to install virtually any version of Ruby, from source.

ruby-build ruby-build is a command-line utility that makes it easy to install virtually any version of Ruby, from source. It is available as a plugin

State of the art
State of the art "build your own engine" kit powered by gfx-hal

A rendering engine based on gfx-hal, which mimics the Vulkan API. Building This library requires standard build tools for the target platforms, except

An attempt at implementing a state-of-the-art Voxel DAG in Rust

VDAG Introduction This is an attempt at implementing a state-of-the-art compressed voxel data structure, as described in a number of papers ([PDFs] Ka

Alexander Mongus is a state-of-the-art filter to sneak amogus characters in pictures
Alexander Mongus is a state-of-the-art filter to sneak amogus characters in pictures

A. Mongus Go to: http://www.lortex.org/amogu/ ??? This is a client-side, Webassembly-based filter to hide amongus characters in your images. Example:

Rust-tokenizer offers high-performance tokenizers for modern language models, including WordPiece, Byte-Pair Encoding (BPE) and Unigram (SentencePiece) models

rust-tokenizers Rust-tokenizer offers high-performance tokenizers for modern language models, including WordPiece, Byte-Pair Encoding (BPE) and Unigra

Rustato: A powerful, thread-safe global state management library for Rust applications, offering type-safe, reactive state handling with an easy-to-use macro-based API.
Rustato: A powerful, thread-safe global state management library for Rust applications, offering type-safe, reactive state handling with an easy-to-use macro-based API.

Rustato State Manager A generical thread-safe global state manager for Rust Introduction • Features • Installation • Usage • Advanced Usage • Api Refe

A little bit fast and modern Ruby version manager written in Rust
A little bit fast and modern Ruby version manager written in Rust

A little bit fast and modern Ruby version manager written in Rust Features Pure Rust implementation not using ruby-build Cross-platform support (macOS

Native Ruby extensions written in Rust

Ruru (Rust + Ruby) Native Ruby extensions in Rust Documentation Website Have you ever considered rewriting some parts of your slow Ruby application? J

“The Tie Between Ruby and Rust.”

Rutie Rutie — /ro͞oˈˌtī/rOOˈˌtI/rüˈˌtaI/ Integrate Ruby with your Rust application. Or integrate Rust with your Ruby application. This project allows

Comments
  • Use `rake-compiler` and `rb-sys` for build

    Use `rake-compiler` and `rb-sys` for build

    This PR changes the build system to use rb-sys and rake-compiler to build the gem. With this change, it will be possible to publish native binaries of the gem so that consumers will not have to maintain a rust toolchain for deployment.

    The actual publishing of the gem is not handled by this PR, but the cross-build Github action will build a bunch of *.gem files which can pushed to Rubygems.

    ❤️

    opened by ianks 8
  • I get an error when installing

    I get an error when installing

    Hello! Thanks for making this gem.

    But it seems to fail to install in my environment.

    gem install tokenizers
    

    I get the following error message

    Building native extensions. This could take a while...
    ERROR:  Error installing tokenizers:
    	ERROR: Failed to build gem native extension.
    
        current directory: /home/kojix2/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/tokenizers-0.1.1/ext/tokenizers
    /home/kojix2/.rbenv/versions/3.1.2/bin/ruby -I /home/kojix2/.rbenv/versions/3.1.2/lib/ruby/3.1.0 -r ./siteconf20220909-19701-2a0rv0.rb extconf.rb
    
    current directory: /home/kojix2/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/tokenizers-0.1.1/ext/tokenizers
    make DESTDIR\= clean
    make: 'clean' に対して行うべき事はありません. # There is nothing to do for 'clean'. (@kojix2)
    
    current directory: /home/kojix2/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/tokenizers-0.1.1/ext/tokenizers
    make DESTDIR\=
    cargo build --release --target-dir target
       Compiling libc v0.2.121
       Compiling cfg-if v1.0.0
       Compiling autocfg v1.1.0
       Compiling cc v1.0.73
       Compiling pkg-config v0.3.24
       Compiling proc-macro2 v1.0.36
       Compiling unicode-xid v0.2.2
       Compiling syn v1.0.89
       Compiling memchr v2.3.4
       Compiling lazy_static v1.4.0
       Compiling log v0.4.14
       Compiling version_check v0.9.4
       Compiling pin-project-lite v0.2.8
       Compiling bitflags v1.3.2
       Compiling bytes v1.1.0
       Compiling futures-core v0.3.21
       Compiling once_cell v1.10.0
       Compiling itoa v1.0.1
       Compiling futures-task v0.3.21
       Compiling typenum v1.15.0
       Compiling crossbeam-utils v0.8.8
       Compiling serde_derive v1.0.136
       Compiling serde v1.0.136
       Compiling foreign-types-shared v0.1.1
       Compiling fnv v1.0.7
       Compiling futures-util v0.3.21
       Compiling openssl v0.10.38
       Compiling ryu v1.0.9
       Compiling pin-utils v0.1.0
       Compiling unicode-width v0.1.9
       Compiling hashbrown v0.11.2
       Compiling native-tls v0.2.8
       Compiling futures-io v0.3.21
       Compiling slab v0.4.5
       Compiling futures-channel v0.3.21
       Compiling futures-sink v0.3.21
       Compiling tinyvec_macros v0.1.0
       Compiling matches v0.1.9
       Compiling httparse v1.6.0
       Compiling crc32fast v1.3.2
       Compiling radium v0.5.3
       Compiling percent-encoding v2.1.0
       Compiling adler v1.0.2
       Compiling strsim v0.9.3
       Compiling getrandom v0.1.16
       Compiling try-lock v0.2.3
       Compiling ident_case v1.0.1
       Compiling scopeguard v1.1.0
       Compiling openssl-probe v0.1.5
       Compiling ppv-lite86 v0.2.16
       Compiling regex-syntax v0.6.25
       Compiling rayon-core v1.9.1
       Compiling either v1.6.1
       Compiling lexical-core v0.7.6
       Compiling httpdate v1.0.2
       Compiling encoding_rs v0.8.30
       Compiling tower-service v0.3.1
       Compiling unicode-bidi v0.3.7
       Compiling static_assertions v1.1.0
       Compiling wyz v0.2.0
       Compiling tap v1.0.1
       Compiling serde_json v1.0.79
       Compiling funty v1.1.0
       Compiling byteorder v1.4.3
       Compiling arrayvec v0.5.2
       Compiling cpufeatures v0.2.2
       Compiling derive_builder v0.9.0
       Compiling ipnet v2.4.0
       Compiling fastrand v1.7.0
       Compiling remove_dir_all v0.5.3
       Compiling mime v0.3.16
       Compiling number_prefix v0.4.0
       Compiling base64 v0.13.0
       Compiling unicode-segmentation v1.9.0
       Compiling glob v0.3.0
       Compiling base64 v0.12.3
       Compiling number_prefix v0.3.0
       Compiling macro_rules_attribute-proc_macro v0.0.2
       Compiling vec_map v0.8.2
       Compiling strsim v0.8.0
       Compiling rutie v0.8.4
       Compiling ansi_term v0.12.1
       Compiling smallvec v1.8.0
       Compiling unicode_categories v0.1.1
       Compiling paste v1.0.6
       Compiling tracing-core v0.1.23
       Compiling memoffset v0.6.5
       Compiling indexmap v1.8.0
       Compiling miniz_oxide v0.4.4
       Compiling crossbeam-epoch v0.9.8
       Compiling rayon v1.5.1
       Compiling generic-array v0.14.5
       Compiling nom v6.2.1
       Compiling foreign-types v0.3.2
       Compiling http v0.2.6
       Compiling textwrap v0.11.0
       Compiling tinyvec v1.5.1
       Compiling openssl-sys v0.9.72
       Compiling bzip2-sys v0.1.11+1.0.8
       Compiling onig_sys v69.7.1
       Compiling esaxx-rs v0.1.7
       Compiling form_urlencoded v1.0.1
       Compiling itertools v0.8.2
       Compiling itertools v0.9.0
       Compiling macro_rules_attribute v0.0.2
       Compiling unicode-normalization-alignments v0.1.12
       Compiling tracing v0.1.32
       Compiling unicode-normalization v0.1.19
       Compiling aho-corasick v0.7.15
       Compiling num_cpus v1.13.1
       Compiling socket2 v0.4.4
       Compiling getrandom v0.2.5
       Compiling terminal_size v0.1.17
       Compiling time v0.1.43
       Compiling filetime v0.2.15
       Compiling xattr v0.2.2
       Compiling fs2 v0.4.3
       Compiling atty v0.2.14
       Compiling tempfile v3.3.0
       Compiling dirs-sys v0.3.7
       Compiling http-body v0.4.4
       Compiling mio v0.8.2
       Compiling want v0.3.0
       Compiling quote v1.0.16
       Compiling crossbeam-channel v0.5.4
       Compiling bitvec v0.19.6
       Compiling regex v1.4.6
       Compiling idna v0.2.3
       Compiling rand_core v0.6.3
       Compiling rand_core v0.5.1
       Compiling tar v0.4.38
       Compiling clap v2.34.0
       Compiling dirs v3.0.2
       Compiling tokio v1.17.0
       Compiling flate2 v1.0.22
       Compiling block-buffer v0.10.2
       Compiling crypto-common v0.1.3
       Compiling url v2.2.2
       Compiling rand_chacha v0.3.1
       Compiling rand_chacha v0.2.2
       Compiling console v0.15.0
       Compiling bzip2 v0.4.3
       Compiling crossbeam-deque v0.8.1
       Compiling digest v0.10.3
       Compiling rand v0.8.5
       Compiling rand v0.7.3
       Compiling tokio-util v0.6.9
       Compiling indicatif v0.16.2
       Compiling indicatif v0.15.0
       Compiling darling_core v0.10.2
       Compiling onig v6.3.1
       Compiling sha2 v0.10.2
       Compiling tokio-native-tls v0.3.0
       Compiling h2 v0.3.12
       Compiling thiserror-impl v1.0.30
       Compiling darling_macro v0.10.2
       Compiling darling v0.10.2
       Compiling derive_builder_core v0.9.0
       Compiling thiserror v1.0.30
       Compiling zip v0.5.13
       Compiling zip-extensions v0.6.1
       Compiling rayon-cond v0.1.0
       Compiling hyper v0.14.17
       Compiling serde_urlencoded v0.7.1
       Compiling spm_precompiled v0.1.3
       Compiling hyper-tls v0.5.0
       Compiling reqwest v0.11.10
       Compiling cached-path v0.5.3
       Compiling tokenizers v0.11.3
       Compiling tokenizers-ruby v0.1.0 (/home/kojix2/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/tokenizers-0.1.1)
        Finished release [optimized] target(s) in 1m 22s
    mv target/release/libtokenizers.so ../../lib/tokenizers/ext.so
    
    current directory: /home/kojix2/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/tokenizers-0.1.1/ext/tokenizers
    make DESTDIR\= install
    cargo build --release --target-dir target
        Finished release [optimized] target(s) in 0.09s
    mv target/release/libtokenizers.so ../../lib/tokenizers/ext.so
    mv: 'target/release/libtokenizers.so' と '../../lib/tokenizers/ext.so' は同じファイルです # is the same file (@kojix2)
    make: *** [Makefile:3: install] エラー 1 # Error1 (@kojix2)
    
    make install failed, exit code 2
    
    Gem files will remain installed in /home/kojix2/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/tokenizers-0.1.1 for inspection.
    Results logged to /home/kojix2/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/extensions/x86_64-linux/3.1.0/tokenizers-0.1.1/gem_make.out
    

    But I was able to try it using the developer's method.

    git clone https://github.com/ankane/tokenizers-ruby.git
    cd tokenizers-ruby
    bundle install
    bundle exec ruby ext/tokenizers/extconf.rb && make
    bundle exec rake download:files
    bundle exec rake test
    

    Tried GPT-2 with onnxruntime! It's working just fine!

    require "tokenizers"
    require "onnxruntime"
    require "numo/narray"
    
    tokenizer = Tokenizers.from_pretrained("gpt2")
    model = OnnxRuntime::Model.new("gpt2-lm-head-10.onnx")
    
    s = "Why do cats want to ride on the keyboard?"
    
    ids = tokenizer.encode(s).ids
    
    10.times do
      o = model.predict({ input1: [[ids]] })
      o = Numo::DFloat.cast(o["output1"][0])
      ids << o[true, -1, true].argmax
    end
    
    puts tokenizer.decode(ids)
    

    :cat2: :keyboard: :question:

    Why do cats want to ride on the keyboard?
    
    The answer is that they do.
    
    opened by kojix2 5
  • Error compiling

    Error compiling

    Getting the following compiler output:

    rustc --version
    rustc 1.61.0 (fe5b13d68 2022-05-18)
    
    gem install tokenizers
    Building native extensions. This could take a while...
    ERROR:  Error installing tokenizers:
    	ERROR: Failed to build gem native extension.
    
        current directory: /home/ur5us/.rvm/gems/ruby-3.0.4/gems/tokenizers-0.1.0/ext/tokenizers
    /home/ur5us/.rvm/rubies/ruby-3.0.4/bin/ruby -I /home/ur5us/.rvm/rubies/ruby-3.0.4/lib/ruby/site_ruby/3.0.0 -r ./siteconf20220623-143041-cdpust.rb extconf.rb
    
    current directory: /home/ur5us/.rvm/gems/ruby-3.0.4/gems/tokenizers-0.1.0/ext/tokenizers
    make DESTDIR\= clean
    cargo clean
    
    current directory: /home/ur5us/.rvm/gems/ruby-3.0.4/gems/tokenizers-0.1.0/ext/tokenizers
    make DESTDIR\=
    cargo build --release
       Compiling libc v0.2.121
       Compiling cfg-if v1.0.0
       Compiling autocfg v1.1.0
       Compiling cc v1.0.73
       Compiling pkg-config v0.3.24
       Compiling proc-macro2 v1.0.36
       Compiling unicode-xid v0.2.2
       Compiling syn v1.0.89
       Compiling memchr v2.3.4
       Compiling lazy_static v1.4.0
       Compiling log v0.4.14
       Compiling version_check v0.9.4
       Compiling pin-project-lite v0.2.8
       Compiling bitflags v1.3.2
       Compiling futures-core v0.3.21
       Compiling bytes v1.1.0
       Compiling once_cell v1.10.0
       Compiling itoa v1.0.1
       Compiling serde_derive v1.0.136
       Compiling futures-task v0.3.21
       Compiling crossbeam-utils v0.8.8
       Compiling typenum v1.15.0
       Compiling futures-util v0.3.21
       Compiling foreign-types-shared v0.1.1
       Compiling serde v1.0.136
       Compiling fnv v1.0.7
       Compiling openssl v0.10.38
       Compiling ryu v1.0.9
       Compiling pin-utils v0.1.0
       Compiling hashbrown v0.11.2
       Compiling slab v0.4.5
       Compiling crc32fast v1.3.2
       Compiling unicode-width v0.1.9
       Compiling futures-channel v0.3.21
       Compiling native-tls v0.2.8
       Compiling futures-io v0.3.21
       Compiling tinyvec_macros v0.1.0
       Compiling matches v0.1.9
       Compiling httparse v1.6.0
       Compiling futures-sink v0.3.21
       Compiling scopeguard v1.1.0
       Compiling radium v0.5.3
       Compiling percent-encoding v2.1.0
       Compiling strsim v0.9.3
       Compiling ident_case v1.0.1
       Compiling adler v1.0.2
       Compiling rayon-core v1.9.1
       Compiling getrandom v0.1.16
       Compiling ppv-lite86 v0.2.16
       Compiling try-lock v0.2.3
       Compiling regex-syntax v0.6.25
       Compiling openssl-probe v0.1.5
       Compiling httpdate v1.0.2
       Compiling lexical-core v0.7.6
       Compiling unicode-bidi v0.3.7
       Compiling encoding_rs v0.8.30
       Compiling tower-service v0.3.1
       Compiling either v1.6.1
       Compiling byteorder v1.4.3
       Compiling arrayvec v0.5.2
       Compiling tap v1.0.1
       Compiling wyz v0.2.0
       Compiling static_assertions v1.1.0
       Compiling serde_json v1.0.79
       Compiling funty v1.1.0
       Compiling fastrand v1.7.0
       Compiling cpufeatures v0.2.2
       Compiling mime v0.3.16
       Compiling base64 v0.13.0
       Compiling remove_dir_all v0.5.3
       Compiling derive_builder v0.9.0
       Compiling ipnet v2.4.0
       Compiling number_prefix v0.4.0
       Compiling ansi_term v0.12.1
       Compiling glob v0.3.0
       Compiling unicode-segmentation v1.9.0
       Compiling strsim v0.8.0
       Compiling macro_rules_attribute-proc_macro v0.0.2
       Compiling smallvec v1.8.0
       Compiling number_prefix v0.3.0
       Compiling rutie v0.8.3
       Compiling vec_map v0.8.2
       Compiling base64 v0.12.3
       Compiling unicode_categories v0.1.1
       Compiling paste v1.0.6
       Compiling tracing-core v0.1.23
       Compiling indexmap v1.8.0
       Compiling memoffset v0.6.5
       Compiling crossbeam-epoch v0.9.8
       Compiling miniz_oxide v0.4.4
       Compiling rayon v1.5.1
       Compiling generic-array v0.14.5
       Compiling nom v6.2.1
       Compiling foreign-types v0.3.2
       Compiling http v0.2.6
       Compiling textwrap v0.11.0
       Compiling openssl-sys v0.9.72
       Compiling bzip2-sys v0.1.11+1.0.8
       Compiling onig_sys v69.7.1
       Compiling esaxx-rs v0.1.7
       Compiling tinyvec v1.5.1
       Compiling form_urlencoded v1.0.1
       Compiling itertools v0.8.2
       Compiling itertools v0.9.0
       Compiling unicode-normalization-alignments v0.1.12
       Compiling macro_rules_attribute v0.0.2
       Compiling tracing v0.1.32
       Compiling aho-corasick v0.7.15
       Compiling unicode-normalization v0.1.19
       Compiling http-body v0.4.4
       Compiling num_cpus v1.13.1
       Compiling socket2 v0.4.4
       Compiling terminal_size v0.1.17
       Compiling getrandom v0.2.5
       Compiling time v0.1.43
       Compiling filetime v0.2.15
       Compiling xattr v0.2.2
       Compiling atty v0.2.14
       Compiling tempfile v3.3.0
       Compiling dirs-sys v0.3.7
       Compiling fs2 v0.4.3
       Compiling quote v1.0.16
       Compiling mio v0.8.2
       Compiling want v0.3.0
       Compiling crossbeam-channel v0.5.4
       Compiling bitvec v0.19.6
       Compiling regex v1.4.6
       Compiling idna v0.2.3
       Compiling rand_core v0.6.3
       Compiling rand_core v0.5.1
       Compiling tar v0.4.38
       Compiling clap v2.34.0
       Compiling dirs v3.0.2
       Compiling tokio v1.17.0
       Compiling flate2 v1.0.22
       Compiling block-buffer v0.10.2
       Compiling crypto-common v0.1.3
       Compiling rand_chacha v0.3.1
       Compiling rand_chacha v0.2.2
       Compiling url v2.2.2
       Compiling bzip2 v0.4.3
       Compiling console v0.15.0
       Compiling crossbeam-deque v0.8.1
       Compiling digest v0.10.3
       Compiling rand v0.8.5
       Compiling rand v0.7.3
       Compiling tokio-util v0.6.9
       Compiling indicatif v0.16.2
       Compiling indicatif v0.15.0
       Compiling darling_core v0.10.2
       Compiling onig v6.3.1
       Compiling sha2 v0.10.2
       Compiling h2 v0.3.12
       Compiling tokio-native-tls v0.3.0
       Compiling thiserror-impl v1.0.30
       Compiling darling_macro v0.10.2
       Compiling thiserror v1.0.30
       Compiling zip v0.5.13
       Compiling darling v0.10.2
       Compiling derive_builder_core v0.9.0
       Compiling zip-extensions v0.6.1
       Compiling rayon-cond v0.1.0
       Compiling hyper v0.14.17
       Compiling serde_urlencoded v0.7.1
       Compiling spm_precompiled v0.1.3
       Compiling hyper-tls v0.5.0
       Compiling reqwest v0.11.10
       Compiling cached-path v0.5.3
       Compiling tokenizers v0.11.3
       Compiling tokenizers-ruby v0.1.0 (/home/ur5us/.rvm/gems/ruby-3.0.4/gems/tokenizers-0.1.0)
    warning: `extern` fn uses type `AnyObject`, which is not FFI-safe
      --> src/lib.rs:63:101
       |
    63 |     fn tokenizers_from_pretrained(identifier: RString, revision: RString, auth_token: AnyObject) -> AnyObject {
       |                                                                                                     ^^^^^^^^^ not FFI-safe
       |
       = note: `#[warn(improper_ctypes_definitions)]` on by default
       = help: consider adding a `#[repr(C)]` or `#[repr(transparent)]` attribute to this struct
       = note: this struct has unspecified layout
    
    warning: `extern` fn uses type `AnyObject`, which is not FFI-safe
      --> src/lib.rs:88:52
       |
    88 |     fn bpe_new(vocab: RString, merges: RString) -> AnyObject {
       |                                                    ^^^^^^^^^ not FFI-safe
       |
       = help: consider adding a `#[repr(C)]` or `#[repr(transparent)]` attribute to this struct
       = note: this struct has unspecified layout
    
    warning: `extern` fn uses type `AnyObject`, which is not FFI-safe
       --> src/lib.rs:107:43
        |
    107 |     fn tokenizer_new(model: AnyObject) -> AnyObject {
        |                                           ^^^^^^^^^ not FFI-safe
        |
        = help: consider adding a `#[repr(C)]` or `#[repr(transparent)]` attribute to this struct
        = note: this struct has unspecified layout
    
    warning: `extern` fn uses type `AnyObject`, which is not FFI-safe
       --> src/lib.rs:137:43
        |
    137 |     fn tokenizer_encode(text: RString) -> AnyObject {
        |                                           ^^^^^^^^^ not FFI-safe
        |
        = help: consider adding a `#[repr(C)]` or `#[repr(transparent)]` attribute to this struct
        = note: this struct has unspecified layout
    
    warning: `extern` fn uses type `rutie::RString`, which is not FFI-safe
       --> src/lib.rs:147:40
        |
    147 |     fn tokenizer_decode(ids: Array) -> RString {
        |                                        ^^^^^^^ not FFI-safe
        |
        = help: consider adding a `#[repr(C)]` or `#[repr(transparent)]` attribute to this struct
        = note: this struct has unspecified layout
    
    warning: `extern` fn uses type `AnyObject`, which is not FFI-safe
       --> src/lib.rs:159:53
        |
    159 |     fn tokenizer_decoder_set(decoder: AnyObject) -> AnyObject {
        |                                                     ^^^^^^^^^ not FFI-safe
        |
        = help: consider adding a `#[repr(C)]` or `#[repr(transparent)]` attribute to this struct
        = note: this struct has unspecified layout
    
    warning: `extern` fn uses type `AnyObject`, which is not FFI-safe
       --> src/lib.rs:167:65
        |
    167 |     fn tokenizer_pre_tokenizer_set(pre_tokenizer: AnyObject) -> AnyObject {
        |                                                                 ^^^^^^^^^ not FFI-safe
        |
        = help: consider adding a `#[repr(C)]` or `#[repr(transparent)]` attribute to this struct
        = note: this struct has unspecified layout
    
    warning: `extern` fn uses type `AnyObject`, which is not FFI-safe
       --> src/lib.rs:175:59
        |
    175 |     fn tokenizer_normalizer_set(normalizer: AnyObject) -> AnyObject {
        |                                                           ^^^^^^^^^ not FFI-safe
        |
        = help: consider adding a `#[repr(C)]` or `#[repr(transparent)]` attribute to this struct
        = note: this struct has unspecified layout
    
    warning: `extern` fn uses type `rutie::Array`, which is not FFI-safe
       --> src/lib.rs:188:26
        |
    188 |     fn encoding_ids() -> Array {
        |                          ^^^^^ not FFI-safe
        |
        = help: consider adding a `#[repr(C)]` or `#[repr(transparent)]` attribute to this struct
        = note: this struct has unspecified layout
    
    warning: `extern` fn uses type `rutie::Array`, which is not FFI-safe
       --> src/lib.rs:198:29
        |
    198 |     fn encoding_tokens() -> Array {
        |                             ^^^^^ not FFI-safe
        |
        = help: consider adding a `#[repr(C)]` or `#[repr(transparent)]` attribute to this struct
        = note: this struct has unspecified layout
    
    warning: `extern` fn uses type `AnyObject`, which is not FFI-safe
       --> src/lib.rs:213:29
        |
    213 |     fn bpe_decoder_new() -> AnyObject {
        |                             ^^^^^^^^^ not FFI-safe
        |
        = help: consider adding a `#[repr(C)]` or `#[repr(transparent)]` attribute to this struct
        = note: this struct has unspecified layout
    
    warning: `extern` fn uses type `AnyObject`, which is not FFI-safe
       --> src/lib.rs:225:36
        |
    225 |     fn bert_pre_tokenizer_new() -> AnyObject {
        |                                    ^^^^^^^^^ not FFI-safe
        |
        = help: consider adding a `#[repr(C)]` or `#[repr(transparent)]` attribute to this struct
        = note: this struct has unspecified layout
    
    warning: `extern` fn uses type `AnyObject`, which is not FFI-safe
       --> src/lib.rs:237:33
        |
    237 |     fn bert_normalizer_new() -> AnyObject {
        |                                 ^^^^^^^^^ not FFI-safe
        |
        = help: consider adding a `#[repr(C)]` or `#[repr(transparent)]` attribute to this struct
        = note: this struct has unspecified layout
    
    warning: `tokenizers-ruby` (lib) generated 13 warnings
        Finished release [optimized] target(s) in 1m 51s
    mv target/release/libtokenizers.so lib/tokenizers/ext.so
    mv: cannot stat 'target/release/libtokenizers.so': No such file or directory
    make: *** [Makefile:3: install] Error 1
    
    make failed, exit code 2
    
    Gem files will remain installed in /home/ur5us/.rvm/gems/ruby-3.0.4/gems/tokenizers-0.1.0 for inspection.
    Results logged to /home/ur5us/.rvm/gems/ruby-3.0.4/extensions/x86_64-linux/3.0.0/tokenizers-0.1.0/gem_make.out
    
    opened by ur5us 1
  • Support to Ruby 3.2.0 (release 0.2.1)

    Support to Ruby 3.2.0 (release 0.2.1)

    Hi! First of all, thanks for making and maintaining this gem! 🙌

    I was wondering when you expect to release the last version of the gem with support to Ruby 3.2.0. We're using it in a project and it seems to be the only missing piece for us to upgrade it.

    Thanks in advance!

    PS: Asking this because I noticed the commit with the changes was already merged https://github.com/ankane/tokenizers-ruby/commit/1f89484e7552b28b3e4dd9e5fccaf57d26d169f1

    opened by vickymadrid03 0
Owner
Andrew Kane
Andrew Kane
Native Ruby extensions written in Rust

Ruru (Rust + Ruby) Native Ruby extensions in Rust Documentation Website Have you ever considered rewriting some parts of your slow Ruby application? J

Dmitry Gritsay 812 Dec 26, 2022
“The Tie Between Ruby and Rust.”

Rutie Rutie — /ro͞oˈˌtī/rOOˈˌtI/rüˈˌtaI/ Integrate Ruby with your Rust application. Or integrate Rust with your Ruby application. This project allows

Daniel P. Clark 726 Jan 2, 2023
Native Ruby extensions without fear

Helix ⚠️ Deprecated ⚠️ Sadly, we have made the decision to deprecate this project. While we had hoped to bootstrap the project to a point where it cou

Tilde 2k Jan 1, 2023
🐱‍👤 Cross-language static library for accessing the Lua state in Garry's Mod server plugins

gmserverplugin This is a utility library for making Server Plugins that access the Lua state in Garry's Mod. Currently, accessing the Lua state from a

William 5 Feb 7, 2022
The uncomplicated Yew State management library

Bounce The uncomplicated state management library for Yew. Bounce is inspired by Redux and Recoil. Rationale Yew state management solutions that are c

Kaede Hoshikawa 5 Dec 1, 2022
Rust bindings for writing safe and fast native Node.js modules.

Rust bindings for writing safe and fast native Node.js modules. Getting started Once you have the platform dependencies installed, getting started is

The Neon Project 7k Jan 4, 2023
ABI encoding, fast

fast-abi Encodes and decodes abi data, fast. Usage const RUST_ENCODER = new FastABI(ABI as MethodAbi[]); const callData = RUST_ENCODER.encodeInput('sa

0x 24 Dec 17, 2022
Node.js bindings to the ripgrep library, for fast file searching in JavaScript without child processes!

ripgrepjs ripgrepjs: Node.js bindings to the ripgrep library, for direct integration with JS programs without spawning an extra subprocess! This proje

Annika 1 May 10, 2022
🚀 Fast and 100% API compatible postcss replacer, built in Rust

postcss-rs ?? Fast and 100% API compatible postcss replacer, built in Rust ⚠️ DO NOT USE. STILL WORK IN PROGRESS. Performance Improvement Tokenize boo

null 472 Dec 28, 2022
🚀 Fast and simple Node.js version manager, built in Rust

Fast Node Manager (fnm) ?? Fast and simple Node.js version manager, built in Rust Features ?? Cross-platform support (macOS, Windows, Linux) ✨ Single

Gal Schlezinger 9.8k Jan 2, 2023