Hello!
Thanks for making this gem.
But it seems to fail to install in my environment.
gem install tokenizers
I get the following error message
Building native extensions. This could take a while...
ERROR: Error installing tokenizers:
ERROR: Failed to build gem native extension.
current directory: /home/kojix2/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/tokenizers-0.1.1/ext/tokenizers
/home/kojix2/.rbenv/versions/3.1.2/bin/ruby -I /home/kojix2/.rbenv/versions/3.1.2/lib/ruby/3.1.0 -r ./siteconf20220909-19701-2a0rv0.rb extconf.rb
current directory: /home/kojix2/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/tokenizers-0.1.1/ext/tokenizers
make DESTDIR\= clean
make: 'clean' に対して行うべき事はありません. # There is nothing to do for 'clean'. (@kojix2)
current directory: /home/kojix2/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/tokenizers-0.1.1/ext/tokenizers
make DESTDIR\=
cargo build --release --target-dir target
Compiling libc v0.2.121
Compiling cfg-if v1.0.0
Compiling autocfg v1.1.0
Compiling cc v1.0.73
Compiling pkg-config v0.3.24
Compiling proc-macro2 v1.0.36
Compiling unicode-xid v0.2.2
Compiling syn v1.0.89
Compiling memchr v2.3.4
Compiling lazy_static v1.4.0
Compiling log v0.4.14
Compiling version_check v0.9.4
Compiling pin-project-lite v0.2.8
Compiling bitflags v1.3.2
Compiling bytes v1.1.0
Compiling futures-core v0.3.21
Compiling once_cell v1.10.0
Compiling itoa v1.0.1
Compiling futures-task v0.3.21
Compiling typenum v1.15.0
Compiling crossbeam-utils v0.8.8
Compiling serde_derive v1.0.136
Compiling serde v1.0.136
Compiling foreign-types-shared v0.1.1
Compiling fnv v1.0.7
Compiling futures-util v0.3.21
Compiling openssl v0.10.38
Compiling ryu v1.0.9
Compiling pin-utils v0.1.0
Compiling unicode-width v0.1.9
Compiling hashbrown v0.11.2
Compiling native-tls v0.2.8
Compiling futures-io v0.3.21
Compiling slab v0.4.5
Compiling futures-channel v0.3.21
Compiling futures-sink v0.3.21
Compiling tinyvec_macros v0.1.0
Compiling matches v0.1.9
Compiling httparse v1.6.0
Compiling crc32fast v1.3.2
Compiling radium v0.5.3
Compiling percent-encoding v2.1.0
Compiling adler v1.0.2
Compiling strsim v0.9.3
Compiling getrandom v0.1.16
Compiling try-lock v0.2.3
Compiling ident_case v1.0.1
Compiling scopeguard v1.1.0
Compiling openssl-probe v0.1.5
Compiling ppv-lite86 v0.2.16
Compiling regex-syntax v0.6.25
Compiling rayon-core v1.9.1
Compiling either v1.6.1
Compiling lexical-core v0.7.6
Compiling httpdate v1.0.2
Compiling encoding_rs v0.8.30
Compiling tower-service v0.3.1
Compiling unicode-bidi v0.3.7
Compiling static_assertions v1.1.0
Compiling wyz v0.2.0
Compiling tap v1.0.1
Compiling serde_json v1.0.79
Compiling funty v1.1.0
Compiling byteorder v1.4.3
Compiling arrayvec v0.5.2
Compiling cpufeatures v0.2.2
Compiling derive_builder v0.9.0
Compiling ipnet v2.4.0
Compiling fastrand v1.7.0
Compiling remove_dir_all v0.5.3
Compiling mime v0.3.16
Compiling number_prefix v0.4.0
Compiling base64 v0.13.0
Compiling unicode-segmentation v1.9.0
Compiling glob v0.3.0
Compiling base64 v0.12.3
Compiling number_prefix v0.3.0
Compiling macro_rules_attribute-proc_macro v0.0.2
Compiling vec_map v0.8.2
Compiling strsim v0.8.0
Compiling rutie v0.8.4
Compiling ansi_term v0.12.1
Compiling smallvec v1.8.0
Compiling unicode_categories v0.1.1
Compiling paste v1.0.6
Compiling tracing-core v0.1.23
Compiling memoffset v0.6.5
Compiling indexmap v1.8.0
Compiling miniz_oxide v0.4.4
Compiling crossbeam-epoch v0.9.8
Compiling rayon v1.5.1
Compiling generic-array v0.14.5
Compiling nom v6.2.1
Compiling foreign-types v0.3.2
Compiling http v0.2.6
Compiling textwrap v0.11.0
Compiling tinyvec v1.5.1
Compiling openssl-sys v0.9.72
Compiling bzip2-sys v0.1.11+1.0.8
Compiling onig_sys v69.7.1
Compiling esaxx-rs v0.1.7
Compiling form_urlencoded v1.0.1
Compiling itertools v0.8.2
Compiling itertools v0.9.0
Compiling macro_rules_attribute v0.0.2
Compiling unicode-normalization-alignments v0.1.12
Compiling tracing v0.1.32
Compiling unicode-normalization v0.1.19
Compiling aho-corasick v0.7.15
Compiling num_cpus v1.13.1
Compiling socket2 v0.4.4
Compiling getrandom v0.2.5
Compiling terminal_size v0.1.17
Compiling time v0.1.43
Compiling filetime v0.2.15
Compiling xattr v0.2.2
Compiling fs2 v0.4.3
Compiling atty v0.2.14
Compiling tempfile v3.3.0
Compiling dirs-sys v0.3.7
Compiling http-body v0.4.4
Compiling mio v0.8.2
Compiling want v0.3.0
Compiling quote v1.0.16
Compiling crossbeam-channel v0.5.4
Compiling bitvec v0.19.6
Compiling regex v1.4.6
Compiling idna v0.2.3
Compiling rand_core v0.6.3
Compiling rand_core v0.5.1
Compiling tar v0.4.38
Compiling clap v2.34.0
Compiling dirs v3.0.2
Compiling tokio v1.17.0
Compiling flate2 v1.0.22
Compiling block-buffer v0.10.2
Compiling crypto-common v0.1.3
Compiling url v2.2.2
Compiling rand_chacha v0.3.1
Compiling rand_chacha v0.2.2
Compiling console v0.15.0
Compiling bzip2 v0.4.3
Compiling crossbeam-deque v0.8.1
Compiling digest v0.10.3
Compiling rand v0.8.5
Compiling rand v0.7.3
Compiling tokio-util v0.6.9
Compiling indicatif v0.16.2
Compiling indicatif v0.15.0
Compiling darling_core v0.10.2
Compiling onig v6.3.1
Compiling sha2 v0.10.2
Compiling tokio-native-tls v0.3.0
Compiling h2 v0.3.12
Compiling thiserror-impl v1.0.30
Compiling darling_macro v0.10.2
Compiling darling v0.10.2
Compiling derive_builder_core v0.9.0
Compiling thiserror v1.0.30
Compiling zip v0.5.13
Compiling zip-extensions v0.6.1
Compiling rayon-cond v0.1.0
Compiling hyper v0.14.17
Compiling serde_urlencoded v0.7.1
Compiling spm_precompiled v0.1.3
Compiling hyper-tls v0.5.0
Compiling reqwest v0.11.10
Compiling cached-path v0.5.3
Compiling tokenizers v0.11.3
Compiling tokenizers-ruby v0.1.0 (/home/kojix2/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/tokenizers-0.1.1)
Finished release [optimized] target(s) in 1m 22s
mv target/release/libtokenizers.so ../../lib/tokenizers/ext.so
current directory: /home/kojix2/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/tokenizers-0.1.1/ext/tokenizers
make DESTDIR\= install
cargo build --release --target-dir target
Finished release [optimized] target(s) in 0.09s
mv target/release/libtokenizers.so ../../lib/tokenizers/ext.so
mv: 'target/release/libtokenizers.so' と '../../lib/tokenizers/ext.so' は同じファイルです # is the same file (@kojix2)
make: *** [Makefile:3: install] エラー 1 # Error1 (@kojix2)
make install failed, exit code 2
Gem files will remain installed in /home/kojix2/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/tokenizers-0.1.1 for inspection.
Results logged to /home/kojix2/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/extensions/x86_64-linux/3.1.0/tokenizers-0.1.1/gem_make.out
But I was able to try it using the developer's method.
git clone https://github.com/ankane/tokenizers-ruby.git
cd tokenizers-ruby
bundle install
bundle exec ruby ext/tokenizers/extconf.rb && make
bundle exec rake download:files
bundle exec rake test
Tried GPT-2 with onnxruntime!
It's working just fine!
require "tokenizers"
require "onnxruntime"
require "numo/narray"
tokenizer = Tokenizers.from_pretrained("gpt2")
model = OnnxRuntime::Model.new("gpt2-lm-head-10.onnx")
s = "Why do cats want to ride on the keyboard?"
ids = tokenizer.encode(s).ids
10.times do
o = model.predict({ input1: [[ids]] })
o = Numo::DFloat.cast(o["output1"][0])
ids << o[true, -1, true].argmax
end
puts tokenizer.decode(ids)
:cat2:
:keyboard:
:question:
Why do cats want to ride on the keyboard?
The answer is that they do.