A Japanese Morphological Analyzer written in pure Rust

Overview

Yoin - A Japanese Morphological Analyzer

Build Status Version info

yoin is a Japanese morphological analyze engine written in pure Rust.

mecab-ipadic is embedded in yoin.

:) $ yoin
すもももももももものうち
すもも	名詞,一般,*,*,*,*,すもも,スモモ,スモモ
も	助詞,係助詞,*,*,*,*,も,モ,モ
もも	名詞,一般,*,*,*,*,もも,モモ,モモ
も	助詞,係助詞,*,*,*,*,も,モ,モ
もも	名詞,一般,*,*,*,*,もも,モモ,モモ
の	助詞,連体化,*,*,*,*,の,ノ,ノ
うち	名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ
EOS

Build & Install

yoin is available on crates.io

CLI

:) $ cargo install yoin

# or

:) $ git clone https://github.com/agatan/yoin
:) $ cd yoin && cargo install

Library

yoin can be included in your Cargo project like this:

[dependencies]
yoin = "*"

and write your code like this:

extern crate yoin;

Usage - CLI

By default, yoin reads lines from stdin, analyzes each line, and outputs results.

:) $ yoin
すもももももももものうち
すもも	名詞,一般,*,*,*,*,すもも,スモモ,スモモ
も	助詞,係助詞,*,*,*,*,も,モ,モ
もも	名詞,一般,*,*,*,*,もも,モモ,モモ
も	助詞,係助詞,*,*,*,*,も,モ,モ
もも	名詞,一般,*,*,*,*,もも,モモ,モモ
の	助詞,連体化,*,*,*,*,の,ノ,ノ
うち	名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ
EOS
そこではなしは終わりになった
そこで	接続詞,*,*,*,*,*,そこで,ソコデ,ソコデ
はなし	名詞,一般,*,*,*,*,はなし,ハナシ,ハナシ
は	助詞,係助詞,*,*,*,*,は,ハ,ワ
終わり	動詞,自立,*,*,五段・ラ行,連用形,終わる,オワリ,オワリ
に	助詞,格助詞,一般,*,*,*,に,ニ,ニ
なっ	動詞,自立,*,*,五段・ラ行,連用タ接続,なる,ナッ,ナッ
た	助動詞,*,*,*,特殊・タ,基本形,た,タ,タ
EOS

Or, reads from file.

:) $ cat input.txt
すもももももももものうち
:) $ yoin --file input.txt
すもも	名詞,一般,*,*,*,*,すもも,スモモ,スモモ
も	助詞,係助詞,*,*,*,*,も,モ,モ
もも	名詞,一般,*,*,*,*,もも,モモ,モモ
も	助詞,係助詞,*,*,*,*,も,モ,モ
もも	名詞,一般,*,*,*,*,もも,モモ,モモ
の	助詞,連体化,*,*,*,*,の,ノ,ノ
うち	名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ
EOS

LICENSE

This software in under the MIT License and contains the MeCab-ipadic model. See LICENSE and NOTICE.txt for more details.

You might also like...
Rust-nlp is a library to use Natural Language Processing algorithm with RUST

nlp Rust-nlp Implemented algorithm Distance Levenshtein (Explanation) Jaro / Jaro-Winkler (Explanation) Phonetics Soundex (Explanation) Metaphone (Exp

Fast suffix arrays for Rust (with Unicode support).
Fast suffix arrays for Rust (with Unicode support).

suffix Fast linear time & space suffix arrays for Rust. Supports Unicode! Dual-licensed under MIT or the UNLICENSE. Documentation https://docs.rs/suff

Elastic tabstops for Rust.

tabwriter is a crate that implements elastic tabstops. It provides both a library for wrapping Rust Writers and a small program that exposes the same

An efficient and powerful Rust library for word wrapping text.

Textwrap Textwrap is a library for wrapping and indenting text. It is most often used by command-line programs to format dynamic output nicely so it l

⏮ ⏯ ⏭ A Rust library to easily read forwards, backwards or randomly through the lines of huge files.

EasyReader The main goal of this library is to allow long navigations through the lines of large files, freely moving forwards and backwards or gettin

An implementation of regular expressions for Rust. This implementation uses finite automata and guarantees linear time matching on all inputs.

regex A Rust library for parsing, compiling, and executing regular expressions. Its syntax is similar to Perl-style regular expressions, but lacks a f

Natural language detection library for Rust. Try demo online: https://www.greyblake.com/whatlang/
Natural language detection library for Rust. Try demo online: https://www.greyblake.com/whatlang/

Whatlang Natural language detection for Rust with focus on simplicity and performance. Content Features Get started Documentation Supported languages

Multilingual implementation of RAKE algorithm for Rust

RAKE.rs The library provides a multilingual implementation of Rapid Automatic Keyword Extraction (RAKE) algorithm for Rust. How to Use Append rake to

A Rust library for generically joining iterables with a separator

joinery A Rust library for generically joining iterables with a separator. Provides the tragically missing string join functionality to rust. extern c

Comments
  • Use `u32` directly in FST execution

    Use `u32` directly in FST execution

    今までは u32[u8; 4] に変換して FST を構築し,実行時には 1 byte ずつバッファに詰めて accept される時に u32 に変換し直していた。 これをパフォーマンス向上のために 最初から u32 だけを使うように変更した。

    FST のノードの状態数が増えるので、辞書サイズが大きくなるかと思ったが、4.5MB -> 4.7MB 程度だった。

    現状は bytecode に落とす時に alignment を全く考えていないので、もう少し丁寧にやる余地はありそう。

    opened by agatan 0
Owner
Agata Naomichi
Software Engineer at B&W, Inc.
Agata Naomichi
Rust programming, in Japanese

sabi In Japanese version https://github.com/bnjbvr/rouille. Shamelessly copied and updated from it. 日本語で Rust プログラムを書くことができます! 例 main.rs sabi::sabi! {

Yuki Toyoda 54 Dec 30, 2022
A small CLI utility for helping you learn japanese words made in rust 🦀

Memofante (Clique aqui ver em português) Memofante is here, a biiiig help: Do you often forget japanese words you really didn't want to forget? Do you

Tiaguinho 3 Nov 4, 2023
murasaki: Nostr to Speech (in Japanese)

murasaki: Nostr to Speech ⚠ このソフトウェアはα版です ⚠ VOICEVOX を利用したタイムライン読み上げツールです。 指定したリレーのグローバルタイムライン、または指定した公開鍵でフォローしているユーザのタイムラインを読み上げます。 つかいかた Rust をインストー

Yoji Shidara 16 Mar 27, 2023
pure rust implemention of word2vec

Word2Vec-rs Word2Vec-rs is a fast implemention of word2vec's skip-gram algorithm. A simple benchmark on a 200M english corpus: 4 threads: tool words p

fang li 46 Oct 24, 2022
Gomez - A pure Rust framework and implementation of (derivative-free) methods for solving nonlinear (bound-constrained) systems of equations

Gomez A pure Rust framework and implementation of (derivative-free) methods for solving nonlinear (bound-constrained) systems of equations. Warning: T

Datamole 19 Dec 24, 2022
A fast, low-resource Natural Language Processing and Text Correction library written in Rust.

nlprule A fast, low-resource Natural Language Processing and Error Correction library written in Rust. nlprule implements a rule- and lookup-based app

Benjamin Minixhofer 496 Jan 8, 2023
A HDPSG-inspired symbolic natural language parser written in Rust

Treebender A symbolic natural language parsing library for Rust, inspired by HDPSG. What is this? This is a library for parsing natural or constructed

Theia Vogel 32 Dec 26, 2022
A backend for mdBook written in Rust for generating PDF based on headless chrome and Chrome DevTools Protocol.

A backend for mdBook written in Rust for generating PDF based on headless chrome and Chrome DevTools Protocol.

Hollow Man 52 Jan 7, 2023
A simple OpenAI (GPT-3) client written in Rust.

A simple OpenAI (GPT-3) client written in Rust. It works by making HTTP requests to OpenAI's API and consuming the results.

Apostolos Kiraleos 3 Oct 28, 2022
Quickner is a new tool to quickly annotate texts for NER (Named Entity Recognition). It is written in Rust and accessible through a Python API.

Quickner ⚡ A simple, fast, and easy to use NER annotator for Python Quickner is a new tool to quickly annotate texts for NER (Named Entity Recognition

Omar MHAIMDAT 7 Mar 3, 2023