ik-analyzer for rust; chinese tokenizer for tantivy

Overview

Crates.io License Open Source Love Build Status

GitHub forks GitHub stars

ik-rs

ik-analyzer for Rust

support Tantivy

Usage

Chinese Segment

    let mut ik = IKSegmenter::new();
    let text = "中华人民共和国";
    let tokens = ik.tokenize(text, TokenMode::INDEX); // TokenMode::SEARCH
    for token in tokens {
        println!("{:?}", token);
    }

Usage for Tantivy

mod tests {
    use ik_rs::core::ik_segmenter::TokenMode;
    use ik_rs::IkTokenizer;
    use tantivy::Index;
    use tantivy::schema::{IndexRecordOption, Schema, TextFieldIndexing, TextOptions};

    #[test]
    fn it_works() {
        let mut schema_builder = Schema::builder();
        let text_field_indexing = TextFieldIndexing::default()
            .set_tokenizer("ik-index")
            .set_index_option(IndexRecordOption::WithFreqsAndPositions);
        let text_options = TextOptions::default()
            .set_indexing_options(text_field_indexing)
            .set_stored();
        schema_builder.add_text_field("title", text_options);
        let schema = schema_builder.build();
        let index = Index::create_in_ram(schema.clone());
        index
            .tokenizers()
            .register("ik-index", IkTokenizer::new(TokenMode::INDEX));
        index
            .tokenizers()
            .register("ik-search", IkTokenizer::new(TokenMode::SEARCH));
    }
}

Welcome rust developer and search engine developer join us, and maintain this project together!

you can PR or submit issue...

and star ⭐️ or fork this project to support me!

You might also like...
Strongly typed Elasticsearch DSL written in Rust

Strongly typed Elasticsearch DSL written in Rust This is an unofficial library and doesn't yet support all the DSL, it's still work in progress. Featu

Searching for plain-text files for lines that match a given string. Built with Rust.

Getting Started This is a minimal grep command-line utility built on Rust. It provides searching for plain-text files for lines that match a given str

A Rust API search engine

Roogle Roogle is a Rust API search engine, which allows you to search functions by names and type signatures. Progress Available Queries Function quer

Message authentication code algorithms written in pure Rust

RustCrypto: Message Authentication Codes Collection of Message Authentication Code (MAC) algorithms written in pure Rust. Supported Algorithms Algorit

Experimenting with Rust implementations of IP address lookup algorithms.

This repository contains some very rough experimentation with Rust implementations of IP address lookup algorithms, both my own and comparison with th

An example of web application by using Rust and Axum with Clean Architecture.

stock-metrics Stock price and stats viewer. Getting Started Middleware Launch the middleware by executing docker compose: cd local-middleware docker c

2 and 3-dimensional collision detection library in Rust.

2D Documentation | 3D Documentation | User Guide | Forum ⚠️ **This crate is now passively-maintained. It is being superseded by the Parry project.** ⚠

Shogun search - Learning the principle of search engine. This is the first time I've written Rust.

shogun_search Learning the principle of search engine. This is the first time I've written Rust. A search engine written in Rust. Current Features: Bu

Image search example by approximate nearest-neighbor library In Rust

rust-ann-search-example Image search example by approximate nearest-neighbor library In Rust use - tensorflow 0.17.0 - pretrain ResNet50 - hora (Ru

Releases(v0.3.1)
Owner
Shen Yanchao
Shen Yanchao
Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust

Tantivy is a full text search engine library written in Rust. It is closer to Apache Lucene than to Elasticsearch or Apache Solr in the sense it is no

tantivy 7.4k Dec 28, 2022
Tantivy is a full text search engine library written in Rust.

Tantivy is a full text search engine library written in Rust. It is closer to Apache Lucene than to Elasticsearch or Apache Solr in the sense it is no

Quickwit OSS 7.4k Dec 30, 2022
Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust

Tantivy is a full-text search engine library written in Rust. It is closer to Apache Lucene than to Elasticsearch or Apache Solr in the sense it is no

Quickwit OSS 7.5k Jan 9, 2023
⚡ Insanely fast, 🌟 Feature-rich searching. lnx is the adaptable deployment of the tantivy search engine you never knew you wanted. Standing on the shoulders of giants.

✨ Feature Rich | ⚡ Insanely Fast An ultra-fast, adaptable deployment of the tantivy search engine via REST. ?? Standing On The Shoulders of Giants lnx

lnx 679 Jan 1, 2023
⚡ Insanely fast, 🌟 Feature-rich searching. lnx is the adaptable deployment of the tantivy search engine you never knew you wanted. Standing on the shoulders of giants.

✨ Feature Rich | ⚡ Insanely Fast An ultra-fast, adaptable deployment of the tantivy search engine via REST. ?? Standing On The Shoulders of Giants lnx

lnx 0 Apr 25, 2022
A full-text search and indexing server written in Rust.

Bayard Bayard is a full-text search and indexing server written in Rust built on top of Tantivy that implements Raft Consensus Algorithm and gRPC. Ach

Bayard Search 1.8k Dec 26, 2022
AI-powered search engine for Rust

txtai: AI-powered search engine for Rust txtai executes machine-learning workflows to transform data and build AI-powered text indices to perform simi

NeuML 69 Jan 2, 2023
A full-text search engine in rust

Toshi A Full-Text Search Engine in Rust Please note that this is far from production ready, also Toshi is still under active development, I'm just slo

Toshi Search 3.8k Jan 7, 2023
Official Elasticsearch Rust Client

elasticsearch   Official Rust Client for Elasticsearch. Full documentation is available at https://docs.rs/elasticsearch The project is still very muc

elastic 574 Jan 5, 2023
🐎 Daac Horse: Double-Array Aho-Corasick in Rust

?? daachorse Daac Horse: Double-Array Aho-Corasick Overview A fast implementation of the Aho-Corasick algorithm using Double-Array Trie. Examples use

null 140 Dec 16, 2022