lingua-rs Python binding. An accurate natural language detection library, suitable for long and short text alike.

Overview

lingua-py

CI PyPI

lingua-rs Python binding. An accurate natural language detection library, suitable for long and short text alike.

Installation

pip install linguars

Usage

import linguars


detector = linguars.LanguageDetector()
print(detector.detect('我们中出了一个叛徒'))
print(detector.confidence('我们中出了一个叛徒'))

License

This work is released under the MIT license. A copy of the license is provided in the LICENSE file.

Comments
  • Bump lingua from 1.3.2 to 1.3.3

    Bump lingua from 1.3.2 to 1.3.3

    Bumps lingua from 1.3.2 to 1.3.3.

    Release notes

    Sourced from lingua's releases.

    Lingua 1.3.3

    Bug Fixes

    • This release updates outdated dependencies and fixes an incompatibility between different versions of the include_dir crate which are used in the main lingua crate and the language model crates.
    Changelog

    Sourced from lingua's changelog.

    Lingua 1.3.3 (released on 22 Feb 2022)

    Bug Fixes

    • This release updates outdated dependencies and fixes an incompatibility between different versions of the include_dir crate which are used in the main lingua crate and the language model crates.
    Commits
    • 21fde52 Prepare release 1.3.3
    • 6bfc067 Bump indoc from 1.0.3 to 1.0.4 (#40)
    • a70828d Update dependency 'include_dir' in lingua crate
    • 98ca845 Remove unnecessary model files for Chinese, Japanese and Korean
    • 448a599 Update dependency 'include_dir' in language model crates
    • de6ef3f Bump serde_json from 1.0.78 to 1.0.79 (#39)
    • 52642ba Bump fraction from 0.9.0 to 0.10.0 (#38)
    • 8e51dd7 Bump serde from 1.0.135 to 1.0.136 (#37)
    • 0358794 Bump serde_json from 1.0.75 to 1.0.78 (#35)
    • aa70cec Bump serde from 1.0.134 to 1.0.135 (#36)
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

    Dependabot will merge this PR once CI passes on it, as requested by @messense.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies rust 
    opened by dependabot[bot] 1
  • Bump regex from 1.5.4 to 1.7.0

    Bump regex from 1.5.4 to 1.7.0

    Bumps regex from 1.5.4 to 1.7.0.

    Changelog

    Sourced from regex's changelog.

    1.7.0 (2022-11-05)

    This release principally includes an upgrade to Unicode 15.

    New features:

    1.6.0 (2022-07-05)

    This release principally includes an upgrade to Unicode 14.

    New features:

    Bug fixes:

    1.5.6 (2022-05-20)

    This release includes a few bug fixes, including a bug that produced incorrect matches when a non-greedy ? operator was used.

    1.5.5 (2022-03-08)

    This releases fixes a security bug in the regex compiler. This bug permits a vector for a denial-of-service attack in cases where the regex being compiled is untrusted. There are no known problems where the regex is itself trusted, including in cases of untrusted haystacks.

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the Security Alerts page.
    dependencies rust 
    opened by dependabot[bot] 0
  • Bump pyo3 from 0.15.1 to 0.15.2

    Bump pyo3 from 0.15.1 to 0.15.2

    Bumps pyo3 from 0.15.1 to 0.15.2.

    Release notes

    Sourced from pyo3's releases.

    PyO3 0.15.2

    This release is a backport of PyO3 0.16's support for PyPy 3.9.

    Thanks to @​mejrs and @​messense for the implementation work, and to @​alex for testing it to build the cryptography package.

    Changelog

    Sourced from pyo3's changelog.

    [0.15.2] - 2022-04-14

    Packaging

    • Backport of PyPy 3.9 support from PyO3 0.16. #2262
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies rust 
    opened by dependabot[bot] 0
  • Bump lingua from 1.3.3 to 1.4.0

    Bump lingua from 1.3.3 to 1.4.0

    Bumps lingua from 1.3.3 to 1.4.0.

    Release notes

    Sourced from lingua's releases.

    Lingua 1.4.0

    Features

    • The library can now be compiled to WebAssembly and be used in any JavaScript project. Big thanks to @​martindisch for bringing this forward. (#14)

    Improvements

    • Some minor performance tweaks have been applied to the rule engine.
    Changelog

    Sourced from lingua's changelog.

    Lingua 1.4.0 (released on 08 Apr 2022)

    Features

    • The library can now be compiled to WebAssembly and be used in any JavaScript project. Big thanks to @​martindisch for bringing this forward. (#14)

    Improvements

    • Some minor performance tweaks have been applied to the rule engine.
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies rust 
    opened by dependabot[bot] 0
  • Bump pyo3 from 0.15.0 to 0.15.1

    Bump pyo3 from 0.15.0 to 0.15.1

    Bumps pyo3 from 0.15.0 to 0.15.1.

    Release notes

    Sourced from pyo3's releases.

    PyO3 0.15.1

    This release is a set of bug fixes for some minor issues reported since PyO3 0.15's release. There are also some small additions for those storing PyIterator, PySequence, and PyMapping in Py smart pointers, and a PyTraceback type to ease interacting with Python tracebacks from Rust.

    For full details of all changes, see the CHANGELOG.

    Thank you to everyone who contributed code, documentation, design ideas, bug reports, and feedback. The following users' commits are included in this release:

    @​dansvo @​davidhewitt @​KRunchPL @​mejrs @​messense @​moriyoshi @​saidvandeklundert @​taiki-e

    Changelog

    Sourced from pyo3's changelog.

    [0.15.1] - 2021-11-19

    Added

    • Add implementations for Py::as_ref() and Py::into_ref() for Py<PySequence>, Py<PyIterator> and Py<PyMapping>. #1682
    • Add PyTraceback type to represent and format Python tracebacks. #1977

    Changed

    • #[classattr] constants with a known magic method name (which is lowercase) no longer trigger lint warnings expecting constants to be uppercase. #1969

    Fixed

    • Fix creating #[classattr] by functions with the name of a known magic method. #1969
    • Fix use of catch_unwind in allow_threads which can cause fatal crashes. #1989
    • Fix build failure on PyPy when abi3 features are activated. #1991
    • Fix mingw platform detection. #1993
    • Fix panic in __get__ implementation when accessing descriptor on type object. #1997
    Commits
    • eb5059a release: 0.15.1
    • 2f6ea2f Merge pull request #1999 from dansvo/guide-link-repair
    • d9a3f67 Fix broken relative markdown link in guide
    • e790d55 Merge pull request #1997 from davidhewitt/get-panic
    • 26ccc1a macros: fix panic in get implementation
    • 45059cb Merge pull request #1990 from davidhewitt/allow-threads-unwind
    • 1df68e8 allow_threads: switch from catch_unwind to guard pattern
    • 3e16a2a Merge pull request #1995 from gertjanvanzwieten/fix-pycounter
    • 8e41483 Merge pull request #1991 from messense/pypy-abi3
    • 9ae7e31 Merge pull request #1977 from davidhewitt/traceback-type
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Bump pyo3 from 0.14.5 to 0.15.0

    Bump pyo3 from 0.14.5 to 0.15.0

    Bumps pyo3 from 0.14.5 to 0.15.0.

    Release notes

    Sourced from pyo3's releases.

    PyO3 0.15.0

    This release of PyO3 brings support for Python 3.10 and PyPy 3.8. In addition, new optional dependencies on anyhow and eyre have been added for easy integration of the popular error-handling libraries with Python code.

    A number of consistency improvements have been made to PyList, PyTuple and PySequence APIs. They now all exclusively use usize- based indexing, and now also support Rust's indexing operator.

    In this release #[pymethods] are now able to implement many magic methods such as __str__ and __repr__, removing the need for #[pyproto] macro implementations. For the 0.15 release series both #[pymethods] and #[pyproto] will be supported; #[pyproto] is expected to be deprecated in the future.

    For full details of all changes, see the CHANGELOG. For help with upgrading, see the migration guide.

    Thank you to everyone who contributed code, documentation, design ideas, bug reports, and feedback.

    Changelog

    Sourced from pyo3's changelog.

    [0.15.0] - 2021-11-03

    Packaging

    • pyo3's Cargo.toml now advertises links = "python" to inform Cargo that it links against libpython. #1819
    • Added optional anyhow feature to convert anyhow::Error into PyErr. #1822
    • Support Python 3.10. #1889
    • Added optional eyre feature to convert eyre::Report into PyErr. #1893
    • Support PyPy 3.8. #1948

    Added

    • Add PyList::get_item_unchecked and PyTuple::get_item_unchecked to get items without bounds checks. #1733
    • Support #[doc = include_str!(...)] attributes on Rust 1.54 and up. #1746
    • Add PyAny::py as a convenience for PyNativeType::py. #1751
    • Add implementation of std::ops::Index<usize> for PyList, PyTuple and PySequence. #1825
    • Add range indexing implementations of std::ops::Index for PyList, PyTuple and PySequence. #1829
    • Add PyMapping type to represent the Python mapping protocol. #1844
    • Add commonly-used sequence methods to PyList and PyTuple. #1849
    • Add as_sequence methods to PyList and PyTuple. #1860
    • Add support for magic methods in #[pymethods], intended as a replacement for #[pyproto]. #1864
    • Add abi3-py310 feature. #1889
    • Add PyCFunction::new_closure to create a Python function from a Rust closure. #1901
    • Add support for positional-only arguments in #[pyfunction]. #1925
    • Add PyErr::take to attempt to fetch a Python exception if present. #1957

    Changed

    • PyList, PyTuple and PySequence's APIs now accepts only usize indices instead of isize. #1733, #1802, #1803
    • PyList::get_item and PyTuple::get_item now return PyResult<&PyAny> instead of panicking. #1733
    • PySequence::in_place_repeat and PySequence::in_place_concat now return PyResult<&PySequence> instead of PyResult<()>, which is needed in case of immutable sequences such as tuples. #1803
    • PySequence::get_slice now returns PyResult<&PySequence> instead of PyResult<&PyAny>. #1829
    • Deprecate PyTuple::split_from. #1804
    • Deprecate PyTuple::slice, new method PyTuple::get_slice added with usize indices. #1828
    • Deprecate FFI definitions PyParser_SimpleParseStringFlags, PyParser_SimpleParseStringFlagsFilename, PyParser_SimpleParseFileFlags when building for Python 3.9. #1830
    • Mark FFI definitions removed in Python 3.10 PyParser_ASTFromString, PyParser_ASTFromStringObject, PyParser_ASTFromFile, PyParser_ASTFromFileObject, PyParser_SimpleParseStringFlags, PyParser_SimpleParseStringFlagsFilename, PyParser_SimpleParseFileFlags, PyParser_SimpleParseString, PyParser_SimpleParseFile, Py_SymtableString, and Py_SymtableStringObject. #1830
    • #[pymethods] now handles magic methods similarly to #[pyproto]. In the future, #[pyproto] may be deprecated. #1864
    • Deprecate FFI definitions PySys_AddWarnOption, PySys_AddWarnOptionUnicode and PySys_HasWarnOptions. #1887
    • Deprecate #[call] attribute in favor of using fn __call__. #1929
    • Fix missing FFI definition _PyImport_FindExtensionObject on Python 3.10. #1942
    • Change PyErr::fetch to panic in debug mode if no exception is present. #1957

    Fixed

    • Fix building with a conda environment on Windows. #1873
    • Fix panic on Python 3.6 when calling Python::with_gil with Python initialized but threading not initialized. #1874
    • Fix incorrect linking to version-specific DLL instead of python3.dll when cross-compiling to Windows with abi3. #1880
    • Fix FFI definition for PyTuple_ClearFreeList incorrectly being present for Python 3.9 and up. #1887

    ... (truncated)

    Commits
    • 4774744 release: 0.15.0
    • 64df791 Merge pull request #1964 from PyO3/pymethods-args
    • 9ce363a guide: add hints for the signature of pymethods protos
    • 39d2b9d Merge pull request #1957 from davidhewitt/fetch-if-set
    • f801c19 err: add PyErr::take
    • 7b9ae8e Clean up Python documentation (#1963)
    • 0f92f28 Merge pull request #1958 from davidhewitt/pymethods-protos-arguments-cleanup
    • 6a3e1e7 macros: clean up protocol argument extraction a bit
    • bfe7086 Merge pull request #1954 from PyO3/feature-fix
    • 50df2c7 Merge pull request #1955 from PyO3/cargo-toml-deps
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
Owner
messense
Python Backend Developer at day, Rustacean at night.
messense
Natural language detection library for Rust. Try demo online: https://www.greyblake.com/whatlang/

Whatlang Natural language detection for Rust with focus on simplicity and performance. Content Features Get started Documentation Supported languages

Sergey Potapov 805 Dec 28, 2022
A fast, low-resource Natural Language Processing and Text Correction library written in Rust.

nlprule A fast, low-resource Natural Language Processing and Error Correction library written in Rust. nlprule implements a rule- and lookup-based app

Benjamin Minixhofer 496 Jan 8, 2023
Ultra-fast, spookily accurate text summarizer that works on any language

pithy 0.1.0 - an absurdly fast, strangely accurate, summariser Quick example: pithy -f your_file_here.txt --sentences 4 --help: Print this help messa

Catherine Koshka 13 Oct 31, 2022
Semantic text segmentation. For sentence boundary detection, compound splitting and more.

NNSplit A tool to split text using a neural network. The main application is sentence boundary detection, but e. g. compound splitting for German is a

Benjamin Minixhofer 273 Dec 29, 2022
WriteForAll is a text file style checker, that compares text documents with editorial tips to make text better.

WriteForAll: tips to make text better WriteForAll is a text file style checker, that compares text documents with editorial tips to make text better.

Joel Parker Henderson 2 Dec 27, 2022
Rust-nlp is a library to use Natural Language Processing algorithm with RUST

nlp Rust-nlp Implemented algorithm Distance Levenshtein (Explanation) Jaro / Jaro-Winkler (Explanation) Phonetics Soundex (Explanation) Metaphone (Exp

Simon Paitrault 34 Dec 20, 2022
frawk is a small programming language for writing short programs processing textual data

frawk frawk is a small programming language for writing short programs processing textual data. To a first approximation, it is an implementation of t

Eli Rosenthal 1k Jan 7, 2023
Natural Language Processing for Rust

rs-natural Natural language processing library written in Rust. Still very much a work in progress. Basically an experiment, but hey maybe something c

Chris Tramel 211 Dec 28, 2022
A HDPSG-inspired symbolic natural language parser written in Rust

Treebender A symbolic natural language parsing library for Rust, inspired by HDPSG. What is this? This is a library for parsing natural or constructed

Theia Vogel 32 Dec 26, 2022
Text Expression Runner – Readable and easy to use text expressions

ter - Text Expression Runner ter is a cli to run text expressions and perform basic text operations such as filtering, ignoring and replacing on the c

Maximilian Schulke 72 Jul 31, 2022
fastText Rust binding

fasttext-rs fastText Rust binding Installation Add it to your Cargo.toml: [dependencies] fasttext = "0.6" Add extern crate fasttext to your crate root

messense 42 Oct 1, 2022
An efficient and powerful Rust library for word wrapping text.

Textwrap Textwrap is a library for wrapping and indenting text. It is most often used by command-line programs to format dynamic output nicely so it l

Martin Geisler 322 Dec 26, 2022
Text calculator with support for units and conversion

cpc calculation + conversion cpc parses and evaluates strings of math, with support for units and conversion. 128-bit decimal floating points are used

Kasper 82 Jan 4, 2023
Find and replace text in source files

Ruplacer Find and replace text in source files: $ ruplacer old new src/ Patching src/a_dir/sub/foo.txt -- old is everywhere, old is old ++ new is ever

Tanker 331 Dec 28, 2022
Source text parsing, lexing, and AST related functionality for Deno

Source text parsing, lexing, and AST related functionality for Deno.

Deno Land 90 Jan 1, 2023
Font independent text analysis support for shaping and layout.

lipi Lipi (Sanskrit for 'writing, letters, alphabet') is a pure Rust crate that provides font independent text analysis support for shaping and layout

Chad Brokaw 12 Sep 22, 2022
🛥 Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto.

?? python-vaporetto ?? Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto. Installation

null 17 Dec 22, 2022
Quickner is a new tool to quickly annotate texts for NER (Named Entity Recognition). It is written in Rust and accessible through a Python API.

Quickner ⚡ A simple, fast, and easy to use NER annotator for Python Quickner is a new tool to quickly annotate texts for NER (Named Entity Recognition

Omar MHAIMDAT 7 Mar 3, 2023
bottom encodes UTF-8 text into a sequence comprised of bottom emoji

bottom encodes UTF-8 text into a sequence comprised of bottom emoji (with , sprinkled in for good measure) followed by ????. It can encode any valid UTF-8 - being a bottom transcends language, after all - and decode back into UTF-8.

Bottom Software Foundation 345 Dec 30, 2022