⚑ Insanely fast, 🌟 Feature-rich searching. lnx is the adaptable deployment of the tantivy search engine you never knew you wanted. Standing on the shoulders of giants.

Overview

Lust Logo

✨ Feature Rich | ⚑ Insanely Fast

An ultra-fast, adaptable deployment of the tantivy search engine via REST.

🌟 Standing On The Shoulders of Giants

lnx is built to not re-invent the wheel, it stands on top of the tokio-rs work-stealing runtime, axum a lightweight abstraction over hyper-rs combined with the raw compute power of the tantivy search engine.

Together this allows lnx to offer millisecond indexing on tens of thousands of document inserts at once (No more waiting around for things to get indexed!), Per index transactions and the ability to process searches like it's just another lookup on the hashtable 😲

✨ Features

lnx although very new offers a wide range of features thanks to the ecosystem it stands on.

  • πŸ€“ Complex Query Parser.
  • ❀️ Typo tolerant fuzzy queries.
  • ⚑️ Typo tolerant fast-fuzzy queries. (pre-computed spell correction)
  • πŸ”₯ More-Like-This queries.
  • Order by fields.
  • Fast indexing.
  • Fast Searching.
  • Several Options for fine grain performance tuning.
  • Multiple storage backends available for testing and developing.
  • Permissions based authorization access tokens.

Demo video

Performance

lnx can provide the ability to fine tune the system to your particular use case. You can customise the async runtime threads. The concurrency thread pool, threads per reader and writer threads, all per index.

This gives you the ability to control in detail where your computing resources are going. Got a large dataset but lower amount of concurrent reads? Bump the reader threads in exchange for lower max concurrency.

This allows you to get some very nice results and tune your application to your needs:

As a more detailed insight:

MeiliSearch

 INFO  lnxcli > starting benchmark system
 INFO  benchmark > starting runtime with 12 threads
 INFO  benchmark::meilisearch > MeiliSearch took 18.188s to process submitted documents
 INFO  benchmark              > Service ready! Beginning benchmark.
 INFO  benchmark              >      Concurrency @ 150 clients
 INFO  benchmark              >      Searching @ 50 sentences
 INFO  benchmark              >      Mode @ Standard
 INFO  benchmark::sampler     > General benchmark results:
 INFO  benchmark::sampler     >      Total Requests Sent: 7500
 INFO  benchmark::sampler     >      Average Requests/sec: 296.65
 INFO  benchmark::sampler     >      Average Latency: 505.654336ms
 INFO  benchmark::sampler     >      Max Latency: 725.2446ms
 INFO  benchmark::sampler     >      Min Latency: 10.085ms
 INFO  lnxcli                 > commands complete!

lnx (default fuzzy search)

 INFO  lnxcli > starting benchmark system
 INFO  benchmark > starting runtime with 12 threads
 INFO  benchmark::lnx > lnx took 785.402ms to process submitted documents
 INFO  benchmark      > Service ready! Beginning benchmark.
 INFO  benchmark      >      Concurrency @ 150 clients
 INFO  benchmark      >      Searching @ 50 sentences
 INFO  benchmark      >      Mode @ Standard
 INFO  benchmark::sampler > General benchmark results:
 INFO  benchmark::sampler >      Total Requests Sent: 7500
 INFO  benchmark::sampler >      Average Requests/sec: 914.84
 INFO  benchmark::sampler >      Average Latency: 163.962587ms
 INFO  benchmark::sampler >      Max Latency: 668.0729ms
 INFO  benchmark::sampler >      Min Latency: 2.5241ms
 INFO  lnxcli             > commands complete!

πŸ’” Limitations

As much as lnx provides a wide range of features, it can not do it all being such a young system. Naturally, it has some limitations:

  • lnx is not distributed (yet) so this really does just scale vertically.
  • Simple but not too simple, lnx can't offer the same level of ease of use compared to MeiliSearch due to its schema-full nature and wide range of tuning options. With more tuning comes more settings, unfortunately.
  • Synonym support (yet)
  • Metrics (yet)
Comments
  • Schema metadata corrupted after restart

    Schema metadata corrupted after restart

    LNX: e9804944edc8a7c0af24ee3ba8397b87f1640b5f

    I'm trying to figure out how to reproduce this. After the panic I reported in #18, after restarting lnx, I noticed that my index schema that was being returned from the search query was messed up. Searching seems to work as advertised (as in searching field_a:foo, will return documents where foo is set in field_a, but the schema of the search results is messed up).

    For example, let's say I have schema where I am only storing 3 fields (field_a, field_b, field_c), but I am indexing 6. before the restart (example)

    $ curl 'http://localhost:4040/indexes/posts/search?query=field_a:foo&mode=normal&limit=50&order_by=-ts'
    {"data":{"count":40,"hits":[{"doc":{"field_a":["foo"],"field_b":[4],"field_c":[44]}, # etc
    

    Now that same query is returning:

    $ curl 'http://localhost:4040/indexes/posts/search?query=field_a:foo&mode=normal&limit=50&order_by=-ts'
    {"data":{"count":40,"hits":[{"doc":{"field_d":["foo"],"field_e":[4],"field_f":[44]}, # etc
    

    The values are correct, but the name of the keys are completely different.

    However, in trying to reproduce the error, it seems like my lnx install is corrupted. I tried to to create an index like so:

    {
        "name": "corrupt",
    
        "writer_buffer": 144000000,
        "writer_threads": 12,
        "reader_threads": 12,
    
        "max_concurrency": 24,
        "search_fields": [
            "field_a"
        ],
    
        "storage_type": "filesystem",
        "set_conjunction_by_default": true,
        "use_fast_fuzzy": false,
        "strip_stop_words": false,
    
        "fields": {
            "field_a": {
                "type": "text",
                "stored": true
            },
            "field_b": {
               "type": "u64",
               "stored": true,
               "indexed": true,
               "fast": "single"
            },
            "field_c": {
               "type": "u64",
               "stored": true,
               "indexed": true,
               "fast": "single"
            },
            "field_d": {
                "type": "text",
                "stored": false
            },
            "field_e": {
                "type": "text",
                "stored": false
            },
            "field_f": {
                "type": "text",
                "stored": false
            },
            "version": {
                "type": "u64",
                "stored": false,
                "indexed": true,
                "fast": "single"
            }
        },
        "boost_fields": {}
    }
    

    then index these documents:

    [
        {"field_a":["foo"], "field_b":[4], "field_c":[44], "field_d":["macbook"], "field_e":["apple"], "field_f":["iphone"], "version":[1]},
        {"field_a":["bar"], "field_b":[5], "field_c":[55], "field_d":["laptop"], "field_e":["micrsoft"], "field_f":["galaxy"], "version":[2]},
        {"field_a":["redbull coke"], "field_b":[6], "field_c":[66], "field_d":["thinkpad"], "field_e":["netflix"], "field_f":["nexus"], "version":[3]},
        {"field_a":["vodka sprite"], "field_b":[7], "field_c":[77], "field_d":["ultrabook"], "field_e":["facebook"], "field_f":["blackberry"], "version":[4]},
        {"field_a":["ginger ale whiskey"], "field_b":[8], "field_c":[88], "field_d":["chomebook"], "field_e":["google"], "field_f":["oneplus"], "version":[5]}
    ]
    

    When I added them I got no error:

     $ curl -X POST -d@corrupt_data.json -H "Content-Type: application/json" http://localhost:4040/indexes/corrupt/documents?wait=true
    {"data":"added documents","status":200}
    

    But in the lnx logs I saw

    Aug 30 01:23:56 torako lnx[635823]: [2021-08-30][01:23:56] | engine::index::writer | INFO  - [ WRITER @ corrupt ][ TRANSACTION 0 ] completed operation ADD-DOCUMENT
    Aug 30 01:23:56 torako lnx[635823]: [2021-08-30][01:23:56] | engine::index::writer | INFO  - [ WRITER @ corrupt ][ TRANSACTION 1 ] completed operation ADD-DOCUMENT
    Aug 30 01:23:56 torako lnx[635823]: [2021-08-30][01:23:56] | engine::index::writer | INFO  - [ WRITER @ corrupt ][ TRANSACTION 2 ] completed operation ADD-DOCUMENT
    Aug 30 01:23:56 torako lnx[635823]: [2021-08-30][01:23:56] | engine::index::writer | INFO  - [ WRITER @ corrupt ][ TRANSACTION 3 ] completed operation ADD-DOCUMENT
    Aug 30 01:23:56 torako lnx[635823]: [2021-08-30][01:23:56] | engine::index::writer | INFO  - [ WRITER @ corrupt ][ TRANSACTION 4 ] completed operation ADD-DOCUMENT
    Aug 30 01:23:56 torako lnx[635823]: thread 'thrd-tantivy-index3' panicked at 'Expected a u64/i64/f64 field, got Str("redbull coke") ', /root/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-0.16.0/src/fastfield/mod.rs:208:14
    Aug 30 01:23:56 torako lnx[635823]: stack backtrace:
    Aug 30 01:23:56 torako lnx[635823]:    0: rust_begin_unwind
    Aug 30 01:23:56 torako lnx[635823]:              at /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/std/src/panicking.rs:515:5
    Aug 30 01:23:56 torako lnx[635823]:    1: std::panicking::begin_panic_fmt
    Aug 30 01:23:56 torako lnx[635823]:              at /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/std/src/panicking.rs:457:5
    Aug 30 01:23:56 torako lnx[635823]:    2: tantivy::indexer::index_writer::index_documents
    Aug 30 01:23:56 torako lnx[635823]: note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
    Aug 30 01:23:56 torako lnx[635823]: thread 'thrd-tantivy-index0' panicked at 'Expected a u64/i64/f64 field, got Str("foo") ', /root/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-0.16.0/src/fastfield/mod.rs:208:14
    Aug 30 01:23:56 torako lnx[635823]: stack backtrace:
    Aug 30 01:23:56 torako lnx[635823]:    0: rust_begin_unwind
    Aug 30 01:23:56 torako lnx[635823]:              at /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/std/src/panicking.rs:515:5
    Aug 30 01:23:56 torako lnx[635823]:    1: std::thread 'panickingthrd-tantivy-index1::' panicked at 'begin_panic_fmtExpected a u64/i64/f64 field, got Str("bar")
    Aug 30 01:23:56 torako lnx[635823]: ',              at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-0.16.0/src/fastfield/mod.rs/rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/std/src/panicking.rs::208457::145
    Aug 30 01:23:56 torako lnx[635823]:    2: tantivy::indexer::index_writer::index_documents
    Aug 30 01:23:56 torako lnx[635823]: note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
    Aug 30 01:23:56 torako lnx[635823]: stack backtrace:
    Aug 30 01:23:56 torako lnx[635823]:    0: rust_begin_unwind
    Aug 30 01:23:56 torako lnx[635823]:              at /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/std/src/panicking.rs:515:5
    Aug 30 01:23:56 torako lnx[635823]:    1: std::panicking::begin_panic_fmt
    Aug 30 01:23:56 torako lnx[635823]:              at thread '/rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/std/src/panicking.rsthrd-tantivy-index4:' panicked at '457Expected a u64/i64/f64 field, got Str("vodka sprite") :', 5/root/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-0.16.0/src/fastfield/mod.rs
    Aug 30 01:23:56 torako lnx[635823]: :208: 14
    Aug 30 01:23:56 torako lnx[635823]: 2: tantivy::indexer::index_writer::index_documents
    Aug 30 01:23:56 torako lnx[635823]: note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
    Aug 30 01:23:56 torako lnx[635823]: stack backtrace:
    Aug 30 01:23:56 torako lnx[635823]: thread 'thrd-tantivy-index2' panicked at 'Expected a u64/i64/f64 field, got Str("ginger ale whiskey") ', /root/.cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-0.16.0/src/fastfield/mod.rs:208: 14
    Aug 30 01:23:56 torako lnx[635823]:  0: rust_begin_unwind
    Aug 30 01:23:56 torako lnx[635823]:              at /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/std/src/panicking.rs:515:5
    Aug 30 01:23:56 torako lnx[635823]:    1: std::panicking::begin_panic_fmt
    Aug 30 01:23:56 torako lnx[635823]:              at /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/std/src/panicking.rs:457:5
    Aug 30 01:23:56 torako lnx[635823]:    2: tantivy::indexer::index_writer::index_documents
    Aug 30 01:23:56 torako lnx[635823]: note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
    Aug 30 01:23:56 torako lnx[635823]: stack backtrace:
    Aug 30 01:23:56 torako lnx[635823]:    0: rust_begin_unwind
    Aug 30 01:23:56 torako lnx[635823]:              at /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/std/src/panicking.rs:515:5
    Aug 30 01:23:56 torako lnx[635823]:    1: std::panicking::begin_panic_fmt
    Aug 30 01:23:56 torako lnx[635823]:              at /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/std/src/panicking.rs:457:5
    Aug 30 01:23:56 torako lnx[635823]:    2: tantivy::indexer::index_writer::index_documents
    Aug 30 01:23:56 torako lnx[635823]: note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
    

    I'm not sure if these two errors are related, but it seems lnx's understanding of the fields and tanvity's aren't in sync.

    opened by miyachan 12
  • Index not updating / adding document

    Index not updating / adding document

    So in this bug, you are right that was the cause. I copied this example from the book and assumed I was hitting the same issue. What I am actually seeing is that when I index a document with a date field, I am no longer able to index any more documents.

    # curl -X DELETE  'http://localhost:4040/indexes/my-index'
    {"data":"index deleted","status":200}# 
    # cat a.json
    {
      "name": "my-index",
      "writer_buffer": 6000000,
      "writer_threads": 1,
      "reader_threads": 1,
      "max_concurrency": 10,
      "search_fields": [
        "title"
      ],
      "storage_type": "memory",
      "use_fast_fuzzy": false,
      "strip_stop_words": false,
       "set_conjunction_by_default": false,
      "fields": {
        "title": {
          "type": "text",
          "stored": true
        },
        "description": {
          "type": "text",
          "stored": true
        },
        "id": {
          "type": "u64",
          "indexed": true,
          "stored": true,
          "fast": "single"
        },
     "ts": {
                "type": "date",
                "stored": false,
                "indexed": true,
                "fast": "single"
            }
      },
      "boost_fields": {}
    }
    # curl -X POST [email protected] -H "Content-Type: application/json" http://127.0.0.1:4040/indexes
    {"data":"index created","status":200}
    # cat c.json
    {
        "title": ["Hello, World2"],
    "id":[4]
    }
    # curl -X POST [email protected] -H "Content-Type: application/json" http://localhost:4040/indexes/my-index/documents?wait=true
    {"data":"added documents","status":200}
    # curl -X POST 'http://localhost:4040/indexes/my-index/commit'
    {"data":"changes committed","status":200}
    # curl 'http://localhost:4040/indexes/my-index/search?query=*&mode=normal'
    {"data":{"count":1,"hits":[{"doc":{"id":[4],"title":["Hello, World2"]},"document_id":"8295453496340348446","ratio":1.0}],"time_taken":0.0001392010017298162},"status":200}
    # cat b.json
    {
        "title": ["Hello, World2"],
    "id":[4],
    "ts":[1630097583]
    }
    # curl -X POST [email protected] -H "Content-Type: application/json" http://localhost:4040/indexes/my-index/documents?wait=true
    {"data":"added documents","status":200}
    # curl -X POST 'http://localhost:4040/indexes/my-index/commit'
    {"data":"changes committed","status":200}
    # curl 'http://localhost:4040/indexes/my-index/search?query=*&mode=normal'
    {"data":{"count":1,"hits":[{"doc":{"id":[4],"title":["Hello, World2"]},"document_id":"8295453496340348446","ratio":1.0}],"time_taken":0.0001936009939527139},"status":200}
    

    Adding a document with a date field doesn't produce an error, but seems to corrupt the index. In my original setup, I always had a date field, and I wasn't seeing any documents get indexed, which is why I assumed the two errors were the same. Once this happens even documents without the ts field fail to be indexed.

    Originally posted by @miyachan in https://github.com/lnx-search/lnx/issues/14#issuecomment-907478178

    opened by ChillFish8 7
  • [BUG][0.8.0] index writer lock issue during k8s pod restart

    [BUG][0.8.0] index writer lock issue during k8s pod restart

    This issue is not present in 0.7.0, but present in 0.8.0

    Error log detail

    2022-01-16T14:30:18.000995Z ERROR error during lnx runtime: failed to load existing indexes due to error Failed to acquire Lockfile: LockBusy. Some("Failed to acquire index lock. If you are using a regular directory, this means there is already an `IndexWriter` working on this `Directory`, in this process or in a different process.")
    

    Reproduce step: Restart pod or recreate pod. k8s will send SIGTERM to pod. But this issue is not present in 0.7.0 In k8s I can add preStop hook to execute kill -SIGINT $(pgrep lnx) to send CTRL-C to lnx process, but if lnx is paniced or crashed, how can I unlock the writer index? For example, if there is a lock file could be removed before lnx start: rm -rf index/some-lock-file && lnx

    bug storage 
    opened by Plasmatium 6
  • Delete Endpoint always fails

    Delete Endpoint always fails

    LNX Version: 8d38d386de3ecdb1bced4f55032673860feaccf4

    This is an issue that happened between e980494...8d38d386de3ecdb

    $ cat a.json
    {
      "name": "my-index",
      "writer_buffer": 6000000,
      "writer_threads": 1,
      "reader_threads": 1,
      "max_concurrency": 10,
      "search_fields": [
        "title"
      ],
      "storage_type": "memory",
      "use_fast_fuzzy": false,
      "strip_stop_words": false,
       "set_conjunction_by_default": false,
      "fields": {
        "title": {
          "type": "text",
          "stored": true
        },
        "description": {
          "type": "text",
          "stored": true
        },
        "id": {
          "type": "u64",
          "indexed": true,
          "stored": true,
          "fast": "single"
        },
     "ts": {
                "type": "date",
                "stored": true,
                "indexed": true,
                "fast": "single"
            }
      },
      "boost_fields": {}
    }
    $ curl -X POST [email protected] -H "Content-Type: application/json" http://127.0.0.1:4040/indexes
    {"data":"index created","status":200}
    $ cat d.json
    {
        "id": {"type": "u64", "value": [4]}
    }
    $ curl -X DELETE [email protected] -H "Content-Type: application/json" http://localhost:4040/indexes/my-index/documents?wait=true
    {"data":"invalid JSON body: Failed to parse the request body as JSON","status":400}
    

    Logs:

    Aug 30 19:01:42 torako lnx[866518]: [2021-08-30][19:01:42] | tantivy::indexer::segment_updater | INFO  - Running garbage collection
    Aug 30 19:01:42 torako lnx[866518]: [2021-08-30][19:01:42] | tantivy::directory::managed_directory | INFO  - Garbage collect
    Aug 30 19:01:42 torako lnx[866518]: [2021-08-30][19:01:42] | engine::index::writer | INFO  - [ WRITER @ my-index ][ TRANSACTION 4 ] completed operation COMMIT
    Aug 30 19:01:49 torako lnx[866518]: [2021-08-30][19:01:49] | engine::index::reader | INFO  - [ SEARCH @ my-index ] took 120.259Β΅s with limit=20, mode=Normal and 1 results total
    Aug 30 19:07:26 torako lnx[866518]: [2021-08-30][19:07:26] | lnx::routes | WARN  - rejecting request due to invalid body: InvalidJsonBody(Error { inner: Error("invalid type: map, expected a string or u32", line: 1, column: 13) })
    Aug 30 19:07:51 torako lnx[866518]: [2021-08-30][19:07:51] | lnx::routes | WARN  - rejecting request due to invalid body: InvalidJsonBody(Error { inner: Error("invalid type: map, expected a string or u32", line: 1, column: 13) })
    
    opened by miyachan 6
  • Invalid queries eventually exhaust the executor ArrayQueue causing a panic

    Invalid queries eventually exhaust the executor ArrayQueue causing a panic

    Lnx Version: e9804944edc8a7c0af24ee3ba8397b87f1640b5f

    I'm trying to somehow reproduce this as I'm not sure how it occurred. I have a system which adds documents to the index and commits every 10s. I was executing searches against the system (specifically I was testing which queries might cause an error, not sure if this is related):

    $ curl 'http://localhost:4040/indexes/posts/search?query=text:f^oobar&mode=normal&limit=50&order_by=-ts'
    {"data":"Syntax Error","status":400}
    $ curl 'http://localhost:4040/indexes/posts/search?query=text:f`oobar&mode=normal&limit=50&order_by=-ts'
    {"data":"channel closed","status":400}
    

    logs:

    Aug 30 00:52:18 torako lnx[307715]: [2021-08-30][00:52:18] | tantivy::directory::managed_directory | INFO  - Deleted "8ae9f9e93c674678ae3e7ab694752231.fast"
    Aug 30 00:52:18 torako lnx[307715]: [2021-08-30][00:52:18] | tantivy::directory::managed_directory | INFO  - Deleted "31b0bab77e014d539022907d36eac93c.fieldnorm"
    Aug 30 00:52:18 torako lnx[307715]: [2021-08-30][00:52:18] | tantivy::directory::managed_directory | INFO  - Deleted "2dbb35c78423479290186b0fccb9b48e.fieldnorm"
    Aug 30 00:52:18 torako lnx[307715]: [2021-08-30][00:52:18] | tantivy::directory::managed_directory | INFO  - Deleted "90fcfd004ee34f3892332a95d9c260e1.fast"
    Aug 30 00:52:18 torako lnx[307715]: [2021-08-30][00:52:18] | engine::index::writer | INFO  - [ WRITER @ posts ][ TRANSACTION 210320997 ] completed operation DELETE-TERM
    Aug 30 00:52:18 torako lnx[307715]: [2021-08-30][00:52:18] | engine::index::writer | INFO  - [ WRITER @ posts ][ TRANSACTION 210320998 ] completed operation ADD-DOCUMENT
    Aug 30 00:52:18 torako lnx[307715]: [2021-08-30][00:52:18] | engine::index::writer | INFO  - [ WRITER @ posts ][ TRANSACTION 210320999 ] completed operation ADD-DOCUMENT
    Aug 30 00:52:18 torako lnx[307715]: [2021-08-30][00:52:18] | tantivy::directory::file_watcher | INFO  - Meta file "./lnx/index-data/posts/meta.json" was modified
    Aug 30 00:52:18 torako lnx[307715]: [2021-08-30][00:52:18] | engine::index::writer | INFO  - [ WRITER @ posts ][ TRANSACTION 210321000 ] completed operation DELETE-TERM
    Aug 30 00:52:18 torako lnx[307715]: [2021-08-30][00:52:18] | engine::index::writer | INFO  - [ WRITER @ posts ][ TRANSACTION 210321001 ] completed operation ADD-DOCUMENT
    Aug 30 00:52:19 torako lnx[307715]: [2021-08-30][00:52:19] | engine::index::writer | INFO  - [ WRITER @ posts ][ TRANSACTION 210321002 ] completed operation DELETE-TERM
    Aug 30 00:52:19 torako lnx[307715]: [2021-08-30][00:52:19] | engine::index::writer | INFO  - [ WRITER @ posts ][ TRANSACTION 210321003 ] completed operation ADD-DOCUMENT
    Aug 30 00:52:19 torako lnx[307715]: [2021-08-30][00:52:19] | engine::index::writer | INFO  - [ WRITER @ posts ][ TRANSACTION 210321004 ] completed operation ADD-DOCUMENT
    Aug 30 00:52:19 torako lnx[307715]: thread 'index-posts-worker-0' panicked at 'get executor', /root/lnx/engine/src/index/reader.rs:264:44
    Aug 30 00:52:19 torako lnx[307715]: note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
    Aug 30 00:52:19 torako lnx[307715]: [2021-08-30][00:52:19] | lnx::routes | WARN  - rejecting search index operation due to bad request: channel closed
    

    My index settings are like so:

    {
    "writer_buffer": 144000000,
    "writer_threads": 6,
    "reader_threads": 6,
    
    "max_concurrency": 12
    }
    

    Is it possible I exceeded the amount of concurrent requests allowed?

    opened by miyachan 6
  • Querying returns no results?

    Querying returns no results?

    I can't seem to get any results from lnx. I'm using commit e9804944edc8a7c0af24ee3ba8397b87f1640b5f. I built lnx using cargo build --release, then starting it with /usr/local/bin/lnx -p 4040.

    # cat a.json
    {
      "name": "my-index",
      "writer_buffer": 6000000,
      "writer_threads": 1,
      "reader_threads": 1,
      "max_concurrency": 10,
      "search_fields": [
        "title"
      ],
      "storage_type": "memory",
      "use_fast_fuzzy": true,
      "strip_stop_words": true,
      "fields": {
        "title": {
          "type": "text",
          "stored": true
        },
        "description": {
          "type": "text",
          "stored": true
        },
        "id": {
          "type": "u64",
          "indexed": true,
          "stored": true,
          "fast": "single"
        }
      },
      "boost_fields": {
        "title": 2,
        "description": 0.8
      }
    }
    # curl -X POST [email protected] -H "Content-Type: application/json" http://127.0.0.1:4040/indexes
    {"data":"index created","status":200}
    # cat b.json
    {
        "title": ["Hello, World"],
        "description": ["Welcome to the next generation system."]
    }
    # curl -X POST -H "Content-Type: application/json" [email protected] http://localhost:4040/indexes/my-index/documents?wait=true
    {"data":"added documents","status":200}
    # curl -X POST 'http://localhost:4040/indexes/my-index/commit'
    {"data":"changes committed","status":200}
    # curl 'http://localhost:4040/indexes/my-index/search?query=*'
    {"data":{"count":0,"hits":[],"time_taken":0.001682035974226892},"status":200}
    # curl 'http://localhost:4040/indexes/my-index/search?query=Hello'
    {"data":{"count":0,"hits":[],"time_taken":0.00014333099534269422},"status":200}
    

    I can't figure out what I'm doing wrong here.

    bug question 
    opened by miyachan 5
  • Mounting volume overwrites binary

    Mounting volume overwrites binary

    Hiya!

    I’m attempting to run LNX on Kubernetes as a stateful set, but running into an issue - the docs/example code suggests mounting a volume to etc/lnx but doing this in Kubernetes causes the contents of that path to be replaced with the attached volume, which means the binary can’t be found.

    Is there a way to parametise the storage path so that it doesn’t collide with binary path? πŸ™‚

    question storage 
    opened by FridgeSeal 4
  • Search Results cannot be sorted by a date field

    Search Results cannot be sorted by a date field

    Attempting to perform a search where the order_by field is a date leads to an error:

    {"data":"Schema error: 'Field \"ts\" is of type I64!=Date'","status":400}
    

    It looks this is because the FieldValue is implied to be i64:

    https://github.com/lnx-search/lnx/blob/8d38d386de3ecdb1bced4f55032673860feaccf4/engine/src/index/reader.rs#L549-L552

    opened by miyachan 4
  • Data corrupted when using fast_serach mode:

    Data corrupted when using fast_serach mode: "Failed to open field \"title\"'s term dictionary in the compos (truncated...)"

    Using master/0.9 beta and a INDEX with "use_fast_fuzzy": true results in corrupted data.

    minimum example

    1. create index with use_fast_fuzzy set to true:

    {
        "override_if_exists": true,
        "index": {
            "name": "products",
            "storage_type": "tempdir",
            "fields": {
                "title": {
                    "type": "text",
                    "stored": true
                }
            },
            "search_fields": [],
            "boost_fields": {},
            "reader_threads": 1,
            "max_concurrency": 1,
            "writer_buffer": 300000,
            "writer_threads": 1,
            "set_conjunction_by_default": false,
            "use_fast_fuzzy": true,
            "strip_stop_words": false,
            "auto_commit": 0
        }
    } 
    

    2. send a document missing one of the defined fields, or having the vaule a empty sting or "-" or "_"

    e.g: POST /indexes/products/documents with body {"title":""}

    3.

    POST /indexes/products/commit I get a error message: {"status":400,"data":"Data corrupted: 'Data corruption: : Failed to open field "title"'s term dictionary in the compos (truncated...)

    workaround / recover

    to recover from this I rebuild the index and add the document with a dummy value or I rebuild the index without using "use_fast_fuzzy".

    Use case

    The example above is not the real use case, usually titles on all my documents are set but they have some optional fields that are optional.

    opened by keywan-ghadami 3
  • Why is every doc field wrapped in an array in search results?

    Why is every doc field wrapped in an array in search results?

    For example:

    {
      "status": 200,
      "data": {
        "hits": [
          {
            "doc": {
              "author": [
                "248b2e6a-7c36-4da3-bcc4-55a979eb57dc"
              ],
              "id": [
                18
              ],
              "title": [
                "title 01"
              ],
              "uuid": [
                "06dbf5c7-d313-413d-8f65-49aed93e4031"
              ]
            },
            "document_id": "1628525110829290421",
            "score": 1.542423
          },
          {
            "doc": {
              "author": [
                "248b2e6a-7c36-4da3-bcc4-55a979eb57dc"
              ],
              "id": [
                19
              ],
              "title": [
                "title 02"
              ],
              "uuid": [
                "8da05387-8727-4a27-baa7-265af7558c0c"
              ]
            },
            "document_id": "1493516234521670736",
            "score": 1.542423
          },
          {
            "doc": {
              "author": [
                "248b2e6a-7c36-4da3-bcc4-55a979eb57dc"
              ],
              "id": [
                20
              ],
              "title": [
                "title 03"
              ],
              "uuid": [
                "3bf64ee1-f2ac-46ce-8e45-0d25956b195c"
              ]
            },
            "document_id": "9603160257558085701",
            "score": 1.542423
          }
        ],
        "count": 3,
        "time_taken": 0.000578893
      }
    }
    

    I think it would make much more sense to show the doc as it has been posted.

    enhancement 
    opened by kindlychung 3
  • Fuzzy field-based search with multiple terms

    Fuzzy field-based search with multiple terms

    Reading through the docs and the source code it seems like you can specify which fields you can search a specific term for, so you can issue a query like:

    {
        "query": [
             {"term": {"ctx": "Harry Potter", "fields": ["role"]}, "occur": "must"},
             {"term": {"ctx": "Daniel Radcliffe", "fields": ["actor"]}, "occur": "must"}
         ]
    }
    

    It would be really neat if it were possible to do the same for fuzzy queries so that something like this would be possible:

    {
       "query": [
            {"fuzzy": {"ctx": "Barry Potter", "fields": ["role"]}, "occur": "must"},
            {"fuzzy": {"ctx": "Daniel Radclif", "fields": ["actor"]}, "occur": "must"}
        ]
    }
    

    To put this in a little more perspective in terms of a use case, suppose documents of movies, actors, and roles. I might have heard that the lead character of a movie is called Harry Potter but have no idea what movie this character belongs in, but I do want to know who the actor is.

    If I were to create an index with the fuzzy method, I could create an index across all 3 fields, but when I search for Harry Potter I will get a bunch of results of actors on account of the movie being called Harry potter and the ...

    Alternatively I could create separate indexes for each of these and search the individual index, but then I run into the problem that once I do have more information (like say movie name or actor name), I would have to compute a likelihood score myself from results of searching multiple indexes.

    enhancement 0.10.0 
    opened by fliepeltje 2
  • Add #[repr(c)] to anything serialized with Rkyv.

    Add #[repr(c)] to anything serialized with Rkyv.

    We should mark everything that gets serialized and deserialized with rkyv with the C layout repr, otherwise this could cause us some issues later on if rust's memory layout changes.

    opened by ChillFish8 0
  • Add index reader

    Add index reader

    Now that we have the ability to write and produce index segments, we want to create a reader which can open a segment, and handle the deletes and query execution.

    opened by ChillFish8 1
  • Add new tantivy directory for merging combined segments.

    Add new tantivy directory for merging combined segments.

    Currently, we can only combine two or more segments into one another, which although works, can make our index inefficient.

    We should create a directory that can read from the segment, and split it out into a temp directory (this can be done with the writer directory) then tell tantivy to process all of the deletes marked within the index (providing it is safe to do so, see below) and then re-export the directory to a new segment.

    Issue notes

    • Deletes are not attached to the index that actually contains the documents intending to be deleted so we cannot blindly remove deletes.
    • Some deletes may occur after a new document has been inserted and should not affect the new documents.
    opened by ChillFish8 0
Releases(0.9.0-master)
  • 0.9.0-master(Oct 5, 2022)

    v0.9.0 Master

    This is a release cut from the current master branch before the 0.10 work begins.

    What's Changed

    • LNX-97: Add Cargo.lock by @ChillFish8 in https://github.com/lnx-search/lnx/pull/98
    • Fix non-stored fields appearing as nulls in JSON search response by @oka-tan in https://github.com/lnx-search/lnx/pull/100
    • Use snowball stemmer stop words for some languages by @saroh in https://github.com/lnx-search/lnx/pull/102

    New Contributors

    • @oka-tan made their first contribution in https://github.com/lnx-search/lnx/pull/100
    • @saroh made their first contribution in https://github.com/lnx-search/lnx/pull/102

    Full Changelog: https://github.com/lnx-search/lnx/compare/0.9.0...0.9.0-master

    Source code(tar.gz)
    Source code(zip)
  • 0.9.0(Jun 25, 2022)

    Version 0.9.0

    This is a breaking release and will require you you re-index your data and re-create schemas.

    It's been a little while since the last release, 0.9 isn't a huge release however, a lot of work has gone into preparing for 0.10 which hopefully should add high availability to the search instances.

    What's New

    • Synonym support is now added! Finally! My life is complete! I can Retire! On a more serious note yes it's added and can be adjusted using the /indexes/:index/synonyms endpoint via POST, GET and DELETE requests respectively. there's also a /indexes/:index/synonyms/clear DELETE endpoint which allows you to clear all synonyms. The syntax for adding synonyms is a semi-relation structure where you provide a list of strings in the format of <word>,<word>:<synonym>, <synonym>,<synonym> which will set all the words on the left of the : to have the given synonyms. This allows you to define synonyms fairly easy for related words e.g. iphone,apple,phone:apple,phone,iphone
    • All loaded stop words can be viewed via the /indexes/:index/stopwords GET endpoint.
    • Returned documents are now converted to be in line with the defined schema i.e fields with multi set to false will be returned as single values and if no value is set it will be returned as null rather than just missing the field entirely.
    • You can now mark fields as required which will cause lnx to reject any documents that are missing required fields (this will also reject fields that are provided but have empty values i.e "foo": []

    What's Different

    • Fields now have a multi field attribute to set them as being multi-value. If they're not multi-value but multiple values are provided, the system will take the last value in the array.
    • The fast attribute for fields is now a bool rather than single/multi because generally, it was a bit confusing to users to know when they wanted single or when they wanted multi, now this is an internal thing. You only have to worry about saying if you want it to be a fast field or not.
    • Fast fuzzy now scores by edit distance and the BM25 score, making for much better relevancy when searching.
    • Fast fuzzy uses traditional word -> terms lookups vs the compound correction

    What's Fixed

    • lnx will now return an error if you try and sort by multi-value fields, this was a panic if you had the fast-field cardinality set correctly.

    What's Changed

    • Improve schema validations and Query logic in https://github.com/lnx-search/lnx/pull/68
    • Implement schema conversion to returned docs and general improvements in https://github.com/lnx-search/lnx/pull/69
    • Cleanup query info, hint and add synonym support in https://github.com/lnx-search/lnx/pull/71
    • Altered the way fast-fuzzy queries are produced and scored https://github.com/lnx-search/lnx/pull/95
    • Moved from a custom fork of SymSpell to dedicated Compose repo https://github.com/lnx-search/lnx/pull/96
    • Added a fields attribute for fuzzy queries allowing for selective field searches, as part of https://github.com/lnx-search/lnx/pull/93

    New Contributors

    • @onerandomusername made their first contribution in https://github.com/lnx-search/lnx/pull/88

    Full Changelog: https://github.com/lnx-search/lnx/compare/0.8.1...0.9.0

    Source code(tar.gz)
    Source code(zip)
  • 0.9.0-beta(Jan 23, 2022)

    Version 0.9.0-beta

    This is a beta version/pre-release of version 0.9.0 so that people are to make use and test some of the nicer quality of life changes like the new schema conversion, more sane defaulting behaviour and synonym support.

    This release won't have any documentation to go with it directly as technically it's a pre-release. That being said this is a breaking release and will require you you re-index your data and re-create schemas.

    What's new

    • Synonym support is now added! Finally! My life is complete! I can Retire! On a more serious note yes it's added and can be adjusted using the /indexes/:index/synonyms endpoint via POST, GET and DELETE requests respectively. there's also a /indexes/:index/synonyms/clear DELETE endpoint which allows you to clear all synonyms. The syntax for adding synonyms is a semi-relation structure where you provide a list of strings in the format of <word>,<word>:<synonym>, <synonym>,<synonym> which will set all the words on the left of the : to have the given synonyms. This allows you to define synonyms fairly easy for related words e.g. iphone,apple,phone:apple,phone,iphone
    • All loaded stop words can be viewed via the /indexes/:index/stopwords GET endpoint.
    • Returned documents are now converted to be in line with the defined schema i.e fields with multi set to false will be returned as single values and if no value is set it will be returned as null rather than just missing the field entirely.
    • You can now mark fields as required which will cause lnx to reject any documents that are missing required fields (this will also reject fields that are provided but have empty values i.e "foo": []

    What's different

    • Fields now have a multi attribute to set them as being multi-value. If they're not multi-value but multiple values are provided, the system will take the last value in the provided array.
    • The fast attribute for fields are now a bool rather than single/multi this because generally, it was a bit confusing to users to know when they wanted single or when they wanted multi, now this is an internal thing and you only have to worry about saying if you want it to be a fast field or not.

    What's fixed

    • lnx will now return an error if you try and sort by multi-value fields, this was a panic if you had the fast-field cardinality set correctly.

    What's Changed

    • Improve schema validations and Query logic in https://github.com/lnx-search/lnx/pull/68
    • Implement schema conversion to returned docs and general improvements in https://github.com/lnx-search/lnx/pull/69
    • Cleanup query info, hint and add synonym support in https://github.com/lnx-search/lnx/pull/71

    Full Changelog: https://github.com/lnx-search/lnx/compare/0.8.1...0.9.0-beta

    Source code(tar.gz)
    Source code(zip)
  • 0.8.1(Jan 18, 2022)

  • 0.8.0(Jan 12, 2022)

    Version 0.8.0

    Version 0.8.0 brings a considerable set of improvements making it the best version to use for production applications. Several bug fixes, logging improvements and debugging measures have been improved as well as some of the issues surrounding fast-fuzzy.

    What's new

    • Local docs removed, We no longer serve the local openapi copy of the docs on the /docs endpoint. This became quite a burden to maintain and keep up to date in two places rather than just redirecting to https://docs.lnx.rs which now supports previous version via the ?version=<major>.<minor> flag e.g. https://docs.lnx.rs?version=0.8
    • Queries are now followed the type: { <context> } pattern for payloads rather than having a base value field, and then having each query kind inconsistently require additional context. See below for an example.
    • Delete by query mode, this will delete the matched documents based on your search query. This respects the limits and offsets you give the query so to delete all you may need to send several requests.
    • Delete specific document endpoint, this allows you to delete a document via DELETE /indexes/:index/documents/:document_id.
    • Indexes are now backed by sled for index metadata storage. This is incredibly useful going forward ensuring atomic behaviour when writing to and from disk with the fast-fuzzy spell correction system. This is a breaking change however, the data sub-folder of this new directory is a fully Tantivy compatible index. Theoretically, you can mount a custom program directly to this folder in order to perform actions.
    • Fast-fuzzy garbage collection, this means that the frequency dictionaries are adjusted again when deleting documents which should prevent potential relevancy loss when working with a system that has a high update rate. The adjustments are only made when calling commit or auto-commit runs the operation.
    • Tracing information, we have moved from log to tracing providing significantly more information for debugging and profiling in future releases, most notably this gives us the ability to add OpenTelemetry tracing later on.
    • JSON log files (--json-logs) lnx can now produce line-by-line json logs where each object is a new log event. This is an amazing addition for anyone using an ingestion system or wanting to parse the logs.
    • Pretty logs (--pretty-logs) for the extra flamboyant users. This makes reading the logs much easier / prettier but at the cost of taking up quite a few more lines per event. Looks nice though.
    • Verbose logs (--verbose-logs) adds additional metadata to each log including thread-name and thread-id.
    • Disable ASNI colours (--disable-asni-logs) This is mostly required for logging to files or ingestion systems. Doesn't look as nice though without the colours :(
    • Logging Directory (--log-directory <dir>) This replaces the --log-file attribute and instead produces hourly log files in the directory, formatted using the aforementioned flags.
    • RUST_LOG env var support. For more control over what to log and what not to log you can directly adjust the RUST_LOG env var which will override the log-level flag or env var. By default lnx will set this to be <log-level-flag>,compress=off,tantivy=info
    • Snapshot support added snapshot support is now supported, this is essentially a wrapper around zipping the index files up and unzipping them again. You can set an automatic snapshot with the --snapshot-interval <interval> flag, take a single snapshot with the --snapshot subcommand, adjust the output directory with the --snapshot-directory <dir> flag and finally load snapshots with the --load-snapshot <file> flag. Each snapshot is generated with the name format of snapshot-<utc-timestamp>-lnx-v<lnx-version> it's important to note that older snapshots may not be compatible with future lnx releases. Although this should be avoided between most versions.

    What's been fixed

    • lnx now handles interrupt signals better and correctly cleans up indexes more reliably, this should prevent dangling locks in future.

    What's been removed

    • --log-file has been removed in favour of --log-directory powered by tracing.
    • The storage type memory has been depreciated/downgraded, the system will now treat this the same as tempdir until it's removed in future versions. This came out of reasoning that realistically theirs not much difference between the two other than tempdir is more reliable on bigger indexes allowing the OS to page in and out of disk.
    • The original --pretty-logs flag has been changed from disabling ASNI colours to Enabling pretty logging, this used to be pretty unintuitive before but now this actually does what the name suggests.

    New Query Pattern

    Before your query style would be

    {
      "query": {
        "value": "foo",
        "kind": "fuzzy"
      }
    }
    

    now it's

    {
      "query": {
        "fuzzy": { "ctx": "foo" }
      }
    }
    

    For more info see https://docs.lnx.rs/?version=0.8#tag/Run-searches

    Source code(tar.gz)
    Source code(zip)
  • 0.7.1(Dec 20, 2021)

    Version 0.7.1

    This adds no new features but does fix the docs not loading the new openapi spec and also reduces memory consumption for the fast-fuzzy system by about 20-40% depending on index size.

    Source code(tar.gz)
    Source code(zip)
  • 0.7.0(Nov 21, 2021)

    Version 0.7.0

    0.7 brings with it a lot of quality of life changes, bug fixes and features. Unlike previous releases, this has backwards compatibility with 0.6.x systems.

    What's New

    • Sentence suggestion endpoint: This gives you the ability to suggest sentences based on the corpus data, although this does not guarantee that the corrections will be correct according to the language, instead it will correct words to inline with sentences within the corpus data itself. (This is not a Grammarly system)
    • Multi-Field term handling: This allows you to now specify multiple fields for a single term by passing an array of field names rather than a single string on the term query kind.
    • Sensible Defaults: This now allows you to skip the writer_threads and writer_buffer fields should you choose and a sensible set of defaults will be calculated based on your current system's specs. This typically will allocate either n number of threads where n is the number of logical CPU cores or 8 is the absolute max. The writer buffer is generally going to be 10% of your total memory or the bare minimum buffer size for the number of threads, whatever is higher.
    • New Allocator: We now use MiMalloc allocator which not only means performance consistency across operating systems and containers but also adds a slight boost to performance.
    • Auto Commit: You can now set an auto_commit value on an index in seconds determining how much time no more operations be submitted should elapse before lnx automatically begins processing and committing documents. If this value is 0 (default) this is disabled.

    What's Fixed

    • Index frequencies now save properly: The bug where index frequencies were not correctly being persisted to and from disk has been fixed.
    • lnx no longer uses 100% of one core per index: Before for every index or more specifically writer-actor, there was it would max that given thread out due to an infinite loop of checking the channels without blocking.

    Full Changelog: https://github.com/lnx-search/lnx/compare/0.6.2...0.7.0

    Source code(tar.gz)
    Source code(zip)
  • 0.6.2(Oct 13, 2021)

    Fixes whitelisting indexes to given index tokens.

    Before the system just completely ignores the field which is reasonably insecure, in 0.6.2 this is now fixed and will reject any unauthorized token with a 401.

    Source code(tar.gz)
    Source code(zip)
  • 0.6.1(Oct 12, 2021)

    This patch fixes the 422 status not presenting itself on a validation error and reformats the codebase. This should have been in 0.6.0 but I was a bit too trigger happy with the publish button.

    No other changes have been made

    Source code(tar.gz)
    Source code(zip)
  • 0.6.0(Oct 12, 2021)

    Version 0.6.0

    0.6 is the biggest update we've released since the initial launch of 0.1, this brings with it a complete redesign of the engine, fast fuzzy system and server design. This has vastly improved the performance and most importantly maintainability of the codebase before 0.6 was starting to show issues of too many things doing the same thing in different .laces. We also took the opportunity during the redesign to cut out several dependencies and requirements e.g. sqlx with sqlite3 and axum which while they worked fine, added a lot of dependencies for a setup we didn't need them for, hence why they were dropped and moved to more lightweight or existing solutions (axum was moved to just hyper + routerify and sqlx was replaced with tantivy's inbuilt storage system).

    What's New?

    The new engine brings many many breaking changes, but we believe they're worth it!

    • Facet fields - hierarchical facets are now supported and can be added by the facet field type and access like a file path e.g. /tools/hammers via a Term query.
    • Term queries - These work similar to the Normal mode except that they are not fed through the parser so query values will be treated as literal strings. This is especially useful for the new facet fields.
    • Combination queries - You can now construct any combination of query kinds and adjust if they should, must or must not appear in matched documents etc... This allows you to truly create any range of queries you need. I cannot stress enough how awesome this feature is after playing around with it in testing.
    • Forgiving values - Lnx will now attempt to convert values into their required type if possible before rejecting a request so things like "3" become 3 for integer fields and DateTime can be converted from a UTC timestamp or formatted string.
    • Single or Multi-value inputs - Lnx now supports the ability to apply operations with a single or multiple values, this includes things like delete queries which now take all fields into consideration (although these are treated as an OR not an AND)
    • Reversible results - You can now order your results in ascending to descending order.
    • Togglable search request logging - We understand that not everyone wants to completely disable info logging just to stop logging every search request so we've made it an optional flag to pass (--silent-search)

    What's Changed?

    It's not just new things being added! We've overhauled the existing designs as well!

    • Queries moved from GET -> POST - Queries are now done via POST requests as to allow for the new combination query system.
    • Deletes now support multi-field and multi-value options - You can now delete by several fields and values at once rather than doing many individual requests.
    • Bulk insertion optimisations - Bulk documents are no longer handled one by one internally in channels which allows for mild performance improvements for large payloads.
    • Unit tests - Yes that's right, we actually have some now! And more on the way. This should hopefully help get us closer to release and make sure everything is running smoother.
    • Smaller binary size - Docker images and the like now come in at almost 3x smaller sizes (Only 3.4MB!)
    • Changeable stop words - You now change what stop words are used on a per-index basis, otherwise a sensible multi-language set of defaults are used.
    • Fast fuzzy no longer uses pre-set dicts - This is a major change for our fast fuzzy system, using the document frequencies instead of pre-setting defaults has reduced memory usage from a constant 1.3 - 2GB of memory to a couple of hundred MB on large indexes. (NOTE: This will increase the more documents are added with unique words, so it is a good idea to every so often to reload the index and re-upload docs, this will lower resource usage and also improve relevancy)
    • More in-depth permissions - Permissions have been moved to bit fields which allow for some more fine-grain control of access.

    What's Gone?

    Alas, not everything has stayed the same, with some of the framework changes some things were removed.

    • TLS support, ultimately it was decided that Lnx should be behind a reverse proxy anyway / internally used so TLS doe not serve many purposes.

    What's Changed

    • sync schema from the index rather than the loader by @ChillFish8 in https://github.com/lnx-search/lnx/pull/20
    • Custom executor handler by @ChillFish8 in https://github.com/lnx-search/lnx/pull/22
    • 0.6.0 engine by @ChillFish8 in https://github.com/lnx-search/lnx/pull/27

    Full Changelog: https://github.com/lnx-search/lnx/compare/0.5.0...0.6.0

    Source code(tar.gz)
    Source code(zip)
  • 0.5.1(Aug 31, 2021)

  • 0.5.0(Aug 28, 2021)

    This update is a breaking change from 0.4.0 in the area of removing documents.

    What's new

    • Multivalue and single-value inserts are now supported. Things like {"title": "foo"} are supported now.
    • Deleting documents now expect single-value entries not multi-value. This doesn't change the behaviour due to only the first value in the old version which leads to some confusing behaviour.
    • Lax values are now supported, so things like date fields can be any of an i64 timestamp, u64 timestamp or an RFC 3339 formatted string.
    • If a field has an incompatible value according to the schema and cannot be converted, an error is returned instead of erroring in logs but returning 200 OK.

    What's fixed

    • Date fields are now correctly handled on upload.
    • The writer-actor no longer panics if an invalid type is given to the document which is different to the schema defined type.
    Source code(tar.gz)
    Source code(zip)
  • 0.4.1(Aug 27, 2021)

    This is a small fix that prevents the un-expected behaviour when fast-fuzzy is disabled on the server but attempting to be enabled on the index.

    Fixed

    • This fixes situations like #14 .
    Source code(tar.gz)
    Source code(zip)
  • 0.4.0(Aug 26, 2021)

    0.4 is out! This brings with it a massive set of performance improvements and relevancy options to tune to your linking πŸ˜„

    What's new

    • Fast-Fuzzy: A hyper optimised mode for search as you type experiences, this uses pre-computational spell-correction for high-speed correction, this improves performance by about 10x (both throughput and latency). This is an opt-in feature via --enable-fast-fuzzy and then the "use_fast_fuzzy": true on index creation payload.
    • Stop words: This was introduced to try to increase the search relevancy, before the system would be matching 17,000 results out of 20,000 just because you included words like the etc... Now if the system detects more than 1 word, providing that they are not all stop words; Any stop words will be removed from the query. (This can be toggled on a per-index basis using strip_stop_words defaults to false)

    Breaking behaviour

    • The system on both fast-fuzzy and more-like-this queries will have a much different performance characteristic now where some common results you might use for testing will now be invalid.
    • The system uses much higher memory when fast-fuzzy is enabled.

    Notes on relevancy

    • The fast fuzzy system is almost at the same level as the current default (Levenshtein distance) if not maybe a little better in places, especially in non-English languages.

    Details for nerds πŸ€“

    • We used the symspell algorithm along with pre-computed frequency dictionaries to do spell correction over Levenshtein distance which corrects entire sentences in the time it takes the traditional method to do one word.
    • The frequency dictionaries are made from traditional word dictionaries and the google n-gram corpus, merging these two gives us correctly spelt frequency dicts.
    • The jump in performance is roughly from 400 searches a second to 4000 searches a second (this was done on the small movies dataset, a larger dataset with around 2 million documents was also used which produced a similar growth in performance).
    Source code(tar.gz)
    Source code(zip)
  • 0.4-alpha(Aug 25, 2021)

    This is the first experimental release of 0.4 this includes a couple of breaking changes and some new features.

    WARNING: This is an experimental build, and should not be relied upon for production systems

    What's new

    • Fast-Fuzzy: A hyper optimised mode for search as you type experiences, this uses pre-computational spell-correction for high-speed correction, this improves performance by about 10x (both throughput and latency). (This is an opt-in feature via --enable-fast-fuzzy and then the "use_fast_fuzzy": true on index creation payload.)
    • Stop words: This was introduced to try to increase the search relevancy, before the system would be matching 17,000 results out of 20,000 just because you included words like the etc... Now if the system detects more than 1 word, providing that they are not all stop words; Any stop words will be removed from the query. (This is currently not changeable)

    Breaking behaviour

    • The system on both fast-fuzzy and more-like-this queries will have a much different performance characteristic now where some common results you might use for testing will now be invalid.
    • The system uses much higher memory when fast-fuzzy is enabled.

    Notes on relevancy

    • The fast fuzzy system is almost at the same level as the current default (Levenshtein distance) if not maybe a little better in places, especially in non-English languages.

    Details for nerds πŸ€“

    • We used the symspell algorithm along with pre-computed frequency dictionaries to do spell correction over Levenshtein distance which corrects entire sentences in the time it takes the traditional method to do one word.
    • The frequency dictionaries are made from traditional word dictionaries and the google ngram corpus, merging these two gives us correctly spelt frequency dicts.
    • The jump in performance is roughly from 400 searches a second to 4000 searches a second (this was done on the small movies dataset, a larger dataset with around 2 million documents was also used which produced a similar growth in performance).
    Source code(tar.gz)
    Source code(zip)
  • 0.3.0(Aug 20, 2021)

    What's changed

    • Getting a document directly has been changed from an int-int format to being a singular u64 integer, this is returned as a string for compatibility with languages where overflowing may occur when parsing (JS). This type of change has also be reflected in any other areas where you supply the document id.
    • Searching via mode=more-like-this has changed from expecting a query name called ref_document to just document and expects a document id.
    • Returned results now return a document_id instead of ref_address to make it easier to understand its purpose.
    • Specialised field name has been added _id if you define this in your schema, the system will ignore it and add its own.

    What's been fixed

    • Searching by document / more-like-this queries now work, before this would lead to a panic if the searcher wasn't the original searcher that retrieved the document.
    • Getting a document directly no longer panics.
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Aug 18, 2021)

    The first release of lnx!

    This release includes:

    • Standard Queries, Fuzzy Queries, More-like-this queries.
    • Token-based authorization.
    • TLS support.
    • Multiple storage backend choices
    • Order by sorting
    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Aug 18, 2021)

Owner
lnx
The high performance search engine written in Rust, powered by giants.
lnx
Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust

Tantivy is a full text search engine library written in Rust. It is closer to Apache Lucene than to Elasticsearch or Apache Solr in the sense it is no

tantivy 7.4k Dec 28, 2022
Tantivy is a full text search engine library written in Rust.

Tantivy is a full text search engine library written in Rust. It is closer to Apache Lucene than to Elasticsearch or Apache Solr in the sense it is no

Quickwit OSS 7.4k Dec 30, 2022
Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust

Tantivy is a full-text search engine library written in Rust. It is closer to Apache Lucene than to Elasticsearch or Apache Solr in the sense it is no

Quickwit OSS 7.5k Jan 9, 2023
A simple and lightweight fuzzy search engine that works in memory, searching for similar strings (a pun here).

simsearch A simple and lightweight fuzzy search engine that works in memory, searching for similar strings (a pun here). Documentation Usage Add the f

Andy Lok 116 Dec 10, 2022
Shogun search - Learning the principle of search engine. This is the first time I've written Rust.

shogun_search Learning the principle of search engine. This is the first time I've written Rust. A search engine written in Rust. Current Features: Bu

Yuxiang Liu 5 Mar 9, 2022
ik-analyzer for rust; chinese tokenizer for tantivy

ik-rs ik-analyzer for Rust support Tantivy Usage Chinese Segment let mut ik = IKSegmenter::new(); let text = "δΈ­εŽδΊΊζ°‘ε…±ε’Œε›½"; let tokens = ik.to

Shen Yanchao 4 Dec 26, 2022
Lightning Fast, Ultra Relevant, and Typo-Tolerant Search Engine

MeiliSearch Website | Roadmap | Blog | LinkedIn | Twitter | Documentation | FAQ ⚑ Lightning Fast, Ultra Relevant, and Typo-Tolerant Search Engine ?? M

MeiliSearch 31.6k Dec 31, 2022
πŸ”TinySearch is a lightweight, fast, full-text search engine. It is designed for static websites.

tinysearch TinySearch is a lightweight, fast, full-text search engine. It is designed for static websites. TinySearch is written in Rust, and then com

null 2.2k Dec 31, 2022
Searching for plain-text files for lines that match a given string. Built with Rust.

Getting Started This is a minimal grep command-line utility built on Rust. It provides searching for plain-text files for lines that match a given str

Harsh Karande 0 Dec 31, 2021
πŸ”Ž Search millions of files at lightning-fast speeds to find what you are looking for

?? Search millions of files at lightning-fast speeds to find what you are looking for

Shiv 22 Sep 21, 2022
High-performance log search engine.

NOTE: This project is under development, please do not depend on it yet as things may break. MinSQL MinSQL is a log search engine designed with simpli

High Performance, Kubernetes Native Object Storage 359 Nov 27, 2022
Perlin: An Efficient and Ergonomic Document Search-Engine

Table of Contents 1. Perlin Perlin Perlin is a free and open-source document search engine library build on top of perlin-core. Since the first releas

CurrySoftware GmbH 70 Dec 9, 2022
AI-powered search engine for Rust

txtai: AI-powered search engine for Rust txtai executes machine-learning workflows to transform data and build AI-powered text indices to perform simi

NeuML 69 Jan 2, 2023
A full-text search engine in rust

Toshi A Full-Text Search Engine in Rust Please note that this is far from production ready, also Toshi is still under active development, I'm just slo

Toshi Search 3.8k Jan 7, 2023
Cross-platform, cross-browser, cross-search-engine duckduckgo-like bangs

localbang Cross-platform, cross-browser, cross-search-engine duckduckgo-like bangs What are "bangs"?? Bangs are a way to define where to search inside

Jakob Kruse 7 Nov 23, 2022
A Rust API search engine

Roogle Roogle is a Rust API search engine, which allows you to search functions by names and type signatures. Progress Available Queries Function quer

Roogle 342 Dec 26, 2022
Configurable quick search engine shortcuts for your terminal and browser.

Quicksearch Configurable quick search engine shortcuts for your terminal and browser. Installation Run cargo install quicksearch to install Configurat

Rahul Pai 2 Oct 14, 2022
Python bindings for Milli, the embeddable Rust-based search engine powering Meilisearch

milli-py Python bindings for Milli, the embeddable Rust-based search engine powering Meilisearch. Due to limitations around Rust lifecycles, methods a

Alexandro Sanchez 92 Feb 21, 2023
πŸ”Ž Impossibly fast web search, made for static sites.

Stork Impossibly fast web search, made for static sites. Stork is two things. First, it's an indexer: it indexes your loosely-structured content and c

James Little 2.5k Dec 27, 2022