Quickwit - the next-gen search & analytics engine built for logs

Overview

CI codecov Contributor Covenant License: AGPL V3 Twitter Follow Discord Rust



Quickwit Quickwit

Search more with less

The new way to manage your logs at any scale

Quickstart | Docs | Tutorials | Chat | Download


Disclaimer: you are reading the README of Quickwit 0.3 version that will be shipped by the end of April 2022.

Quickwit is the next-gen search & analytics engine built for logs. It is a highly reliable & cost-efficient alternative to Elasticsearch.



💡 Features

  • Index data persisted on object storage
  • Ingest JSON documents with or without a strict schema
  • Ingest & Aggregation API Elasticsearch compatible
  • Lightweight Embedded UI
  • Runs on a fraction of the resources: written in Rust, powered by the mighty tantivy
  • Works out of the box with sensible defaults
  • Optimized for multi-tenancy. Add and scale tenants with no overhead costs
  • Distributed search
  • Cloud-native: Kubernetes ready
  • Add and remove nodes in seconds
  • Decoupled compute & storage
  • Sleep like a log: all your indexed data is safely stored on object storage (AWS S3...)
  • Ingest your documents with exactly-once semantics
  • Kafka-native ingestion
  • Search stream API that notably unlocks full-text search in ClickHouse

🔮 Upcoming Features

  • Ingest your logs from your object storage
  • Distributed indexing
  • Support for tracing
  • Native support for OpenTelemetry

Uses & Limitations

  When to use   When not to use
Your documents are immutable: application logs, system logs, access logs, user actions logs, audit trail, etc. Your documents are mutable.
Your data has a time component. Quickwit includes optimizations and design choices specifically related to time. You need a low-latency search for e-commerce websites.
You want a full-text search in a multi-tenant environment. You provide a public-facing search with high QPS.
You want to index directly from Kafka. You want to re-score documents at query time.
You want to add full-text search to your ClickHouse cluster.
You ingest a tremendous amount of logs and don't want to pay huge bills.
You ingest a tremendous amount of data and you don't want to waste your precious time babysitting your cluster.

Getting Started

Let's download and install Quickwit.

curl -L https://install.quickwit.io | sh

You can now move this executable directory wherever sensible for your environment and possibly add it to your PATH environment. You can also install it via other means.

Take a look at our Quick Start to do amazing things, like Creating your first index or Adding some documents, or take a glance at our full Installation guide!

📚 Tutorials

💬 Community

🙋 FAQ

How is Quickwit different from traditional search engines like Elasticsearch or Solr?

The core difference and advantage of Quickwit is its architecture that is built from the ground up for cloud and logs. Optimized IO paths make search on object storage sub-second and thanks to the true decoupled compute and storage, search instances are stateless, it is possible to add or remove search nodes within seconds. Last but not least, we implemented a highly-reliable distributed search and exactly-once semantics during indexing so that all engineers can sleep at night.

How does Quickwit compare to Elastic in terms of cost?

We estimate that Quickwit can be up to 10x cheaper on average than Elastic. To understand how, check out our blog post about searching the web on AWS S3.

What license does Quickwit use?

Quickwit is open-source under the GNU Affero General Public License Version 3 - AGPLv3. Fundamentally, this means that you are free to use Quickwit for your project, as long as you don't modify Quickwit. If you do, you have to make the modifications public. We also provide a commercial license for enterprises to provide support and a voice on our roadmap.

What is Quickwit's business model?

Our business model relies on our commercial license. There is no plan to become SaaS in the near future.

🪄 Third-Party Integration

quickwit_inc quickwit_inc   quickwit_inc    quickwit_inc quickwit_inc     quickwit_inc   quickwit_inc    quickwit_inc

🤝 Contribute and spread the word

We are always super happy to have contributions: code, documentation, issues, feedback, or even saying hello on discord! Here is how you can get started:

And to thank you for your contributions, claim your swag by emailing us at hello at quickwit.io.

🔗 Reference

Comments
  • Bug in quickwit search stream `StorageDirectory only supports async reads`

    Bug in quickwit search stream `StorageDirectory only supports async reads`

    Copy pasted from https://github.com/quickwit-oss/quickwit/discussions/1357#discussioncomment-2687107 am able to ingest data in quickwit and search . However when I search using curl command , I am getting read async error. What could go wrong here. heena@Clickhouse1:~/quickwit-v0.2.1$ ./quickwit index search --index hackernews_5 --query Ambulance 2022-05-04T13:25:27.169Z ERROR quickwit_directories::storage_directory: path="6dc68fd1122c44a985ccf5348907c5f8.term" msg="Unsupported operation. StorageDirectory only supports async reads" 2022-05-04T13:25:27.171Z ERROR quickwit_directories::storage_directory: path="ccf34dbac4614904b1124b751756dab8.term" msg="Unsupported operation. StorageDirectory only supports async reads" { "numHits": 1, "hits": [ { "by": [ "sgk284" ], "id": [ 2923885 ], "kids": [ 2923989, 2925247, 2924320, 2925442, 2924224, 2923994, 2924209, 2924702, 2925235, 2925010, 2924319, 2924638, 2925781, 2923943, 2924298 ], "score": [ 622 ], "text": [ "" ], "time": [ 1314251037 ], "title": [ "Icon Ambulance" ], "type": [ "story" ], "url": [ "https://plus.google.com/107117483540235115863/posts/gcSStkKxXTw" ] } ], "elapsedTimeMicros": 77324, "errors": [ "SplitSearchError { error: \"Internal error:An IO error occurred: 'Unsupported operation. StorageDirectory only supports async reads: \"ccf34dbac4614904b1124b751756dab8.term\"'.\", split_id: \"01G26NHMCV1BAP61AS006H7A75\", retryable_error: true }", "SplitSearchError { error: \"Internal error:An IO error occurred: 'Unsupported operation. StorageDirectory only supports async reads: \"6dc68fd1122c44a985ccf5348907c5f8.term\"'.\", split_id: \"01G26NK8YX0DM4YSVH6J9YD1GN\", retryable_error: true }" ] } The output with curl command to search the same keyword. heena@Clickhouse1:~/quickwit-v0.2.1$ curl "http://0.0.0.0:7280/api/v1/hackernews_5/search/stream?query=Ambulance&outputFormat=csv&fastField=id" curl: (18) transfer closed with outstanding read data remaining heena@Clickhouse1:~/quickwit-v0.2.1$

    Attached the console logs when queried the commands ,This might be helpful

    2022-05-04T13:24:03.927Z ERROR search_adapter:leaf_search_stream:leaf_search_stream: quickwit_search::search_stream::leaf: Failed to send leaf search stream result. Stop sending. Cause: channel closed
    2022-05-04T13:24:13.927Z  INFO quickwit_serve::rest: search_stream index_id=hackernews_5 request=SearchStreamRequestQueryString { query: "google", search_fields: None, start_timestamp: None, end_timestamp: None, fast_field: "id", output_format: ClickHouseRowBinary, partition_by_field: None }
    2022-05-04T13:24:13.927Z  INFO search_adapter:leaf_search_stream: quickwit_search::service: leaf_search index="hackernews_5" splits=[SplitIdAndFooterOffsets { split_id: "01G26NHEB10T2DX37288EKX0SJ", split_footer_start: 270323695, split_footer_end: 278910648 }, SplitIdAndFooterOffsets { split_id: "01G26NHMCV1BAP61AS006H7A75", split_footer_start: 2678183120, split_footer_end: 2678792526 }, SplitIdAndFooterOffsets { split_id: "01G26NK8YX0DM4YSVH6J9YD1GN", split_footer_start: 349970236, split_footer_end: 350048435 }]
    2022-05-04T13:24:13.968Z ERROR search_adapter:leaf_search_stream:leaf_search_stream:leaf_search_stream_single_split{split_id=01G26NHMCV1BAP61AS006H7A75}:warmup: quickwit_directories::storage_directory: path="ccf34dbac4614904b1124b751756dab8.term" msg="Unsupported operation. StorageDirectory only supports async reads"
    2022-05-04T13:24:13.969Z ERROR search_adapter:leaf_search_stream:leaf_search_stream:leaf_search_stream_single_split{split_id=01G26NK8YX0DM4YSVH6J9YD1GN}:warmup: quickwit_directories::storage_directory: path="6dc68fd1122c44a985ccf5348907c5f8.term" msg="Unsupported operation. StorageDirectory only supports async reads"
    2022-05-04T13:24:13.970Z  INFO search_adapter:leaf_search_stream: quickwit_search::service: leaf_search index="hackernews_5" splits=[SplitIdAndFooterOffsets { split_id: "01G26NHEB10T2DX37288EKX0SJ", split_footer_start: 270323695, split_footer_end: 278910648 }, SplitIdAndFooterOffsets { split_id: "01G26NHMCV1BAP61AS006H7A75", split_footer_start: 2678183120, split_footer_end: 2678792526 }, SplitIdAndFooterOffsets { split_id: "01G26NK8YX0DM4YSVH6J9YD1GN", split_footer_start: 349970236, split_footer_end: 350048435 }]
    2022-05-04T13:24:13.972Z ERROR search_adapter:leaf_search_stream:leaf_search_stream: quickwit_search::search_stream::leaf: Failed to send leaf search stream result. Stop sending. Cause: channel closed
    2022-05-04T13:24:14.006Z ERROR search_adapter:leaf_search_stream:leaf_search_stream:leaf_search_stream_single_split{split_id=01G26NHMCV1BAP61AS006H7A75}:warmup: quickwit_directories::storage_directory: path="ccf34dbac4614904b1124b751756dab8.term" msg="Unsupported operation. StorageDirectory only supports async reads"
    2022-05-04T13:24:14.006Z ERROR search_adapter:leaf_search_stream:leaf_search_stream:leaf_search_stream_single_split{split_id=01G26NK8YX0DM4YSVH6J9YD1GN}:warmup: quickwit_directories::storage_directory: path="6dc68fd1122c44a985ccf5348907c5f8.term" msg="Unsupported operation. StorageDirectory only supports async reads"
    2022-05-04T13:24:14.007Z ERROR quickwit_serve::rest: Error when streaming search results. error=Internal error: `Internal error: `An IO error occurred: 'Unsupported operation. StorageDirectory only supports async reads: "ccf34dbac4614904b1124b751756dab8.term"'`.`.
    2022-05-04T13:24:14.009Z ERROR search_adapter:leaf_search_stream:leaf_search_stream: quickwit_search::search_stream::leaf: Failed to send leaf search stream result. Stop sending. Cause: channel closed
    2022-05-04T13:24:49.399Z  INFO quickwit_serve::rest: search_stream index_id=hackernews_5 request=SearchStreamRequestQueryString { query: "google.com", search_fields: None, start_timestamp: None, end_timestamp: None, fast_field: "id", output_format: Csv, partition_by_field: None }
    2022-05-04T13:24:49.400Z  INFO search_adapter:leaf_search_stream: quickwit_search::service: leaf_search index="hackernews_5" splits=[SplitIdAndFooterOffsets { split_id: "01G26NHEB10T2DX37288EKX0SJ", split_footer_start: 270323695, split_footer_end: 278910648 }, SplitIdAndFooterOffsets { split_id: "01G26NHMCV1BAP61AS006H7A75", split_footer_start: 2678183120, split_footer_end: 2678792526 }, SplitIdAndFooterOffsets { split_id: "01G26NK8YX0DM4YSVH6J9YD1GN", split_footer_start: 349970236, split_footer_end: 350048435 }]
    2022-05-04T13:24:49.442Z ERROR search_adapter:leaf_search_stream:leaf_search_stream:leaf_search_stream_single_split{split_id=01G26NHMCV1BAP61AS006H7A75}:warmup: quickwit_directories::storage_directory: path="ccf34dbac4614904b1124b751756dab8.term" msg="Unsupported operation. StorageDirectory only supports async reads"
    2022-05-04T13:24:49.442Z ERROR search_adapter:leaf_search_stream:leaf_search_stream:leaf_search_stream_single_split{split_id=01G26NK8YX0DM4YSVH6J9YD1GN}:warmup: quickwit_directories::storage_directory: path="6dc68fd1122c44a985ccf5348907c5f8.term" msg="Unsupported operation. StorageDirectory only supports async reads"
    2022-05-04T13:24:49.443Z  INFO search_adapter:leaf_search_stream: quickwit_search::service: leaf_search index="hackernews_5" splits=[SplitIdAndFooterOffsets { split_id: "01G26NHEB10T2DX37288EKX0SJ", split_footer_start: 270323695, split_footer_end: 278910648 }, SplitIdAndFooterOffsets { split_id: "01G26NHMCV1BAP61AS006H7A75", split_footer_start: 2678183120, split_footer_end: 2678792526 }, SplitIdAndFooterOffsets { split_id: "01G26NK8YX0DM4YSVH6J9YD1GN", split_footer_start: 349970236, split_footer_end: 350048435 }]
    2022-05-04T13:24:49.452Z ERROR search_adapter:leaf_search_stream:leaf_search_stream: quickwit_search::search_stream::leaf: Failed to send leaf search stream result. Stop sending. Cause: channel closed
    2022-05-04T13:24:49.494Z ERROR search_adapter:leaf_search_stream:leaf_search_stream:leaf_search_stream_single_split{split_id=01G26NHMCV1BAP61AS006H7A75}:warmup: quickwit_directories::storage_directory: path="ccf34dbac4614904b1124b751756dab8.term" msg="Unsupported operation. StorageDirectory only supports async reads"
    2022-05-04T13:24:49.495Z ERROR search_adapter:leaf_search_stream:leaf_search_stream:leaf_search_stream_single_split{split_id=01G26NK8YX0DM4YSVH6J9YD1GN}:warmup: quickwit_directories::storage_directory: path="6dc68fd1122c44a985ccf5348907c5f8.term" msg="Unsupported operation. StorageDirectory only supports async reads"
    2022-05-04T13:24:49.496Z ERROR quickwit_serve::rest: Error when streaming search results. error=Internal error: `Internal error: `An IO error occurred: 'Unsupported operation. StorageDirectory only supports async reads: "ccf34dbac4614904b1124b751756dab8.term"'`.`.
    2022-05-04T13:24:49.503Z ERROR search_adapter:leaf_search_stream:leaf_search_stream: quickwit_search::search_stream::leaf: Failed to send leaf search stream result. Stop sending. Cause: channel closed
    2022-05-04T13:26:29.659Z  INFO quickwit_serve::rest: search_stream index_id=hackernews_5 request=SearchStreamRequestQueryString { query: "Ambulance", search_fields: None, start_timestamp: None, end_timestamp: None, fast_field: "id", output_format: Csv, partition_by_field: None }
    2022-05-04T13:26:29.661Z  INFO search_adapter:leaf_search_stream: quickwit_search::service: leaf_search index="hackernews_5" splits=[SplitIdAndFooterOffsets { split_id: "01G26NHEB10T2DX37288EKX0SJ", split_footer_start: 270323695, split_footer_end: 278910648 }, SplitIdAndFooterOffsets { split_id: "01G26NHMCV1BAP61AS006H7A75", split_footer_start: 2678183120, split_footer_end: 2678792526 }, SplitIdAndFooterOffsets { split_id: "01G26NK8YX0DM4YSVH6J9YD1GN", split_footer_start: 349970236, split_footer_end: 350048435 }]
    2022-05-04T13:26:29.705Z ERROR search_adapter:leaf_search_stream:leaf_search_stream:leaf_search_stream_single_split{split_id=01G26NHMCV1BAP61AS006H7A75}:warmup: quickwit_directories::storage_directory: path="ccf34dbac4614904b1124b751756dab8.term" msg="Unsupported operation. StorageDirectory only supports async reads"
    2022-05-04T13:26:29.706Z ERROR search_adapter:leaf_search_stream:leaf_search_stream:leaf_search_stream_single_split{split_id=01G26NK8YX0DM4YSVH6J9YD1GN}:warmup: quickwit_directories::storage_directory: path="6dc68fd1122c44a985ccf5348907c5f8.term" msg="Unsupported operation. StorageDirectory only supports async reads"
    2022-05-04T13:26:29.707Z  INFO search_adapter:leaf_search_stream: quickwit_search::service: leaf_search index="hackernews_5" splits=[SplitIdAndFooterOffsets { split_id: "01G26NHEB10T2DX37288EKX0SJ", split_footer_start: 270323695, split_footer_end: 278910648 }, SplitIdAndFooterOffsets { split_id: "01G26NHMCV1BAP61AS006H7A75", split_footer_start: 2678183120, split_footer_end: 2678792526 }, SplitIdAndFooterOffsets { split_id: "01G26NK8YX0DM4YSVH6J9YD1GN", split_footer_start: 349970236, split_footer_end: 350048435 }]
    2022-05-04T13:26:29.713Z ERROR search_adapter:leaf_search_stream:leaf_search_stream: quickwit_search::search_stream::leaf: Failed to send leaf search stream result. Stop sending. Cause: channel closed
    2022-05-04T13:26:29.756Z ERROR search_adapter:leaf_search_stream:leaf_search_stream:leaf_search_stream_single_split{split_id=01G26NHMCV1BAP61AS006H7A75}:warmup: quickwit_directories::storage_directory: path="ccf34dbac4614904b1124b751756dab8.term" msg="Unsupported operation. StorageDirectory only supports async reads"
    2022-05-04T13:26:29.757Z ERROR search_adapter:leaf_search_stream:leaf_search_stream:leaf_search_stream_single_split{split_id=01G26NK8YX0DM4YSVH6J9YD1GN}:warmup: quickwit_directories::storage_directory: path="6dc68fd1122c44a985ccf5348907c5f8.term" msg="Unsupported operation. StorageDirectory only supports async reads"
    2022-05-04T13:26:29.757Z ERROR quickwit_serve::rest: Error when streaming search results. error=Internal error: `Internal error: `An IO error occurred: 'Unsupported operation. StorageDirectory only supports async reads: "ccf34dbac4614904b1124b751756dab8.term"'`.`.
    2022-05-04T13:26:29.761Z ERROR search_adapter:leaf_search_stream:leaf_search_stream: quickwit_search::search_stream::leaf: Failed to send leaf search stream result. Stop sending. Cause: channel closed
    
    bug 
    opened by fulmicoton 32
  • OOMs after repeated queries on larger amounts of data.

    OOMs after repeated queries on larger amounts of data.

    Describe the bug

    We've now loaded Quickwit with 16.4 Billion records and have started to trigger some out of memory (OOM) failures. In addition we've also seen clustering issues so to isolate the OOMs we scaled down to a single search node.

    The test query matches 67 million records but there's no sorting, timestamps or anything complicated on the query, just a single criteria i.e.:field:value and max_hits=1.

    On a single searcher node this query will run successfully in about 38 seconds on the first run, 25 seconds on the second run and then consistently OOM on the third. Queries are not concurrent and no other queries are submitted between subsequent runs.

    In this case it's the kernel killing Quickwit since it's exceeding the memory limit allocated. The searcher is running in Kubernetes with 32 GB of RAM allocated.

    Configuration:

    This index currently has 1,340 splits, with 10M doc target per split.

    # quickwit --version
    Quickwit 0.3.0 (commit-hash: 6d07599)
    

    Memory and cache settings are the defaults.

    searcher:
      fast_field_cache_capacity: 10G
      split_footer_cache_capacity:  1G
      max_num_concurrent_split_streams: 100
    
    bug 
    opened by kstaken 27
  • Support Google cloud storage.

    Support Google cloud storage.

    We already support specifying an a non-AWS endpoint. In theory everything should work just fine, but let's check that by indexing a few splits and deleting an index.

    bug enhancement 
    opened by fulmicoton 22
  • Update tutorial following change in Vector. ndjson => json + framing.method :=

    Update tutorial following change in Vector. ndjson => json + framing.method := "newline_delimited"

    when i used send logs from vector to quickwit , i got error: 2022-09-27T05:47:01.785Z WARN {actor=quickwit_indexing::actors::indexing_service::IndexingService}:{msg_id=1}::{index=customer3 gen=0}:{actor=quickwit_indexing::actors::doc_processor::DocProcessor}:{msg_id=4}: quickwit_indexing::actors::doc_processor: err=NotJsonObject("[{"id":4152728738612")

    this is my vector output on console: {"id":415272873861226802,"wechat_name":"清醒"}

    my vector sink config: [sinks.quick] type = "http" inputs = ["modify_t_customer"] encoding.codec = "json" uri = "http://127.0.0.1:7280/api/v1/customer3/ingest"

    when i chaneged my vector config to : [sinks.quick] type = "http" inputs = ["modify_t_customer"] encoding.codec = "native_json" uri = "http://127.0.0.1:7280/api/v1/customer3/ingest"

    i got error like this:

    2022-09-27T05:49:43.074Z WARN {actor=quickwit_indexing::actors::indexing_service::IndexingService}:{msg_id=1}::{index=customer3 gen=0}:{actor=quickwit_indexing::actors::doc_processor::DocProcessor}:{msg_id=169}: quickwit_indexing::actors::doc_processor: err=RequiredFastField("id")

    It looks like a problem with my vector sink config

    However, vector sink http only supports: expected one of avro, gelf, json, logfmt, native, native_json, raw_message, text

    but I didn't see the ndjson in the document url :https://quickwit.io/docs/tutorials/send-logs-from-vector-to-quickwit [sinks.quickwit_logs] type = "http" inputs = ["remap_syslog"] encoding.codec = "ndjson" uri = "http://host.docker.internal:7280/api/v1/otel-logs/ingest"

    what can i do!!

    bug tutorial 
    opened by yangshike 20
  • janitor supports incremental execution

    janitor supports incremental execution

    During our usage we found that postgres has a lot of Lock state transactions. This caused the pipeline to restart due to fetching connection timeouts. When I stopped the janitor execution, I noticed that there were a lot less transactions waiting. This might have something to do with the fact that janitor is getting a lot of splits, so maybe it could be executed in batches, with a fixed number of splits at a time.

    image enhancement 
    opened by guidao 19
  • Lost splits

    Lost splits

    Describe the bug I used the same way to query, the first time the results, after a few minutes again query query failed image

    Expected behavior Same query same result.

    Configuration: Please provide:

    1. quickwit --version:0.3.1
    2. The index_config.yaml `--- version: 0 # File format version.

    index_id: traceback

    doc_mapping: field_mappings: - name: id type: u64 fast: true - name: raw_content type: text tokenizer: default record: position search_settings: default_search_fields: [raw_content]

    sources: - source_id: source-kafka source_type: kafka params: topic: UserAction client_params: bootstrap.servers: $(KAFKA) group.id: FullText security.protocol: PLAINTEXT `

    bug 
    opened by yangjinming1062 19
  • Indexer consumption Kafka error

    Indexer consumption Kafka error

    Describe the bug I started 10 indexer nodes to consume Kafka data and reported the following error:

    2022-05-27T06:28:14.854Z  INFO {actor=quickwit_indexing::actors::indexing_server::IndexingServer}:{msg_id=1}::{msg_id=418922}:{index=clickhouse gen=319}:{actor=Packager}: quickwit_actors::sync_actor: actor-exit actor_id=Packager-nameless-H4KW exit_status=Killed
    2022-05-27T06:28:15.217Z ERROR {actor=quickwit_indexing::actors::indexing_server::IndexingServer}:{msg_id=1}::{msg_id=419295}: quickwit_actors::actor_handle: actor-exit-without-success actor="KafkaSource-long-9JuJ" exit_status=DownstreamClosed
    2022-05-27T06:28:15.217Z ERROR {actor=quickwit_indexing::actors::indexing_server::IndexingServer}:{msg_id=1}::{msg_id=419295}: quickwit_actors::actor_handle: actor-exit-without-success actor="Indexer-red-PtaJ" exit_status=Failure(Failed to add document.
    
    Caused by:
        An error occurred in a thread: 'An index writer was killed.. A worker thread encounterred an error (io::Error most likely) or panicked.')
    2022-05-27T06:28:15.217Z ERROR {actor=quickwit_indexing::actors::indexing_server::IndexingServer}:{msg_id=1}::{msg_id=419295}: quickwit_actors::actor_handle: actor-exit-without-success actor="Packager-nameless-H4KW" exit_status=Killed
    2022-05-27T06:28:15.217Z ERROR {actor=quickwit_indexing::actors::indexing_server::IndexingServer}:{msg_id=1}::{msg_id=419295}: quickwit_actors::actor_handle: actor-exit-without-success actor="Uploader-blue-HXWt" exit_status=Killed
    2022-05-27T06:28:15.217Z ERROR {actor=quickwit_indexing::actors::indexing_server::IndexingServer}:{msg_id=1}::{msg_id=419295}: quickwit_actors::actor_handle: actor-exit-without-success actor="Publisher-icy-tc56" exit_status=Failure(Failed to publish splits.
    
    Caused by:
        0: Publish checkpoint delta overlaps with the current checkpoint: IncompatibleCheckpointDelta { partition_id: PartitionId("0000000000"), current_position: Offset("00000000025977904298"), delta_position_from: Offset("00000000025977865375") }.
        1: IncompatibleChkptDelta at partition: PartitionId("0000000000") cur_pos:Offset("00000000025977904298") delta_pos:Offset("00000000025977865375"))
    2022-05-27T06:28:15.217Z ERROR {actor=quickwit_indexing::actors::indexing_server::IndexingServer}:{msg_id=1}::{msg_id=419295}: quickwit_actors::actor_handle: actor-exit-without-success actor="GarbageCollector-snowy-TxZ8" exit_status=Killed
    2022-05-27T06:28:15.217Z ERROR {actor=quickwit_indexing::actors::indexing_server::IndexingServer}:{msg_id=1}::{msg_id=419295}: quickwit_actors::actor_handle: actor-exit-without-success actor="MergeSplitDownloader-weathered-FyOq" exit_status=Killed
    

    Configuration:

    1. quickwit 0.2.1
    2. quickwit.yaml
    version: 0
    node_id: $POD_NAME
    listen_address: 0.0.0.0
    rest_listen_port: 7280
    #peer_seeds:
    #  -
    #  -
    #data_dir: /data/quickwit
    metastore_uri: postgres://quickwit:[email protected]:5432/quickwit
    default_index_root_uri: s3://quickwit/indexes/
    
    1. index.json
    version: 0
    
    index_id: clickhouse
    
    index_uri: s3://quickwit/indexes/clickhouse
    
    doc_mapping:
      field_mappings:
        - name: id
          type: u64
          fast: true
        - name: created_at
          type: i64
          fast: true
        - name: _log_
          type: text
          tokenizer: default
          record: position
    
    indexing_settings:
      timestamp_field: created_at
    
    search_settings:
      default_search_fields: [_log_]
    
    sources:
      - source_id: quickwit
        source_type: kafka
        params:
          topic: production
          client_params:
            group.id: quickwit
            bootstrap.servers: 192.168.100.1:9092,192.168.100.2:9092,192.168.100.3:9082
    
    bug 
    opened by gnufree 19
  • Offer a way to select a subset of kafka partition in a kafka source

    Offer a way to select a subset of kafka partition in a kafka source

    The objective would be to allow having a larger indexing throughput by running K indexing pipelines for a single index.

    The selector coudl be k % N, or a list of partitions maybe?

    enhancement 
    opened by fulmicoton 17
  • add a chinese tokenizer

    add a chinese tokenizer

    Description

    This adds a simple tokenizer for CJK. Before, something like "你好世界" (hello world) would be a single token because it contains no whitespace. This means searching for "你好" would yield no result.

    A more intelligent tokenizer would probably split in two tokens (hello, world). This tokenizer simply split at each char, creating 4 tokens. This is much faster at indexing, but requires using a phrase query to match a word written as two or more chars.

    fix #1979

    How was this PR tested?

    Some tests added for the tokenizer, and a manual test by indexing the wiki-articles-10000 dataset, using the new tokenizer for the body field and searching for "毛藝" (name of a Chinese gymnast), "毛" (first half), "藝" (2nd half) and "藝毛" (wrong order):

    • "毛藝": yield a doc before and after
    • "毛": yield a doc only after
    • "藝": yield a doc only after
    • "藝毛": yield nothing
    opened by trinity-1686a 16
  • Add CSV/RowBinary output format to Search API

    Add CSV/RowBinary output format to Search API

    Is your feature request related to a problem? Please describe. We want Quickwit to be easily integrated into row-based engines like SQL databases.

    Describe the solution you'd like Exposing a CSV and a row binary format that a user can choose with a query param format would be sufficient.

    CSV format: https://datatracker.ietf.org/doc/html/rfc4180 RowBinary format: to define.

    enhancement 
    opened by fmassot 15
  • Exact match doesn't seem to work

    Exact match doesn't seem to work

    Describe the bug

    The exact search doesn't seem to be working.

    Steps to reproduce (if applicable) Steps to reproduce the behavior:

     ▲ quickwit index search --index-id wikipedia --metastore-uri file://$(pwd)/wikipedia --query 'title:apollo AND 11' | jq '.hits[].title[]'
    "Apollo"
    "Apollo 11"
    "Apollo 8"
    "Apollo program"
    "Apollo 13"
    "Apollo 7"
    "Apollo 9"
    "Apollo 1"
    "Apollo 10"
    "Apollo 12"
    "Apollo 14"
    "Apollo 15"
    "Apollo 16"
    "Apollo 17"
    "List of Apollo astronauts"
    "Apollo, Pennsylvania"
    "Apollo 13 (film)"
    "Apollo Lunar Module"
    "Apollo Guidance Computer"
    "Apollo 4"
    

    Okay, so it seems we've found what we're looking for as a second result. However, since the article as literally named Apollo 11 we should be able to perform what (according to quickwit's documentation) seems to be an exact search:

    ▲ quickwit index search --index-id wikipedia --metastore-uri file://$(pwd)/wikipedia --query 'title:"Apollo 11"' | jq '.hits[].title[]'
    

    Expected behavior

    The "Apollo 11" result should be showing up.

    System configuration:

    60f897c0f49b4a920948b2bb98ca081f5557ed22 built from source on Linux, rustc 1.56.1

    Additional context

    bug 
    opened by mrusme 14
  • Reduce UI bundle size.

    Reduce UI bundle size.

    When building the UI, we get the following logs:

    #24 395.0 The bundle size is significantly larger than recommended.
    #24 395.0 Consider reducing it with code splitting: https://goo.gl/9VhYWB
    #24 395.0 You can also analyze the project dependencies: https://goo.gl/LeUzfb
    
    enhancement low-priority 
    opened by fmassot 0
  • Build  macos binaries and docker amd64 + arm64 images

    Build macos binaries and docker amd64 + arm64 images

    Fix #1928.

    • to bypass the cross-compilation issue on macOS, I added the feature release-macos-feature-vendored-set for macOS builds. It deactivates the libsasl support, which is fine for macOS binaries.
    • I added an arm64 build for docker images.

    Nighly builds success: https://github.com/quickwit-oss/quickwit/actions/runs/3866977651/jobs/6591462839 Docker images (failing due to network issue but should work): https://github.com/quickwit-oss/quickwit/actions/runs/3867131208/jobs/6591714909

    opened by fmassot 0
  • Integrate pull request preview environments

    Integrate pull request preview environments

    Is your feature request related to a problem? Please describe. I would like to support Quickwit by implementing Uffizzi preview environments. Disclaimer: I work on Uffizzi.

    Uffizzi is a Open Source full stack previews engine and our platform is available completely free for Quickwit (and all open source projects). This will provide maintainers with preview environments of every PR in the cloud, which enables faster iterations and reduces time to merge. You can see the open source repos which are currently using Uffizzi over here

    Uffizzi is purpose-built for the task of previewing PRs and it integrates with your workflow to deploy preview environments in the background without any manual steps for maintainers or contributors.

    I can go ahead and create an Initial PoC for you right away if you think there is value in this proposal.

    • [ ] Initial PoC
    enhancement 
    opened by waveywaves 0
  • Large actor scheduler refactoring

    Large actor scheduler refactoring

    The scheduler is not a actor in the sense of the actor framework anymore.

    It removes the necessity to create some fake scheduler mailbox to spawn the scheduler itself.

    The scheduler also now has an improved logic to simulate time shift.

    It only jumps forward when no actors has any work to do. Provided all of the processing is done in actor, the results should be rigorously the same as if someone used time::sleep... Only faster.

    opened by fulmicoton 1
  • Fix duplicate fields in editor auto-completion

    Fix duplicate fields in editor auto-completion

    Ensure an index is registered in the query editor component (Monaco editor) only once.

    Manually reproduced and tested. Closes https://github.com/quickwit-oss/quickwit/issues/2615

    opened by evanxg852000 0
Releases(v0.4.0)
Owner
Quickwit OSS
Quickwit OSS Project
Quickwit OSS
Firecracker takes your HTTP logs and uses them to map your API flows and to detect anomalies in them.

Who is BLST and what do we do? BLST (Business Logic Security Testing) is a startup company that's developing an automatic penetration tester, replacin

BLST 692 Jan 2, 2023
A rust library for creating and managing logs of arbitrary binary data

A rust library for creating and managing logs of arbitrary binary data. Presently it's used to collect sensor data. But it should generally be helpful in cases where you need to store timeseries data, in a nearly (but not strictly) append-only fashion.

Yusuf Simonson 1 May 9, 2022
A cool log library built using rust-lang

RustLog A cool log library built using rust-lang Installation: Cargo.toml rustlog = { git = "https://github.com/krishpranav/rustlog" } log = "0.4.17"

Krisna Pranav 2 Jul 21, 2022
Quickwit is a big data search engine.

Quickwit This repository will host Quickwit, the big data search engine developed by Quickwit Inc. We will progressively polish and opensource our cod

Quickwit Inc. 2.9k Jan 7, 2023
The true next-gen L7 minecraft proxy and load balancer. Built in Rust.

Lure The true next-gen L7 minecraft proxy and load balancer. Built in Rust, Tokio and Valence. Why? Rust is a powerful programming language and a grea

Sammwy 67 Apr 16, 2023
Easy c̵̰͠r̵̛̠ö̴̪s̶̩̒s̵̭̀-t̶̲͝h̶̯̚r̵̺͐e̷̖̽ḁ̴̍d̶̖̔ ȓ̵͙ė̶͎ḟ̴͙e̸̖͛r̶̖͗ë̶̱́ṉ̵̒ĉ̷̥e̷͚̍ s̷̹͌h̷̲̉a̵̭͋r̷̫̊ḭ̵̊n̷̬͂g̵̦̃ f̶̻̊ơ̵̜ṟ̸̈́ R̵̞̋ù̵̺s̷̖̅ţ̸͗!̸̼͋

Rust S̵̓i̸̓n̵̉ I̴n̴f̶e̸r̵n̷a̴l mutability! Howdy, friendly Rust developer! Ever had a value get m̵̯̅ð̶͊v̴̮̾ê̴̼͘d away right under your nose just when

null 294 Dec 23, 2022
The next gen ls command

LSD (LSDeluxe) Table of Contents Description Screenshot Installation Configuration External Configurations Required Optional F.A.Q. Contributors Credi

Pierre Peltier 9k Jan 2, 2023
LSD (LSDeluxe) - The next gen ls command

LSD (LSDeluxe) Table of Contents Description Screenshot Installation Configuration External Configurations Required Optional F.A.Q. Contributors Credi

Pierre Peltier 8.9k Jan 1, 2023
Next-GEN Confguration Template Generation Language

Sap lang yet another configuration oriented language name comes from Sapphire which is the birthstone of september Language Feature the last expr of t

LemonHX 12 Aug 8, 2022
Next-GEN Confguration Template Generation Language

Sap lang yet another configuration oriented language name comes from Sapphire which is the birthstone of september Language Feature the last expr of t

Sap-Lang 12 Aug 8, 2022
A formal, politely verbose programming language for building next-gen reliable applications

vfpl Pronounced "Veepl", the f is silent A politely verbose programming language for building next-gen reliable applications Syntax please initialize

VFPL 4 Jun 27, 2022
xrd a next-gen server controller for TrackMania Forever and Nations ESWC

xrd is a next-gen server controller for TrackMania Forever and Nations ESWC that is designed to be hassle-free and easily updatable (with a bus factor of 0).

Autumn Leaf 6 Mar 26, 2022
SWC Transform to prefix logs. Useful for adding file and line number to logs

SWC Transform to prefix logs. Useful for adding file and line number to logs

William Tetlow 12 Jan 1, 2023
A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy

A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy

Datafuse Labs 5k Jan 9, 2023
A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy

A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy

Datafuse Labs 5k Jan 9, 2023
Rapidly Search and Hunt through Windows Event Logs

Rapidly Search and Hunt through Windows Event Logs Chainsaw provides a powerful ‘first-response’ capability to quickly identify threats within Windows

F-Secure Countercept 1.8k Dec 31, 2022
Rapidly Search and Hunt through Windows Event Logs

Rapidly Search and Hunt through Windows Event Logs Chainsaw provides a powerful ‘first-response’ capability to quickly identify threats within Windows

F-Secure Countercept 1.8k Dec 28, 2022
Shogun search - Learning the principle of search engine. This is the first time I've written Rust.

shogun_search Learning the principle of search engine. This is the first time I've written Rust. A search engine written in Rust. Current Features: Bu

Yuxiang Liu 5 Mar 9, 2022
Le cauet burger gen est un outils très puissant capable de générer des cauet burger ⚠ vous pouvez devenir obèse en l'utilisant trop il est capable de rayer la nasa de la carte

Cauet-burger-generator Le cauet burger gen est un outils très puissant capable de générer des cauet burger ⚠ vous pouvez devenir obèse en l'utilisant

Pinokaille 1 Apr 23, 2022
High-performance runtime for data analytics applications

Weld Documentation Weld is a language and runtime for improving the performance of data-intensive applications. It optimizes across libraries and func

Weld 2.9k Dec 28, 2022