Quickwit is a big data search engine.

Overview

Contributor Covenant

Quickwit

This repository will host Quickwit, the big data search engine developed by Quickwit Inc. We will progressively polish and opensource our code in the next months.

Stay tuned.

Comments
  • Bug in quickwit search stream `StorageDirectory only supports async reads`

    Bug in quickwit search stream `StorageDirectory only supports async reads`

    Copy pasted from https://github.com/quickwit-oss/quickwit/discussions/1357#discussioncomment-2687107 am able to ingest data in quickwit and search . However when I search using curl command , I am getting read async error. What could go wrong here. heena@Clickhouse1:~/quickwit-v0.2.1$ ./quickwit index search --index hackernews_5 --query Ambulance 2022-05-04T13:25:27.169Z ERROR quickwit_directories::storage_directory: path="6dc68fd1122c44a985ccf5348907c5f8.term" msg="Unsupported operation. StorageDirectory only supports async reads" 2022-05-04T13:25:27.171Z ERROR quickwit_directories::storage_directory: path="ccf34dbac4614904b1124b751756dab8.term" msg="Unsupported operation. StorageDirectory only supports async reads" { "numHits": 1, "hits": [ { "by": [ "sgk284" ], "id": [ 2923885 ], "kids": [ 2923989, 2925247, 2924320, 2925442, 2924224, 2923994, 2924209, 2924702, 2925235, 2925010, 2924319, 2924638, 2925781, 2923943, 2924298 ], "score": [ 622 ], "text": [ "" ], "time": [ 1314251037 ], "title": [ "Icon Ambulance" ], "type": [ "story" ], "url": [ "https://plus.google.com/107117483540235115863/posts/gcSStkKxXTw" ] } ], "elapsedTimeMicros": 77324, "errors": [ "SplitSearchError { error: \"Internal error:An IO error occurred: 'Unsupported operation. StorageDirectory only supports async reads: \"ccf34dbac4614904b1124b751756dab8.term\"'.\", split_id: \"01G26NHMCV1BAP61AS006H7A75\", retryable_error: true }", "SplitSearchError { error: \"Internal error:An IO error occurred: 'Unsupported operation. StorageDirectory only supports async reads: \"6dc68fd1122c44a985ccf5348907c5f8.term\"'.\", split_id: \"01G26NK8YX0DM4YSVH6J9YD1GN\", retryable_error: true }" ] } The output with curl command to search the same keyword. heena@Clickhouse1:~/quickwit-v0.2.1$ curl "http://0.0.0.0:7280/api/v1/hackernews_5/search/stream?query=Ambulance&outputFormat=csv&fastField=id" curl: (18) transfer closed with outstanding read data remaining heena@Clickhouse1:~/quickwit-v0.2.1$

    Attached the console logs when queried the commands ,This might be helpful

    2022-05-04T13:24:03.927Z ERROR search_adapter:leaf_search_stream:leaf_search_stream: quickwit_search::search_stream::leaf: Failed to send leaf search stream result. Stop sending. Cause: channel closed
    2022-05-04T13:24:13.927Z  INFO quickwit_serve::rest: search_stream index_id=hackernews_5 request=SearchStreamRequestQueryString { query: "google", search_fields: None, start_timestamp: None, end_timestamp: None, fast_field: "id", output_format: ClickHouseRowBinary, partition_by_field: None }
    2022-05-04T13:24:13.927Z  INFO search_adapter:leaf_search_stream: quickwit_search::service: leaf_search index="hackernews_5" splits=[SplitIdAndFooterOffsets { split_id: "01G26NHEB10T2DX37288EKX0SJ", split_footer_start: 270323695, split_footer_end: 278910648 }, SplitIdAndFooterOffsets { split_id: "01G26NHMCV1BAP61AS006H7A75", split_footer_start: 2678183120, split_footer_end: 2678792526 }, SplitIdAndFooterOffsets { split_id: "01G26NK8YX0DM4YSVH6J9YD1GN", split_footer_start: 349970236, split_footer_end: 350048435 }]
    2022-05-04T13:24:13.968Z ERROR search_adapter:leaf_search_stream:leaf_search_stream:leaf_search_stream_single_split{split_id=01G26NHMCV1BAP61AS006H7A75}:warmup: quickwit_directories::storage_directory: path="ccf34dbac4614904b1124b751756dab8.term" msg="Unsupported operation. StorageDirectory only supports async reads"
    2022-05-04T13:24:13.969Z ERROR search_adapter:leaf_search_stream:leaf_search_stream:leaf_search_stream_single_split{split_id=01G26NK8YX0DM4YSVH6J9YD1GN}:warmup: quickwit_directories::storage_directory: path="6dc68fd1122c44a985ccf5348907c5f8.term" msg="Unsupported operation. StorageDirectory only supports async reads"
    2022-05-04T13:24:13.970Z  INFO search_adapter:leaf_search_stream: quickwit_search::service: leaf_search index="hackernews_5" splits=[SplitIdAndFooterOffsets { split_id: "01G26NHEB10T2DX37288EKX0SJ", split_footer_start: 270323695, split_footer_end: 278910648 }, SplitIdAndFooterOffsets { split_id: "01G26NHMCV1BAP61AS006H7A75", split_footer_start: 2678183120, split_footer_end: 2678792526 }, SplitIdAndFooterOffsets { split_id: "01G26NK8YX0DM4YSVH6J9YD1GN", split_footer_start: 349970236, split_footer_end: 350048435 }]
    2022-05-04T13:24:13.972Z ERROR search_adapter:leaf_search_stream:leaf_search_stream: quickwit_search::search_stream::leaf: Failed to send leaf search stream result. Stop sending. Cause: channel closed
    2022-05-04T13:24:14.006Z ERROR search_adapter:leaf_search_stream:leaf_search_stream:leaf_search_stream_single_split{split_id=01G26NHMCV1BAP61AS006H7A75}:warmup: quickwit_directories::storage_directory: path="ccf34dbac4614904b1124b751756dab8.term" msg="Unsupported operation. StorageDirectory only supports async reads"
    2022-05-04T13:24:14.006Z ERROR search_adapter:leaf_search_stream:leaf_search_stream:leaf_search_stream_single_split{split_id=01G26NK8YX0DM4YSVH6J9YD1GN}:warmup: quickwit_directories::storage_directory: path="6dc68fd1122c44a985ccf5348907c5f8.term" msg="Unsupported operation. StorageDirectory only supports async reads"
    2022-05-04T13:24:14.007Z ERROR quickwit_serve::rest: Error when streaming search results. error=Internal error: `Internal error: `An IO error occurred: 'Unsupported operation. StorageDirectory only supports async reads: "ccf34dbac4614904b1124b751756dab8.term"'`.`.
    2022-05-04T13:24:14.009Z ERROR search_adapter:leaf_search_stream:leaf_search_stream: quickwit_search::search_stream::leaf: Failed to send leaf search stream result. Stop sending. Cause: channel closed
    2022-05-04T13:24:49.399Z  INFO quickwit_serve::rest: search_stream index_id=hackernews_5 request=SearchStreamRequestQueryString { query: "google.com", search_fields: None, start_timestamp: None, end_timestamp: None, fast_field: "id", output_format: Csv, partition_by_field: None }
    2022-05-04T13:24:49.400Z  INFO search_adapter:leaf_search_stream: quickwit_search::service: leaf_search index="hackernews_5" splits=[SplitIdAndFooterOffsets { split_id: "01G26NHEB10T2DX37288EKX0SJ", split_footer_start: 270323695, split_footer_end: 278910648 }, SplitIdAndFooterOffsets { split_id: "01G26NHMCV1BAP61AS006H7A75", split_footer_start: 2678183120, split_footer_end: 2678792526 }, SplitIdAndFooterOffsets { split_id: "01G26NK8YX0DM4YSVH6J9YD1GN", split_footer_start: 349970236, split_footer_end: 350048435 }]
    2022-05-04T13:24:49.442Z ERROR search_adapter:leaf_search_stream:leaf_search_stream:leaf_search_stream_single_split{split_id=01G26NHMCV1BAP61AS006H7A75}:warmup: quickwit_directories::storage_directory: path="ccf34dbac4614904b1124b751756dab8.term" msg="Unsupported operation. StorageDirectory only supports async reads"
    2022-05-04T13:24:49.442Z ERROR search_adapter:leaf_search_stream:leaf_search_stream:leaf_search_stream_single_split{split_id=01G26NK8YX0DM4YSVH6J9YD1GN}:warmup: quickwit_directories::storage_directory: path="6dc68fd1122c44a985ccf5348907c5f8.term" msg="Unsupported operation. StorageDirectory only supports async reads"
    2022-05-04T13:24:49.443Z  INFO search_adapter:leaf_search_stream: quickwit_search::service: leaf_search index="hackernews_5" splits=[SplitIdAndFooterOffsets { split_id: "01G26NHEB10T2DX37288EKX0SJ", split_footer_start: 270323695, split_footer_end: 278910648 }, SplitIdAndFooterOffsets { split_id: "01G26NHMCV1BAP61AS006H7A75", split_footer_start: 2678183120, split_footer_end: 2678792526 }, SplitIdAndFooterOffsets { split_id: "01G26NK8YX0DM4YSVH6J9YD1GN", split_footer_start: 349970236, split_footer_end: 350048435 }]
    2022-05-04T13:24:49.452Z ERROR search_adapter:leaf_search_stream:leaf_search_stream: quickwit_search::search_stream::leaf: Failed to send leaf search stream result. Stop sending. Cause: channel closed
    2022-05-04T13:24:49.494Z ERROR search_adapter:leaf_search_stream:leaf_search_stream:leaf_search_stream_single_split{split_id=01G26NHMCV1BAP61AS006H7A75}:warmup: quickwit_directories::storage_directory: path="ccf34dbac4614904b1124b751756dab8.term" msg="Unsupported operation. StorageDirectory only supports async reads"
    2022-05-04T13:24:49.495Z ERROR search_adapter:leaf_search_stream:leaf_search_stream:leaf_search_stream_single_split{split_id=01G26NK8YX0DM4YSVH6J9YD1GN}:warmup: quickwit_directories::storage_directory: path="6dc68fd1122c44a985ccf5348907c5f8.term" msg="Unsupported operation. StorageDirectory only supports async reads"
    2022-05-04T13:24:49.496Z ERROR quickwit_serve::rest: Error when streaming search results. error=Internal error: `Internal error: `An IO error occurred: 'Unsupported operation. StorageDirectory only supports async reads: "ccf34dbac4614904b1124b751756dab8.term"'`.`.
    2022-05-04T13:24:49.503Z ERROR search_adapter:leaf_search_stream:leaf_search_stream: quickwit_search::search_stream::leaf: Failed to send leaf search stream result. Stop sending. Cause: channel closed
    2022-05-04T13:26:29.659Z  INFO quickwit_serve::rest: search_stream index_id=hackernews_5 request=SearchStreamRequestQueryString { query: "Ambulance", search_fields: None, start_timestamp: None, end_timestamp: None, fast_field: "id", output_format: Csv, partition_by_field: None }
    2022-05-04T13:26:29.661Z  INFO search_adapter:leaf_search_stream: quickwit_search::service: leaf_search index="hackernews_5" splits=[SplitIdAndFooterOffsets { split_id: "01G26NHEB10T2DX37288EKX0SJ", split_footer_start: 270323695, split_footer_end: 278910648 }, SplitIdAndFooterOffsets { split_id: "01G26NHMCV1BAP61AS006H7A75", split_footer_start: 2678183120, split_footer_end: 2678792526 }, SplitIdAndFooterOffsets { split_id: "01G26NK8YX0DM4YSVH6J9YD1GN", split_footer_start: 349970236, split_footer_end: 350048435 }]
    2022-05-04T13:26:29.705Z ERROR search_adapter:leaf_search_stream:leaf_search_stream:leaf_search_stream_single_split{split_id=01G26NHMCV1BAP61AS006H7A75}:warmup: quickwit_directories::storage_directory: path="ccf34dbac4614904b1124b751756dab8.term" msg="Unsupported operation. StorageDirectory only supports async reads"
    2022-05-04T13:26:29.706Z ERROR search_adapter:leaf_search_stream:leaf_search_stream:leaf_search_stream_single_split{split_id=01G26NK8YX0DM4YSVH6J9YD1GN}:warmup: quickwit_directories::storage_directory: path="6dc68fd1122c44a985ccf5348907c5f8.term" msg="Unsupported operation. StorageDirectory only supports async reads"
    2022-05-04T13:26:29.707Z  INFO search_adapter:leaf_search_stream: quickwit_search::service: leaf_search index="hackernews_5" splits=[SplitIdAndFooterOffsets { split_id: "01G26NHEB10T2DX37288EKX0SJ", split_footer_start: 270323695, split_footer_end: 278910648 }, SplitIdAndFooterOffsets { split_id: "01G26NHMCV1BAP61AS006H7A75", split_footer_start: 2678183120, split_footer_end: 2678792526 }, SplitIdAndFooterOffsets { split_id: "01G26NK8YX0DM4YSVH6J9YD1GN", split_footer_start: 349970236, split_footer_end: 350048435 }]
    2022-05-04T13:26:29.713Z ERROR search_adapter:leaf_search_stream:leaf_search_stream: quickwit_search::search_stream::leaf: Failed to send leaf search stream result. Stop sending. Cause: channel closed
    2022-05-04T13:26:29.756Z ERROR search_adapter:leaf_search_stream:leaf_search_stream:leaf_search_stream_single_split{split_id=01G26NHMCV1BAP61AS006H7A75}:warmup: quickwit_directories::storage_directory: path="ccf34dbac4614904b1124b751756dab8.term" msg="Unsupported operation. StorageDirectory only supports async reads"
    2022-05-04T13:26:29.757Z ERROR search_adapter:leaf_search_stream:leaf_search_stream:leaf_search_stream_single_split{split_id=01G26NK8YX0DM4YSVH6J9YD1GN}:warmup: quickwit_directories::storage_directory: path="6dc68fd1122c44a985ccf5348907c5f8.term" msg="Unsupported operation. StorageDirectory only supports async reads"
    2022-05-04T13:26:29.757Z ERROR quickwit_serve::rest: Error when streaming search results. error=Internal error: `Internal error: `An IO error occurred: 'Unsupported operation. StorageDirectory only supports async reads: "ccf34dbac4614904b1124b751756dab8.term"'`.`.
    2022-05-04T13:26:29.761Z ERROR search_adapter:leaf_search_stream:leaf_search_stream: quickwit_search::search_stream::leaf: Failed to send leaf search stream result. Stop sending. Cause: channel closed
    
    bug 
    opened by fulmicoton 32
  • OOMs after repeated queries on larger amounts of data.

    OOMs after repeated queries on larger amounts of data.

    Describe the bug

    We've now loaded Quickwit with 16.4 Billion records and have started to trigger some out of memory (OOM) failures. In addition we've also seen clustering issues so to isolate the OOMs we scaled down to a single search node.

    The test query matches 67 million records but there's no sorting, timestamps or anything complicated on the query, just a single criteria i.e.:field:value and max_hits=1.

    On a single searcher node this query will run successfully in about 38 seconds on the first run, 25 seconds on the second run and then consistently OOM on the third. Queries are not concurrent and no other queries are submitted between subsequent runs.

    In this case it's the kernel killing Quickwit since it's exceeding the memory limit allocated. The searcher is running in Kubernetes with 32 GB of RAM allocated.

    Configuration:

    This index currently has 1,340 splits, with 10M doc target per split.

    # quickwit --version
    Quickwit 0.3.0 (commit-hash: 6d07599)
    

    Memory and cache settings are the defaults.

    searcher:
      fast_field_cache_capacity: 10G
      split_footer_cache_capacity:  1G
      max_num_concurrent_split_streams: 100
    
    bug 
    opened by kstaken 27
  • Support Google cloud storage.

    Support Google cloud storage.

    We already support specifying an a non-AWS endpoint. In theory everything should work just fine, but let's check that by indexing a few splits and deleting an index.

    bug enhancement 
    opened by fulmicoton 22
  • Update tutorial following change in Vector. ndjson => json + framing.method :=

    Update tutorial following change in Vector. ndjson => json + framing.method := "newline_delimited"

    when i used send logs from vector to quickwit , i got error: 2022-09-27T05:47:01.785Z WARN {actor=quickwit_indexing::actors::indexing_service::IndexingService}:{msg_id=1}::{index=customer3 gen=0}:{actor=quickwit_indexing::actors::doc_processor::DocProcessor}:{msg_id=4}: quickwit_indexing::actors::doc_processor: err=NotJsonObject("[{"id":4152728738612")

    this is my vector output on console: {"id":415272873861226802,"wechat_name":"清醒"}

    my vector sink config: [sinks.quick] type = "http" inputs = ["modify_t_customer"] encoding.codec = "json" uri = "http://127.0.0.1:7280/api/v1/customer3/ingest"

    when i chaneged my vector config to : [sinks.quick] type = "http" inputs = ["modify_t_customer"] encoding.codec = "native_json" uri = "http://127.0.0.1:7280/api/v1/customer3/ingest"

    i got error like this:

    2022-09-27T05:49:43.074Z WARN {actor=quickwit_indexing::actors::indexing_service::IndexingService}:{msg_id=1}::{index=customer3 gen=0}:{actor=quickwit_indexing::actors::doc_processor::DocProcessor}:{msg_id=169}: quickwit_indexing::actors::doc_processor: err=RequiredFastField("id")

    It looks like a problem with my vector sink config

    However, vector sink http only supports: expected one of avro, gelf, json, logfmt, native, native_json, raw_message, text

    but I didn't see the ndjson in the document url :https://quickwit.io/docs/tutorials/send-logs-from-vector-to-quickwit [sinks.quickwit_logs] type = "http" inputs = ["remap_syslog"] encoding.codec = "ndjson" uri = "http://host.docker.internal:7280/api/v1/otel-logs/ingest"

    what can i do!!

    bug tutorial 
    opened by yangshike 20
  • janitor supports incremental execution

    janitor supports incremental execution

    During our usage we found that postgres has a lot of Lock state transactions. This caused the pipeline to restart due to fetching connection timeouts. When I stopped the janitor execution, I noticed that there were a lot less transactions waiting. This might have something to do with the fact that janitor is getting a lot of splits, so maybe it could be executed in batches, with a fixed number of splits at a time.

    image enhancement 
    opened by guidao 19
  • Lost splits

    Lost splits

    Describe the bug I used the same way to query, the first time the results, after a few minutes again query query failed image

    Expected behavior Same query same result.

    Configuration: Please provide:

    1. quickwit --version:0.3.1
    2. The index_config.yaml `--- version: 0 # File format version.

    index_id: traceback

    doc_mapping: field_mappings: - name: id type: u64 fast: true - name: raw_content type: text tokenizer: default record: position search_settings: default_search_fields: [raw_content]

    sources: - source_id: source-kafka source_type: kafka params: topic: UserAction client_params: bootstrap.servers: $(KAFKA) group.id: FullText security.protocol: PLAINTEXT `

    bug 
    opened by yangjinming1062 19
  • Indexer consumption Kafka error

    Indexer consumption Kafka error

    Describe the bug I started 10 indexer nodes to consume Kafka data and reported the following error:

    2022-05-27T06:28:14.854Z  INFO {actor=quickwit_indexing::actors::indexing_server::IndexingServer}:{msg_id=1}::{msg_id=418922}:{index=clickhouse gen=319}:{actor=Packager}: quickwit_actors::sync_actor: actor-exit actor_id=Packager-nameless-H4KW exit_status=Killed
    2022-05-27T06:28:15.217Z ERROR {actor=quickwit_indexing::actors::indexing_server::IndexingServer}:{msg_id=1}::{msg_id=419295}: quickwit_actors::actor_handle: actor-exit-without-success actor="KafkaSource-long-9JuJ" exit_status=DownstreamClosed
    2022-05-27T06:28:15.217Z ERROR {actor=quickwit_indexing::actors::indexing_server::IndexingServer}:{msg_id=1}::{msg_id=419295}: quickwit_actors::actor_handle: actor-exit-without-success actor="Indexer-red-PtaJ" exit_status=Failure(Failed to add document.
    
    Caused by:
        An error occurred in a thread: 'An index writer was killed.. A worker thread encounterred an error (io::Error most likely) or panicked.')
    2022-05-27T06:28:15.217Z ERROR {actor=quickwit_indexing::actors::indexing_server::IndexingServer}:{msg_id=1}::{msg_id=419295}: quickwit_actors::actor_handle: actor-exit-without-success actor="Packager-nameless-H4KW" exit_status=Killed
    2022-05-27T06:28:15.217Z ERROR {actor=quickwit_indexing::actors::indexing_server::IndexingServer}:{msg_id=1}::{msg_id=419295}: quickwit_actors::actor_handle: actor-exit-without-success actor="Uploader-blue-HXWt" exit_status=Killed
    2022-05-27T06:28:15.217Z ERROR {actor=quickwit_indexing::actors::indexing_server::IndexingServer}:{msg_id=1}::{msg_id=419295}: quickwit_actors::actor_handle: actor-exit-without-success actor="Publisher-icy-tc56" exit_status=Failure(Failed to publish splits.
    
    Caused by:
        0: Publish checkpoint delta overlaps with the current checkpoint: IncompatibleCheckpointDelta { partition_id: PartitionId("0000000000"), current_position: Offset("00000000025977904298"), delta_position_from: Offset("00000000025977865375") }.
        1: IncompatibleChkptDelta at partition: PartitionId("0000000000") cur_pos:Offset("00000000025977904298") delta_pos:Offset("00000000025977865375"))
    2022-05-27T06:28:15.217Z ERROR {actor=quickwit_indexing::actors::indexing_server::IndexingServer}:{msg_id=1}::{msg_id=419295}: quickwit_actors::actor_handle: actor-exit-without-success actor="GarbageCollector-snowy-TxZ8" exit_status=Killed
    2022-05-27T06:28:15.217Z ERROR {actor=quickwit_indexing::actors::indexing_server::IndexingServer}:{msg_id=1}::{msg_id=419295}: quickwit_actors::actor_handle: actor-exit-without-success actor="MergeSplitDownloader-weathered-FyOq" exit_status=Killed
    

    Configuration:

    1. quickwit 0.2.1
    2. quickwit.yaml
    version: 0
    node_id: $POD_NAME
    listen_address: 0.0.0.0
    rest_listen_port: 7280
    #peer_seeds:
    #  -
    #  -
    #data_dir: /data/quickwit
    metastore_uri: postgres://quickwit:[email protected]:5432/quickwit
    default_index_root_uri: s3://quickwit/indexes/
    
    1. index.json
    version: 0
    
    index_id: clickhouse
    
    index_uri: s3://quickwit/indexes/clickhouse
    
    doc_mapping:
      field_mappings:
        - name: id
          type: u64
          fast: true
        - name: created_at
          type: i64
          fast: true
        - name: _log_
          type: text
          tokenizer: default
          record: position
    
    indexing_settings:
      timestamp_field: created_at
    
    search_settings:
      default_search_fields: [_log_]
    
    sources:
      - source_id: quickwit
        source_type: kafka
        params:
          topic: production
          client_params:
            group.id: quickwit
            bootstrap.servers: 192.168.100.1:9092,192.168.100.2:9092,192.168.100.3:9082
    
    bug 
    opened by gnufree 19
  • Offer a way to select a subset of kafka partition in a kafka source

    Offer a way to select a subset of kafka partition in a kafka source

    The objective would be to allow having a larger indexing throughput by running K indexing pipelines for a single index.

    The selector coudl be k % N, or a list of partitions maybe?

    enhancement 
    opened by fulmicoton 17
  • add a chinese tokenizer

    add a chinese tokenizer

    Description

    This adds a simple tokenizer for CJK. Before, something like "你好世界" (hello world) would be a single token because it contains no whitespace. This means searching for "你好" would yield no result.

    A more intelligent tokenizer would probably split in two tokens (hello, world). This tokenizer simply split at each char, creating 4 tokens. This is much faster at indexing, but requires using a phrase query to match a word written as two or more chars.

    fix #1979

    How was this PR tested?

    Some tests added for the tokenizer, and a manual test by indexing the wiki-articles-10000 dataset, using the new tokenizer for the body field and searching for "毛藝" (name of a Chinese gymnast), "毛" (first half), "藝" (2nd half) and "藝毛" (wrong order):

    • "毛藝": yield a doc before and after
    • "毛": yield a doc only after
    • "藝": yield a doc only after
    • "藝毛": yield nothing
    opened by trinity-1686a 16
  • Add CSV/RowBinary output format to Search API

    Add CSV/RowBinary output format to Search API

    Is your feature request related to a problem? Please describe. We want Quickwit to be easily integrated into row-based engines like SQL databases.

    Describe the solution you'd like Exposing a CSV and a row binary format that a user can choose with a query param format would be sufficient.

    CSV format: https://datatracker.ietf.org/doc/html/rfc4180 RowBinary format: to define.

    enhancement 
    opened by fmassot 15
  • Exact match doesn't seem to work

    Exact match doesn't seem to work

    Describe the bug

    The exact search doesn't seem to be working.

    Steps to reproduce (if applicable) Steps to reproduce the behavior:

     ▲ quickwit index search --index-id wikipedia --metastore-uri file://$(pwd)/wikipedia --query 'title:apollo AND 11' | jq '.hits[].title[]'
    "Apollo"
    "Apollo 11"
    "Apollo 8"
    "Apollo program"
    "Apollo 13"
    "Apollo 7"
    "Apollo 9"
    "Apollo 1"
    "Apollo 10"
    "Apollo 12"
    "Apollo 14"
    "Apollo 15"
    "Apollo 16"
    "Apollo 17"
    "List of Apollo astronauts"
    "Apollo, Pennsylvania"
    "Apollo 13 (film)"
    "Apollo Lunar Module"
    "Apollo Guidance Computer"
    "Apollo 4"
    

    Okay, so it seems we've found what we're looking for as a second result. However, since the article as literally named Apollo 11 we should be able to perform what (according to quickwit's documentation) seems to be an exact search:

    ▲ quickwit index search --index-id wikipedia --metastore-uri file://$(pwd)/wikipedia --query 'title:"Apollo 11"' | jq '.hits[].title[]'
    

    Expected behavior

    The "Apollo 11" result should be showing up.

    System configuration:

    60f897c0f49b4a920948b2bb98ca081f5557ed22 built from source on Linux, rustc 1.56.1

    Additional context

    bug 
    opened by mrusme 14
  • Integrate pull request preview environments

    Integrate pull request preview environments

    Is your feature request related to a problem? Please describe. I would like to support Quickwit by implementing Uffizzi preview environments. Disclaimer: I work on Uffizzi.

    Uffizzi is a Open Source full stack previews engine and our platform is available completely free for Quickwit (and all open source projects). This will provide maintainers with preview environments of every PR in the cloud, which enables faster iterations and reduces time to merge. You can see the open source repos which are currently using Uffizzi over here

    Uffizzi is purpose-built for the task of previewing PRs and it integrates with your workflow to deploy preview environments in the background without any manual steps for maintainers or contributors.

    I can go ahead and create an Initial PoC for you right away if you think there is value in this proposal.

    • [ ] Initial PoC
    enhancement 
    opened by waveywaves 0
  • Large actor scheduler refactoring

    Large actor scheduler refactoring

    The scheduler is not a actor in the sense of the actor framework anymore.

    It removes the necessity to create some fake scheduler mailbox to spawn the scheduler itself.

    The scheduler also now has an improved logic to simulate time shift.

    It only jumps forward when no actors has any work to do. Provided all of the processing is done in actor, the results should be rigorously the same as if someone used time::sleep... Only faster.

    opened by fulmicoton 0
  • Fix duplicate fields in editor auto-completion

    Fix duplicate fields in editor auto-completion

    Ensure an index is registered in the query editor component (Monaco editor) only once.

    Manually reproduced and tested. Closes https://github.com/quickwit-oss/quickwit/issues/2615

    opened by evanxg852000 0
  •  full-text search with clickhouse

    full-text search with clickhouse

    When combined with CK, from the relevant guidance documents, does it need data to be stored in CK and Quickwit at the same time to achieve fast full-text search? For local testing, the data is only stored in CK, and cannot be queried through the API interface provided by QuickWit

    enhancement 
    opened by qiyuxin 6
  • feature: added safe mode to delete index

    feature: added safe mode to delete index

    Description

    The CLI interface of quickwit has a subcommand for deleting an index, given it's ID. The subcommand, except for dry-run, has no safety built-in. I thought about putting a --yes flag, like other subcommands in the CLI, but I feared that this may be disrupting for existing client's pipelines. This seems a more sensible choice because it doesn't interfere with existing processes in place and it can be used only where is necessary. It solves issue #2201.

    How was this PR tested?

    Added the relevant flags in the existing tests.

    opened by druzn3k 1
Releases(v0.4.0)
Owner
Quickwit Inc.
Quickwit Inc.
A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy

A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy

Datafuse Labs 5k Jan 9, 2023
New generation decentralized data warehouse and streaming data pipeline

World's first decentralized real-time data warehouse, on your laptop Docs | Demo | Tutorials | Examples | FAQ | Chat Get Started Watch this introducto

kamu 184 Dec 22, 2022
This library provides a data view for reading and writing data in a byte array.

Docs This library provides a data view for reading and writing data in a byte array. This library requires feature(generic_const_exprs) to be enabled.

null 2 Nov 2, 2022
High-performance runtime for data analytics applications

Weld Documentation Weld is a language and runtime for improving the performance of data-intensive applications. It optimizes across libraries and func

Weld 2.9k Dec 28, 2022
A high-performance, high-reliability observability data pipeline.

Quickstart • Docs • Guides • Integrations • Chat • Download What is Vector? Vector is a high-performance, end-to-end (agent & aggregator) observabilit

Timber 12.1k Jan 2, 2023
Rayon: A data parallelism library for Rust

Rayon Rayon is a data-parallelism library for Rust. It is extremely lightweight and makes it easy to convert a sequential computation into a parallel

null 7.8k Jan 8, 2023
DataFrame / Series data processing in Rust

black-jack While PRs are welcome, the approach taken only allows for concrete types (String, f64, i64, ...) I'm not sure this is the way to go. I want

Miles Granger 30 Dec 10, 2022
A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, written in Rust

Datafuse Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture Datafuse is a Real-Time Data Processing & Analytics DBMS wit

Datafuse Labs 5k Jan 4, 2023
ConnectorX - Fastest library to load data from DB to DataFrames in Rust and Python

ConnectorX enables you to load data from databases into Python in the fastest and most memory efficient way.

SFU Database Group 939 Jan 5, 2023
A highly efficient daemon for streaming data from Kafka into Delta Lake

A highly efficient daemon for streaming data from Kafka into Delta Lake

Delta Lake 172 Dec 23, 2022
A cross-platform library to retrieve performance statistics data.

A toolkit designed to be a foundation for applications to monitor their performance.

Lark Technologies Pte. Ltd. 155 Nov 12, 2022
Fill Apache Arrow record batches from an ODBC data source in Rust.

arrow-odbc Fill Apache Arrow arrays from ODBC data sources. This crate is build on top of the arrow and odbc-api crate and enables you to read the dat

Markus Klein 21 Dec 27, 2022
Analysis of Canadian Federal Elections Data

Canadian Federal Elections election is a small Rust program for processing vote data from Canadian Federal Elections. After building, see election --h

Colin Woodbury 2 Sep 26, 2021
📊 Cube.js — Open-Source Analytics API for Building Data Apps

?? Cube.js — Open-Source Analytics API for Building Data Apps

Cube.js 14.4k Jan 8, 2023
Provides a way to use enums to describe and execute ordered data pipelines. 🦀🐾

enum_pipline Provides a way to use enums to describe and execute ordered data pipelines. ?? ?? I needed a succinct way to describe 2d pixel map operat

Ben Greenier 0 Oct 29, 2021
Perhaps the fastest and most memory efficient way to pull data from PostgreSQL into pandas and numpy. 🚀

flaco Perhaps the fastest and most memory efficient way to pull data from PostgreSQL into pandas and numpy. ?? Have a gander at the initial benchmarks

Miles Granger 14 Oct 31, 2022
AppFlowy is an open-source alternative to Notion. You are in charge of your data and customizations

AppFlowy is an open-source alternative to Notion. You are in charge of your data and customizations. Built with Flutter and Rust.

null 30.7k Jan 7, 2023
An example repository on how to start building graph applications on streaming data. Just clone and start building 💻 💪

An example repository on how to start building graph applications on streaming data. Just clone and start building ?? ??

Memgraph 40 Dec 20, 2022
Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing

Apache Arrow Powering In-Memory Analytics Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enabl

The Apache Software Foundation 10.9k Jan 6, 2023