Making Postgres and Elasticsearch work together like it's 2021

Overview

logo

Making Postgres and Elasticsearch work together like it's 2021

cargo test --all Twitter Follow

Readme

ZomboDB brings powerful text-search and analytics features to Postgres by using Elasticsearch as an index type. Its comprehensive query language and SQL functions enable new and creative ways to query your relational data.

From a technical perspective, ZomboDB is a 100% native Postgres extension that implements Postgres' Index Access Method API. As a native Postgres index type, ZomboDB allows you to CREATE INDEX ... USING zombodb on your existing Postgres tables. At that point, ZomboDB takes over and fully manages the remote Elasticsearch index and guarantees transactionally-correct text-search query results.

ZomboDB is fully compatible with all of Postgres' query plan types and most SQL commands such as CREATE INDEX, COPY, INSERT, UPDATE, DELETE, SELECT, ALTER, DROP, REINDEX, (auto)VACUUM, etc.

It doesn’t matter if you’re using an Elasticsearch cloud provider or managing your own cluster -- ZomboDB communicates with Elasticsearch via its RESTful APIs so you’re covered either way.

ZomboDB allows you to use the power and scalability of Elasticsearch directly from Postgres. You don’t have to manage transactions between Postgres and Elasticsearch, asynchronous indexing pipelines, complex reindexing processes, or multiple data-access code paths -- ZomboDB does it all for you.

Quick Links

Features

Current Limitations

  • Only one ZomboDB index per table
  • ZomboDB indexes with predicates (ie, partial indexes) are not supported
  • CREATE INDEX CONCURRENTLY is not supported

These limitations may be addressed in future versions of ZomboDB.

System Requirements

Product Version
Postgres 10.x, 11.x, 12.x, 13.x
Elasticsearch 7.x

Sponsorship and Downloads

Please see https://github.com/sponsors/eeeebbbbrrrr for sponsorship details. Your sponsorship at any tier is greatly appreciated and helps keep ZomboDB moving forward.

Note that ZomboDB is only available in binary form for certain sponsor tiers.

When you become a sponsor at a tier that provides binary downloads, please request a download key from https://www.zombodb.com/services/. Please do the same if you sponsor a tier that provides access to ZomboDB's private Discord server.

Quick Overview

Note that this is just a quick overview. Please read the getting started tutorial for more details.

Create the extension:

CREATE EXTENSION zombodb;

Create a table:

CREATE TABLE products (
    id SERIAL8 NOT NULL PRIMARY KEY,
    name text NOT NULL,
    keywords varchar(64)[],
    short_summary text,
    long_description zdb.fulltext, 
    price bigint,
    inventory_count integer,
    discontinued boolean default false,
    availability_date date
);

-- insert some data

Create a ZomboDB index:

CREATE INDEX idxproducts 
          ON products 
       USING zombodb ((products.*)) 
        WITH (url='localhost:9200/');

Query it:

SELECT * 
  FROM products 
 WHERE products ==> '(keywords:(sports OR box) OR long_description:"wooden away"~5) AND price:[1000 TO 20000]';

Contact Information

History

The name is an homage to zombo.com and its long history of continuous self-affirmation.

Historically, ZomboDB began in 2013 by Technology Concepts & Design, Inc as a closed-source effort to provide transaction safe text-search on top of Postgres tables. While Postgres' "tsearch" features are useful, they're not necessarily adequate for 200 column-wide tables with 100M rows, each containing large text content.

Initially designed on-top of Postgres' Foreign Data Wrapper API, ZomboDB quickly evolved into an index type so that queries are MVCC-safe and standard SQL can be used to query and manage indices.

Elasticsearch was chosen as the backing search index because of its horizontal scaling abilities, performance, and general ease of use.

ZomboDB was open-sourced in July 2015 and has since been used in numerous production systems of various sizes and complexity.

License

Copyright 2018-2021 ZomboDB, LLC

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Comments
  • Investigate mapping some

    Investigate mapping some "PostGIS" data types to ES "geo shape" type

    I'd like to see how difficult it might be to translate some of the basic PostGIS types into ES "geo shape" types, just for indexing purposes.

    I'm not aiming for complete compatibility with PostGIS (at least not right now), but being able to index some of its data types in a way that can be queried using ZDB's direct JSON QueryDSL syntax would likely be useful.

    enhancement (merged) v4.0 
    opened by eeeebbbbrrrr 92
  • Corruption in ES Index

    Corruption in ES Index

    So we were running the below query to verify some of the "health" of our index and noticed this:

    db=# select *,case when zdb=estimate and zdb=pg then 'ok' else 'ERROR' end as my_stat from
     (select count(*) as zdb from gc.cv_data where zdb('gc.cv_data',cv_data.ctid)==>'pk_data_id: 1 /TO/ 100000')z
     ,(select zdb_estimate_count as estimate from zdb_estimate_count('gc.cv_data','pk_data_id: 1 /TO/ 100000'))e
     ,(select count(*) as pg from gc.cv_data WHERE pk_data_id BETWEEN 1 AND 100000)p
     ,(select zdb_estimate_count as null_estimate from zdb_estimate_count('gc.cv_data','pk_data_id:NULL'))ne;
      zdb   | estimate |  pg   | null_estimate | my_stat
    --------+----------+-------+---------------+---------
     100021 |    99996 | 99996 |             0 | ERROR
    (1 row)
    
    Time: 241.582 ms
    

    The zdb count is off!?!

    We narrowed down one of the pk_data_id's:

    db=# SELECT ctid, xmin, xmax, pk_data_id FROM gc.cv_data WHERE pk_data_id = 97302;
       ctid   |   xmin    |   xmax    | pk_data_id
    ----------+-----------+-----------+------------
     (8733,4) | 147604115 | 188561882 |      97302
    (1 row)
    
    Time: 0.356 ms
    
    
    db=# SELECT ctid, xmin, xmax, pk_data_id FROM gc.cv_data WHERE zdb('gc.cv_data', ctid) ==> 'pk_data_id: 97302 /TO/ 97302';
         ctid     |   xmin    |   xmax    | pk_data_id
    --------------+-----------+-----------+------------
     (8733,4)     | 147604115 | 188561882 |      97302
     (1322553,11) | 188570594 | 188570594 |    9706852
    (2 rows)
    
    Time: 15.467 ms
    db=# SELECT ctid, xmin, xmax, pk_data_id FROM gc.cv_data WHERE zdb('gc.cv_data', ctid) ==> 'pk_data_id: 97302';
         ctid     |   xmin    |   xmax    | pk_data_id
    --------------+-----------+-----------+------------
     (8733,4)     | 147604115 | 188561882 |      97302
     (1322553,11) | 188570594 | 188570594 |    9706852
    (2 rows)
    
    Time: 12.808 ms
    

    Autovacuum/vacuum are up-to-date. What's going on here? What other info can we provide? This is BAD!

    bug v3.1.15 
    opened by taspotts 79
  • WIP:  ZomboDB v4.0  (Support Elasticsearch 2.4)

    WIP: ZomboDB v4.0 (Support Elasticsearch 2.4)

    This PR is based on the work done by @pashinin in PR #140.

    It's been upgraded to support ES 2.4.1 along with including all the changes in ZDB v3.1.

    I need to fix a few junit/regression tests that are failing and spend some time analyzing the performance of ES 2.4 relative to 1.7.5.

    opened by eeeebbbbrrrr 47
  • [question] how to use zdb.score with an index on a function

    [question] how to use zdb.score with an index on a function

    ZomboDB version: 3000.0.0-alpha3 Postgres version: 12.4 Elasticsearch version: 7.10.1

    Problem Description:

    when using zdb.score() with a zombodb index using a function, it returns 0 for each record

    Table Schema/Index Definition:

    To reproduce the problem:

    drop table if exists test_zombodb cascade;
    create table test_zombodb ( id integer not null primary key);
    
    insert into test_zombodb (id) values (1);
    
    drop type if exists test_zombodb_es cascade;
    create type test_zombodb_es as (id integer, text text);
    
    drop function if exists index(test_zombodb);
    
    create function index(test_zombodb test_zombodb) returns test_zombodb_es as $$
      select row(test_zombodb.id, 'some text')::test_zombodb_es
    $$ language sql immutable strict;
    
    drop index if exists idx_es_test_zombodb;
    
    create index idx_es_test_zombodb
      on test_zombodb using zombodb(index(test_zombodb))
      with (url='http://elasticsearch:9200/');
    
    select zdb.score(ctid), index(test_zombodb) from test_zombodb where index(test_zombodb) ==> 'text'
    

    Output from select zdb.index_mapping('index_name');:

    {"105155.107344.117161.117170": {"mappings": {"properties": {"id": {"type": "integer"}, "text": {"type": "text", "copy_to": ["zdb_all"], "analyzer": "zdb_standard", "fielddata": true, "index_prefixes": {"max_chars": 5, "min_chars": 2}}, "zdb_all": {"type": "text", "analyzer": "zdb_all_analyzer"}, "zdb_cmax": {"type": "integer"}, "zdb_cmin": {"type": "integer"}, "zdb_ctid": {"type": "long"}, "zdb_xmax": {"type": "long"}, "zdb_xmin": {"type": "long"}, "zdb_aborted_xids": {"type": "long"}}, "date_detection": false, "dynamic_templates": [{"strings": {"mapping": {"type": "keyword", "copy_to": "zdb_all", "normalizer": "lowercase", "ignore_above": 10922}, "match_mapping_type": "string"}}, {"dates_times": {"mapping": {"type": "date", "format": "strict_date_optional_time||epoch_millis||HH:mm:ss.S||HH:mm:ss.SX||HH:mm:ss.SS||HH:mm:ss.SSX||HH:mm:ss.SSS||HH:mm:ss.SSSX||HH:mm:ss.SSSS||HH:mm:ss.SSSSX||HH:mm:ss.SSSSS||HH:mm:ss.SSSSSX||HH:mm:ss.SSSSSS||HH:mm:ss.SSSSSSX", "copy_to": "zdb_all"}, "match_mapping_type": "date"}}, {"objects": {"mapping": {"type": "nested", "include_in_parent": true}, "match_mapping_type": "object"}}], "numeric_detection": false}}}
    
    bug (merged) v3000.0.0-alpha3 
    opened by mathroc 41
  • _all field misbehaves when used across index links

    _all field misbehaves when used across index links

    I have four tables, each with ZDB index created. I added "index links" and created view for search as stated in INDEX OPTIONS. And here is the situation: As soon as I search like this:

    select * from test_view where zdb ==> 'John'
    

    Everything is fine, I get all records where John is present no matter of index. But if I change select to look like this:

    select * from test_view where zdb ==> 'John Doe'
    

    Then I get only records where John and Doe are present in the same index and not the records where John is in one index and Doe in another.

    One way to solve this is to create materialized view and build index on it, but refreshing 2.5M records materialized view takes approximately 5 minutes, that's really bad.

    So, question is like this: can I create a view where ZDB query will search for records where words can be in any of linked inexes without explicitly defining fields for search?

    ZomboDB 3.1.3, PostgreSQL 9.5.5, ElasticSearch 1.7.6.

    bug v3.1.7 v3.1.11 
    opened by Real-Gecko 39
  • Returning large results is slow

    Returning large results is slow

    ZomboDB version: zombodb_debian-buster_pg12-3000.0.0-alpha1_amd64 (self-built) Postgres version: 12.4 Elasticsearch version: 7.7.1

    Problem Description:

    We'd like to know if there is a way to activate debug logging to know what's happening behind the scenes.

    Our problem is that we're trying to run a long query and we don't know where most of the time is spent. We tried to execute the ES subqueries independently and they're fast so I don't think ES is the problem. Plus, the same query is running up to 80 times slower in zombo than in ES. After the ES queries are done, postgres runs but using only ~5% of the cpu.

    We already tried zdb.log_level but it doesn't seem to change much.

    bug (merged) v3000.0.0-alpha3 
    opened by Jasopaum 37
  • Using additional conditions in WHERE together with zombodb

    Using additional conditions in WHERE together with zombodb

    ZomboDB version: v3000.0.11 Postgres version: 13.4 Elasticsearch version: 8.1.2

    Problem Description: Hello again! I'm trying to use a zombodb index with other btree type indexes in the one query. In EXPLAIN, I expect to see something of the two options below:

    1. Bitmap Index Scan between zombodb and btree indexes;
    2. Index Scan by zombodb index and then Filter instead of using btree index.

    However, I see a completely different situation instead. When trying to search only the zombodb index:

    EXPLAIN
    SELECT * FROM test_tbl
      WHERE test_fts_idx_func(col_a) ==> 'foo bar';
    

    Result:

    Index Scan using test_fts_idx on "test_tbl"  (cost=0.00..529103525.16 rows=6151442944 width=364)
      Index Cond: ("col_a" ==> '{""query_string"":{""query"":""foo bar""}}'::zdbquery)
    JIT:
      Functions: 4
      Options: Inlining true, Optimization true, Expressions true, Deforming true
    

    Everything is great now. When trying to use zombodb and btree index simultaneously (built by column col_b):

    EXPLAIN
    SELECT * FROM test_tbl
      WHERE test_fts_idx_func(col_a) ==> 'foo bar'
        AND col_b = 'foo bar';
    

    Result:

    Gather  (cost=1000.71..623028.95 rows=304674 width=364)
      Workers Planned: 48
      ->  Parallel Index Scan using test_fymd_idx on "test_tbl"  (cost=0.71..591561.55 rows=6347 width=364)
            Index Cond: ("col_b" = 'foo bar'::text)
            Filter: ("col_a" ==> '{""query_string"":{""query"":""foo bar""}}'::zdbquery)
    JIT:
      Functions: 6
      Options: Inlining true, Optimization true, Expressions true, Deforming true
    

    The zombodb index is not used. Same for additional conditions in WHERE, which do not affect other indexes (except zombodb):

    EXPLAIN
    SELECT * FROM test_tbl
      WHERE test_fts_idx_func(col_a) ==> 'foo bar'
        AND col_c = 5;
    

    Result:

    Gather  (cost=1000.00..505959721.55 rows=351657483 width=364)
      Workers Planned: 48
      ->  Parallel Seq Scan on "test_tbl"  (cost=0.00..470792973.25 rows=7326198 width=364)
            Filter: (("col_c" = 5) AND ("col_a" ==> '{""query_string"":{""query"":""foo bar""}}'::zdbquery))
    JIT:
      Functions: 4
      Options: Inlining true, Optimization true, Expressions true, Deforming true
    

    * test_fts_idx_func() here is used to index only one column according to the documentation.

    As we can see, the zombodb index is not involved in the query at all. Actually, the question is, is it even possible to use zombodb and btree indices together? Or, at least, is it possible to use a zombodb index with an additional condition in WHERE? The documentation says that this seems to be possible. However, I cannot achieve these results.

    bug (merged) v3000.0.12 
    opened by hyperion-cs 34
  • Intermittent index out of bounds error on write

    Intermittent index out of bounds error on write

    ZomboDB version: v3000.0.8 Postgres version: 13.7 Elasticsearch version: 7.17.0 Problem Description: We have an intermittent error that occurs on write.

    Error Message (if any):

    error: code=Some(200), {"error":null,"errors":true,"items":[{"update":{"error":{"type":"illegal_argument_exception","reason":"failed to execute script","caused_by":{"type":"script_exception","reason":"runtime error","script_stack":["java.base/jdk.internal.util.Preconditions.outOfBounds([Preconditions.java:64](http://preconditions.java:64/))","java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex([Preconditions.java:70](http://preconditions.java:70/))","java.base/jdk.internal.util.Preconditions.checkIndex([Preconditions.java:266](http://preconditions.java:266/))","java.base/java.util.Objects.checkIndex([Objects.java:359](http://objects.java:359/))","java.base/java.util.ArrayList.remove([ArrayList.java:504](http://arraylist.java:504/))","ctx._source.zdb_aborted_xids.remove(ctx._source.zdb_aborted_xids.indexOf(params.XID));","                                                                               ^---- HERE"],"script":"ctx._source.zdb_aborted_xids.remove(ctx._source.zdb_aborted_xids.indexOf(params.XID));","lang":"painless","position":{"offset":79,"start":0,"end":86},"caused_by":{"type":"index_out_of_bounds_exception","reason":"Index -1 out of bounds for length 2"}}}}}]}
    [1]     at Parser.parseErrorMessage (/home/brent/projects/steel-ui/steelhead-deploy/gosteelhead/node_modules/
    
    

    Table Schema/Index Definition: -- complex, not sure of a mwe.

    CREATE FUNCTION steelhead.search_recipe_node_idx(steelhead.recipe_node) RETURNS steelhead.search_recipe_node_idx_type IMMUTABLE STRICT LANGUAGE sql AS $$
    SELECT ROW (
               $1.id,
               $1.name,
               $1.description_markdown::text,
               $1.operator_input_json_schema::text,
               $1.derived_from,
               $1.treatment_id,
               'recipe node'
               )::steelhead.search_recipe_node_idx_type;
    $$;
    
    CREATE INDEX search_recipe_node_idx 
          ON steelhead.recipe_node 
          USING zombodb (steelhead.search_recipe_node_idx(recipe_node))
             WITH (url='http://elasticsearch:9200/');
    CREATE TYPE steelhead.search_work_order_idx_type AS (
        id integer, 
        name text, 
        id_in_domain text,
        customer_name text,
        received_order_id integer,
        recipe_id integer,
        product_id integer,
        created_at timestamptz,
        creator_id integer,
        custom_inputs text,
        entity text
    );  
    
    CREATE FUNCTION steelhead.search_work_order_idx(steelhead.work_order) RETURNS steelhead.search_work_order_idx_type IMMUTABLE STRICT LANGUAGE sql AS $$
    SELECT ROW (
               $1.id,
               $1.name,
               concat('WO',$1.id_in_domain::text),
               (select name from steelhead.customer where id = $1.customer_id),
               $1.received_order_id,
               $1.recipe_id,
               $1.product_id,
               $1.created_at,
               $1.creator_id,
                $1.custom_inputs,
               'work orders'
               )::steelhead.search_work_order_idx_type;
    $$;
    
    CREATE INDEX search_work_order_idx 
          ON steelhead.work_order 
          USING zombodb (steelhead.search_work_order_idx(work_order))
             WITH (url='http://elasticsearch:9200/');
    GRANT EXECUTE ON FUNCTION steelhead.search_work_order_idx to steelhead_authed;
    
    ALTER INDEX search_work_order_idx SET (options='parts_transfer:(id=<steelhead.parts_transfer.search_parts_transfer_idx>to_work_order_id),
            part_number:(parts_transfer.part_number_id=<steelhead.part_number.search_part_number_idx>id),
            creator:(creator_id=<steelhead.user.search_user_idx>id),
            received_order:(received_order_id=<steelhead.received_order.search_received_order_idx>id),
            ro_creator:(received_order.creator_id=<steelhead.user.search_user_idx>id),
            recipe:(recipe_id=<steelhead.recipe_node.search_recipe_node_idx>id),
            product:(product_id=<steelhead.product.search_product_idx>id)');
    
    

    Output from select zdb.index_mapping('index_name');: here is the output from the two indicies I would expect to be responsible.

    {"122906.123633.124008.130171": {"mappings": {"properties": {"id": {"type": "integer"}, "name": {"type": "text", "copy_to": ["zdb_all"], "analyzer": "zdb_standard", "fielddata": true}, "entity": {"type": "text", "copy_to": ["zdb_all"], "analyzer": "zdb_standard", "fielddata": true}, "zdb_all": {"type": "text", "analyzer": "zdb_all_analyzer"}, "zdb_cmax": {"type": "integer"}, "zdb_cmin": {"type": "integer"}, "zdb_ctid": {"type": "long"}, "zdb_xmax": {"type": "long"}, "zdb_xmin": {"type": "long"}, "derived_from": {"type": "integer"}, "treatment_id": {"type": "integer"}, "zdb_aborted_xids": {"type": "long"}, "description_markdown": {"type": "text", "copy_to": ["zdb_all"], "analyzer": "zdb_standard", "fielddata": true}, "operator_input_json_schema": {"type": "text", "copy_to": ["zdb_all"], "analyzer": "zdb_standard", "fielddata": true}}, "date_detection": false, "dynamic_templates": [{"strings": {"mapping": {"type": "keyword", "copy_to": "zdb_all", "normalizer": "lowercase", "ignore_above": 10922}, "match_mapping_type": "string"}}, {"dates_times": {"mapping": {"type": "keyword", "fields": {"date": {"type": "date", "format": "strict_date_optional_time||epoch_millis||HH:mm:ss.S||HH:mm:ss.SX||HH:mm:ss.SS||HH:mm:ss.SSX||HH:mm:ss.SSS||HH:mm:ss.SSSX||HH:mm:ss.SSSS||HH:mm:ss.SSSSX||HH:mm:ss.SSSSS||HH:mm:ss.SSSSSX||HH:mm:ss.SSSSSS||HH:mm:ss.SSSSSSX"}}, "copy_to": "zdb_all"}, "match_mapping_type": "date"}}, {"objects": {"mapping": {"type": "nested", "include_in_parent": true}, "match_mapping_type": "object"}}], "numeric_detection": false}}}
    
    {"122906.123633.124067.130198": {"mappings": {"properties": {"id": {"type": "integer"}, "name": {"type": "text", "copy_to": ["zdb_all"], "analyzer": "zdb_standard", "fielddata": true}, "entity": {"type": "text", "copy_to": ["zdb_all"], "analyzer": "zdb_standard", "fielddata": true}, "zdb_all": {"type": "text", "analyzer": "zdb_all_analyzer"}, "zdb_cmax": {"type": "integer"}, "zdb_cmin": {"type": "integer"}, "zdb_ctid": {"type": "long"}, "zdb_xmax": {"type": "long"}, "zdb_xmin": {"type": "long"}, "recipe_id": {"type": "integer"}, "created_at": {"type": "keyword", "fields": {"date": {"type": "date"}}, "copy_to": ["zdb_all"]}, "creator_id": {"type": "integer"}, "product_id": {"type": "integer"}, "id_in_domain": {"type": "text", "copy_to": ["zdb_all"], "analyzer": "zdb_standard", "fielddata": true}, "custom_inputs": {"type": "text", "copy_to": ["zdb_all"], "analyzer": "zdb_standard", "fielddata": true}, "customer_name": {"type": "text", "copy_to": ["zdb_all"], "analyzer": "zdb_standard", "fielddata": true}, "zdb_aborted_xids": {"type": "long"}, "received_order_id": {"type": "integer"}}, "date_detection": false, "dynamic_templates": [{"strings": {"mapping": {"type": "keyword", "copy_to": "zdb_all", "normalizer": "lowercase", "ignore_above": 10922}, "match_mapping_type": "string"}}, {"dates_times": {"mapping": {"type": "keyword", "fields": {"date": {"type": "date", "format": "strict_date_optional_time||epoch_millis||HH:mm:ss.S||HH:mm:ss.SX||HH:mm:ss.SS||HH:mm:ss.SSX||HH:mm:ss.SSS||HH:mm:ss.SSSX||HH:mm:ss.SSSS||HH:mm:ss.SSSSX||HH:mm:ss.SSSSS||HH:mm:ss.SSSSSX||HH:mm:ss.SSSSSS||HH:mm:ss.SSSSSSX"}}, "copy_to": "zdb_all"}, "match_mapping_type": "date"}}, {"objects": {"mapping": {"type": "nested", "include_in_parent": true}, "match_mapping_type": "object"}}], "numeric_detection": false}}}
    
    
    bug (merged) v3000.1.1 
    opened by bhalonen 33
  • Elasticsearch 5 support

    Elasticsearch 5 support

    Now that the awesome Lucene 6 release is out, Elasticsearch 5 is being prepared. So parallel to working on Postgres 9.5, ZomboDB support should be started in order to allow using these great releases together to provide the fastest full-text search for Postgres-based applications.

    opened by rleonhardt 30
  • #EXPAND with 2 tables

    #EXPAND with 2 tables

    When there are 2 tables in a view...using ES options to relate the indexes.... and the EXPAND field is in the secondary table... AND the initial criteria is on a field in the secondary table... the record set does not EXPAND.

    create schema test_expand;
    
    create table test_expand.data(pk_data bigint, data_family_group bigint, data_first_name text, constraint idx_test_expand_data_pkey primary key (pk_data));
    
    create table test_expand.var(pk_var bigint, var_family_group bigint, var_pets text, constraint idx_test_expand_var_pkey primary key (pk_var));
    
    insert into test_expand.data(pk_data, data_family_group, data_first_name) values(1,1,'mark'); 
    insert into test_expand.data(pk_data, data_family_group, data_first_name) values(2,1,'eric'); 
    insert into test_expand.data(pk_data, data_family_group, data_first_name) values(3,NULL,'terry'); 
    
    
    insert into test_expand.var(pk_var, var_family_group, var_pets) values(1,1,'dogs'); 
    insert into test_expand.var(pk_var, var_family_group, var_pets) values(2,1,'cats'); 
    insert into test_expand.var(pk_var, var_pets) values(3,'minions');
    
    CREATE INDEX es_test_expand_var ON test_expand.var USING zombodb (zdb('test_expand.var'::regclass, ctid), zdb(var.*)) 
    	WITH (url='http://###.##.##.##:####/', preference=_primary, shards='3', replicas='0');
    
    CREATE INDEX es_test_expand_data ON test_expand.data USING zombodb (zdb('test_expand.data'::regclass, ctid), zdb(data.*)) 
    	WITH (url='http://###.##.##.##:####/',options='pk_data = <var.es_test_expand_var>pk_var', preference=_primary, shards='3', replicas='0');
    	
    CREATE OR REPLACE VIEW test_expand.consolidated_record_view AS  SELECT data.pk_data
    	,data.data_family_group
    	,data.data_first_name
    	,var.var_family_group
    	,var.var_pets
        ,zdb('test_expand.data'::regclass, data.ctid) AS zdb
       FROM test_expand.data
         LEFT JOIN test_expand.var ON data.pk_data = var.pk_var;
    
    SELECT * FROM test_expand.consolidated_record_view;
    	 
    SELECT * FROM test_expand.consolidated_record_view where zdb==>'( (#expand<data_family_group=<this.index>data_family_group>( ( data_first_name = "MARK" ) AND )) )';
    SELECT * FROM test_expand.consolidated_record_view where zdb==>'( (#expand<var_family_group=<this.index>var_family_group>( ( data_first_name = "MARK" ) AND )) )';
    
    SELECT * FROM test_expand.consolidated_record_view where zdb==>'( (#expand<data_family_group=<this.index>data_family_group>( ( var_pets = "DOGS" ) AND )) )';
    SELECT * FROM test_expand.consolidated_record_view where zdb==>'( (#expand<var_family_group=<this.index>var_family_group>( ( var_pets = "DOGS" ) AND )) )';
    

    RESULTS:

    dev1_db=# SELECT * FROM test_expand.consolidated_record_view;
     pk_data | data_family_group | data_first_name | var_family_group | var_pets |  zdb
    ---------+-------------------+-----------------+------------------+----------+-------
           1 |                 1 | mark            |                1 | dogs     | (0,1)
           2 |                 1 | eric            |                1 | cats     | (0,2)
           3 |                   | terry           |                  | minions  | (0,3)
    (3 rows)
    
    dev1_db=# SELECT * FROM test_expand.consolidated_record_view where zdb==>'( (#expand<data_family_group=<this.index>data_family_group>( ( data_first_name = "MARK" ) AND )) )';
     pk_data | data_family_group | data_first_name | var_family_group | var_pets |  zdb
    ---------+-------------------+-----------------+------------------+----------+-------
           1 |                 1 | mark            |                1 | dogs     | (0,1)
           2 |                 1 | eric            |                1 | cats     | (0,2)
    (2 rows)
    
    dev1_db=# SELECT * FROM test_expand.consolidated_record_view where zdb==>'( (#expand<var_family_group=<this.index>var_family_group>( ( data_first_name = "MARK" ) AND )) )';
     pk_data | data_family_group | data_first_name | var_family_group | var_pets |  zdb
    ---------+-------------------+-----------------+------------------+----------+-------
           1 |                 1 | mark            |                1 | dogs     | (0,1)
           2 |                 1 | eric            |                1 | cats     | (0,2)
    (2 rows)
    
    dev1_db=# ^C
    dev1_db=# SELECT * FROM test_expand.consolidated_record_view where zdb==>'( (#expand<data_family_group=<this.index>data_family_group>( ( var_pets = "DOGS" ) AND )) )';
     pk_data | data_family_group | data_first_name | var_family_group | var_pets |  zdb
    ---------+-------------------+-----------------+------------------+----------+-------
           1 |                 1 | mark            |                1 | dogs     | (0,1)
           2 |                 1 | eric            |                1 | cats     | (0,2)
    (2 rows)
    
    dev1_db=# SELECT * FROM test_expand.consolidated_record_view where zdb==>'( (#expand<var_family_group=<this.index>var_family_group>( ( var_pets = "DOGS" ) AND )) )';
     pk_data | data_family_group | data_first_name | var_family_group | var_pets |  zdb
    ---------+-------------------+-----------------+------------------+----------+-------
           1 |                 1 | mark            |                1 | dogs     | (0,1)
    (1 row)
    
    bug v3.1.2 
    opened by MarkMatte 29
  • dsl subquery support

    dsl subquery support

    So I have this problem with accurate + fast counting. I have this table that keeps track of who is online right now ( basically just ids ) And this table is fairly small hundreds to thousands.

    And I have a table with all of the meta data about all of the users. Large table - Millions. This table has a zombodb index on it. When querying these tables individually, it is fine. I can apply a limit in ES and is generally not a problem.

    WHERE table ==> dsl.limit(
     100
    , dsl.bool(...)
    )
    

    If I need to join these two tables, I can't apply a limit through elastic search and it is returning everything in the index, on a table with 1M+ things, this is pretty slow, especailly when users first trigger the query, there are no filters to pass through to ES, so it basically just returns everything

    SELECT ou.*
    FROM online_users ou
    INNER JOIN user_attributes ua on
      ou.id = ua.user_id
    WHERE ua ==> '{"bool": {"must": []}'
    

    What I tried to do was pass IDs of the smaller table as subquery

    SELECT ou.*
    FROM online_users ou
    INNER JOIN user_attributes ua on ou.id = ua.user_id and ou.organization_id = $1
    WHERE ua ==>
     dsl.bool(
      dsl.must(
      	dsl.terms(
              'user_id'
            , (SELECT id FROM online_users where organzation_id = $1 )
           )
        )
      )
    

    but that doesn't work.

    documentation 
    opened by esatterwhite 26
  • Function zdb.tally ignores 'stem' parameter for date type

    Function zdb.tally ignores 'stem' parameter for date type

    ZomboDB version: 3000.1.7 Postgres version: 14 Elasticsearch version: 7.17.0

    Problem Description: New in version 3000.1.7 appears to provide zdb.tally for date field type in raw format that can simply be cast to date. However, the 'stem' parameter does not appear to work as expected.

    Error Message (if any): N/A

    Table Schema/Index Definition:

    CREATE TABLE termlist_issue (pkey serial8, date_combined date);
    CREATE INDEX idxtermlistissue ON termlist_issue USING zombodb ((termlist_issue.*)) WITH (url='http://172.20.40.142:8082/');
    
    INSERT INTO termlist_issue (date_combined) VALUES 
      ('2020-05-10'),('2021-08-01'),('2022-03-13'),('1999-12-31'),('1976-07-04');
    

    Reproduce steps:

    SELECT * FROM termlist_issue;
     pkey | date_combined
    ------+---------------
        1 | 2020-05-10
        2 | 2021-08-01
        3 | 2022-03-13
        4 | 1999-12-31
        5 | 1976-07-04
    (5 rows)
    
    -- Correct output, stem is '^.*'
    SELECT term::date as term, count, term AS exact_term
      FROM zdb.tally('termlist_issue'::regclass, 'date_combined', 'FALSE', '^.*', ''::zdbquery, 5000, 'term'::termsorderby); 
    
      term    | count |        exact_term
    ------------+-------+--------------------------
    1976-07-04 |     1 | 1976-07-04T00:00:00.000Z
    1999-12-31 |     1 | 1999-12-31T00:00:00.000Z
    2020-05-10 |     1 | 2020-05-10T00:00:00.000Z
    2021-08-01 |     1 | 2021-08-01T00:00:00.000Z
    2022-03-13 |     1 | 2022-03-13T00:00:00.000Z
    (5 rows)
    
    -- Stem is '^1.*' - expecting two rows
    SELECT term::date as term, count, term AS exact_term
     FROM zdb.tally('termlist_issue'::regclass, 'date_combined', 'FALSE', '^1.*', ''::zdbquery, 5000, 'term'::termsorderby);  
    
     term    | count |        exact_term
    ------------+-------+--------------------------
    1976-07-04 |     1 | 1976-07-04T00:00:00.000Z
    1999-12-31 |     1 | 1999-12-31T00:00:00.000Z
    2020-05-10 |     1 | 2020-05-10T00:00:00.000Z
    2021-08-01 |     1 | 2021-08-01T00:00:00.000Z
    2022-03-13 |     1 | 2022-03-13T00:00:00.000Z
    (5 rows)
    
    -- Attempt to use `.date` subfield just in case
    SELECT term::date as term, count, term AS exact_term
     FROM zdb.tally('termlist_issue'::regclass, 'date_combined.date', 'FALSE', '^1.*', ''::zdbquery, 5000, 'term'::termsorderby);
    
     term    | count |        exact_term
    ------------+-------+--------------------------
    1976-07-04 |     1 | 1976-07-04T00:00:00.000Z
    1999-12-31 |     1 | 1999-12-31T00:00:00.000Z
    2020-05-10 |     1 | 2020-05-10T00:00:00.000Z
    2021-08-01 |     1 | 2021-08-01T00:00:00.000Z
    2022-03-13 |     1 | 2022-03-13T00:00:00.000Z
    (5 rows) 
    
    opened by Shigoki 0
  • Nested Proximity Search Hit Highlighting Incorrect

    Nested Proximity Search Hit Highlighting Incorrect

    ZomboDB version: 3000.1.7 Postgres version: 14 Elasticsearch version: 7.17.0

    Problem Description: Searches using nested proximity search ie. '(foo w/7 (bar w/7 gumby))' retrieve rows correctly.

    The function zdb.highlight_document() does not appear to highlight terms according to the nested grouping boundaries.

    Error Message (if any): N/A

    Table Schema/Index Definition:

    CREATE TABLE highlight_issue AS SELECT 'Vallejo has produced film posters for numerous fantasy and action movies, including Knightriders (1981), Q (1982), and Barbarian Queen (1985). He has also illustrated posters for comedies, notably National Lampoons Vacation (1983), European Vacation (1985), Nothing But Trouble (1991) and Aqua Teen Hunger Force Colon Movie Film for Theaters (2007), co-created with Bell.[8]
    He created the 1978 Tarzan calendar.[citation needed] His sea serpent paintings hang in the queue of Loch Ness Monster, a rollercoaster at Busch Gardens Williamsburg.' AS t;
    
    CREATE INDEX idxhighlightissue ON highlight_issue USING zombodb ((highlight_issue.*)) WITH (url='http://<es url>/');
    

    Recreate:

    SELECT * FROM highlight_issue WHERE highlight_issue ==> 't: ( "film" w/7 ("movies" w/7 "barbarian"))';
    t
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Vallejo has produced film posters for numerous fantasy and action movies, including Knightriders (1981), Q (1982), and Barbarian Queen (1985). He has also illustrated posters for comedies, notably National Lampoons Vacation (1983), European Vacation (1985), Nothing But Trouble (1991) and Aqua Teen Hunger Force Colon Movie Film for Theaters (2007), co-created with Bell.[8]+
    He created the 1978 Tarzan calendar.[citation needed] His sea serpent paintings hang in the queue of Loch Ness Monster, a rollercoaster at Busch Gardens Williamsburg.
    (1 row)
    
    -- The term "film" is not within 7 of "barbarian" but should highlight as part of nested group
    WITH highlights AS MATERIALIZED (SELECT (
         zdb.highlight_document('highlight_issue'::regclass, json_build_object('t',t), 't: ( "film" w/7 ("movies" w/7 "barbarian"))'::TEXT)).* FROM highlight_issue)
    SELECT * FROM highlights;
    
    field_name | array_index |  term  |    type    | position | start_offset | end_offset |               query_clause
    ------------+-------------+--------+------------+----------+--------------+------------+-------------------------------------------
    t          |           0 | film   | <ALPHANUM> |        4 |           21 |         25 | t:("film" W/7 ("movies" W/7 "barbarian"))
    t          |           0 | movies | <ALPHANUM> |       11 |           66 |         72 | t:("film" W/7 ("movies" W/7 "barbarian"))
    (2 rows)
    
    -- Adjusted so that term "film" is within 25 of nested group and all three terms are highlighted
    WITH highlights AS MATERIALIZED (SELECT (
         zdb.highlight_document('highlight_issue'::regclass, json_build_object('t',t), 't: ( "film" w/25 ("movies" w/7 "barbarian"))'::TEXT)).* FROM highlight_issue)
    SELECT * FROM highlights;
    
    field_name | array_index |   term    |    type    | position | start_offset | end_offset |                query_clause
    ------------+-------------+-----------+------------+----------+--------------+------------+--------------------------------------------
    t          |           0 | film      | <ALPHANUM> |        4 |           21 |         25 | t:("film" W/25 ("movies" W/7 "barbarian"))
    t          |           0 | movies    | <ALPHANUM> |       11 |           66 |         72 | t:("film" W/25 ("movies" W/7 "barbarian"))
    t          |           0 | barbarian | <ALPHANUM> |       18 |          119 |        128 | t:("film" W/25 ("movies" W/7 "barbarian"))
    (3 rows)
    
    opened by Shigoki 0
  • support dense-vector

    support dense-vector

    It would be nice to support storage and query of postgresql vectors as dense-vectors in ES.

    Dense vector is very useful in many applications like image search, machine-learning, or recommendation systems.

    Currently, postgresql can be used to store float point vectors, but it can not perform a kNN query based on dot_product, l2_norm or cosine distances.

    opened by nick008a 0
  • Implement a method to estimate the number of rows in the index

    Implement a method to estimate the number of rows in the index

    ZomboDB version: 3000.0.12 Postgres version: 14.x Elasticsearch version: 8.3

    This issue has already been mentioned in a neighboring issue (by my mistake). Again:

    So, I need extremely fast mechanism (much faster than zdb.count(...) function) to estimate whether a query in ZDB will find more than 10'000 rows or not (don't be scared of this constant - it's inherent in the ES, I'll tell you about it next). Directly in ES this is quite simple: you need to send _search query with size equal to 0. Like this:

    curl -X GET "localhost:9200/16881.2200.527888473.530970621/_search?pretty" -H 'Content-Type: application/json' -d'
    {
        "size" : 0,
        "query":
        {
            "match":
            {
                "some_field":
                {
                    "query": "el paso 5",
                    "fuzzy_transpositions": false,
                    "auto_generate_synonyms_phrase_query": false,
                    "operator": "and"
                }
            }
        }
    }
    '
    

    Example response (insignificant data removed):

    {
      "hits" : {
        "total" : {
          "value" : 2653
        }
      }
    }
    

    In short, ES in this case returns either the exact count in hits.total.value or a constant of 10'000 if the number of rows found >= 10'000. Basically, it's like _count (aka zdb.count(...)), only the counter stops when it finds 10'000 rows. The 10'000 constant is actually the default value for the track_total_hits parameter (described here). Thus, the main difference from _count is that this method of estimating the number of rows works immediately on any number of rows in store.

    At the same time, if I try to make a query in ZDB with size equal to 0, we come across this behavior:

    Some(limit) if limit == 0 => {
        // with a limit of zero, we can avoid going to Elasticsearch at all
        // and just return a (mostly) None'd response
        return Ok(ElasticsearchSearchResponse {
            elasticsearch: None,
            limit: Some(0),
            offset: None,
            track_scores,
            should_sort_hits,
            scroll_id: None,
            shards: None,
            hits: None,
            fast_terms: None,
        });
    }
    

    Consequently, the problem:

    1. How can I query ES via ZDB with size 0 ?
    2. How do I get the value of hits.total.value from the request result?
    3. How to manage track_total_hits parameter (to change value 10'000 to any other)? I couldn't find anything about it in ZDB.

    By the way, this whole problem can be solved as follows: perhaps a function should be added to aggregate functions that returns the estimated number of rows in the request (according to the method I described) ?

    It seems to me that it would be logical and useful for many to have a special function for this. Especially for those who work with a very large collection of data. The definition of the function could be this:

    FUNCTION zdb.estimate_count(
    	index regclass,
    	track_total_hits integer DEFAULT 10000,
    	query zdbquery
          ) 
    RETURNS integer
    
    enhancement 
    opened by hyperion-cs 2
  • Multi table search in zombo

    Multi table search in zombo

    Zombo used to support multi table search https://github.com/zombodb/zombodb/pull/70

    How would you recommend going about doing something similar in the current version?

    Thanks.

    enhancement 
    opened by bhalonen 4
Releases(v3000.1.8)
Owner
ZomboDB
Making Postgres and Elasticsearch work together like it's 2020
ZomboDB
An Elasticsearch REST API client for Rust

elastic elastic is an efficient, modular API client for Elasticsearch written in Rust. The API is targeting the Elastic Stack 7.x. elastic provides st

null 249 Oct 18, 2022
📺 Netflix in Rust/ React-TS/ NextJS, Actix-Web, Async Apollo-GraphQl, Cassandra/ ScyllaDB, Async SQLx, Kafka, Redis, Tokio, Actix, Elasticsearch, Influxdb Iox, Tensorflow, AWS

Fullstack Movie Streaming Platform ?? Netflix in RUST/ NextJS, Actix-Web, Async Apollo-GraphQl, Cassandra/ ScyllaDB, Async SQLx, Spark, Kafka, Redis,

null 34 Apr 17, 2023
Open Data Access Layer that connect the whole world together

OpenDAL Open Data Access Layer that connect the whole world together. Status OpenDAL is in alpha stage and has been early adopted by databend. Welcome

Datafuse Labs 302 Jan 4, 2023
Distributed, version controlled, SQL database with cryptographically verifiable storage, queries and results. Think git for postgres.

SDB - SignatureDB Distributed, version controlled, SQL database with cryptographically verifiable storage, queries and results. Think git for postgres

Fremantle Industries 5 Apr 26, 2022
Rust library and daemon for easily starting postgres databases per-test without Docker

pgtemp pgtemp is a Rust library and cli tool that allows you to easily create temporary PostgreSQL servers for testing without using Docker. The pgtem

Harry Stern 165 Mar 22, 2024
Manage Redshift/Postgres privileges in GitOps style written in Rust

grant-rs An open-source project that aims to manage Postgres/Redshift database roles and privileges in GitOps style, written in Rust. Home | Documenta

Duyet Le 13 Nov 23, 2022
cogo rust coroutine database driver (Mysql,Postgres,Sqlite)

cdbc Coroutine Database driver Connectivity.based on cogo High concurrency,based on coroutine No Future<'q,Output=*>,No async fn, No .await , no Poll*

co-rs 10 Nov 13, 2022
rust-postgres support library for the r2d2 connection pool

r2d2-postgres Documentation rust-postgres support library for the r2d2 connection pool. Example use std::thread; use r2d2_postgres::{postgres::NoTls,

Steven Fackler 128 Dec 26, 2022
A Rust application that inserts Discogs data dumps into Postgres

Discogs-load A Rust application that inserts Discogs data dumps into Postgres. Discogs-load uses a simple state machine with the quick-xml Rust librar

Dylan 7 Dec 9, 2022
An easy-to-use, zero-downtime schema migration tool for Postgres

Reshape is an easy-to-use, zero-downtime schema migration tool for Postgres. It automatically handles complex migrations that would normally require downtime or manual multi-step changes.

Fabian Lindfors 1.4k Dec 25, 2022
postgres-ical - a PostgreSQL extension that adds features related to parsing RFC-5545 « iCalendar » data from within a PostgreSQL database

postgres-ical - a PostgreSQL extension that adds features related to parsing RFC-5545 « iCalendar » data from within a PostgreSQL database

Edgar Onghena 1 Feb 23, 2022
Command-line tool to make Rust source code entities from Postgres tables.

pg2rs Command-line tool to make Rust source code entities from Postgres tables. Generates: enums structs which can be then used like mod structs; use

Stanislav 10 May 20, 2022
The simplest implementation of LLM-backed vector search on Postgres.

pg_vectorize under development The simplest implementation of LLM-backed vector search on Postgres. -- initialize an existing table select vectorize.i

Tembo 5 Jul 25, 2023
Postgres Foreign Data Wrapper for Clerk.com API

Pre-requisites Postgres-15 Rust pgrx Getting Started To run the program locally, clone the repository git clone https://github.com/tembo-io/clerk_fdw.

Tembo 3 Aug 22, 2023
A Pub/Sub library for Rust backed by Postgres

Unisub Unisub is a Pub/Sub library for Rust, using Postgres as the backend. It offers a convenient way to publish and subscribe to messages across dif

Nick Rempel 12 Oct 6, 2023
Canary - Distributed systems library for making communications through the network easier, while keeping minimalism and flexibility.

Canary Canary is a distributed systems and communications framework, focusing on minimalism, ease of use and performance. Development of Canary utiliz

null 28 Nov 3, 2022
Tool to help you select inputs for making bitcoin transactions.

BDK Coin Selection bdk_coin_select is a tool to help you select inputs for making Bitcoin (ticker: BTC) transactions. It's got zero dependencies so yo

Bitcoin Dev Kit 5 Dec 5, 2023
The core innovation of EinsteinDB is its merge-append-commit protocol

The core innovation of EinsteinDB is its merge-append-commit protocol. The protocol is friendly and does not require more than two parties to agree on the order of events. It is also relativistic and can tolerate clock and network lag and drag.

EinsteinDB 1 Jan 23, 2022
CLI tool to work with Sled key-value databases.

sledtool CLI tool to work with Sled key-value databases. $ sledtool --help Usage: sledtool <dbpath> <command> [<args>] CLI tool to work with Sled da

Vitaly Shukela 27 Sep 26, 2022