Materialize simplifies application development with streaming data. Incrementally-updated materialized views - in PostgreSQL and in real time. Materialize is powered by Timely Dataflow.

Materialize, Inc.

Last update: Jan 8, 2023

Related tags

Database rust distributed-systems streaming sql database kafka postgresql-dialect

Overview

Materialize is a streaming database for real-time applications.

Get started

Check out our getting started guide.

About

Materialize lets you ask questions of your live data, which it answers and then maintains for you as your data continue to change. The moment you need a refreshed answer, you can get it in milliseconds. Materialize is designed to help you interactively explore your streaming data, perform data warehousing analytics against live relational data, or just increase the freshness and reduce the load of your dashboard and monitoring tasks.

Materialize focuses on providing correct and consistent answers with minimal latency. It does not ask you to accept either approximate answers or eventual consistency. Whenever Materialize answers a query, that answer is the correct result on some specific (and recent) version of your data. Materialize does all of this by recasting your SQL92 queries as dataflows, which can react efficiently to changes in your data as they happen. Materialize is powered by timely dataflow, which connects the times at which your inputs change with the times of answers reported back to you.

We support a large fraction of PostgreSQL, and are actively working on supporting more builtin PostgreSQL functions. Please file an issue if there's something that you expected to work that didn't!

Get data in

Materialize reads Avro, Protobuf, JSON, and newline-delimited text. Need something else? Just ask.

Materialize can read data from Kafka topics, Kinesis streams (in preview), or tail local files.

Transform, manipulate, and read your data

Once you've got the data in, define views and perform reads via the PostgreSQL protocol. Use your favorite PostgreSQL CLI, including the psql you probably already have on your system.

Materialize supports a comprehensive variety of SQL features, all using the PostgreSQL dialect and protocol:

Joins, Joins, Joins! Materialize supports multi-column join conditions, multi-way joins, self-joins, cross-joins, inner joins, outer joins, etc.
Delta-joins avoid intermediate state blowup compared to systems that can only plan nested binary joins - tested on joins of up to 64 relations.
Support for subqueries. Materialize's SQL optimizer performs subquery decorrelation out-of-the-box, avoiding the need to manually rewrite subqueries into joins.
Materialize supports streams that contain CDC data (currently supporting the Debezium format). Materialize can incrementally maintain views in the presence of arbitrary inserts, updates, and deletes. No asterisks.
All the aggregations. GROUP BY , MIN, MAX, COUNT, SUM, STDDEV, HAVING, etc.
ORDER BY
LIMIT
DISTINCT
JSON support in the PostgreSQL dialect including operators and functions like ->, ->>, @>, ?, jsonb_array_element, jsonb_each. Materialize automatically plans lateral joins for efficient jsonb_each support.
Nest views on views on views!
Multiple views that have overlapping subplans can share underlying indices for space and compute efficiency, so just declaratively define what you want, and we'll worry about how to efficiently maintain them.

Just show us what it can do!

Here's an example join query that works fine in Materialize, TPC-H query 15:

-- Views define commonly reused subqueries.
CREATE VIEW revenue (supplier_no, total_revenue) AS
    SELECT
        l_suppkey,
        SUM(l_extendedprice * (1 - l_discount))
    FROM
        lineitem
    WHERE
        l_shipdate >= DATE '1996-01-01'
        AND l_shipdate < DATE '1996-01-01' + INTERVAL '3' month
    GROUP BY
        l_suppkey;

-- Materialized views are maintained automatically, and can depend on non-materialized views.
CREATE MATERIALIZED VIEW tpch_q15 AS
  SELECT
    s_suppkey,
    s_name,
    s_address,
    s_phone,
    total_revenue
FROM
    supplier,
    revenue
WHERE
    s_suppkey = supplier_no
    AND total_revenue = (
        SELECT
            max(total_revenue)
        FROM
            revenue
    )
ORDER BY
    s_suppkey

Stream inserts, updates, and deletes on the underlying tables (lineitem and supplier), and Materialize keeps the materialized view incrementally updated. You can type SELECT * FROM tpch_q15 and expect to see the current results immediately!

Get data out

Pull based: Use any PostgreSQL-compatible driver in any language/environment to make SELECT queries against your views. Tell them they're talking to a PostgreSQL database, they don't ever need to know otherwise.

Push based: Or configure Materialize to stream results to a Kafka topic as soon as the views change.

If you want to use an ORM, chat with us. They're surprisingly tricky.

Documentation

Check out our documentation.

License

Materialize is source-available and licensed under the BSL 1.1, converting to the open-source Apache 2.0 license after 4 years. As stated in the BSL, Materialize is free forever on a single node.

Materialize is also available as a paid cloud service with additional features such as high availability via multi-active replication.

How does it work?

Materialize is built on top of differential dataflow and timely dataflow, and builds on a decade of cutting-edge stream processing research.

For developers

Materialize is written entirely in Rust.

Developers can find docs at doc/developer, and Rust API documentation is hosted at https://mtrlz.dev/api/rust/. The Materialize development roadmap is divided up into roughly month-long milestones, and managed in GitHub.

Contributions are welcome. Prospective code contributors might find the good first issue tag useful. We value all contributions equally, but bug reports are more equal.

Credits

Materialize is lovingly crafted by a team of developers and one bot. Join us.

Comments

Error if subqueries contain more than one result
This PR demonstrates (perhaps incorrectly) how subqueries can be checked for multiple results, and yield errors if more than one record is returned for any key. I haven't worked in this code for a while, so it is possible that the random numbers I've picked are no longer the right ones, and there are better ways to do things.

The gist is that before returning results in a SELECT subquery, we also count the number of occurrences of each of the leading attributes. Any counts greater than one result in an error. The error could be enriched to present both the keys and the count, with some more attention.

This current makes MZ unrunnable, with the error

thread 'main' panicked at 'internal error: failed to load bootstrap view: pg_type error: more than one record produced in subquery', src/coord/src/catalog.rs:633:29

This change is
opened by frankmcsherry 77
persist: hook up an initial implementation of compaction
See the comments in trace.rs for an overview of how this is set up.

Compaction tests can quickly become a maintenance burden as it's difficult to balance invariant verification against needing lots of diffs when small compaction heuristics change. Invest early in datadriven tests (the same style and underlying library as testdrive).

In production initially disabled by default, but turned on by the MZ_PERSIST_COMPACTION_ENABLED env var for easy testing. In test, enabled by default.

Motivation

This PR adds a known-desirable feature.

Tips for reviewer

I'd recommend taking a quick scan of the entire PR's diff to get a high level picture (maybe start with the comment in trace.rs) and then going back commit by commit.

Testing

[x] This PR has adequate test coverage / QA involvement has been duly considered.

Release notes

This PR includes the following user-facing behavior changes:

N/A
opened by danhhz 45
Investigating fast mode
This is just a list of ideas to try out.

Try to simplify repro of slow mode:

[x] is peek load required? change closed-loop peeks to peek every 30s

[x] is write load required? turn off input to mysql

[ ] does it matter which queries we use? reduce to single query

Measure:

[ ] time peeks arrive on the wire

[ ] time peeks are read from the wire by protocol.rs

[ ] time peeks arrive on worker

[ ] time peeks are dispatched by the worker

[ ] time response is received by protocol.rs

[ ] time response arrives on the wire
opened by jamii 43
Materialize is stuck after some period with `reuse_topic=true`
What version of Materialize are you using?

v0.9.3

How did you install Materialize?

Docker image

What was the issue?

We are using Materialize for calculation states of entities depending on counters. Materialize gets ready counters from Kafka, joins with settings and calculates final state (exceeded or not). States results go into few sinks. There are 19 materialized views, 5 sources and 6 sinks (with reuse_topic=true) in Materialize. Materialize configured with -w 8 --disable-telemetry --experimental command.

So after some time (3 hours for -w 8), Materialize is stuck. We see in the logs only:

ERROR materialized::mux: error handling connection in http server: connection closed before message completed

psql can't connect to Materialize. Prometheus metrics are not available after this moment too.

We have noticed strange coordinator behavior in the Prometheus metrics:

As you can see, coordinator's CPU consumption is growing always. Growth dynamics depend on workers quantity. After it gets 100%, Materialize is stuck.

We have tried:

Restarting doesn't help (same error in the logs)

-w 16 doesn't help

--introspection-frequency 300s doesn't help

--introspection-frequency off doesn't help

Clearing all Kafka topics completely is one way to get Materialize to work.

Is the issue reproducible? If so, please provide reproduction instructions.

It's reproducible for our configuration. Wa have Kafka cluster with permanent counters streaming values.

Please attach any applicable log files.

ERROR materialized::mux: error handling connection in http server: connection closed before message completed
C-bug P1
opened by krpn 36
Document our release process

This is a proposal based on the conversations that we've had, and placed in github so that in the future, when we change it, we will be able to refer back to the reasons that we made the various decisions.

opened by quodlibetor 36
Decorrelation

This isn't actually done yet, but there are no regressions and I would like to merge the bits that are changing a lot in master (sql/lib.rs) and finish working on the remainder (expr/correlated.rs) with less merge hell.

Also the extra transforms apparently might be useful elsewhere.

opened by jamii 35
persist,sources: allow compaction on persisted sources
The idea behind this is:

keep track of what compaction the coordinator allows in RenderState

received AllowCompaction commands update that compaction frontier

pass a Rc<RefCell<>>> to that frontier to an operator that also looks at its input frontier and allows compaction to the combination (aka "minimum") of those two

An alternative solution would be to invoke allow_compaction() upon receipt of the AllowCompaction message. This would, however, require to keep the involved persistence write handles in RenderState, and the number and type of those handles will likely be different for different combinations of physical source (think Kafka etc.) and envelopes.

With the chosen solution we only need to update the allowed frontier centrally and the rendering code that then allows compaction can be tailored to the rendered source.

Fixes #9508

Tips/Notes for reviewer

I didn't yet add tests, but I wanted to get the basic idea out for review. If you agree, I'd add unit tests.

Also, merging this PR by itself would be incorrect, because we currently don't consider the since when restarting sources. This would surface when merging this PR together with https://github.com/MaterializeInc/materialize/pull/9659, which adds a check that persistence can serve the required as_of/since. The solution for this is https://github.com/MaterializeInc/materialize/pull/9656, which calculates an as_of/since when (re-)starting.

Checklist

I would add unit tests, and trust the combination of this PR plus the aforementioned PRs that add as_of checks together with Philip's persistence tests to cover this change.

[ ] This PR has adequate test coverage / QA involvement has been duly considered.

[N/A] This PR adds a release note for any user-facing behavior changes.

This change is
opened by aljoscha 34
[Epic] Postgres source could materialize replicated tables directly
Feature request

At the moment, a postgres source gets created with a homogenous collection of records containing table identifiers and packed up row payloads. The CREATE VIEWS command can create views that unpack them, but they are then inefficient to read (re-reading, filtering, and decoding the data for each use). It would seem that common users would then materialize these views back to persist, where it also seems the postgres source could have written them in the first place.

I propose that a "postgres source" (idk if is still "a" source) should performed the equivalent of today's CREATE SOURCE, CREATE VIEWS, and the multiple CREATE MATERIALIZED VIEW commands. This is very likely what the user wants and expects, but also removes various footguns (not doing the above) and bits of magic (CREATE VIEWS).

Tasks

[x] https://github.com/MaterializeInc/materialize/pull/15159 (catalog and storage pipeline infrastructure)

[x] https://github.com/MaterializeInc/materialize/pull/15190 [in review]

[x] https://github.com/MaterializeInc/materialize/pull/15247

[ ] Fine-grained control over column types [pending]

[ ] Update documentation [pending]

cc: @petrosagg, @benesch
C-feature A-STORAGE Roadmap-P2
opened by frankmcsherry 32

BI Demo doesn't work out of the box

What version of Materialize are you using?

Github commit a63d599b

How did you install Materialize?

[X] Docker image

Followed the instructions at https://materialize.io/docs/demos/business-intelligence/

What was the issue?

schema-registry isn't brought up in time:

$ ./mzcompose --mz-quiet -f ./mzcompose.yml up -d connector-mysql
Creating chbench_zookeeper_1 ... done
Creating chbench_kafka_1     ... done
Creating chbench_schema-registry_1 ... done
Creating chbench_connect_1         ... done
Creating chbench_control-center_1  ... done
Creating chbench_connector-mysql_1 ... done
C> waiting for connect:8083 119 118 117 116 115 114 113 112 111 110 109 107 106 105 104 103 102 101 success!
C> waiting for schema-registry:8081 29 28 27 26 25 24 22 21 20 18 17 16 15 14 12 11 9 7 6 5 3 1 0C> Failed! Unable to connect to schema-registry:8081

Wait for a minute, then re-run the same command:

$ ./mzcompose --mz-quiet -f ./mzcompose.yml run --service-ports peeker --only-initialize -c /etc/peeker/materialized-sources.toml -q loadtest
[2020-09-15T18:19:54Z INFO  peeker] startup 2020-09-15 18:19:54.740193700 UTC
[2020-09-15T18:19:54Z INFO  peeker] Allowing chbench to warm up for 0 seconds at 2020-09-15 18:19:54.743453 UTC
[2020-09-15T18:19:54Z INFO  peeker] Done warming up at 2020-09-15 18:19:54.743525900 UTC
[2020-09-15T18:19:54Z INFO  peeker] serving prometheus metrics on port 16875
[2020-09-15T18:19:55Z WARN  peeker] error trying to create source customer: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.customer-value' from registry: subject not found
[2020-09-15T18:19:55Z WARN  peeker] error trying to create source item: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.item-value' from registry: subject not found
[2020-09-15T18:19:55Z WARN  peeker] error trying to create source nation: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.nation-value' from registry: subject not found
[2020-09-15T18:19:55Z WARN  peeker] error trying to create source neworder: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.neworder-value' from registry: subject not found
[2020-09-15T18:19:55Z WARN  peeker] error trying to create source order: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.order-value' from registry: subject not found
[2020-09-15T18:19:55Z WARN  peeker] error trying to create source orderline: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.orderline-value' from registry: subject not found
[2020-09-15T18:19:56Z WARN  peeker] error trying to create source stock: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.stock-value' from registry: subject not found
[2020-09-15T18:19:56Z WARN  peeker] error trying to create source supplier: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.supplier-value' from registry: subject not found
[2020-09-15T18:19:56Z WARN  peeker] error trying to create source district: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.district-value' from registry: subject not found
[2020-09-15T18:19:56Z WARN  peeker] error trying to create source nation: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.nation-value' from registry: subject not found
[2020-09-15T18:19:56Z WARN  peeker] error trying to create source region: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.region-value' from registry: subject not found
[2020-09-15T18:19:56Z WARN  peeker] error trying to create source supplier: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.supplier-value' from registry: subject not found
[2020-09-15T18:19:56Z WARN  peeker] error trying to create source warehouse: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.warehouse-value' from registry: subject not found
[2020-09-15T18:19:59Z WARN  peeker] error trying to create source customer: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.customer-value' from registry: subject not found
[2020-09-15T18:19:59Z WARN  peeker] error trying to create source item: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.item-value' from registry: subject not found
[2020-09-15T18:20:00Z WARN  peeker] error trying to create source nation: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.nation-value' from registry: subject not found
[2020-09-15T18:20:00Z WARN  peeker] error trying to create source neworder: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.neworder-value' from registry: subject not found
[2020-09-15T18:20:00Z WARN  peeker] error trying to create source order: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.order-value' from registry: subject not found
[2020-09-15T18:20:01Z WARN  peeker] error trying to create source orderline: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.orderline-value' from registry: subject not found
[2020-09-15T18:20:01Z WARN  peeker] error trying to create source stock: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.stock-value' from registry: subject not found
[2020-09-15T18:20:01Z WARN  peeker] error trying to create source supplier: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.supplier-value' from registry: subject not found
[2020-09-15T18:20:02Z WARN  peeker] error trying to create source district: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.district-value' from registry: subject not found
[2020-09-15T18:20:03Z WARN  peeker] error trying to create source nation: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.nation-value' from registry: subject not found
[2020-09-15T18:20:05Z WARN  peeker] error trying to create source region: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.region-value' from registry: subject not found
[2020-09-15T18:20:08Z WARN  peeker] error trying to create source supplier: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.supplier-value' from registry: subject not found
[2020-09-15T18:20:10Z WARN  peeker] error trying to create source warehouse: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.warehouse-value' from registry: subject not found
[2020-09-15T18:20:16Z WARN  peeker] error trying to create source customer: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.customer-value' from registry: subject not found
[2020-09-15T18:20:19Z WARN  peeker] error trying to create source item: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.item-value' from registry: subject not found
[2020-09-15T18:20:23Z WARN  peeker] error trying to create source nation: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.nation-value' from registry: subject not found
[2020-09-15T18:20:27Z WARN  peeker] error trying to create source neworder: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.neworder-value' from registry: subject not found
[2020-09-15T18:20:30Z WARN  peeker] error trying to create source order: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.order-value' from registry: subject not found
[2020-09-15T18:20:32Z WARN  peeker] error trying to create source orderline: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.orderline-value' from registry: subject not found
[2020-09-15T18:20:36Z WARN  peeker] error trying to create source stock: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.stock-value' from registry: subject not found
[2020-09-15T18:20:41Z WARN  peeker] error trying to create source supplier: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.supplier-value' from registry: subject not found
[2020-09-15T18:20:44Z WARN  peeker] error trying to create source district: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.district-value' from registry: subject not found
[2020-09-15T18:21:04Z WARN  peeker] error trying to create source nation: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.nation-value' from registry: subject not found
[2020-09-15T18:21:12Z WARN  peeker] error trying to create source region: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.region-value' from registry: subject not found
[2020-09-15T18:21:21Z WARN  peeker] error trying to create source supplier: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.supplier-value' from registry: subject not found
[2020-09-15T18:21:37Z WARN  peeker] error trying to create source warehouse: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.warehouse-value' from registry: subject not found
[2020-09-15T18:21:56Z WARN  peeker] error trying to create source customer: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.customer-value' from registry: subject not found
[2020-09-15T18:22:08Z WARN  peeker] error trying to create source item: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.item-value' from registry: subject not found
[2020-09-15T18:22:27Z WARN  peeker] error trying to create source nation: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.nation-value' from registry: subject not found
[2020-09-15T18:23:16Z WARN  peeker] error trying to create source neworder: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.neworder-value' from registry: subject not found
[2020-09-15T18:24:03Z WARN  peeker] error trying to create source order: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.order-value' from registry: subject not found
[2020-09-15T18:24:30Z WARN  peeker] error trying to create source orderline: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.orderline-value' from registry: subject not found
[2020-09-15T18:25:07Z WARN  peeker] error trying to create source stock: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.stock-value' from registry: subject not found
[2020-09-15T18:25:20Z WARN  peeker] error trying to create source supplier: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.supplier-value' from registry: subject not found
[2020-09-15T18:25:33Z WARN  peeker] error trying to create source district: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.district-value' from registry: subject not found
[2020-09-15T18:25:44Z WARN  peeker] error trying to create source nation: db error: ERROR: fetching latest schema for subject 'debezium.tpcch.nation-value' from registry: subject not found

Note that I also see no sources:

[11:15][awang@MacBook-Pro-2:chbench (main ✓)]$ ./mzcompose run cli
==> Collecting mzbuild dependencies
...
mz> show sources;                                                                               
+-----------+
| SOURCES   |
|-----------|
+-----------+

Is the issue reproducible? If so, please provide reproduction instructions.

Please attach any applicable log files.

$ docker image ls | grep connector-mysql
chbench_connector-mysql                     latest                                     40fc4f653ac7        3 hours ago         658MB

$ docker-compose --version && docker --version
docker-compose version 1.26.2, build eefe0d31
Docker version 19.03.12, build 48a66213fe

This is on a 2019 Macbook Pro, 1.4 GHz Quad-Core Intel Core i5, 16 GB RAM.

C-bug

opened by awang 31

Join ordering based on input filters
This PR strives to order join inputs in Differential joins and inside each Delta path in a way that inputs that have more selective filters come first. This way, intermediate result sizes can get reduced sooner, so later join steps do less work. This means we have better chances that a filter is applied before some bad stuff (cross joins, new arrangements).

The PR also tweaks Differential ordering to make a smarter choice if the max-minning finds multiple equally good candidates.

(Note that this does no reordering between delta paths.)

As expected from @vmarcos' investigations, this PR has a huge impact in some cases: Speeds up TPC-H Q07 by 10.5x, and Q21 by 8x. This results in a 19% speedup over the sum of all TPC-H queries. (Note that for the speedup in Q07, https://github.com/MaterializeInc/materialize/pull/14725 is also needed.)

(I'm not considering predicates that span over multiple inputs, e.g. a+b=5 where a and b come from different inputs. Handling these could be a possible follow-up. For now, it seemed to me that handling these would be much more complicated and they occur less frequently, so the cost-benefit ratio of working on them is not so good.)

Motivation

This PR adds a known-desirable feature: https://github.com/MaterializeInc/materialize/issues/14512

Tips for reviewer

First of all, don't be alarmed by the large diff; 90+% of it are just json plan changes (edit: I think most of this will go away while addressing comments). Also, many source files have some minor changes just to treat the new field on MirRelationExpr::Join.

The main change is in join_implementation.rs:

I added the characteristics of the input filters as a new field on Characteristics. This field is used by Orderer through the Ord of Characteristics.

I also tweaked the code around the max-minning in the Differential ordering, see code comments there.

I needed to add a new field in MirRelationExpr::Join for recording the input filter characteristics. The problem that this new field is solving is that JoinImplementation lifts away some of the input filters on its first run, so subsequent runs can't see the filters on the inputs anymore. @frankmcsherry, Could you please check that this is ok as a temporary solution? What we could do in the long run, is to lift all filters up all the way, and make them available everywhere as attributes.

I didn't yet add explain printing for the new field:

The old explain code will soon be deleted.

I wasn't sure how to add it to the new explain, as the join format there is still in flux.

Note: I'm not looking at the equivalences of the MirRelationExpr::Join, because at this point in the pipeline, equivalences don't contain predicates that can be evaluated in the context of a single input, because PredicatePushdown already pushed these down as filters.

Checklist

[x] This PR has adequate test coverage / QA involvement has been duly considered.

I ran TPC-H, see above.

We have many join plans in slt tests. I inspected most of the changed plans manually, and didn't find any obviously harmful changes, but found many beneficial changes.

[ ] This PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way) and therefore is tagged with a T-proto label.

[x] This PR includes the following user-facing behavior changes:

Improve join orders for joins that have filters applied to their inputs, leading to an order of magnitude performance improvement in some cases with highly selective filters.

A-optimization A-COMPUTE
opened by ggevay 30
Control plane consistency
Metastore currently stores its data in ZooKeeper, which is... not great. The psuedo-filesystem that ZooKeeper chooses to expose makes life difficult when you want a strictly ordered stream of events across the whole filesystem, not just across one file or one directory. We're suffering at the moment because PEEK is not properly sequenced with CREATE DATAFLOW*, and therefore differential often sees a PEEK foo command come through before it sees the corresponding CREATE DATAFLOW foo ... command. Not to mention that we globally sequence things twice! ZooKeeper does one round of consensus, and then we do another round of exchanging with differential's sequencer.

The question, then, is what would be a suitable replacement for ZooKeeper? @frankmcsherry and @jamii have both proposed using a single-partition Kafka topic. For reasons I'll describe below, I'm not sold on that particular solution. Let me start by listing out what I think we need out of our metadata storage layer.

Consistency. We need to make sure that a dataflow D cannot be deleted while there exists another dataflow that depends on D. This effectively means that the metadata layer needs to support transactions, or that there is some means of electing a leader who is the sole arbiter of metadata updates.

Sequencing. The metadata storage should be able to produce an ordered log of all operations, including read commands, like PEEK, and write commands, like CREATE DATAFLOW.

Durability. If the server goes down, rebooting materialized should, at the very least, reinstall all dataflows that were previously installed. This is a prerequisite for durably storing computation state. Ideally, the metadata layer would be selectively durable. It doesn't need to store old versions of dataflows, for example, nor does it need to remember read-only commands.

A single-partition Kafka topic gives us sequencing and durability. @jamii mentioned that clever use of keys might even allow Kafka to garbage-collect old versions of dataflows and all but the last PEEK command, allowing Kafka to meet the stricter goal of selective durability.

The problem with a single-partition Kafka topic, AFAICT, is maintaining consistency. You can either validate consistency on the way in, or the way out. I think performing validation on the way in is totally doomed, because Kafka has no write-only-if-nothing-changed primitive. Put another way, let's say that two users simultaneously execute CREATE DATAFLOW bar SELECT * FROM foo; and DROP DATAFLOW bar;. Only one of these commands can succeed. Either the CREATE wins and the DROP fails, because bar is now depended upon by foo, or the DROP wins and the CREATE fails because bar no longer exists. But if you have two clients performing validation simultaneously, they won't know about the other command yet, and so they'll both decide that the command is valid and push the computed action into Kafka, where the computed action is either "definitely build this dataflow" or "definitely drop this dataflow", corresponding to today's dataflow::Command struct. And now whatever's reading from Kafka on the other end will explode.

So what about performing validation on the way out? In this world I think you push raw SQL into the Kafka topic. Then whatever reads from Kafka parses and validates the SQL, potentially rejecting the command with an error. This seems fundamentally viable, except that the durability is in the wrong place. When materialized crashes and you turn it back on, you'll have all these busted queries sitting in the Kafka topic that you have to run through. It's also not clear how you would use Kafka's automatic garbage collection in this model; you wouldn't want CREATE DATAFLOW foo AS <valid>; CREATE DATAFLOW foo AS <invalid> to only remember the invalid command (because it's later)!

There's also the problem that Kafka's strength is not minimizing latency, it's maximizing throughput, so pushing commands through Kafka in the first place might add unacceptable latency to the interactive experience.

FWIW, I fully accept that I may be suffering from insufficient imagination here, and I would love for someone to point out flaws in my reasoning! :)

So what does satisfy all three properties? As far as I can tell, both etcd3 and CockroachDB do, and not much else. CockroachDB isn't a great fit, given how resource hungry it is. etcd3 is actually intended for exactly our desired use case, but there aren't Rust bindings for its v3 API (and the v2 API doesn't actually meet our needs) and it's yet another service that would need to be deployed alongside materialized.

I'm pretty unsure of what the upshot is. One nuclear option is that we could punt on this problem until later, when we tackle active–active replication. That would allow us to rip out the ZooKeeper goop, and get a super-fast in-memory metadata store for tests. If we needed durability in the short term, we could just have one of the workers be responsible for saving metadata updates to disk.

Other, less nuclear options involve:

Continuing to hack our way around ZooKeeper. I think this is workable, but painful.

Switching to etcd3, with the intention of one day bundling it with materialized itself.

*I am apparently using CREATE DATAFLOW to mean either CREATE MATERIALIZED VIEW or CREATE DATA SOURCE.
C-musing
opened by benesch 30
Extract count from each `Accum`
Each reduce::Accum would track its own total: Diff counting the number of contributing records. This total is the same across all accumulations, and shouldn't need to be maintained independently if there are multiple accumulations.

Motivation

Tips for reviewer

Checklist

[ ] This PR has adequate test coverage / QA involvement has been duly considered.

[ ] This PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way) and therefore is tagged with a T-proto label.

[ ] This PR includes the following user-facing behavior changes:
opened by frankmcsherry 0
Panic using scope address
This PR demonstrates how to panic with the containing scope address, rather than the first digit of this address, at the same time avoiding arguments to the various rendering methods. The scope is available from all of these methods, and we just read its address out (providing potentially more information as the scope address can provide more detail).

I think this simplifies some logic, but I think actually moves us no closer to panic messages that are easy to interpret, and we should instead either 1. rely on better messages, or 2. not panic so much, or 3. not worry about identifying the scope when we panic (e.g. when we have invalid plans; probably more important to point out the invalid part of the plan than the scope in which its rendering was attempted).

Motivation

Tips for reviewer

Checklist

[ ] This PR has adequate test coverage / QA involvement has been duly considered.

[ ] This PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way) and therefore is tagged with a T-proto label.

[ ] This PR includes the following user-facing behavior changes:
opened by frankmcsherry 0
environmentd: remove unused TLS modes
We've committed to the cloud and will not be using anything besides that and local dev.

Motivation

This PR adds a feature that has not yet been specified.

Checklist

[ ] This PR has adequate test coverage / QA involvement has been duly considered.

[ ] This PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way) and therefore is tagged with a T-proto label.

[ ] This PR includes the following user-facing behavior changes:

n/a
opened by mjibson 0
[Epic] Re-introduce mzclt
Product Outcome

Improve activation rate

Improve power user experience

Spec

Design

Documentation

Must Haves

Nice to Haves

Backlog of features to be prioritized:

[ ] https://github.com/MaterializeInc/materialize/issues/14921

[ ] https://github.com/MaterializeInc/materialize/issues/16989

[ ] https://github.com/MaterializeInc/materialize/issues/16878

[ ] https://github.com/MaterializeInc/materialize/issues/16988

[ ] https://github.com/MaterializeInc/materialize/issues/16987

[ ] https://github.com/MaterializeInc/materialize/issues/16986

[ ] https://github.com/MaterializeInc/materialize/issues/16985

[ ] https://github.com/MaterializeInc/materialize/issues/16984

[ ] https://github.com/MaterializeInc/materialize/issues/16980

[ ] https://github.com/MaterializeInc/materialize/issues/16983

[ ] https://github.com/MaterializeInc/materialize/issues/16981

[ ] https://github.com/MaterializeInc/materialize/issues/16982

[ ] https://github.com/MaterializeInc/materialize/issues/16979

[ ] https://github.com/MaterializeInc/materialize/issues/16978

[ ] https://github.com/MaterializeInc/materialize/issues/15373

[ ] https://github.com/MaterializeInc/materialize/issues/15372

[ ] https://github.com/MaterializeInc/materialize/issues/15371

[ ] https://github.com/MaterializeInc/materialize/issues/14922

Blockers
opened by ggnall 0