Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter: support is null and not null expr #59

Merged
merged 7 commits into from Sep 6, 2020

Conversation

zier-one
Copy link
Contributor

@zier-one zier-one commented Sep 5, 2020

implement #50

@zier-one
Copy link
Contributor Author

zier-one commented Sep 5, 2020

@panarch PTAL

Copy link
Member

@panarch panarch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks very good starting, ok.
Now I also got what kind of comments I should mention, your work helps a lot, thanks.

Before comment, here is what Evaluated enum does.
I write explanation in here below, and also I'll add another issue to add comment to Evaluated enum.

There exists 5 types.
LiteralRef, Literal, StringRef, ValueRef and Value.
This many different types are for avoiding unnecessary clone() calls.

Literal vs Value

Literal is value from parser(sqlparser-rs) and Value is from this repo, executor. so...

  • LiteralRef
    100 in WHERE 100 IS NULL.
    Number 100 is required but it doesn't need to take ownership from parsed sql (AST).

  • Literal
    8 + 3 in WHERE 8 + 3 IS NULL.
    8 and 3 are LiteralRef but executor should take ownership of calculated value 8 + 3.
    So it is Literal.

  • ValueRef
    name in WHERE name IS NULL.
    name is fetched from storage so it is Value but no need of taking ownership.

  • Value
    name + 1 in WHERE name + 1 IS NULL.
    name is ValueRef and 1 is LiteralRef but it requires + operation.
    Then the result is Value! In this case, we know that the exact type of name + 1 so there's no need of keeping Literal type.

  • StringRef
    "Something" in WHERE "Something" IS NULL.
    It is similar with LiteralRef but sqlparser-rs handles "Something" string value in different way so Evaluated also takes this kind of seperated type.

So...
I'd like to request 3 changes.

  1. tests/nullable.rs test cases should reach all 10 possible match arms. (5 in IS NULL and 5 in IS NOT NULL)
    It looks good to add another function in tests/nullable.rs.
    And it is also not for testing blend so it would be enough by counting number of fetched rows.
    You can see that counting test case pattern in tests/nested_select.rs.

Expr::IsNull(expr) => Ok(match evaluate(expr)? {
    Evaluated::ValueRef(v) => !v.is_some(),
    Evaluated::Value(v) => !v.is_some(),
    _ => false,
}),

should be something like this below, it is required to handle all arms

Expr::IsNull(expr) => Ok(match evaluate(expr)? {
    Evaluated::ValueRef(v) => !v.is_some(),
    Evaluated::Value(v) => !v.is_some(),
    Evaluated::LiteralRef(v) ...
    Evaluated::Literal(v) ...
    Evaluated::StringRef(_) => false,
}),

And If it is possible, it's better not to expose inner types of enum Evaluated to outside.
filter.rs is outside of Evaluated so, let's move match codes into evaluated.rs.
You can add some method like which Value has is_some(), Evaluated also can have that sort of method.

@zier-one
Copy link
Contributor Author

zier-one commented Sep 6, 2020

Thanks for the thorough review. PTAL @panarch

@zier-one
Copy link
Contributor Author

zier-one commented Sep 6, 2020

I found an incompatible behavior with MySQL. In MySQL, the behavior regarding null is as follows:

Schema (MySQL v5.7)

CREATE TABLE Test (
    id INTEGER NULL,
    num INTEGER,
    name TEXT
)

Load Data

INSERT INTO Test (id, num, name) VALUES (NULL, 2, "Hello");
INSERT INTO Test (id, num, name) VALUES (1, 9, "World");
INSERT INTO Test (id, num, name) VALUES (3, 4, "Great");

Query 1

SELECT id, num FROM Test WHERE id + 1 IS NULL;
id num
null 2

Query 2

SELECT id + 1 FROM Test;
id + 1
null
2
4

Query 3

SELECT id, num FROM Test WHERE id = NULL;

No results.


Query 4

SELECT id, num FROM Test WHERE id IS NULL;
id num
null 2

But in GlueSQL, NULL + 1 = 1 and NULL = NULL is true. Do we need to change the behavior on null to be consistent with MySQL?

@panarch
Copy link
Member

panarch commented Sep 6, 2020

Wow, NULL has lots of stuff to think about, thanks for taking this carefully.
Existing code looks very good, and if you are ok then can I request small changes for nullable test codes?
Now tests/nullable.rs has lots of test cases so it would be good to simplify little bit, like this

And your second comment about MySQL compatible issue looks also very interesting.
My opinion is this.

  1. NULL IS NULL, NULL = NULL is ok.
    I think it's better to deal NULL as a type, not something systemic value.
    Nowadays SQL is not only for programmers but also marketers and many others.
    So, forcing them to understand something systemic issue is not affordable.
    Without that, GlueSQL itself already handles NULL as a type, not something special else.
    So naturally it's ok to see NULL IS NULL as true.

  2. NULL + 1 should be NULL.
    Good point, I think it should be fixed and current implementation I made is wrong.
    Existing implementation was simply because I didn't care about this case, I made that work during I'm working on aggregate task.
    This one is related to gluesql::data::Value and sqlparser::ast::Value as AstValue.
    I will simply divide using value for gluesql::data::Value and literal for sqlparser::ast::Value as AstValue.
    There exists 3 possible scenarios.

  • literal + literal
  • value + value
  • value + literal

But the last one, we don't need to care value + literal because it starts by converting literal to value. so it becomes value + value case.

Two things required to be changed.

  • literal + literal
    Those functions are defined in src/executor/evaluate/evaluated.rs
    literal_add, literal_subtract, literal_multiply and literal_divide.
    Existings only take AstValue::Number(_) type but it should also handle AstValue::Null.

  • value + value
    It's defined in src/data/value.rs.
    In impl Value, there exists 4 functions add, subtract, multiply and divide.
    These have wrong implementations, so you can simply fix it to show NULL + 1 to NULL.
    But the problem is, after you fix that, integration tests will fail. It is because of SUM aggregation.
    In src/executor/aggregate.rs,

"SUM" => {
    let value_to_sum = get_first_value(args)?;

    match aggregated.get(func) {
        Some((current, value)) => {
            if &index <= current {
                return Ok(aggregated);
            }

            Ok(aggregated.update(func, (index, value.add(value_to_sum)?)))
        }
        None => Ok(aggregated.update(func, (index, value_to_sum.clone()))),
    }
}

This uses value.add(value_to_sum)?, but after changing NULL + 1 behavior, it should be also fixed.
You may be able to adapt similar approach from COUNT implementation.


So my opinion is..

  • NULL = NULL is true => OK.
  • NULL + 1 should be NULL => Change required.

How do you think of this?

);
( $( $t:path )* ;) => (
select!()
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, thanks! this looks good, now select! can take empty result. 👍

@zier-one
Copy link
Contributor Author

zier-one commented Sep 6, 2020

I agree with this:

  • NULL = NULL is true => OK.
  • NULL + 1 should be NULL => Change required.

On the mysql compatibility issue, I think I can submit another PR to fix it.
In this PR, I will continue to refine the test cases, but will not modify the null related behavior

@panarch
Copy link
Member

panarch commented Sep 6, 2020

That sounds good, I also agree that it's good to divide NULL compatibility issue using another PR.
After you refine the test cases, then I'll merge it.

@zier-one
Copy link
Contributor Author

zier-one commented Sep 6, 2020

@panarch PTAL

@panarch
Copy link
Member

panarch commented Sep 6, 2020

Test cases looks perfect!
And sorry for the late comment but here is something that I'd like to get your opinion.

pub fn is_some(&self) -> bool {
    match self {
        Evaluated::ValueRef(v) => v.is_some(),
        Evaluated::Value(v) => v.is_some(),
        Evaluated::Literal(v) => *v != AstValue::Null,
        Evaluated::LiteralRef(v) => **v != AstValue::Null,
        Evaluated::StringRef(_v) => true,
    }
}

here, *v != AstValue::Null and **v != AstValue::Null uses dereferencing, and I usually prefer using & than *.
The reason is...
In this code we cannot check whether v in Evaluated::Literal(v) implements Copy trait or not.
So for making sure that *v doesn't call unnecessary clone(), we should check the code of AstValue.

/// Primitive SQL values such as number and string
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
pub enum Value {
    /// Numeric literal
    #[cfg(not(feature = "bigdecimal"))]
    Number(String),
   ...
...

In this case, it's ok. that sqlparser Value does not implement Copy trait.

But when we use v != &AstValue::Null, we don't need to worry about unnecessary clone() may happen or not, because it will not happen.
So, I think using referencing is better choice, how do you think?

@zier-one
Copy link
Contributor Author

zier-one commented Sep 6, 2020

@panarch I agree with you, and I've made the relevant changes.

@panarch
Copy link
Member

panarch commented Sep 6, 2020

Thanks! merging 👍

@panarch panarch merged commit 3fa4aef into gluesql:main Sep 6, 2020
@zier-one zier-one deleted the support_is_null_expr branch September 6, 2020 09:25
@panarch panarch mentioned this pull request Sep 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants