Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements Feature 630 #634

Merged
merged 7 commits into from Jun 3, 2021
Merged

Conversation

dantengsky
Copy link
Member

@dantengsky dantengsky commented May 26, 2021

Summary

It's a baby step of integrating Store with Query, which implements

  • update metadata after appending data parts to the table
  • remote table read_plan
  • remote table read

Basic statements like insert into ... and select .. from ... could be executed now. (and lots of interesting things are left to do)

Changelog

  • Store: implementions for ITable read_plan and read a5c42b2

  • Query: implements RemoteTalbe's read_plan & read b55eacf

  • Adds remote flag to ReadDataSourcePlan deaea8e

  • Tweaks stateless test cases ed69c92

The following issues might be worthy of your concern:

  • Remove trait bound Sync from SendableDataBlockStream ed69c92

    Turns out, at least for now, we do not need this trait bound, and without Sync constraint, SendableDataBlockStream is more stream-combinator friendly.

  • Keep ITable::read_plan as a non-async method

    By using runtime of ctx (and channel).
    IMHO, change ITable::read_plan to async fn may be too harsh at this stage.

  • Add an extra flag to ReadDataSourcePlan and SourceTransform

    So that we could be aware of operating a remote table(and fetch remote table accordingly). It is a temp workaround, let's postpone it until the Catalog API is ready.
    SourceTransform::execute and FuseQueryContext are tweaked accordingly. pls see deaea8e

Related Issues

resolves #630

Test Plan

  • UT & Stateless Testes

Progress

  • Update meta
  • Flight Service
  • Store Client
  • Remote table - read_plan
  • Remote table - read (read partition)
  • Unit tests & integration tests
  • Multi-Node integration tests
  • Code GC
  • Squash commits

@databend-bot
Copy link
Member

Thanks for the contribution!
I have applied any labels matching special text in your title and description.

Please review the labels and make any necessary changes.

@dantengsky
Copy link
Member Author

Thanks for the contribution!
I have applied any labels matching special text in your title and description.

Please review the labels and make any necessary changes.

Hey miss bot, where did the rusted robot go?

@BohuTANG
Copy link
Member

Thanks for the contribution!
I have applied any labels matching special text in your title and description.
Please review the labels and make any necessary changes.

Hey miss bot, where did the rusted robot go?

https://github.com/rust-highfive

@databend-bot
Copy link
Member

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

@codecov-commenter
Copy link

codecov-commenter commented May 31, 2021

Codecov Report

Merging #634 (4386ac9) into master (e4e4331) will decrease coverage by 0%.
The diff coverage is 45%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #634    +/-   ##
=======================================
- Coverage      78%     78%    -1%     
=======================================
  Files         310     311     +1     
  Lines       16896   17181   +285     
=======================================
+ Hits        13330   13451   +121     
- Misses       3566    3730   +164     
Impacted Files Coverage Δ
common/flights/src/store_do_get.rs 0% <0%> (ø)
common/planners/src/test.rs 84% <ø> (ø)
common/streams/src/stream.rs 100% <ø> (ø)
fusequery/query/src/datasources/datasource.rs 66% <0%> (-11%) ⬇️
fusequery/query/src/datasources/local/csv_table.rs 75% <ø> (ø)
...usequery/query/src/datasources/local/null_table.rs 73% <ø> (ø)
...query/query/src/datasources/local/parquet_table.rs 68% <ø> (ø)
...query/query/src/datasources/remote/remote_table.rs 0% <0%> (ø)
...ery/src/datasources/remote/remote_table_do_read.rs 0% <0%> (ø)
...ery/query/src/datasources/system/clusters_table.rs 79% <ø> (ø)
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e4e4331...4386ac9. Read the comment docs.

rust-toolchain Outdated
@@ -1 +1 @@
1.52.1
nightly-2021-05-06
Copy link
Member

@zhang2014 zhang2014 Jun 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have some compile error that are not resolved in nightly-2021-05-06.

thread 'rustc' panicked at 'assertion failed: `(left == right)`
  left: `Some(Fingerprint(8537439170242672706, 4648092694241280842))`,
 right: `Some(Fingerprint(212923312148573672, 18320173992410164289))`: found unstable fingerprints for evaluate_obligation(6b017a932d0761bd-4ff2d1ea8919cfd7): Ok(EvaluatedToOk)', /rustc/bacf770f2983a52f31e3537db5f0fe1ef2eaa874/compiler/rustc_query_system/src/query/plumbing.rs:585:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

error: internal compiler error: unexpected panic

note: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/issues/new?labels=C-bug%2C+I-ICE%2C+T-compiler&template=ice.md

note: rustc 1.54.0-nightly (bacf770f2 2021-05-05) running on x86_64-apple-darwin

note: compiler flags: -C embed-bitcode=no -C split-debuginfo=unpacked -C debuginfo=2 -C incremental

note: some of the compiler flags provided by cargo are hidden

query stack during panic:
#0 [evaluate_obligation] evaluating trait selection obligation `common_datavalues::DataValue: std::marker::Send`
#1 [typeck] type-checking `tests::service::start_one_service`
end of query stack

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I have the same problem in my dev env, especially when using the cargo watch xxx tool, the error you mentioned keeps popping out.

Since that, I switch to stable toolchain 1.52.1 during daily dev.

But I am not sure if that's appropriate (in case somebody or tools is using nightly features). So I switch it back in the last commit.

Any suggestions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BohuTANG I've seen a gist that you created that keeps a similar error message, any idea about it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the later, grcov nightly Rust is required:
https://github.com/mozilla/grcov

But we can try to rollback to a stable nightly version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, nightly is ok.

Copy link
Member

@BohuTANG BohuTANG Jun 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, the master rust toolchain has changed to nightly-2021-06-01, seems more stable.

@dantengsky dantengsky force-pushed the feature-630 branch 2 times, most recently from d04cf38 to a4a96b4 Compare June 2, 2021 14:49
@drmingdrmer
Copy link
Member

@dantengsky ready for review? I received a review request but it is still a draft

Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not finished reviewing all of it yet.
Nice job! 👍 👍 👍

Comment on lines +171 to +175
pub async fn scan_partition(
&mut self,
db_name: String,
tbl_name: String,
scan_plan: &ScanPlan,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about scan or scan_table?

The subject is a table, not a partition.

BTW on the server-side, the corresponding function name has an extra s: scan_partitions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change method name "scan_partition" to "scan", and "read_partition" to "scan_table"?

Oh, wait, u reviewed last night? thanks for that bro. I've force pushed something for the "final touch", including changing the name of method "read" to "read_partition". any suggestion about the now names?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about scan_table_partitions and read_table_partition? since the former filters table partitions, the latter returns datablocks of a partition.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about scan_table_partitions and read_table_partition? since the former filters table partitions, the latter returns datablocks of a partition.

Nice to me.

@BohuTANG @sundy-li Is there any difference between a datablock and a partition in our code base?
If no, using a universal term would be better for stargazers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, wait, u reviewed last night? thanks for that bro. I've force pushed something for the "final touch"

Not that late for me :DDD

common/flights/src/store_client.rs Show resolved Hide resolved
pub push_down: PlanNode,
}

#[derive(serde::Serialize, serde::Deserialize, Clone, Debug)]
pub struct ScanPartitionsAction {
pub can_plan: ScanPlan,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub can_plan: ScanPlan,
pub scan_plan: ScanPlan,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

er... awkward typo

if let Err(_e) = tx
.send(flight_data_from_arrow_batch(&batch, &ipc_write_opt).1)
.await
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use log::error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, log::error is better

@dantengsky dantengsky marked this pull request as ready for review June 3, 2021 01:13
@dantengsky
Copy link
Member Author

@dantengsky ready for review? I received a review request but it is still a draft

It was too late last night, so I invited you but keep this PR in the draft state. But.... seems I'd rather invite you later ;-)

@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

1 similar comment
@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

@drmingdrmer drmingdrmer self-requested a review June 3, 2021 04:55
@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

5 similar comments
@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid exhausting the merge bot.

@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

18 similar comments
@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

@BohuTANG BohuTANG merged commit 62c6f85 into datafuselabs:master Jun 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Updates table's metadata after appending data to Store
7 participants