Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[decoupled-execution] process_sync_req now tries to advance a buffer item to aggregated #9207

Merged
merged 1 commit into from Sep 18, 2021

Conversation

yuxiamit
Copy link
Contributor

process_sync_request now tries to advance a buffer item to aggregated

Pop a prefix of buffer items if found an aggregated item.

Motivation

(Write your motivation for proposed changes here.)

Have you read the Contributing Guidelines on pull requests?

(Write your answer here.)

Test Plan

(Share your test plan here. If you changed code, please provide us with clear instructions for verifying that your changes work.)

Related PRs

(If this PR adds or changes functionality, please take some time to update the docs at https://github.com/diem/diem/tree/main/developers.diem.com, and link to your PR here.)

If targeting a release branch, please fill the below out as well

  • Justification and breaking nature (who does it affect? validators, full nodes, tooling, operators, AOS, etc.)
  • Comprehensive test results that demonstrate the fix working and not breaking existing workflows.
  • Why we must have it for V1 launch.
  • What workarounds and alternative we have if we do not push the PR.

@yuxiamit yuxiamit requested a review from a team as a code owner September 17, 2021 01:23
@bors-libra bors-libra added this to In Review in bors Sep 17, 2021
if signed_item
.commit_proof
.commit_info()
.match_ordered_only(ledger_info.commit_info())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be full match?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point.

let executed_item = *executed_item_box;
if executed_item
.commit_info
.match_ordered_only(ledger_info.commit_info())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

.map(|eb| Arc::new(eb.clone()))
.collect::<Vec<Arc<ExecutedBlock>>>(),
);
if link_eq(&cursor, &aggregated_item_cursor) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer we break out the loop first then send the request, seems more readable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this function I want to easily distinguish successfully finding the item vs reaching the end of the list, so I let it send the request here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, it's not possible to reach the end of the list without hitting the cursor right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now it is not possible. Just in case of a bug in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems more natural to have something like

while curson.is_some() {
  if find {
    break 
  }
}
assert!(!blocks_to_persist.is_empty());
send_request

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocks_to_persist collects the blocks before (including) the aggregated item. If it does not find the item, blocks_to_persist contains all the blocks. I added a flag found and moved the code outside the loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, can't we just assert cursor.is_some() after the loop instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to move out buffer_item anyway, so I just put it into an option and check if buffer_item has something. The code is already updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still feel we can simplify this code, but I guess I can try it myself after it gets merged

}
unreachable!("Bad Aggregated Item Cursor: Item not found in the buffer.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this is unreachable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function assumes the aggregated item exists in the buffer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it only break after sending the request instead of return

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops sorry I meant to return after sending the request

let (attempted_item, aggregated, matching) =
buffer_item.try_advance_to_aggregated_with_ledger_info(ledger_info.clone());
set_elem(&cursor, attempted_item);
if matching {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if aggregated is enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are three cases:

  1. not matched, not aggregated, we continue to the next iteration
  2. matched, not aggregated, we end the loop since we already found the item
  3. matched and aggregated, we call try_persisting to dequeue the items.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean aggregated must be matched?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, but they share a break statement. So I put it inside matching.

@yuxiamit yuxiamit force-pushed the buffer-manager-sync-req-fix branch 4 times, most recently from ad8d221 to 4b7a885 Compare September 17, 2021 13:57
self.buffer.pop_front();
}

if let Some(buffer_item) = found_item {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use expect here to avoid the if branch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good. I'll do this in the next PR.

zekun000
zekun000 previously approved these changes Sep 17, 2021
.map(|eb| Arc::new(eb.clone()))
.collect::<Vec<Arc<ExecutedBlock>>>(),
);
if link_eq(&cursor, &aggregated_item_cursor) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still feel we can simplify this code, but I guess I can try it myself after it gets merged

@yuxiamit
Copy link
Contributor Author

/land

@bors-libra bors-libra moved this from In Review to Queued in bors Sep 17, 2021
bors-libra pushed a commit that referenced this pull request Sep 17, 2021
…item to aggregated. Pop a prefix of buffer items if found an aggregated item.

Closes: #9207
@bors-libra bors-libra moved this from Queued to Testing in bors Sep 17, 2021
@bors-libra
Copy link
Contributor

💔 Test Failed - ci-test

@bors-libra bors-libra moved this from Testing to In Review in bors Sep 17, 2021
zekun000
zekun000 previously approved these changes Sep 18, 2021
@yuxiamit
Copy link
Contributor Author

/land

@bors-libra bors-libra moved this from In Review to Queued in bors Sep 18, 2021
…item to aggregated. Pop a prefix of buffer items if found an aggregated item.

Closes: diem#9207
@bors-libra bors-libra moved this from Queued to Testing in bors Sep 18, 2021
@github-actions
Copy link

Cluster Test Result

Test runner setup time spent 257 secs
Compatibility test results for land_6122cb78 ==> land_3e334126 (PR)
1. All instances running land_6122cb78, generating some traffic on network
2. First full node land_6122cb78 ==> land_3e334126, to validate new full node to old validator node traffic
3. First Validator node land_6122cb78 ==> land_3e334126, to validate storage compatibility
4. First batch validators (14) land_6122cb78 ==> land_3e334126, to test consensus and traffic between old full nodes and new validator node
5. First batch full nodes (14) land_6122cb78 ==> land_3e334126
6. Second batch validators (15) land_6122cb78 ==> land_3e334126, to upgrade rest of the validators
7. Second batch of full nodes (15) land_6122cb78 ==> land_3e334126, to finish the network upgrade, time spent 700 secs
all up : 1153 TPS, 3929 ms latency, 4500 ms p99 latency, no expired txns, time spent 249 secs
Logs: http://kibana.ct-0-k8s-testnet.aws.hlw3truzy4ls.com/app/kibana#/discover?_g=(time:(from:'2021-09-18T01:28:41Z',to:'2021-09-18T01:51:37Z'))
Dashboard: http://grafana.ct-0-k8s-testnet.aws.hlw3truzy4ls.com/d/performance/performance?from=1631928521000&to=1631929897000
Validator 1 logs: http://kibana.ct-0-k8s-testnet.aws.hlw3truzy4ls.com/app/kibana#/discover?_g=(time:(from:'2021-09-18T01:28:41Z',to:'2021-09-18T01:51:37Z'))&_a=(columns:!(log),query:(language:kuery,query:'kubernetes.pod_name:"val-1"'),sort:!(!('@timestamp',desc)))

Repro cmd:

./scripts/cti --tag land_6122cb78 --cluster-test-tag land_3e334126 -E BATCH_SIZE=15 -E UPDATE_TO_TAG=land_3e334126 --report report.json --suite land_blocking_compat

🎉 Land-blocking cluster test passed! 👌

@bors-libra bors-libra removed this from Testing in bors Sep 18, 2021
@bors-libra bors-libra merged commit 3e33412 into diem:main Sep 18, 2021
@bors-libra bors-libra temporarily deployed to Sccache September 18, 2021 01:52 Inactive
@bors-libra bors-libra temporarily deployed to Sccache September 18, 2021 01:52 Inactive
@bors-libra bors-libra temporarily deployed to Docker September 18, 2021 01:52 Inactive
@bors-libra bors-libra temporarily deployed to Sccache September 18, 2021 14:18 Inactive
@bors-libra bors-libra temporarily deployed to Docker September 18, 2021 14:18 Inactive
@bors-libra bors-libra temporarily deployed to Sccache September 19, 2021 14:18 Inactive
@bors-libra bors-libra temporarily deployed to Docker September 19, 2021 14:18 Inactive
@bors-libra bors-libra temporarily deployed to Sccache September 20, 2021 14:19 Inactive
@bors-libra bors-libra temporarily deployed to Docker September 20, 2021 14:19 Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants