Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[forge] Always scale down cluster first before each test run #9210

Merged
merged 1 commit into from Sep 17, 2021

Conversation

zihaoccc
Copy link
Contributor

Sometimes, post test cleanup may accidentally get into faulty status. We should always scale down cluster first before test as well to prevent this case not affect subsequent test run

Motivation

(Write your motivation for proposed changes here.)

Have you read the Contributing Guidelines on pull requests?

(Write your answer here.)

Test Plan

(Share your test plan here. If you changed code, please provide us with clear instructions for verifying that your changes work.)

Related PRs

(If this PR adds or changes functionality, please take some time to update the docs at https://github.com/diem/diem/tree/main/developers.diem.com, and link to your PR here.)

If targeting a release branch, please fill the below out as well

  • Justification and breaking nature (who does it affect? validators, full nodes, tooling, operators, AOS, etc.)
  • Comprehensive test results that demonstrate the fix working and not breaking existing workflows.
  • Why we must have it for V1 launch.
  • What workarounds and alternative we have if we do not push the PR.

@bors-libra bors-libra added this to In Review in bors Sep 17, 2021
@zihaoccc zihaoccc changed the title [forge] Always scale down first before test [forge] Always scale down cluster first before each test run Sep 17, 2021
bmwill
bmwill previously approved these changes Sep 17, 2021
@zihaoccc
Copy link
Contributor Author

/land

@bors-libra bors-libra moved this from In Review to Queued in bors Sep 17, 2021
bors-libra pushed a commit that referenced this pull request Sep 17, 2021
@bors-libra bors-libra moved this from Queued to Testing in bors Sep 17, 2021
@github-actions
Copy link

Cluster Test Result

Test runner setup time spent 292 secs
Compatibility test results for land_3ba55ff1 ==> land_234ced83 (PR)
1. All instances running land_3ba55ff1, generating some traffic on network
2. First full node land_3ba55ff1 ==> land_234ced83, to validate new full node to old validator node traffic
3. First Validator node land_3ba55ff1 ==> land_234ced83, to validate storage compatibility
4. First batch validators (14) land_3ba55ff1 ==> land_234ced83, to test consensus and traffic between old full nodes and new validator node
Experiment `Compatibility test, phased upgrade to land_234ced83 in batches of 1, 14, 15` failed: `Experiment deadline reached`
Logs: http://kibana.ct-2-k8s-testnet.aws.hlw3truzy4ls.com/app/kibana#/discover?_g=(time:(from:'2021-09-17T20:26:47Z',to:'2021-09-17T20:50:04Z'))
Dashboard: http://grafana.ct-2-k8s-testnet.aws.hlw3truzy4ls.com/d/performance/performance?from=1631910407000&to=1631911804000
Validator 1 logs: http://kibana.ct-2-k8s-testnet.aws.hlw3truzy4ls.com/app/kibana#/discover?_g=(time:(from:'2021-09-17T20:26:47Z',to:'2021-09-17T20:50:04Z'))&_a=(columns:!(log),query:(language:kuery,query:'kubernetes.pod_name:"val-1"'),sort:!(!('@timestamp',desc)))

❗ Cluster Test failed - non-zero exit code for cti

Repro cmd:

./scripts/cti --tag land_3ba55ff1 --cluster-test-tag land_234ced83 -E BATCH_SIZE=15 -E UPDATE_TO_TAG=land_234ced83 --report report.json --suite land_blocking_compat

@bors-libra
Copy link
Contributor

💔 Test Failed - ci-test

bmwill
bmwill previously approved these changes Sep 17, 2021
@zihaoccc
Copy link
Contributor Author

/land

@bors-libra bors-libra moved this from In Review to Queued in bors Sep 17, 2021
@bors-libra bors-libra moved this from Queued to Testing in bors Sep 17, 2021
@github-actions
Copy link

Cluster Test Result

Test runner setup time spent 252 secs
Compatibility test results for land_bb1991af ==> land_a9975df1 (PR)
1. All instances running land_bb1991af, generating some traffic on network
2. First full node land_bb1991af ==> land_a9975df1, to validate new full node to old validator node traffic
3. First Validator node land_bb1991af ==> land_a9975df1, to validate storage compatibility
4. First batch validators (14) land_bb1991af ==> land_a9975df1, to test consensus and traffic between old full nodes and new validator node
5. First batch full nodes (14) land_bb1991af ==> land_a9975df1
6. Second batch validators (15) land_bb1991af ==> land_a9975df1, to upgrade rest of the validators
7. Second batch of full nodes (15) land_bb1991af ==> land_a9975df1, to finish the network upgrade, time spent 670 secs
all up : 1178 TPS, 3856 ms latency, 4400 ms p99 latency, no expired txns, time spent 250 secs
Logs: http://kibana.ct-1-k8s-testnet.aws.hlw3truzy4ls.com/app/kibana#/discover?_g=(time:(from:'2021-09-17T22:38:22Z',to:'2021-09-17T23:00:47Z'))
Dashboard: http://grafana.ct-1-k8s-testnet.aws.hlw3truzy4ls.com/d/performance/performance?from=1631918302000&to=1631919647000
Validator 1 logs: http://kibana.ct-1-k8s-testnet.aws.hlw3truzy4ls.com/app/kibana#/discover?_g=(time:(from:'2021-09-17T22:38:22Z',to:'2021-09-17T23:00:47Z'))&_a=(columns:!(log),query:(language:kuery,query:'kubernetes.pod_name:"val-1"'),sort:!(!('@timestamp',desc)))

Repro cmd:

./scripts/cti --tag land_bb1991af --cluster-test-tag land_a9975df1 -E BATCH_SIZE=15 -E UPDATE_TO_TAG=land_a9975df1 --report report.json --suite land_blocking_compat

🎉 Land-blocking cluster test passed! 👌

@bors-libra bors-libra removed this from Testing in bors Sep 17, 2021
@bors-libra bors-libra merged commit a9975df into diem:main Sep 17, 2021
@bors-libra bors-libra temporarily deployed to Sccache September 17, 2021 23:01 Inactive
@bors-libra bors-libra temporarily deployed to Docker September 17, 2021 23:01 Inactive
@bors-libra bors-libra temporarily deployed to Sccache September 17, 2021 23:01 Inactive
@zihaoccc zihaoccc deleted the cleanup branch September 18, 2021 00:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants