Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[test] Flaky tests #548

Closed
BohuTANG opened this issue May 14, 2021 · 26 comments · Fixed by #566
Closed

[test] Flaky tests #548

BohuTANG opened this issue May 14, 2021 · 26 comments · Fixed by #566
Labels
C-testing Category: testing

Comments

@BohuTANG
Copy link
Member

Tests issues

@BohuTANG BohuTANG added the C-testing Category: testing label May 14, 2021
@BohuTANG BohuTANG pinned this issue May 14, 2021
@BohuTANG
Copy link
Member Author

https://github.com/datafuselabs/datafuse/runs/2583034348

failures:

---- meta_service::raftmeta_test::test_meta_node_add_non_voter stdout ----
Error: transport error: error trying to connect: tcp connect error: Connection refused (os error 111)

Caused by:
    0: error trying to connect: tcp connect error: Connection refused (os error 111)
    1: tcp connect error: Connection refused (os error 111)
    2: Connection refused (os error 111)
thread 'meta_service::raftmeta_test::test_meta_node_add_non_voter' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', /rustc/673d0db5e393e9c64897005b470bfeb6d5aec61b/library/test/src/lib.rs:193:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

cc @drmingdrmer

@BohuTANG
Copy link
Member Author

test api::rpc::flight_service_test::test_flight_create_get_table ... ok
May 18 16:35:22.868 ERROR async_raft::replication: error sending AppendEntries RPC to target error=transport error: error trying to connect: tcp connect error: Connection refused (os error 111)

cc @drmingdrmer

@drmingdrmer
Copy link
Member

test api::rpc::flight_service_test::test_flight_create_get_table ... ok
May 18 16:35:22.868 ERROR async_raft::replication: error sending AppendEntries RPC to target error=transport error: error trying to connect: tcp connect error: Connection refused (os error 111)

cc @drmingdrmer

@BohuTANG
This is not a bug: when testing restarting-node, sending to a closed node emits this log.

@BohuTANG
Copy link
Member Author

test api::rpc::flight_service_test::test_flight_create_get_table ... ok
May 18 16:35:22.868 ERROR async_raft::replication: error sending AppendEntries RPC to target error=transport error: error trying to connect: tcp connect error: Connection refused (os error 111)

cc @drmingdrmer

@BohuTANG
This is not a bug: when testing restarting-node, sending to a closed node emits this log.

Got

@BohuTANG BohuTANG changed the title Flaky tests [test] Flaky tests Jun 4, 2021
@BohuTANG
Copy link
Member Author

BohuTANG commented Jun 4, 2021

`---- meta_service::meta_service_impl_test::test_meta_server_incr_seq stdout ----
Error: transport error: error trying to connect: tcp connect error: Connection refused (os error 111)

Caused by:
0: error trying to connect: tcp connect error: Connection refused (os error 111)
1: tcp connect error: Connection refused (os error 111)
2: Connection refused (os error 111)
thread 'meta_service::meta_service_impl_test::test_meta_server_incr_seq' panicked at 'assertion failed: (left == right)
left: 1,
right: 0: the test returned a termination value with a non-zero status code (1) which indicates a failure', /rustc/657bc01888e6297257655585f9c475a0801db6d2/library/test/src/lib.rs:193:5
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

---- meta_service::meta_service_impl_test::test_meta_server_set_get stdout ----
Error: transport error: error trying to connect: tcp connect error: Connection refused (os error 111)

Caused by:
0: error trying to connect: tcp connect error: Connection refused (os error 111)
1: tcp connect error: Connection refused (os error 111)
2: Connection refused (os error 111)
thread 'meta_service::meta_service_impl_test::test_meta_server_set_get' panicked at 'assertion failed: (left == right)
left: 1,
right: 0: the test returned a termination value with a non-zero status code (1) which indicates a failure', /rustc/657bc01888e6297257655585f9c475a0801db6d2/library/test/src/lib.rs:193:5
`
https://github.com/datafuselabs/datafuse/runs/2747574962

cc @drmingdrmer

@drmingdrmer
Copy link
Member

@BohuTANG I saw it too last night. It needs to wait several milliseconds before connecting. Gonna fix it along with my next merge.
Thanks for reminding me!

@BohuTANG
Copy link
Member Author

BohuTANG commented Jun 5, 2021

@drmingdrmer

github collaboration is async, this comments just a notes when you see it, anytime is fine :)

@BohuTANG BohuTANG unpinned this issue Jun 7, 2021
@BohuTANG
Copy link
Member Author

New one:

failures:

---- meta_service::meta_service_impl_test::test_meta_cluster_write_on_non_leader stdout ----
Error: timeout wait for 1: current_leader -> 0 latest: RaftMetrics { id: 1, state: NonVoter, current_term: 0, last_log_index: 0, last_applied: 0, current_leader: None, membership_config: MembershipConfig { members: {1}, members_after_consensus: None } }
thread 'meta_service::meta_service_impl_test::test_meta_cluster_write_on_non_leader' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', /rustc/657bc01888e6297257655585f9c475a0801db6d2/library/test/src/lib.rs:193:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    meta_service::meta_service_impl_test::test_meta_cluster_write_on_non_leader

test result: FAILED. 34 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 11.70

@BohuTANG
Copy link
Member Author

---- servers::mysql::mysql_handler_test::test_rejected_session_with_sequence stdout ----
Error: Code: 1000, displayText = Reject connection, cause: MySqlError { ERROR 1203 (42000): The current accept connection has exceeded mysql_handler_thread_num config }.

https://github.com/datafuselabs/datafuse/runs/2880840620#step:4:1160

cc @zhang2014

@BohuTANG
Copy link
Member Author

New one:

failures:

---- dfs::distributed_fs_test::test_distributed_fs_single_node_list stdout ----
Error: status: Unimplemented, message: "", details: [], metadata: MetadataMap { headers: {"date": "Fri, 25 Jun 2021 09:59:44 GMT", "content-type": "application/grpc"} }
thread 'dfs::distributed_fs_test::test_distributed_fs_single_node_list' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', /rustc/657bc01888e6297257655585f9c475a0801db6d2/library/test/src/lib.rs:193:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    dfs::distributed_fs_test::test_distributed_fs_single_node_list

cc @drmingdrmer

@drmingdrmer
Copy link
Member

New one:

failures:

---- dfs::distributed_fs_test::test_distributed_fs_single_node_list stdout ----
Error: status: Unimplemented, message: "", details: [], metadata: MetadataMap { headers: {"date": "Fri, 25 Jun 2021 09:59:44 GMT", "content-type": "application/grpc"} }
thread 'dfs::distributed_fs_test::test_distributed_fs_single_node_list' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', /rustc/657bc01888e6297257655585f9c475a0801db6d2/library/test/src/lib.rs:193:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    dfs::distributed_fs_test::test_distributed_fs_single_node_list

cc @drmingdrmer

@BohuTANG
Where did you see this error?

@BohuTANG
Copy link
Member Author

New one:

failures:

---- dfs::distributed_fs_test::test_distributed_fs_single_node_list stdout ----
Error: status: Unimplemented, message: "", details: [], metadata: MetadataMap { headers: {"date": "Fri, 25 Jun 2021 09:59:44 GMT", "content-type": "application/grpc"} }
thread 'dfs::distributed_fs_test::test_distributed_fs_single_node_list' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', /rustc/657bc01888e6297257655585f9c475a0801db6d2/library/test/src/lib.rs:193:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    dfs::distributed_fs_test::test_distributed_fs_single_node_list

cc @drmingdrmer

@BohuTANG
Where did you see this error?

This happens in running unit-tests job, I have restart the job and the logs are gone.

@drmingdrmer
Copy link
Member

This happens in running unit-tests job, I have restart the job and the logs are gone.

Shall we turn on RUST_BACKTRACE=1 for CI? Error message like this one provided very little information.

@BohuTANG
Copy link
Member Author

This happens in running unit-tests job, I have restart the job and the logs are gone.

Shall we turn on RUST_BACKTRACE=1 for CI? Error message like this one provided very little information.

Sure

@BohuTANG BohuTANG unpinned this issue Jun 29, 2021
@zhang2014
Copy link
Member

The results are not sorted, and the different test results under different CPU nums
CC: @jyizheng

https://github.com/datafuselabs/datafuse/blob/59fa228c47d1f0f90a78335ff32f50360f47fb70/tests/suites/0_stateless/03_0009_select_exists.sql#L1-L4

@BohuTANG
Copy link
Member Author

BohuTANG commented Aug 12, 2021

Tests (macos-latest, stable, x86_64-apple-darwin, false) failed 7 hours ago in 26m 12s:

failures:

---- meta_service::raftmeta_test::test_meta_node_write_to_local_leader stdout ----
Error: timeout after 2s when  .state -> Follower latest: RaftMetrics { id: 2, state: NonVoter, current_term: 1, last_log_index: 5, last_applied: 5, current_leader: Some(0), membership_config: MembershipConfig { members: {0}, members_after_consensus: None }, snapshot: LogId { term: 0, index: 0 }, leader_metrics: None }
thread 'meta_service::raftmeta_test::test_meta_node_write_to_local_leader' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', /rustc/e3b1c12becebd21d88ca9f4364d7db8d1d380c18/library/test/src/lib.rs:194:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- meta_service::raftmeta_test::test_meta_node_set_file stdout ----
Error: timeout after 2s when  .last_log_index -> 8 latest: RaftMetrics { id: 4, state: NonVoter, current_term: 1, last_log_index: 6, last_applied: 6, current_leader: Some(0), membership_config: MembershipConfig { members: {0}, members_after_consensus: None }, snapshot: LogId { term: 0, index: 0 }, leader_metrics: None }
thread 'meta_service::raftmeta_test::test_meta_node_set_file' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', /rustc/e3b1c12becebd21d88ca9f4364d7db8d1d380c18/library/test/src/lib.rs:194:5


failures:
    meta_service::raftmeta_test::test_meta_node_set_file
    meta_service::raftmeta_test::test_meta_node_write_to_local_leader

test result: FAILED. 97 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out; finished in 53.27s

https://github.com/datafuselabs/datafuse/pull/1441/checks?check_run_id=3303472611#step:5:1024

CC: @drmingdrmer

@BohuTANG BohuTANG pinned this issue Aug 12, 2021
@PsiACE
Copy link
Member

PsiACE commented Aug 12, 2021

failures:

---- catalogs::impls::remote_meta_store_client_test::test_get_database stdout ----
thread 'catalogs::impls::remote_meta_store_client_test::test_get_database' panicked at 'assertion failed: `(left == right)`
  left: `40`,
 right: `3`', query/src/catalogs/impls/remote_meta_store_client_test.rs:317:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- catalogs::impls::remote_meta_store_client_test::test_get_table stdout ----
thread 'catalogs::impls::remote_meta_store_client_test::test_get_table' panicked at 'assertion failed: `(left == right)`
  left: `40`,
 right: `25`', query/src/catalogs/impls/remote_meta_store_client_test.rs:370:9

@dantengsky
Copy link
Member

test_get_table

I know some clues of these two failures, let me handle these

@BohuTANG
Copy link
Member Author

Failed in stateless query with cluster:

2021-08-18T05:25:23.9202385Z [2021-08-18T05:25:23Z INFO  datafuse_store::api::rpc::flight_service] Receive do_action: GetTableExt(GetTableExtReq { tbl_id: 2, tbl_ver: None })
2021-08-18T05:25:23.9214519Z [2021-08-18T05:25:23Z INFO  datafuse_store::api::rpc::flight_service] Receive do_action: GetTableExt(GetTableExtReq { tbl_id: 2, tbl_ver: None })
2021-08-18T05:25:23.9326344Z [2021-08-18T05:25:23Z INFO  datafuse_store::executor::action_handler] entering read
2021-08-18T05:25:23.9330865Z �[2mAug 18 05:25:23.932�[0m �[32m INFO�[0m datafuse_store::localfs::local_fs: read: ./_local_fs/db1/t1/ad73d5c4eb3b42e59eb19214129ce0df.parquet
2021-08-18T05:35:23.6207883Z [ FAIL ] - Timeout!

Full raw log:
https://pipelines.actions.githubusercontent.com/Zgqden0KhWGRXpIXPBTtTR2O0h9dPB76KbDLD2GlEOYbcWXWjs/_apis/pipelines/1/runs/27373/signedlogcontent/3?urlExpires=2021-08-18T05%3A57%3A09.6819992Z&urlSigningMethod=HMACV1&urlSignature=EH%2BTDPbfP4OtZ2LZWk%2FNnM%2B%2BODaPTUXwif3R5aW4ui8%3D

cc: @dantengsky

@dantengsky
Copy link
Member

@BohuTANG

Failed to find this issue in time, the URL of the raw log is expired;-(, can I find the logs somewhere else?

I run stateless-cluster-test several times locally, unfortunately, test failure was not reproduced.

Is this message generated by datafuse-test? and indicates that some test cases hang?

2021-08-18T05:35:23.6207883Z [ FAIL ] - Timeout!

@BohuTANG
Copy link
Member Author

@BohuTANG

Failed to find this issue in time, the URL of the raw log is expired;-(, can I find the logs somewhere else?

I run stateless-cluster-test several times locally, unfortunately, test failure was not reproduced.

Is this message generated by datafuse-test? and indicates that some test cases hang?

2021-08-18T05:35:23.6207883Z [ FAIL ] - Timeout!

This is triggred by Stateless Cluster Tests by github CI.
Let's skip it and catch next time.

@dantengsky
Copy link
Member

roger

@BohuTANG
Copy link
Member Author

Failed:

failures:

---- meta_service::raftmeta_test::test_meta_node_snapshot_replication stdout ----
Error: Raft is shutting down

Caused by:
    Raft is shutting down
thread 'meta_service::raftmeta_test::test_meta_node_snapshot_replication' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', /rustc/e3b1c12becebd21d88ca9f4364d7db8d1d380c18/library/test/src/lib.rs:194:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

cc: @drmingdrmer

@drmingdrmer
Copy link
Member

Failed:

failures:

---- meta_service::raftmeta_test::test_meta_node_snapshot_replication stdout ----
Error: Raft is shutting down

Caused by:
    Raft is shutting down
thread 'meta_service::raftmeta_test::test_meta_node_snapshot_replication' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', /rustc/e3b1c12becebd21d88ca9f4364d7db8d1d380c18/library/test/src/lib.rs:194:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

cc: @drmingdrmer

This should be fixed by #1546

BTW, this issue becomes toooooo looooong. 🤔

@BohuTANG
Copy link
Member Author

Failed:

failures:

---- meta_service::raftmeta_test::test_meta_node_snapshot_replication stdout ----
Error: Raft is shutting down

Caused by:
    Raft is shutting down
thread 'meta_service::raftmeta_test::test_meta_node_snapshot_replication' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', /rustc/e3b1c12becebd21d88ca9f4364d7db8d1d380c18/library/test/src/lib.rs:194:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

cc: @drmingdrmer

This should be fixed by #1546

BTW, this issue becomes toooooo looooong. 🤔

Cool, we should open new one

@BohuTANG BohuTANG unpinned this issue Aug 22, 2021
@BohuTANG
Copy link
Member Author

New one: #1564

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-testing Category: testing
Projects
None yet
5 participants