Spent the evening collecting details of a fork that appears to have thrown the network into chaos, causing syncs to hang. Collecting details here instead of gitter:
Yeastplume @yeastplume 21:46
Right, trying to restart node stuck trying to sync, last block it accepts is 7356 -8a7eab34, everything after is an orphan according to the node.
First we heard of that block was:
Nov 21 19:05:21.578 DEBG Received block 8a7eab34 at 7356 from network, going to process.
Nov 21 19:05:21.579 DEBG Processing block 8a7eab34 at 7356 with 6 inputs and 12 outputs.
Nov 21 19:05:21.580 DEBG Block 8a7eab34 refused by chain: Orphan
.
.
.
Nov 21 19:05:21.688 DEBG locate_headers: common header: 1258fdb7
Nov 21 19:05:21.695 DEBG locate_headers: returning headers: 74
Nov 21 19:05:22.255 DEBG Received block 8a7eab34 at 7356 from network, going to process.
Nov 21 19:05:22.256 DEBG Processing block 8a7eab34 at 7356 with 6 inputs and 12 outputs.
Nov 21 19:05:22.257 DEBG Block 8a7eab34 refused by chain: Orphan
Nov 21 19:05:22.630 DEBG peer total_diff (ping/pong): 73.81.122.26:13414, 26860673 vs us 28873113
Nov 21 19:05:24.017 DEBG Received block 8a7eab34 at 7356 from network, going to process.
This one appears to be the missing link between me and the rest of the chain... when trying to sync it ends up in a loop here:
Nov 21 21:27:28.318 INFO Received unfit block header 6c022946 at 7354: already in store.
Nov 21 21:27:28.318 DEBG Processing header e4b4cd15 at 7355.
Nov 21 21:27:28.321 INFO Received unfit block header e4b4cd15 at 7355: already in store.
Nov 21 21:27:28.321 DEBG Processing header 8a7eab34 at 7356.
Nov 21 21:27:28.322 DEBG Validating block with cuckoo size 16
Nov 21 21:27:28.327 DEBG Processing header 822896e5 at 7357.
Nov 21 21:27:28.327 INFO Invalid block header 822896e5: Orphan.
It looks like even though we rejected that earlier as an Orphan, when trying to rebuild the chain we're trying to include it... is there an off-by-one error somewhere?
Yeastplume @yeastplume 21:54
And everything after that was where the orphan trouble started... it looks as if we're keeping that rogue block around somehow when trying to rebuild the list of headers.
Nov 21 19:05:31.197 DEBG cleaning V4(37.59.48.28:13414), not connected
Nov 21 19:05:31.814 DEBG peer total_diff (ping/pong): 174.138.116.153:13414, 28873113 vs us 28873113
Nov 21 19:05:32.737 DEBG Received block 822896e5 at 7357 from network, going to process.
Nov 21 19:05:32.737 DEBG Processing block 822896e5 at 7357 with 0 inputs and 1 outputs.
Nov 21 19:05:32.738 DEBG Block 822896e5 refused by chain: Orphan
Yeastplume @yeastplume 22:07
Or.. now looking into it more.... perhaps it that our longest peer is somehow returning a list of headers it can't validate either
Yeastplume @yeastplume 22:20
And from what seems like another peer now
Nov 21 22:15:37.217 DEBG Error returned, discarding sumtree extension.
Nov 21 22:15:37.217 INFO Rejected block 8a7eab34 at 7356: DuplicateCommitment(Commitment(0947cb1c2d22aa8196db0295e10397d80f61a565d700e9bbde0f6a3a7e1ff5cfcb))
Nov 21 22:15:37.217 DEBG Block 8a7eab34 refused by chain: DuplicateCommitment(Commitment(0947cb1c2d22aa8196db0295e10397d80f61a565d700e9bbde0f6a3a7e1ff5cfcb))
And that block was just after (at least 1) transaction 6 inputs, 12 outputs.
Yeastplume @yeastplume 22:35
From the other channel 51.255.202.83 has has the longest chain for a while, and any node trying to sync from it is ending up rejecting a forked chain. My issues on this node started at the same time that node caught up with and overtook mine:
Nov 21 19:05:20.809 DEBG peer total_diff (ping/pong): 51.255.202.83:13414, 28872857 vs us 28873113
.
Nov 21 19:05:21.578 DEBG Received block 8a7eab34 at 7356 from network, going to process.
Nov 21 19:05:21.579 DEBG Processing block 8a7eab34 at 7356 with 6 inputs and 12 outputs.
Nov 21 19:05:21.580 DEBG Block 8a7eab34 refused by chain: Orphan
.
Nov 21 19:05:34.996 DEBG peer total_diff (ping/pong): 51.255.202.83:13414, 28873949 vs us 28873113
It looks like it mined a rogue `8a7eab34' block with a commit I already had in my UTXO set somewhere, which I rejected then it accepted then became the longest chain.. then I switched to try to follow it instead, which means my version of that block is hosed for good.
Yeastplume @yeastplume 22:50
from another node:
Nov 21 20:05:21.565 DEBG Received block 8a7eab34 at 7356 from network, going to process.
Nov 21 20:05:21.565 DEBG Processing block 8a7eab34 at 7356 with 6 inputs and 12 outputs.
Nov 21 20:05:21.565 DEBG Validating block with cuckoo size 16
Nov 21 20:05:21.569 DEBG Starting new sumtree extension.
--
Nov 21 20:05:22.115 TRCE Hashing with additional data
Nov 21 20:05:22.116 TRCE Hashing with additional data
Nov 21 20:05:22.126 DEBG Block at 7356 with hash 8a7eab34 is valid, going to save and append.
Nov 21 20:05:22.130 INFO Updated head to 8a7eab34 at 7356.
Nov 21 20:05:22.130 DEBG Committing sumtree extension.
Nov 21 20:05:22.154 DEBG Sumtree extension done.
--
Nov 21 20:05:22.549 DEBG peer total_diff (ping/pong): 165.227.63.166:13414, 28795465 vs us 28873185
Nov 21 20:05:22.630 DEBG peer total_diff (ping/pong): 73.81.122.26:13414, 26860673 vs us 28873185
Nov 21 20:05:22.677 DEBG Received block 8a7eab34 at 7356 from network, going to process.
Nov 21 20:05:22.677 DEBG Processing block 8a7eab34 at 7356 with 6 inputs and 12 outputs.
Nov 21 20:05:22.677 DEBG Block 8a7eab34 at 7356 is unfit at this time: already known
Nov 21 20:05:22.677 DEBG Block 8a7eab34 refused by chain: Unfit("already known")
Nov 21 20:05:22.678 DEBG peer total_diff (ping/pong): 84.172.93.17:13414, 28873185 vs us 28873185
Nov 21 20:05:23.733 DEBG Received block 8a7eab34 at 7356 from network, going to process.