Details
Description
Metis, a node on the ESN, will not sync with the other nodes on the network following a manual upgrade from 1.1.33 to 1.1.37. In the logs, a hash mismatch error on the config ledger is being reported:
2017-09-29 11:49:17,887 | INFO | ledger_manager.py ( 601) | hasValidCatchupReplies | metis could not verify catchup reply CATCHUP_REP{'consProof': ['3SicutRJhiGma6ZGV7x1U8Dgd3ysSyRrrxdNVTtTpgQ1', 'BVa43Pg8SLAXi7NMvTfXDNXECA1SVSoihsfSDPkc4BNG', 'EJ7McLjzmQFxuvHm8d65zoQDREZJfvN3XHZ7Q5BzjSyi', '8HhxoviUnWyJ1RCEqCGJ19QPtmonuPKDfrH6GVxYnM9x', 'FLuA9mEmudY1tTBvhk18uapfF8cmRf6Cs8AAsx7NC49S', '4dicxCBSiGvjhppLZLGqLAdn7dpLKyr63oZgTNcsJ3E5'], 'ledgerId': 2, 'txns': {'3': {'reqId': 1505926023556578, 'identifier': '7VNYvJaxDraquhMC9YneziwmM9SZzR5KM24xWtm1jVh', 'txnTime': 1505926023, 'signature': '3Vr9bnzoxnSSUM3UZPMD2uXVmfdGDA5nVCrPSiawNPymE4pJentkGYkAAp2hbzTozuUMUuSunZVD8emCwf7F4ogh', 'data': {'version': '1.1.33', 'action': 'complete'}, 'type': '110'}}} since Inconsistency: first root hash does not match. Expected hash: b'99bbdf156bbfb1578944d380bd5f33996400330256f3b9a1398802c937e59ce1', computed hash: b'6ce3b31822cb39fbee6601df74abd6d1f5f6710d1cb1915f7da3f986cb834172'
When inspecting the config ledger, a mismatch is indeed found on the second transaction, which had been posted to the ledger a month ago, when an upgrade transaction with the --force flag was used to upgrade from 1.0.28 to 1.1.33. The node has been operational and processing domain transactions normally over the past month, while running 1.1.33. With this failure to sync, the node is no longer able to accept transactions.
Theory:
A non-consensus transaction (or a duplicate transaction) was posted only to the metis config ledger a month ago during the 1.1.33 upgrade with the --force flag, but for some reason the problem was not detected then, perhaps because nothing was written to the config ledger after that. The problem sat dormant until the node service was stopped and started during the 1.1.37 upgrade, triggering the resync event on all the ledgers, and the config ledger resync failed due to the mismatch. Node functionality was then halted.
Attached are the log, domain (partial) and config ledgers for metis, and the config ledger for one of the other, presumed good, nodes on the ESN. The domain ledger of metis matches the domain ledger of the other nodes.