Uploaded image for project: 'Indy Node'
  1. Indy Node
  2. INDY-893

Node fails to sync following upgrade

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Complete
    • Priority: High
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Environment:

      ESN, running 1.1.37

    • Epic Link:
    • Sprint:
      INDY 17.21, INDY 17.22, INDY 17.23, INDY 17.24: Node Perf

      Description

      Metis, a node on the ESN, will not sync with the other nodes on the network following a manual upgrade from 1.1.33 to 1.1.37.  In the logs, a hash mismatch error on the config ledger is being reported:

      2017-09-29 11:49:17,887 | INFO     | ledger_manager.py    ( 601) | hasValidCatchupReplies | metis could not verify catchup reply CATCHUP_REP{'consProof': ['3SicutRJhiGma6ZGV7x1U8Dgd3ysSyRrrxdNVTtTpgQ1', 'BVa43Pg8SLAXi7NMvTfXDNXECA1SVSoihsfSDPkc4BNG', 'EJ7McLjzmQFxuvHm8d65zoQDREZJfvN3XHZ7Q5BzjSyi', '8HhxoviUnWyJ1RCEqCGJ19QPtmonuPKDfrH6GVxYnM9x', 'FLuA9mEmudY1tTBvhk18uapfF8cmRf6Cs8AAsx7NC49S', '4dicxCBSiGvjhppLZLGqLAdn7dpLKyr63oZgTNcsJ3E5'], 'ledgerId': 2, 'txns': {'3': {'reqId': 1505926023556578, 'identifier': '7VNYvJaxDraquhMC9YneziwmM9SZzR5KM24xWtm1jVh', 'txnTime': 1505926023, 'signature': '3Vr9bnzoxnSSUM3UZPMD2uXVmfdGDA5nVCrPSiawNPymE4pJentkGYkAAp2hbzTozuUMUuSunZVD8emCwf7F4ogh', 'data': {'version': '1.1.33', 'action': 'complete'}, 'type': '110'}}} since Inconsistency: first root hash does not match. Expected hash: b'99bbdf156bbfb1578944d380bd5f33996400330256f3b9a1398802c937e59ce1', computed hash: b'6ce3b31822cb39fbee6601df74abd6d1f5f6710d1cb1915f7da3f986cb834172'
      

      When inspecting the config ledger, a mismatch is indeed found on the second transaction, which had been posted to the ledger a month ago, when an upgrade transaction with the --force flag was used to upgrade from 1.0.28 to 1.1.33.  The node has been operational and processing domain transactions normally over the past month, while running 1.1.33. With this failure to sync, the node is no longer able to accept transactions.

      Theory:

      A non-consensus transaction (or a duplicate transaction) was posted only to the metis config ledger a month ago during the 1.1.33 upgrade with the --force flag, but for some reason the problem was not detected then, perhaps because nothing was written to the config ledger after that.  The problem sat dormant until the node service was stopped and started during the 1.1.37 upgrade, triggering the resync event on all the ledgers, and the config ledger resync failed due to the mismatch.  Node functionality was then halted.

       

      Attached are the log, domain (partial) and config ledgers for metis, and the config ledger for one of the other, presumed good, nodes on the ESN.  The domain ledger of metis matches the domain ledger of the other nodes.

        Attachments

        1. jouer_config_ledger.txt
          9 kB
        2. journalctl-10-09-17.txt
          2.55 MB
        3. metis_config_ledger-10-04-17.txt
          2 kB
        4. metis_ledger_trans-10-04-17.txt
          215 kB
        5. metis_log_2017-09-29.txt
          4.61 MB
        6. metis-10-09-17.txt
          3.21 MB

          Issue Links

            Activity

              People

              Assignee:
              ashcherbakov Alexander Shcherbakov
              Reporter:
              mgbailey Mike Bailey
              Watchers:
              Alexander Shcherbakov, Dmitry Surnin, Kelly Wilson, Mike Bailey, Olga Zheregelya
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: