Uploaded image for project: 'Indy Node'
  1. Indy Node
  2. INDY-895

New nodes added to existing pool are unable to sync ledgers with the pool.

    Details

    • Type: Bug
    • Status: Complete
    • Priority: Highest
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Sprint:
      14, INDY 17.21, INDY 17.22

      Description

      When adding a new validator node to an existing pool the ledgers are unable to sync. Due to this issue the new node cannot participate in consensus and counts as a failed node.

      The issue may be due to the use of the 'force=True' parameter in the POOL_UPGRADE transaction.

      Short version
      Setup pool with indy-node 1.0.28 using live pool configuration settings
      Upgrade to 1.1.37
      Add a new node to the pool from a fresh install of 1.1.37

      Steps
      1. Setup a pool using the provisional live build

      indy-plenum=1.0.21
      indy-anoncreds=1.0.8
      indy-node=1.0.28
      sovrin=1.0.3
      

      2. Before starting the pool change the configuration to use the live transaction files. As the sovrin user edit ".sovrin/sovrin_config.py"
      Add the following lines

      poolTransactionsFile = 'pool_transactions_live'
      domainTransactionsFile = 'transactions_live'
      

      3. Start the sovrin-node service. The ledgers for pool and domain will be created in the following directories

      .sovrin/data/nodes/<node name>/pool_transactions_live
      .sovrin/data/nodes/<node name>/transactions_live
      

      4. Send a few transactions from the CLI to make sure the pool is working correctly.

      *Upgrade *to indy-node 1.1.37
      Note- The upgrade to 1.1.37 introduced serialized ledgers. Due to this significant change it was necessary that all validator nodes in the pool upgraded simultaneously. The instructions to those upgrading the pool included the use of the upgrade parameter "force=True".

      5. Send the an upgrade transaction with the parameter "force=True" like in the example below.

      send POOL_UPGRADE name=upgradestable37 version=1.1.37 sha256=e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 action=start schedule={'Gw6pDLhcBcoQesN72qfotTgFa7cbuqZpkX3Xo6pLhPhv':'2017-10-04T17:30:00.258870-06:00','8ECVSk179mjsjKRLWiQtssMLgp6EPhWXtaYyStWPSGAb': '2017-10-04T17:30:00.258870-06:00','DKVxG2fXXTU8yT5N7hGEbXB3dfdAnYv1JczDUHpmDxya':'2017-10-04T17:30:00.258870-06:00','4PS3EDQ3dW1tci1Bp6543CfuuebjFrg36kLAUcskGfaA':'2017-10-04T17:30:00.258870-06:00','4SWokCJWJc69Tn74VvLS6t2G2ucvXqM9FDMsWJjmsUxe':'2017-10-04T17:30:00.258870-06:00','Cv1Ehj43DDM5ttNBmC6VPpEfwXWwfGktHwjDJsTV5Fz8':'2017-10-04T17:30:00.258870-06:00','BM8dTooz5uykCbYSAAFwKNkYfT4koomBHsSWHTDtkjhW':'2017-10-04T17:30:00.258870-06:00','98VysG35LxrutKTNXvhaztPFHnx5u9kHtT7PnUGqDa8x':'2017-10-04T17:30:00.258870-06:00','6pfbFuX5tx7u3XKz8MNK4BJiHxvEcnGRBs1AQyNaiEQL':'2017-10-04T17:30:00.258870-06:00','HaNW78ayPK4b8vTggD4smURBZw7icxJpjZvCMLdUueiN':'2017-10-04T17:30:00.258870-06:00'} timeout=10 force=True
      

      6. After upgrading successfully the pool version on each node showed

      indy-plenum=1.1.27
      indy-anoncreds=1.0.10
      indy-node=1.1.37
      sovrin=1.1.6
      

      7. Send some transactions to make sure the pool is functioning. I sent 15 transactions.
      Note - At this point the pool was functioning and all nodes are in sync

      Add Node - Now install a new node to add to the pool
      8. Install the latest stable (indy-node 1.1.37) to a new machine.
      9. Initialize the node, but do not start the services
      10. From one of the nodes in the pool copy the following files to the .sovrin directory of the new node

      pool_transactions_live_genesis
      domain_transactions_live_genesis
      

      11. Before starting the node change the configuration file to use the live transaction files. As the sovrin user edit ".sovrin/sovrin_config.py"
      Add the following lines

      poolTransactionsFile = 'pool_transactions_live'
      domainTransactionsFile = 'domain_transactions_live'
      

      Note the difference here. After the upgrade the transactions_live file was renamed to "domain_transactions_live" and the format change to be in a json format

      12. Now start the sovrin-node service
      13. You can verify the ledger has data using the read_ledger tool as the sovrin user.

      read_ledger --type domain
      

      14. From the CLI add a new steward for this node
      15. Using the CLI as the new Steward add the node with the send node transaction like below

      send NODE dest=<base 58 Key> data={'client_port': 9702, 'client_ip': '<IP Address>', 'alias': 'ohioLiveQA11', 'node_ip': '<IP Address>', 'node_port': 9701, 'services': ['VALIDATOR']}
      

      16. You should see the node show up in the CLI as connected.

      ISSUE
      The domain ledger will not sync.
      The other nodes show they are connected to the new node.
      The logs in on the new node show the following error

      2017-10-05 22:36:31,736 | INFO     | ledger_manager.py    ( 601) | hasValidCatchupReplies | Node11 could not verify catchup reply CATCHUP_REP{'txns': {'16': {'txnTime': 1507156101, 'data': None, 'verkey': None, 'ref': None, 'type': '1', 'alias': None, 'enc': None, 'signature_type': None, 'role': None, 'dest': 'CA6NHp54iKYu4zTEobYKy7', 'reqId': 1507156101755400, 'identifier': 'V4SGRU86Z58d6TV7PBUe6f', 'hash': None, 'signature': '2tN1sHvPmc8bcd3YT2fpW8tHibqAr8JbovmKCmompzfbDjU45mPr6Q5D6ZXkKqfDJg6uA6zXUbSRMESxy2LVTEAz', 'raw': None}}, 'consProof': ['7MMFgPR4syqDpTjnpXe5guGLWuVSTeNUUG25nnG5M1Ho', 'GBS6VPdF21Rz13AbiAjStwLULmthUJPV4eKeLgC7Pa99'], 'ledgerId': 1} since Bad Merkle proof: second root hash does not match. Expected hash: b'5da495937529bcb7a9cff1135316250839296bc7e655fe32555e1c4444411b72' , computed hash: b'338aa575f9f71708bc45f9ece724c09886648b5cd19bf2de79a2bf6d6d2db73b'
      

      Node11.log

        Attachments

        1. _node1.txt
          3.03 MB
        2. _node2.txt
          3.10 MB
        3. _node3.txt
          3.10 MB
        4. _node4.txt
          3.09 MB
        5. _node5.txt
          1.36 MB
        6. _node6.txt
          887 kB
        7. icenode 3_log.txt
          182 kB
        8. journal.txt
          319 kB
        9. journalctl_1.1.41_migration
          8 kB
        10. migration_failure.PNG
          migration_failure.PNG
          423 kB
        11. Node11.log
          22 kB

          Issue Links

            Activity

              People

              • Assignee:
                Artemkaaas Artem Ivanov
                Reporter:
                krw910 Kelly Wilson
                Watchers:
                Alexander Shcherbakov, Daniel Hardman, Kelly Wilson, Mike Bailey, Vladimir Shishkin
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: