Details
-
Bug
-
Status: Complete
-
Highest
-
Resolution: Done
-
None
-
None
-
None
-
14, INDY 17.21, INDY 17.22
Description
When adding a new validator node to an existing pool the ledgers are unable to sync. Due to this issue the new node cannot participate in consensus and counts as a failed node.
The issue may be due to the use of the 'force=True' parameter in the POOL_UPGRADE transaction.
Short version
Setup pool with indy-node 1.0.28 using live pool configuration settings
Upgrade to 1.1.37
Add a new node to the pool from a fresh install of 1.1.37
Steps
1. Setup a pool using the provisional live build
indy-plenum=1.0.21 indy-anoncreds=1.0.8 indy-node=1.0.28 sovrin=1.0.3
2. Before starting the pool change the configuration to use the live transaction files. As the sovrin user edit ".sovrin/sovrin_config.py"
Add the following lines
poolTransactionsFile = 'pool_transactions_live' domainTransactionsFile = 'transactions_live'
3. Start the sovrin-node service. The ledgers for pool and domain will be created in the following directories
.sovrin/data/nodes/<node name>/pool_transactions_live .sovrin/data/nodes/<node name>/transactions_live
4. Send a few transactions from the CLI to make sure the pool is working correctly.
*Upgrade *to indy-node 1.1.37
Note- The upgrade to 1.1.37 introduced serialized ledgers. Due to this significant change it was necessary that all validator nodes in the pool upgraded simultaneously. The instructions to those upgrading the pool included the use of the upgrade parameter "force=True".
5. Send the an upgrade transaction with the parameter "force=True" like in the example below.
send POOL_UPGRADE name=upgradestable37 version=1.1.37 sha256=e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 action=start schedule={'Gw6pDLhcBcoQesN72qfotTgFa7cbuqZpkX3Xo6pLhPhv':'2017-10-04T17:30:00.258870-06:00','8ECVSk179mjsjKRLWiQtssMLgp6EPhWXtaYyStWPSGAb': '2017-10-04T17:30:00.258870-06:00','DKVxG2fXXTU8yT5N7hGEbXB3dfdAnYv1JczDUHpmDxya':'2017-10-04T17:30:00.258870-06:00','4PS3EDQ3dW1tci1Bp6543CfuuebjFrg36kLAUcskGfaA':'2017-10-04T17:30:00.258870-06:00','4SWokCJWJc69Tn74VvLS6t2G2ucvXqM9FDMsWJjmsUxe':'2017-10-04T17:30:00.258870-06:00','Cv1Ehj43DDM5ttNBmC6VPpEfwXWwfGktHwjDJsTV5Fz8':'2017-10-04T17:30:00.258870-06:00','BM8dTooz5uykCbYSAAFwKNkYfT4koomBHsSWHTDtkjhW':'2017-10-04T17:30:00.258870-06:00','98VysG35LxrutKTNXvhaztPFHnx5u9kHtT7PnUGqDa8x':'2017-10-04T17:30:00.258870-06:00','6pfbFuX5tx7u3XKz8MNK4BJiHxvEcnGRBs1AQyNaiEQL':'2017-10-04T17:30:00.258870-06:00','HaNW78ayPK4b8vTggD4smURBZw7icxJpjZvCMLdUueiN':'2017-10-04T17:30:00.258870-06:00'} timeout=10 force=True
6. After upgrading successfully the pool version on each node showed
indy-plenum=1.1.27 indy-anoncreds=1.0.10 indy-node=1.1.37 sovrin=1.1.6
7. Send some transactions to make sure the pool is functioning. I sent 15 transactions.
Note - At this point the pool was functioning and all nodes are in sync
Add Node - Now install a new node to add to the pool
8. Install the latest stable (indy-node 1.1.37) to a new machine.
9. Initialize the node, but do not start the services
10. From one of the nodes in the pool copy the following files to the .sovrin directory of the new node
pool_transactions_live_genesis domain_transactions_live_genesis
11. Before starting the node change the configuration file to use the live transaction files. As the sovrin user edit ".sovrin/sovrin_config.py"
Add the following lines
poolTransactionsFile = 'pool_transactions_live' domainTransactionsFile = 'domain_transactions_live'
Note the difference here. After the upgrade the transactions_live file was renamed to "domain_transactions_live" and the format change to be in a json format
12. Now start the sovrin-node service
13. You can verify the ledger has data using the read_ledger tool as the sovrin user.
read_ledger --type domain
14. From the CLI add a new steward for this node
15. Using the CLI as the new Steward add the node with the send node transaction like below
send NODE dest=<base 58 Key> data={'client_port': 9702, 'client_ip': '<IP Address>', 'alias': 'ohioLiveQA11', 'node_ip': '<IP Address>', 'node_port': 9701, 'services': ['VALIDATOR']}
16. You should see the node show up in the CLI as connected.
ISSUE
The domain ledger will not sync.
The other nodes show they are connected to the new node.
The logs in on the new node show the following error
2017-10-05 22:36:31,736 | INFO | ledger_manager.py ( 601) | hasValidCatchupReplies | Node11 could not verify catchup reply CATCHUP_REP{'txns': {'16': {'txnTime': 1507156101, 'data': None, 'verkey': None, 'ref': None, 'type': '1', 'alias': None, 'enc': None, 'signature_type': None, 'role': None, 'dest': 'CA6NHp54iKYu4zTEobYKy7', 'reqId': 1507156101755400, 'identifier': 'V4SGRU86Z58d6TV7PBUe6f', 'hash': None, 'signature': '2tN1sHvPmc8bcd3YT2fpW8tHibqAr8JbovmKCmompzfbDjU45mPr6Q5D6ZXkKqfDJg6uA6zXUbSRMESxy2LVTEAz', 'raw': None}}, 'consProof': ['7MMFgPR4syqDpTjnpXe5guGLWuVSTeNUUG25nnG5M1Ho', 'GBS6VPdF21Rz13AbiAjStwLULmthUJPV4eKeLgC7Pa99'], 'ledgerId': 1} since Bad Merkle proof: second root hash does not match. Expected hash: b'5da495937529bcb7a9cff1135316250839296bc7e655fe32555e1c4444411b72' , computed hash: b'338aa575f9f71708bc45f9ece724c09886648b5cd19bf2de79a2bf6d6d2db73b'