Details
-
Bug
-
Status: Complete
-
High
-
Resolution: Done
-
None
-
None
-
None
-
INDY 18.01: Stability+, Sprint 18.02 Stability
Description
The pool lost it's ability to reach consensus while a new node was performing a catch-up.
I don't have detailed logs only at the info level.
Setup
I have pool of 13 nodes with 5,022 transactions. I was adding 3 more nodes to the pool (14, 15, 16)
Steps
- I added Node14 and let it perform a catch up before adding the next node
- I added Node15 to the pool after Node14 was at 5,022 transactions
- I then added Node16 after Node15 was at 5,022 transactions
Other Info
- The catch-up is pretty fast so I run "read_ledger --type domain --count" around every 20 - 30 seconds to see when it has completed.
- The ledger tool was displaying the incorrect ledger count (this is a different issue) while performing a catch-up. I jumped from 12 to 6,000 transactions (more than what the pool has) and then to 9,337.
- The ledger on Node16 settled at 4,688 on Node16 and did not change.
- I sent a new transaction from the CLI on a different machine and the pool stopped taking transactions.
Error
With only info level debugging this is all I captured
( 29) | discard | Node1 discarding message INSTANCE_CHANGE{'reason': 26, 'viewNo': 1} because Received instance change request with view no 1 which is not more than its view no 1
It appears that a view change might have been attempted while Node16 was performing a catch-up.