The pool lost it's ability to reach consensus while a new node was performing a catch-up.
I don't have detailed logs only at the info level.
I have pool of 13 nodes with 5,022 transactions. I was adding 3 more nodes to the pool (14, 15, 16)
- I added Node14 and let it perform a catch up before adding the next node
- I added Node15 to the pool after Node14 was at 5,022 transactions
- I then added Node16 after Node15 was at 5,022 transactions
- The catch-up is pretty fast so I run "read_ledger --type domain --count" around every 20 - 30 seconds to see when it has completed.
- The ledger tool was displaying the incorrect ledger count (this is a different issue) while performing a catch-up. I jumped from 12 to 6,000 transactions (more than what the pool has) and then to 9,337.
- The ledger on Node16 settled at 4,688 on Node16 and did not change.
- I sent a new transaction from the CLI on a different machine and the pool stopped taking transactions.
With only info level debugging this is all I captured
It appears that a view change might have been attempted while Node16 was performing a catch-up.