Affects Version/s: None
Fix Version/s: None
Sprint:18.07 Stability & Monitoring, 18.08 Stability-Monitoring, EV 18.11 Stability/ViewChange
Steps to Reproduce:
1. Run multiple load tests (reading and writing) against 25 nodes pool.
2. Reach ~178k txns written.
3. Try to write some additional txns (via load script or CLI).
4. Check validator-info at all nodes.
One of nodes (24) stopped at 49k txns and doesn't catch up. One of nodes (11) stopped at 123k txns and doesn't catch up. All other nodes have 175..178k txns. All nodes show 25/25 reachable hosts in validator-info but pool has no consensus. Whole pool restart doesn't help.
Journalctl is in attachment. Nodes' logs will be added to google drive.
Diagnose the issue, decide on a Plan of Attack, and raise the appropriate epics and stories that can be scheduled.