Resolution: Invalid: Environment Issue
v2.2.0, v2.3.0, v2.2.1, v2.2.2
Bring a cluster of 3 peers online, stress chaincode operations and data upload and follower peer or peers will get misaligned. After that, they cannot re-align anymore.
My setup: cluster of 3 machines with HLF 2.3.0 (but tested also with 2.2.0), on each machine 1 orderer, 1 peer, 1 ca. Peer belongs to the same organization.
Both orderers and peers have a leader and peer's leader sends blocks to others via gossip protocol.
This same cluster was correctly working in 1.4.3 and never reported such kind of issues.
I have a data uploader which constantly sends data to the blockchain.
It seems that when the operation number is high and a lot of performances are needed, the follower peers are not able to receive new blocks from the leader and they get misaligned.
On peer2, which is follower, I read on the log:
- *19:10:16:367 UTC [gossip.state] func1 -> WARN 543f3[0m Block  received from gossip wasn't added to payload buffer: Ledger height is at 765398, cannot enqueue block with sequence of 765420*
- Suddenly after, block 765398 is received and the peer re-start aligning himself, but he already collected a lot of blocks, so he start validating them, until:
- Block 765415 is correctly added:
*[34m2020-12-09 19:10:22.086 UTC [kvledger] commit -> INFO 54a1e[0m [cbichannel] Committed block  with 9 transaction(s) in 25ms (state_validation=1ms block_and_pvtdata_commit=18ms state_commit=3ms) commitHash=[7f27a8efafa07825bbef57b94d17bff9d45c9675746749802c2d38d7679acfe4]*
- But suddenly after, I start receiving this error again:
*[33m2020-12-09 19:10:22.169 UTC [gossip.state] func1 -> WARN 54a1f[0m Block  received from gossip wasn't added to payload buffer: Ledger height is at 765416, cannot enqueue block with sequence of 765440*
- From now on, the ledger heigth is 765416 (last block received is 765415) and the peer is not able to enqueue more blocks, even if it is receiving them.
- For example, 20 minutes after, I am more than 400 blocks higher on the peer leader:
*[33m2020-12-09 19:39:30.803 UTC [gossip.state] func1 -> WARN 5d1f3[0m Block  received from gossip wasn't added to payload buffer: Ledger height is at 765416, cannot enqueue block with sequence of 770947*
- If I look for block 765416 on the logs, of course I cannot find the point where it does receive the block, while 417-418 are correctly received but cannot be added due to ledger height (because 765416 is missing). It is clear that a block is missing somehow and the peer is not able to recover it.
- In this case I just miss 1 block, but I also experienced casistics where more than 1 block was missing
The only manual fix I found was to stop all the peers and restart as first the one misaligned. In this way it gets aligned by orderers. After that, if I have another peer misaligned, I turn down the one just aligned and I do the same thing with the other. Once they are all aligned, I can bring up all the peers again. Of course this is a manual fix and cannot be a production operation.
I had this same cluster and data throughput in 1.4.3 and never had this kind of issues. It seems like a performance issue due to recent versions of HLF. Did anyone notice this bug or is it planned to be fixed?