Uploaded image for project: 'Fabric'
  1. Fabric
  2. FAB-18371

Peers misaligned when uploading data and never gets realigned due to missing blocks



    • Bug
    • Status: Closed
    • Highest
    • Resolution: Invalid: Environment Issue
    • v2.2.0, v2.3.0, v2.2.1, v2.2.2
    • None
    • None
    • Bring a cluster of 3 peers online, stress chaincode operations and data upload and follower peer or peers will get misaligned. After that, they cannot re-align anymore.


      My setup: cluster of 3 machines with HLF 2.3.0 (but tested also with 2.2.0), on each machine 1 orderer, 1 peer, 1 ca. Peer belongs to the same organization.

      Both orderers and peers have a leader and peer's leader sends blocks to others via gossip protocol.

      This same cluster was correctly working in 1.4.3 and never reported such kind of issues.


      I have a data uploader which constantly sends data to the blockchain.

      It seems that when the operation number is high and a lot of performances are needed, the follower peers are not able to receive new blocks from the leader and they get misaligned.


      On peer2, which is follower, I read on the log:

      • *19:10:16:367 UTC [gossip.state] func1 -> WARN 543f3 Block [765420] received from gossip wasn't added to payload buffer: Ledger height is at 765398, cannot enqueue block with sequence of 765420*
      • Suddenly after, block 765398 is received and the peer re-start aligning himself, but he already collected a lot of blocks, so he start validating them, until:
      • Block 765415 is correctly added:

      *2020-12-09 19:10:22.086 UTC [kvledger] commit -> INFO 54a1e [cbichannel] Committed block [765415] with 9 transaction(s) in 25ms (state_validation=1ms block_and_pvtdata_commit=18ms state_commit=3ms) commitHash=[7f27a8efafa07825bbef57b94d17bff9d45c9675746749802c2d38d7679acfe4]*

      • But suddenly after, I start receiving this error again:

      *2020-12-09 19:10:22.169 UTC [gossip.state] func1 -> WARN 54a1f Block [765440] received from gossip wasn't added to payload buffer: Ledger height is at 765416, cannot enqueue block with sequence of 765440*

      • From now on, the ledger heigth is 765416 (last block received is 765415) and the peer is not able to enqueue more blocks, even if it is receiving them.
      • For example, 20 minutes after, I am more than 400 blocks higher on the peer leader:

      *2020-12-09 19:39:30.803 UTC [gossip.state] func1 -> WARN 5d1f3 Block [770947] received from gossip wasn't added to payload buffer: Ledger height is at 765416, cannot enqueue block with sequence of 770947*

      • If I look for block 765416 on the logs, of course I cannot find the point where it does receive the block, while 417-418 are correctly received but cannot be added due to ledger height (because 765416 is missing). It is clear that a block is missing somehow and the peer is not able to recover it.
      • In this case I just miss 1 block, but I also experienced casistics where more than 1 block was missing

      The only manual fix I found was to stop all the peers and restart as first the one misaligned. In this way it gets aligned by orderers. After that, if I have another peer misaligned, I turn down the one just aligned and I do the same thing with the other. Once they are all aligned, I can bring up all the peers again. Of course this is a manual fix and cannot be a production operation. 


      I had this same cluster and data throughput in 1.4.3 and never had this kind of issues. It seems like a performance issue due to recent versions of HLF. Did anyone notice this bug or is it planned to be fixed?


        Issue Links



              Unassigned Unassigned
              ricba1995 Riccardo Basso
              0 Vote for this issue
              2 Start watching this issue