Uploaded image for project: 'Fabric'
  1. Fabric
  2. FAB-18371

Peers misaligned when uploading data and never gets realigned due to missing blocks

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Highest
    • Resolution: Invalid: Environment Issue
    • v2.2.0, v2.3.0, v2.2.1, v2.2.2
    • None
    • None
    • Bring a cluster of 3 peers online, stress chaincode operations and data upload and follower peer or peers will get misaligned. After that, they cannot re-align anymore.

    Description

      My setup: cluster of 3 machines with HLF 2.3.0 (but tested also with 2.2.0), on each machine 1 orderer, 1 peer, 1 ca. Peer belongs to the same organization.

      Both orderers and peers have a leader and peer's leader sends blocks to others via gossip protocol.

      This same cluster was correctly working in 1.4.3 and never reported such kind of issues.

       

      I have a data uploader which constantly sends data to the blockchain.

      It seems that when the operation number is high and a lot of performances are needed, the follower peers are not able to receive new blocks from the leader and they get misaligned.

       

      On peer2, which is follower, I read on the log:

      • *19:10:16:367 UTC [gossip.state] func1 -> WARN 543f3 Block [765420] received from gossip wasn't added to payload buffer: Ledger height is at 765398, cannot enqueue block with sequence of 765420*
      • Suddenly after, block 765398 is received and the peer re-start aligning himself, but he already collected a lot of blocks, so he start validating them, until:
      • Block 765415 is correctly added:

      *2020-12-09 19:10:22.086 UTC [kvledger] commit -> INFO 54a1e [cbichannel] Committed block [765415] with 9 transaction(s) in 25ms (state_validation=1ms block_and_pvtdata_commit=18ms state_commit=3ms) commitHash=[7f27a8efafa07825bbef57b94d17bff9d45c9675746749802c2d38d7679acfe4]*

      • But suddenly after, I start receiving this error again:

      *2020-12-09 19:10:22.169 UTC [gossip.state] func1 -> WARN 54a1f Block [765440] received from gossip wasn't added to payload buffer: Ledger height is at 765416, cannot enqueue block with sequence of 765440*

      • From now on, the ledger heigth is 765416 (last block received is 765415) and the peer is not able to enqueue more blocks, even if it is receiving them.
      • For example, 20 minutes after, I am more than 400 blocks higher on the peer leader:

      *2020-12-09 19:39:30.803 UTC [gossip.state] func1 -> WARN 5d1f3 Block [770947] received from gossip wasn't added to payload buffer: Ledger height is at 765416, cannot enqueue block with sequence of 770947*

      • If I look for block 765416 on the logs, of course I cannot find the point where it does receive the block, while 417-418 are correctly received but cannot be added due to ledger height (because 765416 is missing). It is clear that a block is missing somehow and the peer is not able to recover it.
      • In this case I just miss 1 block, but I also experienced casistics where more than 1 block was missing

      The only manual fix I found was to stop all the peers and restart as first the one misaligned. In this way it gets aligned by orderers. After that, if I have another peer misaligned, I turn down the one just aligned and I do the same thing with the other. Once they are all aligned, I can bring up all the peers again. Of course this is a manual fix and cannot be a production operation. 

       

      I had this same cluster and data throughput in 1.4.3 and never had this kind of issues. It seems like a performance issue due to recent versions of HLF. Did anyone notice this bug or is it planned to be fixed?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ricba1995 Riccardo Basso
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: