Uploaded image for project: 'Fabric'
  1. Fabric
  2. FAB-14500

RAFT term mismatch for 6 seconds

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Medium
    • Resolution: Invalid: Environment Issue
    • None
    • None
    • fabric-orderer
    • (Please add steps to reproduce)

    Description

      Under stress test (FAB-14350), we observed that OSN1 becomes leader with a new term 31, but 6 seconds later OSN2 still remains in the old term 30.

      2019-02-26 21:26:47.341 UTC [orderer.consensus.etcdraft] run -> INFO 6305f raft.node: 1 lost leader 3 at term 30 channel=testorgschannel139 node=1
      2019-02-26 21:26:47.383 UTC [orderer.consensus.etcdraft] becomeLeader -> INFO 632f5 1 became leader at term 31 channel=testorgschannel139 node=1
      2019-02-26 21:26:47.383 UTC [orderer.consensus.etcdraft] run -> INFO 632f6 raft.node: 1 elected leader 1 at term 31 channel=testorgschannel139 node=1
      2019-02-26 21:26:53.786 UTC [orderer.consensus.etcdraft] run -> INFO b00ca raft.node: 2 lost leader 3 at term 30 channel=testorgschannel139 node=2

      Jay answered in the slack channel: every time a new election is completed, term is advanced by 1. Those nodes in old term will eventually catch up when they receive message from new leader. If leader failover happens, this log is expected. Although the real question is, why leader failover occurs?

      Our question is why it takes 6 seconds to catch up? This means that OSN1 and ONS2 are disconnected. What causes the disconnection? Is this related to FAB-14499? Also since it takes 6 seconds to catch up, shall the leader re-elect again?

      This defect is to followup Jay's answer in order to track this issue.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              dongming Dongming Hwang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: