Uploaded image for project: 'Indy Node'
  1. Indy Node
  2. INDY-1578

Promotion Workflow - Potential race condition when restarting node after promotion

    XMLWordPrintable

Details

    • Bug
    • Status: Complete
    • Medium
    • Resolution: Done
    • None
    • 1.6.78
    • test-automation
    • None
    • Unset
    • EV 18.18 Service Pack 2, EV 18.19

    Description

      When running the Demote Replica Chaos experiment defined by INDY-1541, one or more nodes persistently get "stuck" even following the prescribed restart following node promotion.

      ashcherbakov asked that I log this issue and attach the following artifacts:

      1. Logs for demoted/promoted nodes
      2. Logs for at least 1 normal node

      Identical steps to those executed by the Demote Replica Chaos experiment were exercised manually and thus removed chaostoolkit/chaosindy from the equation and was unable to reproduce the problem. Perhaps there is a race condition caused by the programmatic execution of the steps faster than can be done manually?

      Manual steps and their results are outlined in the following comment on INDY-1541:

      https://jira.hyperledger.org/browse/INDY-1541?focusedCommentId=48747&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-48747

      The attached logs are for the following nodes:

      Node1 - Was the master before indy-node was stopped to force view change. What you see in the screenshot is Node1's state following a indy-node service start on Node1 immediately AFTER promoting Node3 and Node4 (indy-node was restarted on Node3 and Node4 as prescribed by the "node promotion workflow")

      Node4 - The demoted node that has the correct/expected state following promotion and indy-node service restart

      Node3 - The demoted node that has the incorrect/unexpected state following promotion and indy-node service restart

      Node2 - The replica that became the master after indy-node on Node1 was stopped.

      Attachments

        1. Node1.log
          2.68 MB
        2. Node2.log
          2.85 MB
        3. Node3.log
          2.10 MB
        4. Node4.log
          2.27 MB

        Issue Links

          Activity

            People

              keichiri Janko Krstic
              ckochenower Corin Kochenower
              Corin Kochenower, Janko Krstic
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: