Uploaded image for project: 'Indy Node'
  1. Indy Node
  2. INDY-1351

STN not accepting transactions with only one node down

    XMLWordPrintable

Details

    • Bug
    • Status: Complete
    • High
    • Resolution: Done
    • None
    • None
    • None
    • STN running 1.3.57

    • EV 18.11 Stability/ViewChange

    Description

      The STN currently has 11 nodes, 7 of which are owned by Sovrin. When one node of our seven is brought down, the network fails to post transactions. We should be well above consensus. An additional fact that confuses matters is that when we attempt to connect to the pool using the legacy CLI, it shows that it is connecting to nodes that are not currently part of the pool, but are now part of the live pool.  These nodes have all been demoted on this ledger.

      Validator-info shows the correct pool nodes:

      Validator england is running
      Current time: Friday, May 18, 2018 9:57:57 PM
      Validator DID: DNuLANU7f1QvW1esN3Sv9Eap9j14QuLiPeYzf28Nub4W
      Verification Key: 5PFZeZLWxaH8LxumLkLKq9LbfDNiCNb2xXR2TrGxSbrHeyu6Pfd8Kan
      Node Port: 9701/tcp on 0.0.0.0/0
      Client Port: 9702/tcp on 0.0.0.0/0
      Metrics:
      Uptime: 1 minute, 0 seconds
      Total Config Transactions: 501
      Total Ledger Transactions: 593
      Total Pool Transactions: 35
      Read Transactions/Seconds: 0.00
      Write Transactions/Seconds: 0.00
      Reachable Hosts: 11/11
      RFCU
      VeridiumIDC
      australia
      brazil
      canada
      england
      findentity
      ibm
      korea
      singapore
      virginia
      Unreachable Hosts: 0/11
      Software Versions:
      indy-node: 1.3.57
      sovrin: 1.1.9
      

      If you look in the attached cli log file, you will see erroneous connections to nodes such as TNO. The strange behavior of the CLI is not the thrust of this ticket, it is only a strange symptom. The emphasis of the investigation should be why one node being up or down can prevent consensus.

      This problem is repeatable on the STN. If you bring down any node, the pool does not achieve consensus. Korea was down at the time that these logs were obtained. When all seven of the sovrin-owned nodes are up, the pool is in consensus, and the CLI connects and acts normally.

      Logs for the sovrin-owned validators are also included. Logs will be requested from our external stewards and will be attached as they are received.

      Acceptance Criteria

      • Diagnose the issue and create a Plan of Attack, including associated stories and epics that can be scheduled.
      • If the problem proves to be a configuration issue, we can solve it immediately.

      Attachments

        1. australia.tgz
          1.81 MB
        2. brazil.tgz
          2.11 MB
        3. canada.tgz
          1.88 MB
        4. cli.tgz
          136 kB
        5. england.tgz
          1.93 MB
        6. korea.tgz
          1.99 MB
        7. singapore.tgz
          1.90 MB
        8. virginia.tgz
          2.08 MB

        Activity

          People

            mgbailey Mike Bailey
            mgbailey Mike Bailey
            Alexander Shcherbakov, Mike Bailey, Richard Esplin, Sergey Khoroshavin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: