Uploaded image for project: 'Indy Node'
  1. Indy Node
  2. INDY-1565

Improve throughput calculation to reduce a chance of false positive View Changes

    Details

    • Type: Task
    • Status: Complete
    • Priority: High
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: 1.6.73
    • Component/s: None
    • Labels:
      None

      Description

      During analysis of INDY-1556 and INDY-1562, we discovered that there can be cases where View Change may happen unexpectedly.

      One of the problems is that we calculate the throughput based on requests number, while the trend is more for 3PC batches count. If we have 3PC batches with a big difference in requests count, we may see false positives because of spikes in the trend.

      As of now, we use Exponential moving average for throughput calculation with constant window and constant alpha.

       

      Acceptance criteria:

      • Write tests that there is no view change detected if we got the same amount of requests on all instances, but with a different distribution of ordering time:
        • All instances ordered equally with some medium throughput; one instance didn't order for a some period, and then ordered all stashed requests in 1 huge 3PC batch
        • All instances ordered equally with some medium throughput; one instance didn't order for a some period, and then ordered all stashed requests in multiple huge 3PC batches
        • All instances didn't order for a long period of time, and then a small load started, and some backups started ordering earlier than master
        • All instances didn't order for a long period of time, and then a huge load started, and some backups started ordering earlier than master
        • All instances ordered equally with some medium throughput; and then all instances started ordering with a heavier load, but some non-master did it a bit earlier
        • All instances ordered equally with some medium throughput; and then load stopped, leading to no more requests ordered
        • Other test cases
      • Provide a theory for modification of the current algorithm.
        Some ideas:
        • Fall back to initial-window mode if the current throughput value is almost zero.
        • Do not trigger view change immediately, but wait for a sequence of trigger events (for example, backups noticed that master throughput degraded 5 times in a row).
        • Use dynamic window and/or dynamic alpha for Exponential moving average
          • consider the window doesn't contain zero intervals and extended till we have some non-zero values
        • Use different moving average algorithm (not exponential) with a proper window
          • for example, not exponential, but quadratic trend at the beginning of the window
          • explore moving average with momentum
          • consider digital filter synthesis (IIR or FIR) to obtain needed response and avoid ringing
        • Include information about 3PC batch size into algorithm
        • use moving average for 3PC batches, not requests. Consider a separate metric for requests
          • beware that using moving average just for 3PC batches will allow malicious primary to drop requests unnoticed
      • Implement the modifications

       

        Attachments

          Activity

            People

            • Assignee:
              VladimirWork Vladimir Shishkin
              Reporter:
              ashcherbakov Alexander Shcherbakov
              Watchers:
              Alexander Shcherbakov, Sergey Shilov, Vladimir Shishkin
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: