Uploaded image for project: 'Indy Node'
  1. Indy Node
  2. INDY-1595

Node can't catch up large ledger

    Details

    • Type: Bug
    • Status: Complete
    • Priority: High
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: 1.6.73
    • Component/s: None
    • Labels:
      None

      Description

      Environment:
      indy-nodeĀ 1.5.551

      Steps to Reproduce:
      1. Restore pool from backup with large ledger (1,340,000 txns), keep one node clear (Node24 in attached logs).
      2. Start the pool exclude clear node.
      3. Initialize clear node and start it.
      => Domain ledger of clear node is 30txns.
      4. Wait several hours.

      Actual Results:
      Clear node still has 30 txns in domain ledger, following messages appear in logs of the rest nodes:

      2018-08-15 09:29:56,416|INFO|ledger_manager.py|Node1 received catchup request: CATCHUP_REQ{'seqNoStart': 1061257, 'ledgerId': 1, 'catchupTill': 1340508, 'seqNoEnd': 1117110} from Node24
      2018-08-15 09:29:56,416|INFO|ledger_manager.py|Node1 generating consistency proof: 1117110 from 1340508
      2018-08-15 09:30:12,778|WARNING|prepare_batch.py|Too many split steps were done 9. Batches were not created
      2018-08-15 09:30:12,779|WARNING|prepare_batch.py|Too many split steps were done 9. Batches were not created
      ...... several hundreds of the same messages ......
      2018-08-15 09:30:12,812|WARNING|prepare_batch.py|Too many split steps were done 9. Batches were not created
      2018-08-15 09:30:12,812|WARNING|prepare_batch.py|Too many split steps were done 9. Batches were not created
      2018-08-15 09:30:12,812|WARNING|prepare_batch.py|Too many split steps were done 9. Batches were not created
      2018-08-15 09:30:12,812|ERROR|batched.py|Node1 cannot create batch(es) for Node24

      Expected Results:
      Clear node should complete catch up.

      Additional Information:

      • Pool restore was performed not quite right. Nodes 15, 18 and 25 were not started with the rest ones. 15 and 18 were started later (after steps to reproduce), 25 was not started at all.
      • The same behavior for 730K txns in the ledger, when all nodes were restored correctly.

      Logs: s3://qanodelogs/indy-1595
      To get logs, run following command on log processor machine:
      aws s3 cp --recursive s3://qanodelogs/indy-1595/ /home/ev/logs/indy-1595/

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ozheregelya Olga Zheregelya
                Reporter:
                ozheregelya Olga Zheregelya
                Watchers:
                Andrew Nikitin, Olga Zheregelya
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: