Details
-
Bug
-
Status: Complete
-
Medium
-
Resolution: Done
-
None
-
None
-
None
-
Sprint 18.03 Stability, DKMS, Sprint 18.04
Description
Ordering of 3PC-batches on a backup replica may stop after a catch-up. It is so because the backup replica tries to adjust last_ordered_3pc according to the lowest probable prepared certificate (in order to be able to order next batches) only when it receives and stashes an out-of-order PREPREPARE. But normally the PREPREPARE is received before the PREPAREs (which forms together the prepared certificate).
However, usually the backup replica succeeds with adjusting last_ordered_3pc because when after a catch-up the replica receives and stashes the next PREPREPARE which turns out to be out-of-order, it requests missing PREPREPAREs and PREPAREs. This step seems to be incorrect because the backup replica will discard most (if not all) previous 3PC-messages as laying out of the watermarks (which were updated after a catch-up using the master's last ordered 3PC-key). But the range of messages being requested as missed includes pp_seq_no of the out-of-turn PREPREPARE. So this PREPREPARE is re-received from other nodes in MESSAGE_RESPONSEs. Due to the last two MESSAGE_RESPONSEs with this PREPREPARE are usually received after the quorum number of the PREPAREs have been received in the normal way, the backup replica adjusts its last_ordered_3pc when receives the former and commits the prepared certificate when receives the latter. After this the backup replica is able to process new batches in the normal way.
In scope of this ticket the specified logic must be reworked. The range of messages being requested as missed must never include pp_seq_no of the out-of-order PREPREPARE. When after a catch-up a backup replica receives the next PREPREPARE, it must not request the previous messages. But at the same time, a backup replica must set up its state after a catch-up in a way to be able to order next 3PC-batches.
Attachments
Issue Links
- is blocked by
-
INDY-1301 [Design] Design catch-up procedure divided into separate sub-procedures for each replica
-
- New
-