Uploaded image for project: 'Fabric'
  1. Fabric
  2. FAB-15613

Single bi-directional connection management in gossip/comm



    • Task
    • Status: To Do
    • Medium
    • Resolution: Unresolved
    • None
    • Future
    • fabric-gossip
    • None


      The design of gossip communication is to manage a single bi-directional connection between a pair of peers.

      When 2 connections are created simultaneously between a pair of peers where both peers initiate a connection to each other there might be a race condition. Currently there are no tests in gossip/comm that reveal this. Although, there are tests in higher layers that reveal the problem, TestLeaderYield() for example. TestBasic() in gossip/comm can be used to reveal the race condition after removing the sleep between the sendings.

      The following sequence of events happen on failures (the following diagram might help understand). Comm1 and comm2 open connections to each other simultaneously. Comm1 calls getConnection(). There, it creates a connection to comm2 (1st connection), and because there are no existing connections (comm1 receives the connection from comm2 later), it keeps it. Then, when comm1 receives the connection from comm2, comm1 enters onConnected(), closes 1st connection (the connection it opened) and keeps the connection from comm2 (2nd connection).
      From comm2 perspective, it enters getConnection(). There, between the first check and the second check, it receives the connection from comm1 (1st connection). Since it passed the first check, it creates a connection to comm1 (2nd connection). But, because it received the connection from comm1, it doesn’t pass the second check and closes the connection it just opened (2nd connection).
      In conclusion, comm1 closes the connection it opened (1st connection) and comm2 closes the connection it opened (2nd connection) and we end up with no opened connection, so any message sent will fail.

      Alternatively, comm1 may fail earlier (before it reaches onConnected()), the same way as comm2. Namely, comm1 enters getConnection() and receives the connection from comm2 (2nd connection) between the two checks for an existing connection. It passes the first check and because of the receiving connection it closes the connection it created (1st connection).
      The bottom line is the same.



        Issue Links



              Unassigned Unassigned
              ronenschafferibm Ronen Schaffer
              0 Vote for this issue
              7 Start watching this issue