Uploaded image for project: 'Fabric'
  1. Fabric
  2. FAB-2809

Chaincode pagination of query results

    XMLWordPrintable

    Details

    • Type: Epic
    • Status: Closed (View Workflow)
    • Priority: High
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: v1.3.0
    • Component/s: fabric-peer
    • Labels:
      None
    • Epic Name:
      Chaincode pagination of query results
    • SDK Impact:
      No
    • Design Status:
      Not Required
    • Function Test Status:
      Done
    • Documentation Status:
      Done
    • Sample/Tutorial:
      Done
    • Current Status:
      Done.
    • Test Plan:
      Hide
      [Chris Elder to add integration test plan, collaborate with Adnan on system test plan]

      SVT Prime for reviewing testplan: Adnan
      Show
      [Chris Elder to add integration test plan, collaborate with Adnan on system test plan] SVT Prime for reviewing testplan: Adnan

      Description

      Currently, couchdb queries are limited to configurable max number of results/docs, based on peer queryLimit config option (default 10000).  The shim/peer support pagination but CouchDB query interators do not.

      Users often want to add pagination to couchdb queries, since it is not viable to retrieve large numbers of documents from couchdb in a single request.  However, allowing user controlled pagination has risk since users could pass large values for the limit (causing large payloads from couchdb to peer) or could pass large values for skip (causing couchdb to do a large amount of work to iterate through results). Both could add stress to the environment and a malicious user could use this as part of a denial of service attack.  Fabric 1.0 therefore does not support user provided values of limit and skip.

      As a workaround in current releases, queryLimit can be set to the preferred page size.  Then the client can drive the paging by doing subsequent queries, as follows:
      For range queries, the start key of the next query would be the last key of the prior result set (disregard the first item that comes back).
      For rich JSON queries, sort on some field and use that field in the subsequent query filter criteria.
       
      It would be preferable for client to be able to specify the page size however, and have more implicit handling of paging.  This task will add safe implicit pagination to CouchDB query iterators using the favored couchdb range query limit/startkey pagination approach described here  http://docs.couchdb.org/en/2.0.0/couchapp/views/pagination.html#paging-alternate-method.   A reasonable limit from the peer config will be utilized (e.g. default of 1000). Chaincode will be able to iterate through as many results as desired, with the implicit pagination retrieving the next set of rows as needed.  This approach enforces safe pagination, and abstracts pagination away from the chaincode, so that the chaincode author and client authors do not have to think about it.  It will also allow for raising the max queryLimit to a much larger value, for example a total limit default of 1000000 may be put in place, that is, implicitly allow up to 1000 queries of 1000 documents each, by default.
      A user or chaincode provided limit will be honored if it is smaller than the peer-configured limit (default 1000).  This will enable applications to do their own paging logic if they still desire to manage it on their own. For example they could have the chaincode implement the limit/startkey paging approach for 100 documents at a time, while the fabric will still protect against large limit and large skip values.

       

      Implemention approach will actually be different for range queries (all_docs api) versus rich/selector queries (_find api):
      1. startkey/limit approach for range queries - #3 below
      2. storing all the docids in iterator(in-memory) for rich(selector) queries - #2 below

      Implemented only for current query support which is mango query and range query. Map-Reduce view couchdb queries are not yet supported by Fabric.

       

      ANALYSIS OF OPTIONS

      Alternate options considered (vs cost):
      Issue: Shim fetches results in batches (of 100) and returns them to chaincode/caller. On the peer, the handler, maintains a QueryIterator for each txid (query uuid) which is used to fetch Next set of results from ledger. For Couch internally, all queries limit to N results and do no paging.

      Proposal: Statedbcouch should fetch results in batches from Couch and return them to handler. Every call to statedb.ResultsIterator.Next() will not translate to couchdb call. The size of the batch could be different from one used by shim (but is best if both match).
      Revert the shim batch size to 100 (this should not be configurable).
      Retain the config parameter for system-wide couch(or all dbs?) query limits, this will throttle queries that hit couch but it will remain as overall limit on any couch query/view and can be a large number ~1million (ideally this should be on couchdb side, but since it's not...maybe it doesnt matter to couch....so we do away with it?)
      _
      Approach 1: Use Skip and Limit_
      For Mango queries:
      Record the querystring/indexname in Query Iterator. manipulate skip and limit and repeat the request (_find or view) everytime statecouchdb is invoked.
      Downside: This is costly as skip is performed on resultset after the query is executed. Every time we skip S and limit to next L results, we will be re-executing the full query (on upto S+L records)
      For Range queries: Without storing new startkey for Next() call, run the same query with skip and limit. Only skip and limit count is stored.

      Approach 2: Cache the keys (or results) of all documents matching Query
      On a call to query
      Perform the query/view as is
      For mango queries
      Store the "fields" to be returned as part of QueryIterator
      Store the [N:] docids returned from query into QueryIterator
      For view queries
      Store the [N:] (id,key,value) returned from view into QueryIterator. Since these are mostly aggregates the value should not be large (to store in memory)

      On call to Next()
      For mango queries
      Use rich selector query with $contains operator for first N docids and select only required "fields"
      Run the query and return results
      slice the results in QueryIterator by [N:]
      For map/reduce queries
      Return N results from QueryIterator
      slice the results in QueryIterator by [N:]

      One adv of this approach is we can support RowCount() (not important as this is not exposed)
      _
      Approach 3: Use startkey and startkey_docid_  and limit
      For Mango queries:
      Sortkeys need not be unique. Sortkeys need not be returned in "fields". For sorting, couchdb requires that "There is an index already defined, with all the sort fields in the sameorder." - so we cannot add "_id" to sortkeys and expect it to also sort by _id and in same direction
      Even if records with same sortkeys are sorted by docid (and in same direction), this will require extensive massaging of querystring during Next() - eg., negating for all results read so far...and then skipping first few records

      This can be ONLY be used for map/reduce queries and range queries
      On a call to query
      Store the viewname in the query iterator.
      Fetch N(limit=) records, where N is the batch size of Couch.
      Store last record's key as startkey and docid as startkey_docid for next invocation of Next()

      On a call to Next()
      Use viewname from query iterator
      Add an addition constraint
      Map/Reduce query -> startkey = <startkey>&startkey_docid = <startkey_docid>
      Fetch N+1(limit=) records, where N is the batch size of Couch. Do not set skip. If system-wide limit is specified, then its minof(N+1,system_limit-R) where R is maintained as #records returned by iterator so far.
      Store last record's key as startkey and docid as startkey_docid for next invocation of Next()

      The same approach can be used for range queries where we store the last records' key as startkey for next invocation of Next() and avoid using skip.

      In all cases, if we get less than limit records, we can mark the iterator as complete
      On statedb.ResultsIterator.Close() (or when the shim sends COMPLETED and cleansup) all resources are released

      Refer: http://docs.couchdb.org/en/latest/couchapp/views/pagination.html

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              chris.elder Chris Elder
              Reporter:
              balaji.viswanathan Balaji Viswanathan
              Votes:
              7 Vote for this issue
              Watchers:
              21 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Git Integration