CouchDB rich queries require indexes to perform well. Additionally, queries with a sort require an index. Fabric supports CouchDB declarative JSON queries in chaincode as defined at http://docs.couchdb.org/en/2.0.0/api/database/find.html. This work item proposes to add support for indexes as defined at the same CouchDB doc page, as well as management lifecycle to create indexes.
CouchDB queries are defined in chaincode and executed from chaincode, therefore the index management lifecycle should follow from chaincode management lifecycle. Indexes are authored alongside queries, and therefore indexes will be defined alongside chaincode and packaged in the chaincode installation package. Indexes will be deployed on a peer's statedb when the peer processes the block that includes the chaincode instantiation transaction, as this is when the associated chaincode database resources are setup. In the future, a chaincode 'define' step may be added prior to instantiation, and in that case the indexes would be deployed to statedb upon processing the define transaction on each peer.
Indexes that are required to support chaincode queries will be defined in the chaincode installation package. Define default indexes in chaincode source metadata directory (e.g. a META-INF/statedb/couchdb/indexes/<index_name>.json file per required index). These get packaged up into chaincode installation package.
The automatic chaincode index deploy behavior will be driven by core.yaml peer.ledger.state.couchDBConfig.autoDeployChaincodeIndexes property, set to true by default, with the following behavior:
- If chaincode is already installed on peer, when chaincode gets instantiated (or upgraded) for one of the peer’s channels, the channel_chaincode database gets created and indexes from the install package are automatically deployed upon processing the chaincode instantiate/upgrade transaction.
- If chaincode is not yet installed on peer, the chaincode instantiate step would create the channel_chaincode database to support peer committer role, but would not created indexes since the peer cannot yet execute chaincode (and since the chaincode install package including index definitions is not available). If the chaincode is installed at a later time, meaning the peer becomes a potential endorser, the install step would iterate through the peer’s CouchDB channel_chaincode databases and automatically deploy the default chaincode indexes that are included in the chaincode installation package. Some synchronization is required to ensure that any channels that get configured for the chaincode while the install is in progress also get the indexes. During the install step the indexes should get created before the chaincode gets added to peer's file system, so that in the case of any problems the install step could be run again (index creation is idempotent in CouchDB, that is, can be re-created in a retry).
For subsequent chaincode versions, the indexes in the chaincode installation package will be created with the following behavior:
- If the same indexes are included in subsequent chaincode version packages. Index re-deployment causes no harm and results in a noop (question - does the index get re-built in CouchDB? If so the peer should not attempt to re-create).
- If a named index has a change to the index definition, the updated index definition will be re-deployed to CouchDB.
- If new indexes are added to chaincode package, the new indexes will be deployed to CouchDB.
- If indexes are removed from chaincode package, the indexes will not be automatically dropped, as peer-specific indexes may have been added and may still be required, but they may be dropped manually at any time (see below).
If peer.ledger.state.couchDBConfig.autoDeployChaincodeIndexes is false, indexes will not be automatically deployed. This approach preserves an option to manage indexes completely manually in CouchDB. This would likely be even more important for other types of state databases, for example if support is added for relational databases, most peer administrators would likely want table and index DDL to be executed manually by a DBA rather than automatically. But for CouchDB indexes automatic deployment is a preferred and acceptable default in most scenarios.
Peer specific indexes and future work
Suppose an auditor organization performs additional queries against their peers, which are never executed on other peers. Additional indexes may be required on these auditor peers only. These additional indexes may be created and dropped directly in CouchDB (for example by a hosting service). There may also be a need to drop indexes from previously deployed chaincode versions which are no longer applicable. These will be manually dropped. Additionally, if index deployment fails the indexes will need to be manually fixed. Hosting services will need to be able to manage CouchDB indexes as needed, in additional to general CouchDB management/monitoring.
In the future a peer API to manage indexes may be added, with corresponding command line options. This would provide an option for peer consumers to manage their own indexes without requiring direct access to CouchDB. This is not in the scope of this work item, but command line options similar to the following could be added in the future:
CouchDB per channel/chaincode instead of per channel
Currently there is one CouchDB database per channel, matching the commit granularity for blocks (channel level). However, logically chaincode key/value data is scoped per chaincode within the channel. That is, key/value data can only be accessed or queried by the chaincode that created the data. Similarly, indexes should only apply to data from the corresponding chaincode (it would not make sense to apply an index from cc1 to documents from cc2). Since CouchDB indexes apply to all documents within a CouchDB database, the granularity of CouchDB databases will have to change from one db per channel, to one db per channel/chaincode. This is acceptable, since a db in couchdb is lightweight - think of it more like a table in other databases.
Implementation implications (Done in -
- CouchDB database name will be based on <channel_chaincode>. See subtask ------
------ for more details. FAB-7130
- VersionedDB for the CouchDB state database impl will logically remain at channel level (since commits are applied at channel level), however there will be N channel_chaincode databases, rather than a single database defined for the VersionedDB.
- Data will be committed to each chaincode-specific database upon block commit. Failure to commit to any chaincode-specific database will result in block commit failure, and will be fully recoverable using existing state database recovery logic (idempotent retries).
- Performance has been evaluated and there is no material difference between bulk writing to one CouchDB database or bulk writing to N CouchDB databases in parallel during block commit processing (actually, parallel bulk updates is the most performant method of writing to CouchDB).
- State database recovery logic will continue to utilize a channel scoped savepoint document that is written upon commit of all chaincode data in the block. Since the CouchDB database will no longer be channel scoped, the savepoint document per channel will be saved to a new CouchDB database to store state database metadata.
- For environments upgrading from 1.0 with existing channel-scoped databases, upon peer 1.1 start the channel_chaincode scoped databases will not be found and will be created and populated from the chain data following existing recovery process. In release notes we can recommend that CouchDB data be dropped when upgrading from 1.0 to 1.1, but even if this is not done there is no functional problem, there will simply be a set of orphaned data in the database (orphaned 1.0 channel-scoped databases, as well as newly created 1.1 channel/chaincode-scoped databases). The orphaned 1.0 channel-scoped databases can be dropped anytime, it's just easier to drop all the CouchDB data prior to the 1.1 upgrade.