Getting Started

Overview

MongoDB integration

The MongoDB integration provides a way to mirror onchain data to a MongoDB collection of your choice. Data is automatically inserted as it's produced by the chain, and it's invalidated in case of chain reorganizations.

  • The integration can be used to populate a collection with data from one or more networks or smart contracts.
  • Create powerful analytics with MongoDB pipelines.
  • Change how collections are queried without re-indexing.

Installation#

apibara plugins install sink-mongo

Configuration#

  • connectionString: string: the Mongo connection URL of your database.
  • database: string: the target database name.
  • collectionName: string: the target collection name.
  • entityMode: boolean: enable entity mode. See the "Entity storage" section for more information.

Collection schema#

The transformation step is required to return an array of objects. Data is converted to BSON and then written to the collection. The MongoDB integration adds a _cursor column to each record so that data can be invalidated in case of chain reorganizations.

Querying data#

When querying data, you should always add the following property to your MongoDB filter to ensure you get the latest value:

{
"_cursor.to": null,
}

The "Storage & Data Invalidation" section at the end of this document contains information on why you need to add this condition to your filter.

Entity storage#

The MongoDB integration works with two types of data:

  • Immutable logs (default): the values returned by the indexer represent something that doesn't change over time, in other words they're a list of items. For example, they represent a list of token transfers.
  • Mutable entities: the values returned by the indexer represent the state of an entity at a given block. For example, token balances change block by block as users transfer the token.

You can index entities by setting the entityMode option to true. When you set this option, the indexer expects the transform function to return a list of update operations. An update operation is a JavaScript object with an entity property used to filter which entities should be updated, and an update property with either an update document or an aggregation pipeline.

Example: our indexer tracks token ownership for an ERC-721 smart contract together with the number of transactions for each token. We enable entity storage by setting the entityMode option to true. The transform function returns the entities that need update, together with the operation to update their state.

export default function transform(block: Block) {
// Example to show the shape of data returned by transform.
return [
{
entity: { contract, tokenId: "1" },
update: { "$set": { owner: "0xA" }, "$inc": { "txCount": 1 } }
},
{
entity: { contract, tokenId: "2" },
update: { "$set": { owner: "0xB" }, "$inc": { "txCount": 3 } }
},
];
}

The integration will iterate through the new entities and update the existing values (if any) using the following MongoDB pseudo-query:

for (const doc of returnValue) {
db.collection.updateMany({
filter: doc.entity,
update: doc.update,
options: {
upsert: true,
}
});
}

Notice that in reality the query is more complex, please refer to the next section to learn more about how the MongoDB integration stores data.

Storage & Data Invalidation#

Storing blockchain data poses an additional challenge since we must be able to rollback the database state in case of chain reorganizations. This integration adds an additional _cursor field to all documents to track for which block range a piece of data is valid for.

type Cursor = {
/** Block (inclusive) when this piece of data was created. */
from: number,
/** Block (exclusive) at which this piece of data became invalid. */
to: number | null,
};

It follows that a field is valid at the most recent block if its _cursor.to field is null.

Example: we're indexing an ERC-721 token with the following transfers:

  • block: 1000, transfer from 0x0 to 0xA
  • block: 1010, transfer from 0xA to 0xB
  • block: 1020, transfer from 0xB to 0xC

If we put the token ownership on a timeline, it looks like the following diagram.

1000 1010 1020
--+-----------------------+---------------------+---- - - - - - - -
[ { owner: "0xA } )
[ { owner: "0xB" } )
[ { owner: "0xC" }

Which translates to the following documents in the MongoDB collection.

After the first transfer:

[
{ "owner": "0xA", "_cursor": { "from": 1000, "to": null } }
]

After the second transfer:

[
{ "owner": "0xA", "_cursor": { "from": 1000, "to": 1010 } },
{ "owner": "0xB", "_cursor": { "from": 1010, "to": null } }
]

And after the third transfer:

[
{ "owner": "0xA", "_cursor": { "from": 1000, "to": 1010 } },
{ "owner": "0xB", "_cursor": { "from": 1010, "to": 1020 } },
{ "owner": "0xC", "_cursor": { "from": 1020, "to": null } }
]

Edit on GitHub