Getting started with the MongoDB sink

The MongoDB integration provides a way to mirror onchain data to a MongoDB collection of your choice. Data is automatically inserted as it's produced by the chain, and it's invalidated in case of chain reorganizations.

The integration can be used to populate a collection with data from one or more networks or smart contracts.
Create powerful analytics with MongoDB pipelines.
Change how collections are queried without re-indexing.

Installation

Terminal

apibara plugins install sink-mongo

Collection schema

The transformation step is required to return an array of objects. Data is converted to BSON and then written to the collection. The MongoDB integration adds a _cursor column to each record so that data can be invalidated in case of chain reorganizations.

Important indices

To ensure the best performance, you need to add the following indices to your MongoDB collections. This is especially important if you're indexing pending blocks.

_cursor.from
_cursor.to

Querying data

When querying data, you should always add the following property to your MongoDB filter to ensure you get the latest value:

{
    "_cursor.to": null,
}

The next section contains information on why you need to add this condition to your filter.

Storage & Data Invalidation

Storing blockchain data poses an additional challenge since we must be able to rollback the database state in case of chain reorganizations. This integration adds an additional _cursor field to all documents to track for which block range a piece of data is valid for.

type Cursor = {
  /** Block (inclusive) when this piece of data was created. */
  from: number;
  /** Block (exclusive) at which this piece of data became invalid. */
  to: number | null;
};

It follows that a field is valid at the most recent block if its _cursor.to field is null.

Example: we're indexing an ERC-721 token with the following transfers:

block: 1000, transfer from 0x0 to 0xA
block: 1010, transfer from 0xA to 0xB
block: 1020, transfer from 0xB to 0xC

If we put the token ownership on a timeline, it looks like the following diagram.

1000                    1010                  1020
--+-----------------------+---------------------+---- - - - - - - -
  [ { owner: "0xA }       )
                          [ { owner: "0xB" }    )
                                                [ { owner: "0xC" }

Which translates to the following documents in the MongoDB collection.

After the first transfer:

[{ "owner": "0xA", "_cursor": { "from": 1000, "to": null } }]

After the second transfer:

[
  { "owner": "0xA", "_cursor": { "from": 1000, "to": 1010 } },
  { "owner": "0xB", "_cursor": { "from": 1010, "to": null } }
]

And after the third transfer:

[
  { "owner": "0xA", "_cursor": { "from": 1000, "to": 1010 } },
  { "owner": "0xB", "_cursor": { "from": 1010, "to": 1020 } },
  { "owner": "0xC", "_cursor": { "from": 1020, "to": null } }
]