DNA v2 architecture

This page describes in detail the architecture of DNA v2.

At a high-level, the goals for DNA v2 are:

serve onchain data through a protocol that's optimized for building indexers.
provide a scalable and cost-efficient way to access onchain data.
decouple compute from storage.

This is achieved by building a cloud native service that ingests onchain data from an archive node and stores it into Object Storage (for example Amazon S3, Cloudflare R2). Data is served by stateless workers that read and filter data from Object Storage before sending it to the indexers. The diagram below shows all the high-level components that make a production deployment of DNA v2. Communication between components is done through etcd.

                 ┌─────────────────────────────────────────────┐                 
                 │                Archive Node                 │░                
                 └─────────────────────────────────────────────┘░                
                  ░░░░░░░░░░░░░░░░░░░░░░│░░░░░░░░░░░░░░░░░░░░░░░░                
                                        │                                        
                                        │                                        
  ╔═ DNA Cluster ═══════════════════════╬══════════════════════════════════════╗ 
  ║                                     │                                      ║░
  ║ ┌──────┐                            ▼                            ┌──────┐  ║░
  ║ │      │     ┌─────────────────────────────────────────────┐     │      │  ║░
  ║ │      │     │                                             │     │      │  ║░
  ║ │      │◀────│              Ingestion Service              │────▶│      │  ║░
  ║ │      │     │                                             │     │      │  ║░
  ║ │      │     └─────────────────────────────────────────────┘     │      │  ║░
  ║ │      │     ┌─────────────────────────────────────────────┐     │      │  ║░
  ║ │      │     │                                             │     │      │  ║░
  ║ │      │◀────│             Compaction Service              │────▶│      │  ║░
  ║ │      │     │                                             │     │      │  ║░
  ║ │      │     └─────────────────────────────────────────────┘     │      │  ║░
  ║ │  S3  │     ┌─────────────────────────────────────────────┐     │ etcd │  ║░
  ║ │      │     │                                             │     │      │  ║░
  ║ │      │◀────│               Pruning Service               │────▶│      │  ║░
  ║ │      │     │                                             │     │      │  ║░
  ║ │      │     └─────────────────────────────────────────────┘     │      │  ║░
  ║ │      │     ┌───────────────────────────────────────────┐       │      │  ║░
  ║ │      │     │┌──────────────────────────────────────────┴┐      │      │  ║░
  ║ │      │     ││┌──────────────────────────────────────────┴┐     │      │  ║░
  ║ │      │     │││                                           │     │      │  ║░
  ║ │      │     │││                  Stream                   │     │      │  ║░
  ║ │      │◀────┤││                                           ├────▶│      │  ║░
  ║ │      │     │││                  Service                  │     │      │  ║░
  ║ └──────┘     └┤│                                           │     └──────┘  ║░
  ║               └┤                                           │               ║░
  ║                └───────────────────────────────────────────┘               ║░
  ║                                                                            ║░
  ╚════════════════════════════════════════════════════════════════════════════╝░
   ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░

DNA service

The DNA service is comprised of several components:

ingestion service: listens for new blocks on the network and stores them into Object Storage.
compaction service: combines multiple blocks together into segments. Segments are grouped by data type (like logs, transactions, and receipts).
pruner service: removes blocks that have been compacted to reduce storage cost.
stream service: receives streaming requests from clients (indexers) and serves onchain data by filtering objects stored on S3.

Ingestion service

The ingestion service fetches blocks from the network and stores them into Object Storage. This service is the only chain-specific service in DNA, all other components work on generic data-structures.

Serving onchain data requires serving a high-volume of data filtered by a relatively small number of columns. When designing DNA, we took a few decisions to make this process as efficient as possible:

data is stored as pre-serialized protobuf messages to avoid wasting CPU cycles serializing the same data over and over again.
filtering is entirely done using indices to reduce reads.
joins (for example include logs' transactions) are also achieved with indices.

The ingestion service is responsible for creating this data and indices. Data is grouped into blocks. Blocks are comprised of fragments, that is groups of related data. All fragments have an unique numerical id used to identify them. There are four different types of fragments:

index: a collection of indices, the fragment id is 0. Indices are grouped by the fragment they index.
join: a collection of join indices, the fragment id is 254. Join indices are also grouped by the source fragment index.
header: the block header, the fragment id is 1. Header are stored as pre-serialized protobuf messages.
body: the chain-specific block data, grouped by fragment id.

Note that we call block number + hash a cursor since it uniquely identifies a block in the chain.

 ╔═ Block ══════════════════════════════════════════════════════════════╗ 
 ║ ┌─ Index ──────────────────────────────────────────────────────────┐ ║░
 ║ │ ┌─ Fragment 0 ─────────────────────────────────────────────────┐ │ ║░
 ║ │ │┌────────────────────────────────────────────────────────────┐│ │ ║░
 ║ │ ││                          Index 0                           ││ │ ║░
 ║ │ │├────────────────────────────────────────────────────────────┤│ │ ║░
 ║ │ ││                          Index 1                           ││ │ ║░
 ║ │ │├────────────────────────────────────────────────────────────┤│ │ ║░
 ║ │ │                                                              │ │ ║░
 ║ │ │├────────────────────────────────────────────────────────────┤│ │ ║░
 ║ │ ││                          Index N                           ││ │ ║░
 ║ │ │└────────────────────────────────────────────────────────────┘│ │ ║░
 ║ │ └──────────────────────────────────────────────────────────────┘ │ ║░
 ║ └──────────────────────────────────────────────────────────────────┘ ║░
 ║ ┌─ Join ───────────────────────────────────────────────────────────┐ ║░
 ║ │ ┌─ Fragment 0 ─────────────────────────────────────────────────┐ │ ║░
 ║ │ │┌────────────────────────────────────────────────────────────┐│ │ ║░
 ║ │ ││                         Fragment 1                         ││ │ ║░
 ║ │ │├────────────────────────────────────────────────────────────┤│ │ ║░
 ║ │ ││                         Fragment 2                         ││ │ ║░
 ║ │ │└────────────────────────────────────────────────────────────┘│ │ ║░
 ║ │ └──────────────────────────────────────────────────────────────┘ │ ║░
 ║ └──────────────────────────────────────────────────────────────────┘ ║░
 ║ ┌─ Body ───────────────────────────────────────────────────────────┐ ║░
 ║ │ ┌──────────────────────────────────────────────────────────────┐ │ ║░
 ║ │ │                                                              │ │ ║░
 ║ │ │                           Fragment 0                         │ │ ║░
 ║ │ │                                                              │ │ ║░
 ║ │ └──────────────────────────────────────────────────────────────┘ │ ║░
 ║ │ ┌──────────────────────────────────────────────────────────────┐ │ ║░
 ║ │ │                                                              │ │ ║░
 ║ │ │                           Fragment 1                         │ │ ║░
 ║ │ │                                                              │ │ ║░
 ║ │ └──────────────────────────────────────────────────────────────┘ │ ║░
 ║ └──────────────────────────────────────────────────────────────────┘ ║░
 ╚══════════════════════════════════════════════════════════════════════╝░
 ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░

On supported networks, the ingestion service is also responsible for periodically refreshing the mempool (pending) block data and uploading it into Object Storage. This works exactly as for all other blocks.

The ingestion service tracks the canonical chain and uploads it to Object Storage. This data is used by the stream service to track online and offline chain reorganizations.

The ingestion service stores its data on etcd. Stream services subscribe to etcd updates to receive notifications about new blocks ingested and other changes to the chain (for example changes to the finalized block).

Finally, the ingestion service is fault tolerant. When the ingestion service starts, it acquires a distributed lock from etcd to ensure only one instance is running at the same time. If running multiple deployments of DNA, all other instances will wait for the lock to be released (following a service restart or crash) and will try to take over the ingestion.

Compaction service

The compaction service groups together data from several blocks (usually 100 or 1000) into segments. Segments only contain data for one fragment type (for example headers, indices, and transactions). In other words, the compaction service groups N blocks into M segments.

Only data that has been finalized is compacted into segments.

The compaction service also creates block-level indices called groups. Groups combine indices from multiple blocks/segments to quickly look up which blocks contain specific data. This type of index is very useful to increase performance on sparse datasets.

 ╔═ Index Segment ═══════════════════════╗         ╔═ Transaction Segment ═════════════════╗ 
 ║ ┌─ Block ───────────────────────────┐ ║░        ║ ┌─ Block ───────────────────────────┐ ║░
 ║ │ ┌─ Fragment 0 ──────────────────┐ │ ║░        ║ │ ┌───────────────────────────────┐ │ ║░
 ║ │ │┌─────────────────────────────┐│ │ ║░        ║ │ │                               │ │ ║░
 ║ │ ││           Index 0           ││ │ ║░        ║ │ │                               │ │ ║░
 ║ │ │├─────────────────────────────┤│ │ ║░        ║ │ │           Fragment 2          │ │ ║░
 ║ │ ││           Index 1           ││ │ ║░        ║ │ │                               │ │ ║░
 ║ │ │├─────────────────────────────┤│ │ ║░        ║ │ │                               │ │ ║░
 ║ │ │                               │ │ ║░        ║ │ └───────────────────────────────┘ │ ║░
 ║ │ │├─────────────────────────────┤│ │ ║░        ║ └───────────────────────────────────┘ ║░
 ║ │ ││           Index N           ││ │ ║░        ║ ┌─ Block ───────────────────────────┐ ║░
 ║ │ │└─────────────────────────────┘│ │ ║░        ║ │ ┌───────────────────────────────┐ │ ║░
 ║ │ └───────────────────────────────┘ │ ║░        ║ │ │                               │ │ ║░
 ║ └───────────────────────────────────┘ ║░        ║ │ │                               │ │ ║░
 ║ ┌─ Block ───────────────────────────┐ ║░        ║ │ │           Fragment 2          │ │ ║░
 ║ │ ┌─ Fragment 0 ──────────────────┐ │ ║░        ║ │ │                               │ │ ║░
 ║ │ │┌─────────────────────────────┐│ │ ║░        ║ │ │                               │ │ ║░
 ║ │ ││           Index 0           ││ │ ║░        ║ │ └───────────────────────────────┘ │ ║░
 ║ │ │├─────────────────────────────┤│ │ ║░        ║ └───────────────────────────────────┘ ║░
 ║ │ ││           Index 1           ││ │ ║░        ║ ┌─ Block ───────────────────────────┐ ║░
 ║ │ │├─────────────────────────────┤│ │ ║░        ║ │ ┌───────────────────────────────┐ │ ║░
 ║ │ │                               │ │ ║░        ║ │ │                               │ │ ║░
 ║ │ │├─────────────────────────────┤│ │ ║░        ║ │ │           Fragment 2          │ │ ║░
 ║ │ ││           Index N           ││ │ ║░        ║ │ │                               │ │ ║░
 ║ │ │└─────────────────────────────┘│ │ ║░        ║ │ │                               │ │ ║░
 ║ │ └───────────────────────────────┘ │ ║░        ║ │ └───────────────────────────────┘ │ ║░
 ║ └───────────────────────────────────┘ ║░        ║ └───────────────────────────────────┘ ║░
 ╚═══════════════════════════════════════╝░        ╚═══════════════════════════════════════╝░
  ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░         ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░

Pruner service

The pruner service cleans up block data that has been included in segments. This is done to reduce the storage used by DNA.

Object hierarchy

We now have all elements to understand the objects uploaded to Object Storage by the ingestion service. If you run DNA pointing it to your bucket, you can eventually see a folder structure that looks like the following.

my-chain
├── blocks
│   ├── 000020908017
│   │   └── 0xc137607affd53bd9e857af372429762f77eaff0fe32f0e49224e9fc0e439118d
│   │   ├── pending-0
│   │   ├── pending-1
│   │   └── pending-2
│   ├── 000020908018
│   │   └── ... same as above
│   └── 000020908019
│       └── ... same as above
├── chain
│   ├── recent
│   ├── z-000020906000
│   ├── z-000020907000
│   └── z-000020908000
├── groups
│   └── 000020905000
│       └── index
└── segments
    ├── 000020906000
    │   ├── header
    │   ├── index
    │   ├── join
    │   ├── log
    │   ├── receipt
    │   └── transaction
    ├── 000020907000
    │   └── ... same as above
    └── 000020908000
        └── ... same as above

Stream service

The stream service is responsible for serving data to clients. The raw onchain data stored in Object Storage is filtered by the stream service before being sent over the network, this results in lower egress fees compared to solutions that filter data on the client.

Upon receiving a stream request, the service validates and compiles the request into a query. A query is simply a list of index lookup requests that are applied to each block.

The stream service loops then keeps repeating the following steps:

check if it should send a new block of data or inform the client of a chain reorganization.
load the indices from the segment or the block and use them to compute what data to send the client.
load the pre-serialized protobuf messages and copy them to the output stream.

One critical aspect of the stream service is how it loads blocks and segments. Reading from Object Storage has virtually unlimited throughput, but also high latency. The service is also very likely to access data closer to the chain's tip more frequently, and we should cache Object Storage requests to avoid unnecessarily increase our cloud spending.

We achieve all of this (and more!) by using an hybrid cache that stores frequently accessed data in memory and on disk. This may come as a surprise since isn't the point of DNA to avoid expensive disks and rely on cheap Object Storage? The reasons this design still makes sense are multiple:

we can use cheaper and higher performance temporary NVMe disks attached directly to our server.
we can quickly scale horizontally the stream service without re-indexing all data.
we can use disks that are much smaller than the full chain's data. The cache dynamically stores the most frequently accessed data. Unused or rarely used data lives on Object Storage.

The following table, inspired by the table in this article by Vantage, shows the difference in performance and price between an AWS EC2 instance using (temporary) NVMe disks and two using EBS (one with a general purpose gp3 volume, and one with higher performance io1 volume). All prices as of April 2024, US East, 1 year reserved with no upfront payment.

Metric	EBS (gp3)	EBS (io1)	NVMe
Instance Type	r6i.8xlarge	r6i.8xlarge	i4i.8xlarge
vCPU	32	32	32
Memory (GiB)	256	256	245
Network (Gibps)	12.50	12.50	18.75
Storage (GiB)	7500	7500	2x3750
IOPS (read/write)	16,000	40,000	800,000 / 440,000
Cost - Compute ($/mo)	973	973	1,300
Cost - Storage ($/mo)	665	3,537	0
Cost - Total ($/mo)	1,638	4,510	1,300

Notice how the NVMe instance has 30-50x the IOPS per dollar. This price difference means that Apibara users benefit from lower costs and/or higher performance.