Blog
Archive Nodes: Explanation Based on Use Cases

Archive Nodes: Explanation Based on Use Cases

Olha Diachuk
April 29, 2024

Have you ever thought about blockchain as an elegant solution that copies the atomic structure of the universe? Each node is a separate atom, and they are all interconnected. Maybe we haven’t discovered that each atom knows everything about what happened in the universe and stores this information being a part of a system similar to a decentralized ledger. And each atom has its purpose—as do the nodes.

Today, we’ll take you on a fantastic detour discovering types of nodes and their roles in the blockchain, focusing on the archive node with additional zoom. Our engineers were pleased to share some unique node provider use cases implemented in the projects that show how impossible limitations of blockchain can bend under the innovative potential. In this article, we choose the Ethereum archive node as an example to explain.

Ethereum archive node: Definition and description

The contrast of concepts easily demonstrates the truth. So, we quickly look at the sisters of the archive node: full and light nodes. 

3 types of nodes explained

The Ethereum blockchain is technically a network of computers called nodes. 

Full nodes
store the entire Ethereum blockchain history, including every block, transaction, and smart contract ever executed. However, they only maintain the latest state (snapshot) of the network for the most recent few blocks, typically around 128. This allows them to verify incoming transactions and ensure everything runs according to plan. 

Additionally, full nodes can regenerate older states by re-executing transactions, although this can be computationally intensive. Ethereum full node size is currently around 1.1 TB, and it’s obviously going to grow.

Archive Nodes
. Built on the full node's capabilities, archive nodes take data retention to the next level. They not only store the complete blockchain data but also maintain a record of the historical state of every single block from the moment the chain exists. 

This allows the Ethereum archive node to serve requests for historical data efficiently, without needing to recompute everything from scratch.  However, this extensive data storage comes at a cost—archive type requests space bigger than Ethereum full node size—at least 10 TB for each.

Light Nodes.
Designed for efficiency, light nodes are the minimalists of the network family. They only download Ethereum block headers, containing the bare minimum of blockchain data needed to interact with the network. 

Light nodes rely on the full or archive types for all other data, working in a "need-to-use" mode. Their limited functionality prevents them from participating in network consensus (they can't be validators). However, they are still irreplaceable for simple tasks like checking balances, verifying transactions, or browsing event logs.

Node Type

Functionality

Use Cases

Benefits

Light

Limited functionality

- Check wallet balances
- Verify transaction confirmations
- Browse event logs

- Lightweight and efficient
- Lower resource requirements
- Suitable for mobile wallets and basic user interactions

Full

Stores complete blockchain history

- Verify transactions and network state
- Participate in network discussions
- Interact directly with the Ethereum

- Contributes to decentralization and security
- Enhanced privacy by not relying on third-party providers
- Full access to historical data (requires computation)

Archive

Stores complete blockchain history and historical state for each block

- Access historical data without recomputing
- Run historical analysis or audits on the network

- Superior historical data access
- Ideal for researchers, data analysts, and security auditors

The lifecycle of the Ethereum archive node

To understand how it works, we will examine its natural environment and lifecycle, focusing on the core functionalities.

1. Initialization and sync:

  • The archive node starts by downloading the entire Ethereum blockchain history, just like her full type sister. This includes all blocks, transactions, and smart contracts ever executed on the network.
  • During this initial sync, it also begins processing each block and recording the network state (account balances, smart contract storage) after every single block. This state essentially captures a snapshot of the network at that specific point in time.

The initial setup requires time, reliable bandwidth, and an experienced engineer on your side. You’d better trust us 🙂

2. Ongoing operations:

  • As new blocks are added to the Ethereum blockchain, the archive node continues processing them.
  • For each new block, it updates its internal state database, reflecting the changes caused by the transactions within that block (e.g., account balances changing after a transfer).
  • This ongoing process ensures the archive node maintains a constantly updated record of the entire blockchain history and the corresponding historical state for every block.

Typically, this stage takes hours, but it isn’t a strict limitation for our blockchain engineers.

3. Serving historical data requests:

  • When a user or application requests historical data from the archive node, it can retrieve it much faster than the full one.

    The archive node can directly access the relevant historical state of the network based on the requested timeframe, like taking an immersive trip back in time in the fast and furious Delorian.
  • This significantly reduces processing time and resources compared to a full node for historical data retrieval.

4. Pruning (as an option):

  • Over time, the amount of historical state data the archive node stores becomes enormous.
  • Some archive node clients offer optional "pruning" options. Pruning allows to discard historical state data older than a certain timeframe (e.g., more than a year) while still maintaining the complete blockchain history.
  • This helps with managing storage space requirements, especially for long-running archive nodes.

So there you have it. An archive node acts like a constantly updated historical archive of the Ethereum network. As a sentinel of memory that forgets nothing, it gives access to valuable insights about anything that happened in the network, keeping past certain and recorded.

You might already have guessed who must invest so much space and effort in expeditions to the historical center of the blockchain. 

Archive nodes fan club: Projects and individuals who use them

The historical data is a treasure for anyone who wants to know how blockchain will act in the future. Many personal investigators and projects build their business around retrieving data on the functionality of archive nodes.

Personal usage might be resourceful but fruitful: 

  • Advanced blockchain exploration—analyze historical trends, pinpoint specific on-chain events, or track wallet activity over extended periods.
  • Self-verification and security. Enhance trust by independently verifying the Ethereum blockchain history without relying on third-party data providers. This can be particularly valuable for users who prioritize self-custody of their crypto assets.
  • Personal research & development. Explore historical data for personal research projects, analyze DeFi protocols, track NFT collections, or develop new blockchain applications with a deeper understanding of network activity.

Regarding hardware, the requirements are fairly similar for both full and archive nodes, with disk space being the key differentiator. 

Node requires a fast CPU with at least four cores, 16GB of RAM (ideally 32GB), and an NVMe SSD. Storage space varies—a full node can function with 2TB, while archive nodes might require much more depending on the client software used (Geth being a storage hog). Finally, a stable and fast internet connection with at least 25 Mbps bandwidth is crucial for keeping your parts of infrastructure online and responsive.

The number of third-party providers and services has grown on the archive nodes' functionality:

Group Purpose Benefit Example Provide
Blockchain data analytics firms Market analysis & research Analyze historical data to provide insights on token performance, DeFi trends, and user behavior. This data can be used to create reports, develop investment strategies, and inform product development. Chainalysis, Kaiko, Messari, Nansen
Security & auditing firms Enhanced security audits & threat detection Backtrack suspicious transactions, identify vulnerabilities in smart contracts, and analyze network activity for potential security risks. ConsenSys Diligence, OpenZeppelin, Trail of Bits
Blockchain infrastructure providers Reliable data access & infrastructure Run their own archive nodes or partner with node provider to offer historical data access to their clients. This enables developers and businesses to build applications with rich historical context. Dysnix, Infura, Alchemy, QuickNode, GetBlock.io
Blockchain explorers & data platforms Comprehensive on-chain data exploration Provide users with advanced search functionalities, historical data visualizations, and on-chain analytics capabilities. Etherscan, Blockchair, Dune Analytics

Both platforms and individuals use archive nodes for:

  • Faster information retrieval: Archive nodes can access historical data directly, eliminating the need for expensive transaction re-executions.
  • Decentralization: Archive nodes contribute to a more decentralized network by providing an alternative to centralized historical data providers.
  • Self-verification: Users can independently verify historical data without relying on third-party sources.

Running an archive node or getting access to one from a reliable provider is the best way to access the whole blockchain.

What about clients?

Several Ethereum client software options exist for running an archive node, each with advantages and storage requirements. Clientdiversity.org is a valuable resource for exploring these options and promoting a healthy level of client diversity within the network. 

Some time ago, the dragon of Geth (OpenEthereum, named later) was deprecated, giving life to the separated branches of clients with specialized functionality. A more modular and efficient client ecosystem within the Ethereum space gives users a wider range of blockchain choices in this chocolate shop.

Here are some of the most popular archive node clients:

Erigon

Erigon - popular archive node clients
  • Strong sides: It is lightweight and efficient, making it a good choice for users with limited storage space. It also has faster startup and synchronization and was actively developed with a strong community.
  • Weaknesses: Not as feature-rich as some other clients.

Besu

Hyperledger Besu Ethereum client
  • Pros: It is an enterprise-grade client with features like scalability and permissioned networks. It was actively developed with a focus on security and stability.
  • Cons: It can have higher resource requirements than Erigon. Might be overkill for basic archive node needs.

Nethermind

Nethermind - The simplest solutions to the hardest problems in blockchain
  • Plus: The modular client architecture allows for customization and integration with other tools. It has been actively developed with a growing community.
  • Minus: Might have a slightly steeper learning curve than Go-based clients like Erigon.

Do you need an archive node? Dysnix has it!

If you don’t want to sit over documentation of blockchain clients and welcome the sunrise with a bit of frustration, yet without running a node, just ask us to help. We can give you the fastest access and customize any part of infrastructure for you.

We have met the expectations of the biggest blockchain players in the market, so we’ll be glad to help anyone.

Meet us here, in our Web3 chatting lounge, or just drop us a line. Always glad to help.

And thank you for reading 🙂

Olha Diachuk
Writer at Dysnix
10+ years in tech writing. Trained researcher and tech enthusiast.
Copied to Clipboard
Paste it wherever you like