Crypto Data Flow Architectures
Anyone interested in blockchain should understand how and where crypto data originates. This first installment of the series dives into how client nodes initiate data retrieval, the function of request pipelines, and the characteristics of raw on-chain data.
It also examines the crucial process of decoding this raw data, converting it into human-readable formats, and aggregating it for various applications. Additionally, it explores how off-chain sources and more complex on-chain schemas are integrated to give a comprehensive perspective on data transformations.
While there is a standard data transformation process, not every pipeline conforms. Unique tasks have needs. That said, here is what the overall process components look like:
Client Nodes
Fetching data from a blockchain begins with a request to a client node. A client node is the core software infrastructure allowing users to request data and submit transactions.
Every chain comes with a client specification.
Request Pipelines
Retrieving blocks, transactions, or events via interfaces like JSON RPC is a common requirement for accessing on-chain data. Nodes provide historical and current blockchain state, but interacting with the node API requires a structured approach to manage and organize the information efficiently. A well-designed request pipeline is crucial for handling incoming requests and transforming raw data into organized tables.
Effective pipelines optimize data retrieval and ensure consistency by incorporating stages like request validation, data extraction, indexing, caching, and error handling. These steps minimize latency, avoid duplication, and enhance performance and reliability, ensuring data is accurate, timely, and ready for analysis.
Raw Tables (On-Chain Data)
Raw on-chain data is anything you can extract from a node through an RPC call. The most common types that follow standard schemas include blocks, transactions, accounts, raw traces, and logs.
They are used for network-level metrics and as a source for decoded data. Metrics obtained by transforming raw on-chain data include TPS – transaction per second, gas per transaction, top contracts called, distinct new accounts, daily unique transaction signers, block meantime, and transaction size.
Decoding
Decoding translates raw event and trace data into a human-readable format, including function and parameter names, using the contract ABI. This requires access to either the contract ABI or the original Solidity code.
Decoding is essential because smart contracts are stored on-chain as opcodes or low-level EVM instructions rather than in Solidity. Solidity, a high-level language, compiles into these opcodes executed by nodes. Since nodes only process opcodes, they lack knowledge of the original Solidity code, leaving them unaware of function names, parameter names, and the meaning of outputs.
Decoded Data
Decoded data includes logs, traces, transfers, and view functions translated into human-readable form. Decoding is typically applied in most data flows, even for less common data types.
Decoded data is the main source of protocol-level metrics and transformed data. Metrics derived from transformed data include daily liquidations, TVL in protocols, bridge inflows, and outflows, protocol revenue, volume and open interest, and ETH2 contract deposits.
Transformed Data
Transformed data includes business-level metrics or aggregates derived from all available sources, such as raw and decoded data. After gathering this information, it’s manipulated and combined to create meaningful metrics. This transformed data results from the data transformation process and is often stored for future use to prevent recomputation. You can enhance efficiency and performance by transforming data once and reusing it as needed.
Examples include Aave TVL, Uniswap SWAPs, and DEX.trades, which feed the analytics layer for analytics UI, dashboards, and charts. They are also a source of aggregations and other transformed data.
Aggregations
Aggregation helps provide data that answers questions such as “How many users have subscribed to crypto social platforms?” “What is the TVL in DeFi?” and “How much volume do DEXes generate?” involves aggregating metrics across numerous platforms in a network.
Off-Chain Data
Off-chain data is any information that does not come from a blockchain node. Such data is used to enhance blockchain metrics. Blockchain metrics can be greatly improved by incorporating external data, often from providers like centralized exchanges (CEXes), which offer token price information in dollar terms at specific times. Other data may come from centralized or semi-decentralized databases. Ingesting this off-chain data directly into your system can streamline the Extract-Translate-Transform process.
Examples include token prices from CEXes, NFT collections metadata, and Maximal Extractable Value (MEV ) data.
Prices (Offchain Data)
Crypto prices do not have a definitive source of truth. Every exchange and blockchain has different prices for the same trading pair at a given time. Therefore, price tables aim to aggregate and provide time-weighted averages (TWAP) from various CEXes and DEXes into a single representative value. This process involves keying in known prices, calculating volumes, eliminating outliers, and filtering out less representative and lagged markets.
Other On-Chain Data
These are less-standardized but specific pre-indexed data from blockchains. They often require additional steps to extract and standardize. Examples include Mem Pool data for block building or high-frequency trading, Beacon chain deposits and withdrawals for staking providers, P2P communications, and blobs data for L2 sequencers.
View Functions Calls
Much of the data categorized as “Other On-Chain Data” in EVM-compatible blockchains are accessed through the outputs of “view functions.” View functions are a type of Solidity function that reads and transforms existing data without modifying the blockchain state.
Conclusion
This article has explored the key steps in the data journey, from retrieving raw on-chain data to transforming it into actionable metrics using client nodes, request pipelines, and transformation engines. We’ve highlighted the importance of decoding data, managing view function calls, and integrating off-chain data like prices, emphasizing the complexity of each process.
Efficient management of crypto data flows enables the creation of valuable insights and metrics that drive better decisions and innovation. As the blockchain landscape evolves, mastering these data flows will be crucial for staying ahead and fully leveraging blockchain’s potential. Watch out for article two of the series.