Parallel Indexing Of Blockchain Data With Substreams

Goal:

Develop substreams that allow us to mine and transform contextual blockchain data in a way that is highly modular, parameterized, and parallel.

Mapping Process:

Step 1 - Define Context and Schema:

The context and schema define the scope of data that matters to you.
context vs. schema.
- For Messari subgraphs, the context is a single protocol.
  - Uniswap V2, Sushiswap, Curve Finance, and Balancer V2 are different contexts, but they share the same DEX (Decentralized Exchange) schema.
- The schema determines the data extracted from the context.
  - The schema standardizes data extracted across different contexts.
- The context and schema can be composable and interconnected in substreams.
  - Data within the same or different contextual layers may or may not have the same schema.
  - Lower level contexts may build context(s) in layers above.
    - Because contexts sometimes interact with one another, this is not always an additive procedure - More on this later.

Screen Shot 2022-11-13 at 7.13.49 PM.png

Step 2 - Identify Data Sources:

Identify contracts with their event logs and call data that you need under your context(s) to fulfill your schema.
Data Source Types:
- Spawner Data Source:
  - These are contracts that are used to instantiate other contracts within a context.
  - Examples: Factory , Comptroller contracts.
- Bounded Under Context Data Source.
  - Data sources that exist in a bounded quantity under some context.
  - Most often deployed directly by EOA (Externally Owned Account).
    - The addresses of these data sources need to be identified before indexing.
  - Examples: Factory , Controller , MasterChef contracts.
- Unbounded Under Context Data Source.
  - Data sources that can have any number of instantiations of the same contract under some context.
    - Usually created by other contracts.
      - These data sources can often be found by parsing event logs or call data of spawner contracts.
      - Sometimes, however, they are directly deployed by EOAs.
        
        These can be difficult to track as an ad hoc data source.
        
        Requires staying up to date with development of a protocol.
    - Examples: ERC20 , UniswapV2Pair , Gauge contracts.

Screen Shot 2022-11-12 at 1.36.50 PM.png

Step 3 - Identify Interactions:

What is an interaction?
- An interaction is composed of transaction data, and a connected *tree* of event logs and call data from an outside of context call to a context.
  - Since there may be layered contexts, there may be layered interactions.
    - The same pieces of call data or event logs can contained in multiple interactions.
- Examples:
  - A direct call from the user to the **swapTokensForExactTokens() on the UniswapV2Router contract.
  - An internal transaction call from the 1InchAggregationRouter to swapTokensForExactTokens() on the UniswapV2Router.
    - Interaction with 2 contexts.
      - +1 1Inch.
      - +1 Uniswap V2.
Why should we break down transactions into interactions?
- To understand the story of a transaction.
  - story: A larger tree that contains all interactions under all relevant contexts.
Multiple interactions.
- MultiCall.
- Aggregators.
  - Protocols that route through or utilize other protocols.
  - A user executing a swap on 1Inch using the 1InchAggregationRouter might execute a swap on Uniswap V2 that swaps on 2 UniswapV2Pair contracts (liquidity pools).
    - This would result in 1 interaction with 1Inch, and 1 or 2 interactions with Uniswap V2.
      - Note!!!:
        
        1 Uniswap V2 interaction if 1InchAggregationRouter calls the UniswapV2Router which then calls the UniswapV2Pair contracts.
        
        2 Uniswap V2 interactions if 1InchAggregationRouter calls the UniswapV2Pair contracts directly.
Piecing together the story:
1. Using event logs.
  - Benefit:
    - Faster.
    - Less irrelevant data.
  - Detriment:
    - May be very difficult, hacky, or impossible to completely piece together the story since it lacks contextual data about how contracts were called.
2. Using call data.
  - Benefit:
    - Easy to build to story since the call history gives a full and connected contextual representation of the transaction.
    - Have easy access to the call data.
  - Detriment:
    - Slower.
    - More work to understand the relationship between calls and their event logs and call data.

Screen Shot 2022-11-16 at 5.46.42 PM.png