Batch Processing

Batch processing is a core performance optimization technique in Squid SDK that dramatically improves indexing speed by minimizing database operations and maximizing data throughput.

Core Principles

The batch processing model relies on these key principles:

Minimize database hits by grouping multiple single-row transactions into multi-row batch transactions
Transform data in memory using vectorized operators for maximum efficiency
Batch EVM state queries using the MakerDAO Multicall contract

Batch processing is more flexible than handler parallelization and provides better performance for the inherently sequential nature of blockchain indexing.

How Batch Processing Works

To illustrate batch processing, consider a processor that tracks the current state of on-chain records. It listens to Create and Update events and receives a batch of events:

[
  Create({ id: 1, name: "Alice" }),
  Update({ id: 1, name: "Bob" }),
  Create({ id: 2, name: "Mikee" }),
  Update({ id: 1, name: "Carol" }),
  Update({ id: 2, name: "Mike" }),
];

Following batch processing principles, the processor updates entity states in memory and persists only the final state:

[
  { id: 1, name: "Carol" },
  { id: 2, name: "Mike" },
];

This results in a single database transaction instead of multiple individual updates.

The data batch contains all events from multiple blocks, allowing for efficient batch processing.

Implementation Patterns

Here’s the idiomatic pattern for implementing batch processing with processor.run():

processor.run(new TypeormDatabase(), async (ctx) => {
  // Step 1: Collect and normalize all data from the batch
  const logDataBatch = [];

  for (const block of ctx.blocks) {
    for (const log of block.logs) {
      // Transform and normalize raw log data
      logDataBatch.push(decodeAndTransformToMyData(log));
    }
  }

  // Step 2: Extract all entity IDs that need to be updated/created
  const myEntityIds = new Set();
  for (const d of logDataBatch) {
    // Business logic mapping on-chain data to entity IDs
    myEntityIds.add(extractEntityId(d));
  }

  // Step 3: Batch-load existing entities using IN operator
  const myEntities: Map<string, MyEntity> = new Map(
    (await ctx.store.findBy(MyEntity, { id: In([...myEntityIds]) })).map(
      (entity) => [entity.id, entity]
    )
  );

  // Step 4: Calculate updated state for all entities
  for (const d of logDataBatch) {
    const myEntity = myEntities.get(extractEntityId(d));
    if (myEntity == null) {
      // Create new entity instance
      myEntities.set(extractEntityId(d), new MyEntity(d));
    } else {
      // Update existing entity
      updateEntity(myEntity, d);
    }
  }

  // Step 5: Batch-save all entities in a single transaction
  await ctx.store.save([...myEntities.values()]);
});

For complete implementations of this pattern, see: EVM squid example

Anti-patterns to Avoid

Avoid loading or persisting single entities unless absolutely necessary. This dramatically reduces indexing performance.

❌ Anti-pattern: Individual Entity Saves

Here’s an example of what not to do from the Gravatar example:

// ❌ ANTI-PATTERN: Individual saves per event
processor.run(new TypeormDatabase(), async (ctx) => {
  for (const block of ctx.blocks) {
    for (const log of block.logs) {
      if (
        log.address !== GRAVATAR_CONTRACT ||
        (log.topics[0] !== events.NewGravatar.topic &&
          log.topics[0] !== events.UpdatedGravatar.topic)
      )
        continue;

      const { id, owner, displayName, imageUrl } = extractData(log);

      // ❌ This creates a database transaction for each event!
      await ctx.store.save(
        Gravatar,
        new Gravatar({
          id: id.toHexString(),
          owner: decodeHex(owner),
          displayName,
          imageUrl,
        })
      );
    }
  }
});

✅ Correct Pattern: Batch Processing

Instead, use in-memory caching and batch operations:

// ✅ CORRECT: Batch processing with in-memory cache
processor.run(new TypeormDatabase(), async (ctx) => {
  const gravatars: Map<string, Gravatar> = new Map();

  for (const block of ctx.blocks) {
    for (const log of block.logs) {
      if (
        log.address !== GRAVATAR_CONTRACT ||
        (log.topics[0] !== events.NewGravatar.topic &&
          log.topics[0] !== events.UpdatedGravatar.topic)
      )
        continue;

      const { id, owner, displayName, imageUrl } = extractData(log);

      // Store in memory for batch processing
      gravatars.set(
        id.toHexString(),
        new Gravatar({
          id: id.toHexString(),
          owner: decodeHex(owner),
          displayName,
          imageUrl,
        })
      );
    }
  }

  // Single database transaction for all entities
  await ctx.store.save([...gravatars.values()]);
});

Migration from Handler-Based Processing

Batch processing can serve as a drop-in replacement for handler-based mappings (like those used in subgraphs). While handler-based processing is significantly slower due to excessive database operations, it can be a useful intermediary step during migration from subgraphs to Squid SDK.

Transitional Approach

You can reuse existing handlers while iterating over batch data:

processor.run(new TypeormDatabase(), async (ctx) => {
  for (const block of ctx.blocks) {
    // Process logs
    for (const log of block.logs) {
      switch (log.topics[0]) {
        case abi.events.FooEvent.topic:
          await handleFooEvent(ctx, log);
          continue;
        case abi.events.BarEvent.topic:
          await handleBarEvent(ctx, log);
          continue;
        default:
          continue;
      }
    }

    // Process transactions
    for (const txn of block.transactions) {
      const sighash = txn.input.slice(0, 10); // 0x + 4 bytes
      switch (sighash) {
        case "0xa9059cbb": // transfer(address,uint256) sighash
          await handleTransferTx(ctx, txn);
          continue;
        case abi.functions.approve.sighash:
          await handleApproveTx(ctx, txn);
          continue;
        default:
          continue;
      }
    }
  }
});

This approach allows for gradual migration while maintaining existing logic. However, for optimal performance, consider refactoring to true batch processing patterns.

Block Hooks

You can implement pre- and post-block hooks for additional processing logic:

processor.run(new TypeormDatabase(), async (ctx) => {
  for (const block of ctx.blocks) {
    await preBlockHook(ctx, block);

    for (const log of block.logs) {
      // Process logs
    }

    for (const txn of block.transactions) {
      // Process transactions
    }
    
    
    
    
    

    await postBlockHook(ctx, block);
  }
});

Block hooks are useful for implementing cross-block state management, caching, or cleanup operations that need to run before or after processing each block.

Overview

Pipes SDK

Squid SDK

Batch Processing

Batch Processing

Core Principles

How Batch Processing Works

Implementation Patterns

Anti-patterns to Avoid

❌ Anti-pattern: Individual Entity Saves

✅ Correct Pattern: Batch Processing

Migration from Handler-Based Processing

Transitional Approach

Block Hooks

Overview

Pipes SDK

Squid SDK

​Batch Processing

​Core Principles

​How Batch Processing Works

​Implementation Patterns

​Anti-patterns to Avoid

​❌ Anti-pattern: Individual Entity Saves

​✅ Correct Pattern: Batch Processing

​Migration from Handler-Based Processing

​Transitional Approach

​Block Hooks

Batch Processing

Core Principles

How Batch Processing Works

Implementation Patterns

Anti-patterns to Avoid

❌ Anti-pattern: Individual Entity Saves

✅ Correct Pattern: Batch Processing

Migration from Handler-Based Processing

Transitional Approach

Block Hooks