Skip to main content

Batch Processing

Batch processing is a core performance optimization technique in Squid SDK that dramatically improves indexing speed by minimizing database operations and maximizing data throughput.

Core Principles

The batch processing model relies on these key principles:
  • Minimize database hits by grouping multiple single-row transactions into multi-row batch transactions
  • Transform data in memory using vectorized operators for maximum efficiency
  • Batch Solana state queries using efficient account data retrieval patterns
Batch processing is more flexible than handler parallelization and provides better performance for the inherently sequential nature of blockchain indexing.

How Batch Processing Works

To illustrate batch processing, consider a processor that tracks the current state of on-chain records. It listens to Create and Update events and receives a batch of events:
[
  Create({ id: 1, name: "Alice" }),
  Update({ id: 1, name: "Bob" }),
  Create({ id: 2, name: "Mikee" }),
  Update({ id: 1, name: "Carol" }),
  Update({ id: 2, name: "Mike" }),
];
Following batch processing principles, the processor updates entity states in memory and persists only the final state:
[
  { id: 1, name: "Carol" },
  { id: 2, name: "Mike" },
];
This results in a single database transaction instead of multiple individual updates.
The data batch contains all events from multiple blocks, allowing for efficient batch processing.

Implementation Patterns

Here’s the idiomatic pattern for implementing batch processing with processor.run():
processor.run(new TypeormDatabase(), async (ctx) => {
  // Step 1: Collect and normalize all data from the batch
  const instructionDataBatch = [];

  for (const block of ctx.blocks) {
    for (const instruction of block.instructions) {
      // Transform and normalize raw instruction data
      instructionDataBatch.push(decodeAndTransformToMyData(instruction));
    }
  }

  // Step 2: Extract all entity IDs that need to be updated/created
  const myEntityIds = new Set();
  for (const d of instructionDataBatch) {
    // Business logic mapping on-chain data to entity IDs
    myEntityIds.add(extractEntityId(d));
  }

  // Step 3: Batch-load existing entities using IN operator
  const myEntities: Map<string, MyEntity> = new Map(
    (await ctx.store.findBy(MyEntity, { id: In([...myEntityIds]) })).map(
      (entity) => [entity.id, entity]
    )
  );

  // Step 4: Calculate updated state for all entities
  for (const d of instructionDataBatch) {
    const myEntity = myEntities.get(extractEntityId(d));
    if (myEntity == null) {
      // Create new entity instance
      myEntities.set(extractEntityId(d), new MyEntity(d));
    } else {
      // Update existing entity
      updateEntity(myEntity, d);
    }
  }

  // Step 5: Batch-save all entities in a single transaction
  await ctx.store.save([...myEntities.values()]);
});
For complete implementations of this pattern, see:

Anti-patterns to Avoid

Avoid loading or persisting single entities unless absolutely necessary. This dramatically reduces indexing performance.

❌ Anti-pattern: Individual Entity Saves

Here’s an example of what not to do - individual saves per data item:
// ❌ ANTI-PATTERN: Individual saves per data item
processor.run(new TypeormDatabase(), async (ctx) => {
  for (const block of ctx.blocks) {
    for (const instruction of block.instructions) {
      // Process instruction
      const data = extractData(instruction);
      
      // ❌ This creates a database transaction for each instruction!
      await ctx.store.save(new MyEntity(data));
    }
    
    
    
    
  }
});

✅ Correct Pattern: Batch Processing

Instead, use in-memory caching and batch operations:
// ✅ CORRECT: Batch processing with in-memory cache
processor.run(new TypeormDatabase(), async (ctx) => {
  const entities: Map<string, MyEntity> = new Map();

  for (const block of ctx.blocks) {
    for (const instruction of block.instructions) {
      const data = extractData(instruction);
      // Store in memory for batch processing
      entities.set(data.id, new MyEntity(data));
    }
    
    
    
    
  }

  // Single database transaction for all entities
  await ctx.store.save([...entities.values()]);
});

Migration from Handler-Based Processing

Batch processing can serve as a drop-in replacement for handler-based mappings (like those used in subgraphs). While handler-based processing is significantly slower due to excessive database operations, it can be a useful intermediary step during migration from subgraphs to Squid SDK.

Transitional Approach

You can reuse existing handlers while iterating over batch data:
processor.run(new TypeormDatabase(), async (ctx) => {
  for (const block of ctx.blocks) {
    // Process instructions
    for (const instruction of block.instructions) {
      if (instruction.programId === PROGRAM_ID) {
        await handleInstruction(ctx, instruction);
      }
    }
    
    
    
    
  }
});
This approach allows for gradual migration while maintaining existing logic. However, for optimal performance, consider refactoring to true batch processing patterns.

Block Hooks

You can implement pre- and post-block hooks for additional processing logic:
processor.run(new TypeormDatabase(), async (ctx) => {
  for (const block of ctx.blocks) {
    await preBlockHook(ctx, block);

    
    for (const instruction of block.instructions) {
      // Process instructions
    }
    
    
    
    

    await postBlockHook(ctx, block);
  }
});
Block hooks are useful for implementing cross-block state management, caching, or cleanup operations that need to run before or after processing each block.