Save indexed data to local CSV files
Objective
This tutorial describes how to use the SQD indexing framework for saving processed blockchain data to local CSV files. The intent is to show how Squid SDK can be used for data analytics prototyping. File-based data formats like CSV are convenient for data analysis, especially in the early prototyping stages. This convenience motivated SQD Team to develop some extensions to allow saving processed data to file-based storage. We chose MATIC transactions on Ethereum mainnet for this example. This selection provides enough data to highlight the performance of the SQD framework and is interesting from a data analysis standpoint. An article about this demo project has been published on Medium. The project source code can be found in thelocal-csv-indexing repository.
Pre-requisites
- Squid CLI
- (optional) Python
Setup
Let’s start by creating a new blockchain indexer, or “squid” in SQD terminology. In a terminal, launch this command:local-csv-indexing is the name of the project, and can be changed to anything else. The -t evm option specifies that the evm template should be used as a starting point.
** The template actually has more than what we need for this project. Unnecessary packages have been removed in the tutorial repository. You can grab
package.json from there to do the same.Files-wise, docker-compose.yml, schema.graphql and squid.yaml were removed. commands.json, the list of local sqd scripts, has been significantly shortened (here is the updated version).ERC-20 token ABI
To be able to index transfers of a token, it’s necessary to know the address and the ABI (Application Binary Interface) of the token contract. The ABI defines the contract’s functions and events, including their typed inputs and outputs. Luckily, both of these can be found on block explorers like Etherscan. The indexer needs the ABI for locating the contract events or functions in the EVM execution trace and decoding their inputs. Squid SDK has a handy command to generate some boilerplate TypeScript code to achieve this:The typegen tool uses Etherscan API to fetch the contract ABI. Other compatible APIs are supported via the
--etherscan-api flag. For example, if the contract was deployed to Polygon and its ABI was available from Polygonscan, it could still be fetched with the same command extended with --etherscan-api https://api.polygonscan.io.Alternatilvely, the same command can be used with a path (local or URL) to a JSON ABI in place of the contract address.src/abi folder, the most interesting of which is matic.ts.
CSV, Tables and Databases
For writing local CSVs we will need thefile-store-csv package:
Packages are also available for writing to parquet files and for uploading to S3-compatible cloud services.
src/tables.ts. This is where it’s possible to provide filenames for the CSV files, as well as configure their data structure, in much the same way as if they were a database table (the class name is no coincidence):
src/db.ts, to configure the data abstraction layer. Here we export an instance of the Database class implementation from the file-store package (a dependency of file-store-csv). We will use this instance in much the same way as we would use a TypeormDatabase instance in a PostgreSQL-based squid.
Note the
chunkSizeMb option. A new chunk (that is, a new folder with a new CSV file in it) will be written when either the amount of data stored in the processor buffer exceeds chunkSizeMb, or at the end of the batch during which ctx.store.setForceFlush() was called.Data indexing
All the indexing logic is defined insrc/processor.ts, so let’s open it and edit the EvmBatchProcessor class configuration. We should request data for the right smart contract and the right EVM event log:
Transfers CSV file.
Here is a short summary of the logic:
The file in the GitHub repository is slightly different, as there’s some added logic to obtain the number of decimals for the token. For that, the processor interacts with the smart contract deployed on chain.
Launch the project
Build, then launch the project withdata directory, each containing a transfer.csv file.
