Step 3: Adding external data
This is the third part of the tutorial in which we build a squid that indexes Bored Ape Yacht Club NFTs, their transfers and owners from the Ethereum blockchain, fetches the metadata from IPFS and regular HTTP URLs, stores all the data in a database and serves it over a GraphQL API. In the first two parts of the tutorial (1, 2) we created a squid that scrapedTransfer events emitted by the BAYC token contract and derived some information on tokens and their owners from that data. In this part we enrich token data with information obtained from contract state calls, IPFS and regular HTTP URLs.
Prerequisites: Node.js, Squid CLI, Docker, a project folder with the code from the second part (this commit).
Exploring token metadata
Now that we have a record for each BAYC NFT, let’s explore how we can retrieve more data for each token. EIP-721 suggests that token metadata contracts may make token data available in a JSON referred to by the output of thetokenURI() contract function. Upon examining src/abi/bayc.ts, we find that the BAYC token contract implements this function. Also, the public ABI has no obvious contract methods that may set token URI or events that may be emitted on its change. In other words, it appears that the only way to retrieve this data is by querying the contract state.
This requires a RPC endpoint of an archival Ethereum node, but we do not need to add one here: processor will reuse the endpoint we supplied in part one of the tutorial for use in RPC ingestion.
The next step is to prepare for retrieving and parsing the metadata proper. For this, we need to understand the protocols used in the URIs and the structure of metadata JSONs. To learn that, we retrieve and inspect some URIs ahead of the main squid sync. The most straightforward way to achieve this is by adding the following to the batch handler:
src/main.ts
Contract class provided by the src/abi/bayc.ts module. It uses an RPC endpoint supplied by the processor via ctx to call methods of contract CONTRACT_ADDRESS at the height corresponding to the last block of each batch. Once we have the Contract instance, we call tokenURI() for each token mentioned in the batch and print the retrieved URI.
This simple approach is rather slow: the modified squid needs about three to eight hours to get a reasonably sized sample of URIs (figure out of date). A faster yet more complex alternative will be discussed in the next part of the tutorial.
Running the modified squid reveals that some metadata URIs point to HTTPS and some point to IPFS. Here is one of the metadata JSONs:
- BAYC metadata URIs can point to HTTPS or IPFS — we need to be able to retrieve both.
- Metadata JSONs have two fields:
"image", a string, and"attributes", an array of pairs{"trait_type": string, "value": string}.
Extending the Token entity
We will save both image and attributes metadata fields and the metadata URI to the database. To do this, we need to add some new fields to the existing Token entity:
Attribute is a non-entity type that we use to type the attributes field.
Once schema.graphql is updated, we regenerate the TypeORM data model code::
createTokens():
PartialToken stores the incomplete Token information obtained purely from blockchain events and function calls, before any state queries or enrichment with external data.
The function completeTokens() is responsible for filling Token fields that are missing in PartialTokens. This involves IO operations, so both the function and its caller createTokens() have to be asynchronous. The functions also require a batch context for state queries and logging. We modify the createTokens() call in the batch handler to accommodate these changes:
completeTokens():
Contract object and use it to call the tokenURI() method of the BAYC token contract. The retrieved URIs are then used by the fetchTokenMetadata() function, which is responsible for HTTPS/IPFS metadata retrieval and parsing. Once we have its output, we can create and return the final Token entity instances.
Retrieving external resources
In thefetchTokenMetadata() implementation we first classify the URIs depending on the protocol. For IPFS links we replace 'ipfs://' with an address of an IPFS gateway, then retrieve the metadata from all links using a regular HTTPS client. Here for the demonstration purposes we use the public ipfs.io gateway, which is slow and prone to dropping requests due to rate-limiting. For production squids we recommend using a dedicated gateway, e.g. from Filebase.
src/metadata.ts. Examine its full contents here.
Then all that is left is to import the relevant parts in src/main.ts:
src/main.ts
npx squid-graphql-server and visiting the local GraphiQL playground. It is now possible to retrieve image URLs and attributes for each token: