Developing pipes - SQD Documentation

Begin by making sure you know

which on-chain data items () you need;
how you are going to transform these items into usable data, based on your business logic;
what is your preferred mode of consuming the transformed data.

Use Pipes CLI to quickly generate a starter project. The process of getting from here to a useful data pipeline follows directly from Pipe anatomy.

Adding queries and developing transforms

Here are some things to keep in mind as you’re developing the heart of your pipeline.

Writing maintainable pipes

Recall that there are two kinds of transforms in Pipes SDK:

Per-query transforms work on subsets of raw data. They can be bundled with their queries, making them logically self-contained and easily reusable.
Whole pipe transforms process outputs of all per-query transforms at the same time. This unlocks arbitrary data combinations, but it also means that that each such transform might need to be changed whenever any of the upstream transforms changes.

For maximum mainatainability you’ll have to balance the following two objectives:

Aim to push as much of your business logic into per-query transforms. Make the code of any whole pipe transforms as simple as possible.
Use ready-made, validated query-transform combos. For example, It’ll often be preferable to build your pipeline out of such modules, even if that happens to make whole-pipe transforms slightly more complicated.

Stateful transforms

It’s often the case that you need access to some part of the previously processed data to do the transform. For example, to . There are multiple ways to accomplish this in Pipes SDK, each with its advantages and disadvantages. Consult the Stateful transforms guide.

Writing data

Postgres and ClickHouse

If you need your transformed data in Postgres or ClickHouse, you should already have a basic configuration generated by Pipes CLI. If you’re working with real-time data, it is very important to

on Postgres when adding or removing any relevant tables: update the list of tables in the target configuration;
on ClickHouse when any data dependencies or structure of the stored data changes: update the onRollback() callback.

Consult the Postgres via Drizzle and ClickHouse guides.

Plain iterator

A complete pipeline without a .pipeTo is a valid async iterator.

The pipeline will produce some logs by default. Disable them by setting logger: false when creating the data source. If you’re looking to convert an existing standalone pipe into a module in a larger program and wish to get rid of any side effects, consult the Running bare bones guide.
If you’re working with unfinalized data (default setting of the source), the iterator will throw ForkExceptions on blockchain reorgs. You should catch these and process them correctly. Consult the fork handling guide for details. Alternatively, configure the data source to use final data only:
```
const source = evmPortalSource({
  portal: {
    url: '<portal_dataset_url>',
    finalized: true,
  },
  ...
})
```
By default, the pipeline is stateless: when re-created it’ll restart from the earliest block relevant to any of the queries. If you want the pipeline to persist its sync state between restarts, you’ll have to manage the state by yourself. See Cursor management.

Developing your own target

Use the createTarget() function.

If you’re working with unfinalized data (default setting of the source), you must define a fork handler callback. Consult the fork handling guide for details. Alternatively, configure the data source to use final data only:
```
const source = evmPortalSource({
  portal: {
    url: '<portal_dataset_url>',
    finalized: true,
  },
  ...
})
```
If you want your pipeline to preserve its sync state between restarts, you’ll have to manage this state in your write callback. See Cursor management.

Pipes SDK

Squid SDK

Documentation Index

​Adding queries and developing transforms

​Writing maintainable pipes

​Stateful transforms

​Writing data

​Postgres and ClickHouse

​Plain iterator

​Developing your own target