Data Pipeline

Data ingestion, normalization, feature engineering, and data quality controls

Sources

  • Exchange market data (candles, trades, order books)

  • DEX pool states and on-chain events (swaps, liquidity changes)

  • Reference data (asset metadata, oracles)

Normalization

  • Time alignment to canonical intervals; clock skew handling

  • Missing data handling (gap-filling with confidence flags)

  • Outlier detection (z-score and robust estimators)

Feature engineering

  • Price/volume transforms (returns, volatility, microstructure metrics)

  • Order book features (depth imbalance, spread dynamics)

  • On-chain flow features (pool TVL changes, swap volume bursts)

  • Sentiment features (level, velocity, decay)

Quality and lineage

  • Schema versioning for backwards-compatible model inputs

  • Data quality SLAs with alerting; quarantine on breach

  • Reproducible snapshots for training and backtesting

Last updated