docs: add Databento integration plan and roadmap items
This commit is contained in:
780
docs/DATABENTO_INTEGRATION_PLAN.md
Normal file
780
docs/DATABENTO_INTEGRATION_PLAN.md
Normal file
@@ -0,0 +1,780 @@
|
||||
# Databento Historical Data Integration Plan
|
||||
|
||||
## Overview
|
||||
|
||||
Integrate Databento historical API for backtesting and scenario comparison pages, replacing yfinance for historical data on these pages. The integration will support configurable start prices/values independent of portfolio settings, with intelligent caching to avoid redundant downloads.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Current State
|
||||
- **Backtest page** (`app/pages/backtests.py`): Uses `YFinanceHistoricalPriceSource` via `BacktestPageService`
|
||||
- **Event comparison** (`app/pages/event_comparison.py`): Uses seeded event presets with yfinance data
|
||||
- **Historical provider** (`app/services/backtesting/historical_provider.py`): Protocol-based architecture with `YFinanceHistoricalPriceSource` and `SyntheticHistoricalProvider`
|
||||
|
||||
### Target State
|
||||
- Add `DatabentoHistoricalPriceSource` implementing `HistoricalPriceSource` protocol
|
||||
- Add `DatabentoHistoricalOptionSource` implementing `OptionSnapshotSource` protocol (future)
|
||||
- Smart caching layer: only re-download when parameters change
|
||||
- Pre-seeded scenario data via batch downloads
|
||||
|
||||
## Databento Data Sources
|
||||
|
||||
### Underlyings and Datasets
|
||||
|
||||
| Instrument | Dataset | Symbol Format | Notes |
|
||||
|------------|---------|----------------|-------|
|
||||
| GLD ETF | `XNAS.BASIC` or `EQUS.PLUS` | `GLD` | US equities consolidated |
|
||||
| GC=F Futures | `GLBX.MDP3` | `GC` + continuous or `GC=F` raw | Gold futures |
|
||||
| Gold Options | `OPRA.PILLAR` | `GLD` underlying | Options on GLD ETF |
|
||||
|
||||
### Schemas
|
||||
|
||||
| Schema | Use Case | Fields |
|
||||
|--------|----------|--------|
|
||||
| `ohlcv-1d` | Daily backtesting | open, high, low, close, volume |
|
||||
| `ohlcv-1h` | Intraday scenarios | Hourly bars |
|
||||
| `trades` | Tick-level analysis | Full trade data |
|
||||
| `definition` | Instrument metadata | Expiries, strike prices, tick sizes |
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Phase 1: Historical Price Source (DATA-DB-001)
|
||||
|
||||
**File:** `app/services/backtesting/databento_source.py`
|
||||
|
||||
```python
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from datetime import date, timedelta
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
import hashlib
|
||||
import json
|
||||
|
||||
from app.services.backtesting.historical_provider import DailyClosePoint, HistoricalPriceSource
|
||||
|
||||
try:
|
||||
import databento as db
|
||||
DATABENTO_AVAILABLE = True
|
||||
except ImportError:
|
||||
DATABENTO_AVAILABLE = False
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class DatabentoCacheKey:
|
||||
"""Cache key for Databento data requests."""
|
||||
dataset: str
|
||||
symbol: str
|
||||
schema: str
|
||||
start_date: date
|
||||
end_date: date
|
||||
|
||||
def cache_path(self, cache_dir: Path) -> Path:
|
||||
key_str = f"{self.dataset}_{self.symbol}_{self.schema}_{self.start_date}_{self.end_date}"
|
||||
key_hash = hashlib.sha256(key_str.encode()).hexdigest()[:16]
|
||||
return cache_dir / f"dbn_{key_hash}.parquet"
|
||||
|
||||
def metadata_path(self, cache_dir: Path) -> Path:
|
||||
key_str = f"{self.dataset}_{self.symbol}_{self.schema}_{self.start_date}_{self.end_date}"
|
||||
key_hash = hashlib.sha256(key_str.encode()).hexdigest()[:16]
|
||||
return cache_dir / f"dbn_{key_hash}_meta.json"
|
||||
|
||||
|
||||
@dataclass
|
||||
class DatabentoSourceConfig:
|
||||
"""Configuration for Databento data source."""
|
||||
api_key: str | None = None # Falls back to DATABENTO_API_KEY env var
|
||||
cache_dir: Path = Path(".cache/databento")
|
||||
dataset: str = "XNAS.BASIC"
|
||||
schema: str = "ohlcv-1d"
|
||||
stype_in: str = "raw_symbol"
|
||||
|
||||
# Re-download threshold
|
||||
max_cache_age_days: int = 30
|
||||
|
||||
|
||||
class DatabentoHistoricalPriceSource(HistoricalPriceSource):
|
||||
"""Databento-based historical price source for backtesting."""
|
||||
|
||||
def __init__(self, config: DatabentoSourceConfig | None = None) -> None:
|
||||
if not DATABENTO_AVAILABLE:
|
||||
raise RuntimeError("databento package required: pip install databento")
|
||||
|
||||
self.config = config or DatabentoSourceConfig()
|
||||
self.config.cache_dir.mkdir(parents=True, exist_ok=True)
|
||||
self._client: db.Historical | None = None
|
||||
|
||||
@property
|
||||
def client(self) -> db.Historical:
|
||||
if self._client is None:
|
||||
self._client = db.Historical(key=self.config.api_key)
|
||||
return self._client
|
||||
|
||||
def _load_from_cache(self, key: DatabentoCacheKey) -> list[DailyClosePoint] | None:
|
||||
"""Load cached data if available and fresh."""
|
||||
cache_file = key.cache_path(self.config.cache_dir)
|
||||
meta_file = key.metadata_path(self.config.cache_dir)
|
||||
|
||||
if not cache_file.exists() or not meta_file.exists():
|
||||
return None
|
||||
|
||||
try:
|
||||
with open(meta_file) as f:
|
||||
meta = json.load(f)
|
||||
|
||||
# Check cache age
|
||||
download_date = date.fromisoformat(meta["download_date"])
|
||||
age_days = (date.today() - download_date).days
|
||||
if age_days > self.config.max_cache_age_days:
|
||||
return None
|
||||
|
||||
# Check parameters match
|
||||
if meta["dataset"] != key.dataset or meta["symbol"] != key.symbol:
|
||||
return None
|
||||
|
||||
# Load parquet and convert
|
||||
import pandas as pd
|
||||
df = pd.read_parquet(cache_file)
|
||||
return self._df_to_daily_points(df)
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
def _save_to_cache(self, key: DatabentoCacheKey, df: pd.DataFrame) -> None:
|
||||
"""Save data to cache."""
|
||||
cache_file = key.cache_path(self.config.cache_dir)
|
||||
meta_file = key.metadata_path(self.config.cache_dir)
|
||||
|
||||
df.to_parquet(cache_file, index=False)
|
||||
|
||||
meta = {
|
||||
"download_date": date.today().isoformat(),
|
||||
"dataset": key.dataset,
|
||||
"symbol": key.symbol,
|
||||
"schema": key.schema,
|
||||
"start_date": key.start_date.isoformat(),
|
||||
"end_date": key.end_date.isoformat(),
|
||||
"rows": len(df),
|
||||
}
|
||||
with open(meta_file, "w") as f:
|
||||
json.dump(meta, f, indent=2)
|
||||
|
||||
def _fetch_from_databento(self, key: DatabentoCacheKey) -> pd.DataFrame:
|
||||
"""Fetch data from Databento API."""
|
||||
data = self.client.timeseries.get_range(
|
||||
dataset=key.dataset,
|
||||
symbols=key.symbol,
|
||||
schema=key.schema,
|
||||
start=key.start_date.isoformat(),
|
||||
end=(key.end_date + timedelta(days=1)).isoformat(), # Exclusive end
|
||||
stype_in=self.config.stype_in,
|
||||
)
|
||||
df = data.to_df()
|
||||
return df
|
||||
|
||||
def _df_to_daily_points(self, df: pd.DataFrame) -> list[DailyClosePoint]:
|
||||
"""Convert DataFrame to DailyClosePoint list."""
|
||||
points = []
|
||||
for idx, row in df.iterrows():
|
||||
# Databento ohlcv schema has ts_event as timestamp
|
||||
ts = row.get("ts_event", row.get("ts_recv", idx))
|
||||
if hasattr(ts, "date"):
|
||||
row_date = ts.date()
|
||||
else:
|
||||
row_date = date.fromisoformat(str(ts)[:10])
|
||||
|
||||
close = float(row["close"]) / 1e9 # Databento prices are int64 x 1e-9
|
||||
|
||||
points.append(DailyClosePoint(date=row_date, close=close))
|
||||
|
||||
return sorted(points, key=lambda p: p.date)
|
||||
|
||||
def load_daily_closes(self, symbol: str, start_date: date, end_date: date) -> list[DailyClosePoint]:
|
||||
"""Load daily closing prices from Databento (with caching)."""
|
||||
# Map symbols to datasets
|
||||
dataset = self._resolve_dataset(symbol)
|
||||
databento_symbol = self._resolve_symbol(symbol)
|
||||
|
||||
key = DatabentoCacheKey(
|
||||
dataset=dataset,
|
||||
symbol=databento_symbol,
|
||||
schema=self.config.schema,
|
||||
start_date=start_date,
|
||||
end_date=end_date,
|
||||
)
|
||||
|
||||
# Try cache first
|
||||
cached = self._load_from_cache(key)
|
||||
if cached is not None:
|
||||
return cached
|
||||
|
||||
# Fetch from Databento
|
||||
import pandas as pd
|
||||
df = self._fetch_from_databento(key)
|
||||
|
||||
# Cache results
|
||||
self._save_to_cache(key, df)
|
||||
|
||||
return self._df_to_daily_points(df)
|
||||
|
||||
def _resolve_dataset(self, symbol: str) -> str:
|
||||
"""Resolve symbol to Databento dataset."""
|
||||
symbol_upper = symbol.upper()
|
||||
if symbol_upper in ("GLD", "GLDM", "IAU"):
|
||||
return "XNAS.BASIC" # ETFs on Nasdaq
|
||||
elif symbol_upper in ("GC=F", "GC", "GOLD"):
|
||||
return "GLBX.MDP3" # CME gold futures
|
||||
elif symbol_upper == "XAU":
|
||||
return "XNAS.BASIC" # Treat as GLD proxy
|
||||
else:
|
||||
return self.config.dataset # Use configured default
|
||||
|
||||
def _resolve_symbol(self, symbol: str) -> str:
|
||||
"""Resolve vault-dash symbol to Databento symbol."""
|
||||
symbol_upper = symbol.upper()
|
||||
if symbol_upper == "XAU":
|
||||
return "GLD" # Proxy XAU via GLD prices
|
||||
elif symbol_upper == "GC=F":
|
||||
return "GC" # Use parent symbol for continuous contracts
|
||||
return symbol_upper
|
||||
|
||||
def get_cost_estimate(self, symbol: str, start_date: date, end_date: date) -> float:
|
||||
"""Estimate cost in USD for a data request."""
|
||||
dataset = self._resolve_dataset(symbol)
|
||||
databento_symbol = self._resolve_symbol(symbol)
|
||||
|
||||
try:
|
||||
cost = self.client.metadata.get_cost(
|
||||
dataset=dataset,
|
||||
symbols=databento_symbol,
|
||||
schema=self.config.schema,
|
||||
start=start_date.isoformat(),
|
||||
end=(end_date + timedelta(days=1)).isoformat(),
|
||||
)
|
||||
return cost
|
||||
except Exception:
|
||||
return 0.0 # Return 0 if cost estimation fails
|
||||
|
||||
|
||||
class DatabentoBacktestProvider:
|
||||
"""Databento-backed historical provider for synthetic backtesting."""
|
||||
|
||||
provider_id = "databento_v1"
|
||||
pricing_mode = "synthetic_bs_mid"
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
price_source: DatabentoHistoricalPriceSource,
|
||||
implied_volatility: float = 0.16,
|
||||
risk_free_rate: float = 0.045,
|
||||
) -> None:
|
||||
self.price_source = price_source
|
||||
self.implied_volatility = implied_volatility
|
||||
self.risk_free_rate = risk_free_rate
|
||||
|
||||
def load_history(self, symbol: str, start_date: date, end_date: date) -> list[DailyClosePoint]:
|
||||
return self.price_source.load_daily_closes(symbol, start_date, end_date)
|
||||
|
||||
# ... rest delegates to SyntheticHistoricalProvider logic
|
||||
```
|
||||
|
||||
### Phase 2: Backtest Settings Model (DATA-DB-002)
|
||||
|
||||
**File:** `app/models/backtest_settings.py`
|
||||
|
||||
```python
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import date
|
||||
from uuid import UUID
|
||||
|
||||
from app.models.backtest import ProviderRef
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class BacktestSettings:
|
||||
"""User-configurable backtest settings (independent of portfolio)."""
|
||||
|
||||
# Scenario identification
|
||||
settings_id: UUID
|
||||
name: str
|
||||
|
||||
# Data source configuration
|
||||
data_source: str = "databento" # "databento", "yfinance", "synthetic"
|
||||
dataset: str = "XNAS.BASIC"
|
||||
schema: str = "ohlcv-1d"
|
||||
|
||||
# Date range
|
||||
start_date: date = date(2024, 1, 1)
|
||||
end_date: date = date(2024, 12, 31)
|
||||
|
||||
# Independent scenario configuration (not derived from portfolio)
|
||||
underlying_symbol: str = "GLD"
|
||||
start_price: float = 0.0 # 0 = auto-derive from first close
|
||||
underlying_units: float = 1000.0 # Independent of portfolio
|
||||
loan_amount: float = 0.0 # Debt position for LTV analysis
|
||||
margin_call_ltv: float = 0.75
|
||||
|
||||
# Templates to test
|
||||
template_slugs: tuple[str, ...] = field(default_factory=lambda: ("protective-put-atm-12m",))
|
||||
|
||||
# Provider reference
|
||||
provider_ref: ProviderRef = field(default_factory=lambda: ProviderRef(
|
||||
provider_id="databento_v1",
|
||||
pricing_mode="synthetic_bs_mid",
|
||||
))
|
||||
|
||||
# Cache metadata
|
||||
cache_key: str = "" # Populated when data is fetched
|
||||
data_cost_usd: float = 0.0 # Cost of last data fetch
|
||||
```
|
||||
|
||||
### Phase 3: Cache Management (DATA-DB-003)
|
||||
|
||||
**File:** `app/services/backtesting/databento_cache.py`
|
||||
|
||||
```python
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from datetime import date, timedelta
|
||||
from pathlib import Path
|
||||
import hashlib
|
||||
import json
|
||||
|
||||
from app.services.backtesting.databento_source import DatabentoCacheKey
|
||||
|
||||
|
||||
@dataclass
|
||||
class CacheEntry:
|
||||
"""Metadata for a cached Databento dataset."""
|
||||
cache_key: DatabentoCacheKey
|
||||
file_path: Path
|
||||
download_date: date
|
||||
size_bytes: int
|
||||
cost_usd: float
|
||||
|
||||
|
||||
class DatabentoCacheManager:
|
||||
"""Manages Databento data cache lifecycle."""
|
||||
|
||||
def __init__(self, cache_dir: Path = Path(".cache/databento")) -> None:
|
||||
self.cache_dir = cache_dir
|
||||
self.cache_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
def list_entries(self) -> list[CacheEntry]:
|
||||
"""List all cached entries."""
|
||||
entries = []
|
||||
for meta_file in self.cache_dir.glob("*_meta.json"):
|
||||
with open(meta_file) as f:
|
||||
meta = json.load(f)
|
||||
|
||||
cache_file = meta_file.with_name(meta_file.stem.replace("_meta", "") + ".parquet")
|
||||
if cache_file.exists():
|
||||
entries.append(CacheEntry(
|
||||
cache_key=DatabentoCacheKey(
|
||||
dataset=meta["dataset"],
|
||||
symbol=meta["symbol"],
|
||||
schema=meta["schema"],
|
||||
start_date=date.fromisoformat(meta["start_date"]),
|
||||
end_date=date.fromisoformat(meta["end_date"]),
|
||||
),
|
||||
file_path=cache_file,
|
||||
download_date=date.fromisoformat(meta["download_date"]),
|
||||
size_bytes=cache_file.stat().st_size,
|
||||
cost_usd=0.0, # Would need to track separately
|
||||
))
|
||||
return entries
|
||||
|
||||
def invalidate_expired(self, max_age_days: int = 30) -> list[Path]:
|
||||
"""Remove cache entries older than max_age_days."""
|
||||
removed = []
|
||||
cutoff = date.today() - timedelta(days=max_age_days)
|
||||
|
||||
for entry in self.list_entries():
|
||||
if entry.download_date < cutoff:
|
||||
entry.file_path.unlink(missing_ok=True)
|
||||
meta_file = entry.file_path.with_name(entry.file_path.stem + "_meta.json")
|
||||
meta_file.unlink(missing_ok=True)
|
||||
removed.append(entry.file_path)
|
||||
|
||||
return removed
|
||||
|
||||
def clear_all(self) -> int:
|
||||
"""Clear all cached data."""
|
||||
count = 0
|
||||
for file in self.cache_dir.glob("*"):
|
||||
if file.is_file():
|
||||
file.unlink()
|
||||
count += 1
|
||||
return count
|
||||
|
||||
def get_cache_size(self) -> int:
|
||||
"""Get total cache size in bytes."""
|
||||
return sum(f.stat().st_size for f in self.cache_dir.glob("*") if f.is_file())
|
||||
|
||||
def should_redownload(self, key: DatabentoCacheKey, params_changed: bool) -> bool:
|
||||
"""Determine if data should be re-downloaded."""
|
||||
cache_file = key.cache_path(self.cache_dir)
|
||||
meta_file = key.metadata_path(self.cache_dir)
|
||||
|
||||
if params_changed:
|
||||
return True
|
||||
|
||||
if not cache_file.exists() or not meta_file.exists():
|
||||
return True
|
||||
|
||||
try:
|
||||
with open(meta_file) as f:
|
||||
meta = json.load(f)
|
||||
download_date = date.fromisoformat(meta["download_date"])
|
||||
age_days = (date.today() - download_date).days
|
||||
return age_days > 30
|
||||
except Exception:
|
||||
return True
|
||||
```
|
||||
|
||||
### Phase 4: Backtest Page UI Updates (DATA-DB-004)
|
||||
|
||||
**Key changes to `app/pages/backtests.py`:**
|
||||
|
||||
1. Add Databento configuration section
|
||||
2. Add independent start price/units inputs
|
||||
3. Show estimated data cost before fetching
|
||||
4. Cache status indicator
|
||||
|
||||
```python
|
||||
# In backtests.py
|
||||
|
||||
with ui.card().classes("w-full ..."):
|
||||
ui.label("Data Source").classes("text-lg font-semibold")
|
||||
|
||||
data_source = ui.select(
|
||||
{"databento": "Databento (historical market data)", "yfinance": "Yahoo Finance (free, limited)"},
|
||||
value="databento",
|
||||
label="Data source",
|
||||
).classes("w-full")
|
||||
|
||||
# Databento-specific settings
|
||||
with ui.column().classes("w-full gap-2").bind_visibility_from(data_source, "value", lambda v: v == "databento"):
|
||||
ui.label("Dataset configuration").classes("text-sm text-slate-500")
|
||||
|
||||
dataset_select = ui.select(
|
||||
{"XNAS.BASIC": "Nasdaq Basic (GLD)", "GLBX.MDP3": "CME Globex (GC=F)"},
|
||||
value="XNAS.BASIC",
|
||||
label="Dataset",
|
||||
).classes("w-full")
|
||||
|
||||
schema_select = ui.select(
|
||||
{"ohlcv-1d": "Daily bars", "ohlcv-1h": "Hourly bars"},
|
||||
value="ohlcv-1d",
|
||||
label="Resolution",
|
||||
).classes("w-full")
|
||||
|
||||
# Cost estimate
|
||||
cost_label = ui.label("Estimated cost: $0.00").classes("text-sm text-slate-500")
|
||||
|
||||
# Cache status
|
||||
cache_status = ui.label("").classes("text-xs text-slate-400")
|
||||
|
||||
# Independent scenario settings
|
||||
with ui.card().classes("w-full ..."):
|
||||
ui.label("Scenario Configuration").classes("text-lg font-semibold")
|
||||
ui.label("Configure start values independent of portfolio settings").classes("text-sm text-slate-500")
|
||||
|
||||
start_price_input = ui.number(
|
||||
"Start price",
|
||||
value=0.0,
|
||||
min=0.0,
|
||||
step=0.01,
|
||||
).classes("w-full")
|
||||
ui.label("Set to 0 to auto-derive from first historical close").classes("text-xs text-slate-400 -mt-2")
|
||||
|
||||
underlying_units_input = ui.number(
|
||||
"Underlying units",
|
||||
value=1000.0,
|
||||
min=0.0001,
|
||||
step=0.0001,
|
||||
).classes("w-full")
|
||||
|
||||
loan_amount_input = ui.number(
|
||||
"Loan amount ($)",
|
||||
value=0.0,
|
||||
min=0.0,
|
||||
step=1000,
|
||||
).classes("w-full")
|
||||
```
|
||||
|
||||
### Phase 5: Scenario Pre-Seeding (DATA-DB-005)
|
||||
|
||||
**File:** `app/services/backtesting/scenario_bulk_download.py`
|
||||
|
||||
```python
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from datetime import date
|
||||
from pathlib import Path
|
||||
import json
|
||||
|
||||
try:
|
||||
import databento as db
|
||||
DATABENTO_AVAILABLE = True
|
||||
except ImportError:
|
||||
DATABENTO_AVAILABLE = False
|
||||
|
||||
|
||||
@dataclass
|
||||
class ScenarioPreset:
|
||||
"""Pre-configured scenario ready for backtesting."""
|
||||
preset_id: str
|
||||
display_name: str
|
||||
symbol: str
|
||||
dataset: str
|
||||
window_start: date
|
||||
window_end: date
|
||||
default_start_price: float # First close in window
|
||||
default_templates: tuple[str, ...]
|
||||
event_type: str
|
||||
tags: tuple[str, ...]
|
||||
description: str
|
||||
|
||||
|
||||
def download_historical_presets(
|
||||
client: db.Historical,
|
||||
presets: list[ScenarioPreset],
|
||||
output_dir: Path,
|
||||
) -> dict[str, Path]:
|
||||
"""Bulk download historical data for all presets.
|
||||
|
||||
Returns mapping of preset_id to cached file path.
|
||||
"""
|
||||
results = {}
|
||||
|
||||
for preset in presets:
|
||||
cache_key = DatabentoCacheKey(
|
||||
dataset=preset.dataset,
|
||||
symbol=preset.symbol,
|
||||
schema="ohlcv-1d",
|
||||
start_date=preset.window_start,
|
||||
end_date=preset.window_end,
|
||||
)
|
||||
|
||||
cache_file = cache_key.cache_path(output_dir)
|
||||
|
||||
# Download if not cached
|
||||
if not cache_file.exists():
|
||||
data = client.timeseries.get_range(
|
||||
dataset=preset.dataset,
|
||||
symbols=preset.symbol,
|
||||
schema="ohlcv-1d",
|
||||
start=preset.window_start.isoformat(),
|
||||
end=preset.window_end.isoformat(),
|
||||
)
|
||||
data.to_parquet(cache_file)
|
||||
|
||||
results[preset.preset_id] = cache_file
|
||||
|
||||
return results
|
||||
|
||||
|
||||
def create_default_presets() -> list[ScenarioPreset]:
|
||||
"""Create default scenario presets for gold hedging research."""
|
||||
return [
|
||||
ScenarioPreset(
|
||||
preset_id="gld-2020-covid-crash",
|
||||
display_name="GLD March 2020 COVID Crash",
|
||||
symbol="GLD",
|
||||
dataset="XNAS.BASIC",
|
||||
window_start=date(2020, 2, 15),
|
||||
window_end=date(2020, 4, 15),
|
||||
default_start_price=143.0, # Approx GLD close on 2020-02-15
|
||||
default_templates=("protective-put-atm-12m", "protective-put-95pct-12m"),
|
||||
event_type="crash",
|
||||
tags=("covid", "crash", "high-vol"),
|
||||
description="March 2020 COVID market crash - extreme volatility event",
|
||||
),
|
||||
ScenarioPreset(
|
||||
preset_id="gld-2022-rate-hike-cycle",
|
||||
display_name="GLD 2022 Rate Hike Cycle",
|
||||
symbol="GLD",
|
||||
dataset="XNAS.BASIC",
|
||||
window_start=date(2022, 1, 1),
|
||||
window_end=date(2022, 12, 31),
|
||||
default_start_price=168.0,
|
||||
default_templates=("protective-put-atm-12m", "ladder-50-50-atm-95pct-12m"),
|
||||
event_type="rate_cycle",
|
||||
tags=("rates", "fed", "extended"),
|
||||
description="Full year 2022 - aggressive Fed rate hikes",
|
||||
),
|
||||
ScenarioPreset(
|
||||
preset_id="gcf-2024-rally",
|
||||
display_name="GC=F 2024 Gold Rally",
|
||||
symbol="GC",
|
||||
dataset="GLBX.MDP3",
|
||||
window_start=date(2024, 1, 1),
|
||||
window_end=date(2024, 12, 31),
|
||||
default_start_price=2060.0,
|
||||
default_templates=("protective-put-atm-12m",),
|
||||
event_type="rally",
|
||||
tags=("gold", "futures", "rally"),
|
||||
description="Gold futures rally in 2024",
|
||||
),
|
||||
]
|
||||
```
|
||||
|
||||
### Phase 6: Settings Persistence (DATA-DB-006)
|
||||
|
||||
**File:** `app/models/backtest_settings_repository.py`
|
||||
|
||||
```python
|
||||
from dataclasses import asdict
|
||||
from datetime import date
|
||||
from pathlib import Path
|
||||
from uuid import UUID, uuid4
|
||||
import json
|
||||
|
||||
from app.models.backtest_settings import BacktestSettings
|
||||
|
||||
|
||||
class BacktestSettingsRepository:
|
||||
"""Persistence for backtest settings."""
|
||||
|
||||
def __init__(self, base_path: Path | None = None) -> None:
|
||||
self.base_path = base_path or Path(".workspaces")
|
||||
|
||||
def _settings_path(self, workspace_id: str) -> Path:
|
||||
return self.base_path / workspace_id / "backtest_settings.json"
|
||||
|
||||
def load(self, workspace_id: str) -> BacktestSettings:
|
||||
"""Load backtest settings, creating defaults if not found."""
|
||||
path = self._settings_path(workspace_id)
|
||||
|
||||
if path.exists():
|
||||
with open(path) as f:
|
||||
data = json.load(f)
|
||||
return BacktestSettings(
|
||||
settings_id=UUID(data["settings_id"]),
|
||||
name=data.get("name", "Default Backtest"),
|
||||
data_source=data.get("data_source", "databento"),
|
||||
dataset=data.get("dataset", "XNAS.BASIC"),
|
||||
schema=data.get("schema", "ohlcv-1d"),
|
||||
start_date=date.fromisoformat(data["start_date"]),
|
||||
end_date=date.fromisoformat(data["end_date"]),
|
||||
underlying_symbol=data.get("underlying_symbol", "GLD"),
|
||||
start_price=data.get("start_price", 0.0),
|
||||
underlying_units=data.get("underlying_units", 1000.0),
|
||||
loan_amount=data.get("loan_amount", 0.0),
|
||||
margin_call_ltv=data.get("margin_call_ltv", 0.75),
|
||||
template_slugs=tuple(data.get("template_slugs", ("protective-put-atm-12m",))),
|
||||
cache_key=data.get("cache_key", ""),
|
||||
data_cost_usd=data.get("data_cost_usd", 0.0),
|
||||
)
|
||||
|
||||
# Return defaults
|
||||
return BacktestSettings(
|
||||
settings_id=uuid4(),
|
||||
name="Default Backtest",
|
||||
)
|
||||
|
||||
def save(self, workspace_id: str, settings: BacktestSettings) -> None:
|
||||
"""Persist backtest settings."""
|
||||
path = self._settings_path(workspace_id)
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
data = asdict(settings)
|
||||
data["settings_id"] = str(data["settings_id"])
|
||||
data["start_date"] = data["start_date"].isoformat()
|
||||
data["end_date"] = data["end_date"].isoformat()
|
||||
data["template_slugs"] = list(data["template_slugs"])
|
||||
data["provider_ref"] = {
|
||||
"provider_id": settings.provider_ref.provider_id,
|
||||
"pricing_mode": settings.provider_ref.pricing_mode,
|
||||
}
|
||||
|
||||
with open(path, "w") as f:
|
||||
json.dump(data, f, indent=2)
|
||||
```
|
||||
|
||||
## Roadmap Items
|
||||
|
||||
### DATA-DB-001: Databento Historical Price Source
|
||||
**Dependencies:** None
|
||||
**Estimated effort:** 2-3 days
|
||||
**Deliverables:**
|
||||
- `app/services/backtesting/databento_source.py`
|
||||
- `tests/test_databento_source.py` (mocked API)
|
||||
- Environment variable `DATABENTO_API_KEY` support
|
||||
|
||||
### DATA-DB-002: Backtest Settings Model
|
||||
**Dependencies:** None
|
||||
**Estimated effort:** 1 day
|
||||
**Deliverables:**
|
||||
- `app/models/backtest_settings.py`
|
||||
- Repository for persistence
|
||||
|
||||
### DATA-DB-003: Cache Management
|
||||
**Dependencies:** DATA-DB-001
|
||||
**Estimated effort:** 1 day
|
||||
**Deliverables:**
|
||||
- `app/services/backtesting/databento_cache.py`
|
||||
- Cache cleanup CLI command
|
||||
|
||||
### DATA-DB-004: Backtest Page UI Updates
|
||||
**Dependencies:** DATA-DB-001, DATA-DB-002
|
||||
**Estimated effort:** 2 days
|
||||
**Deliverables:**
|
||||
- Updated `app/pages/backtests.py`
|
||||
- Updated `app/pages/event_comparison.py`
|
||||
- Cost estimation display
|
||||
|
||||
### DATA-DB-005: Scenario Pre-Seeding
|
||||
**Dependencies:** DATA-DB-001
|
||||
**Estimated effort:** 1-2 days
|
||||
**Deliverables:**
|
||||
- `app/services/backtesting/scenario_bulk_download.py`
|
||||
- Pre-configured presets for gold hedging research
|
||||
- Bulk download script
|
||||
|
||||
### DATA-DB-006: Options Data Source (Future)
|
||||
**Dependencies:** DATA-DB-001
|
||||
**Estimated effort:** 3-5 days
|
||||
**Deliverables:**
|
||||
- `DatabentoOptionSnapshotSource` implementing `OptionSnapshotSource`
|
||||
- OPRA.PILLAR integration for historical options chains
|
||||
|
||||
## Configuration
|
||||
|
||||
Add to `.env`:
|
||||
```
|
||||
DATABENTO_API_KEY=db-xxxxxxxxxxxxxxxxxxxxxxxx
|
||||
```
|
||||
|
||||
Add to `requirements.txt`:
|
||||
```
|
||||
databento>=0.30.0
|
||||
```
|
||||
|
||||
Add to `pyproject.toml`:
|
||||
```toml
|
||||
[project.optional-dependencies]
|
||||
databento = ["databento>=0.30.0"]
|
||||
```
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
1. **Unit tests** with mocked Databento responses (`tests/test_databento_source.py`)
|
||||
2. **Integration tests** with recorded VCR cassettes (`tests/cassettes/*.yaml`)
|
||||
3. **E2E tests** using cached data (`tests/test_backtest_databento_playwright.py`)
|
||||
|
||||
## Cost Management
|
||||
|
||||
- Use `metadata.get_cost()` before fetching to show estimated cost
|
||||
- Default to cached data when available
|
||||
- Batch download for large historical ranges (>1 year)
|
||||
- Consider Databento flat rate plans for heavy usage
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- API key stored in environment variable, never in code
|
||||
- Cache files contain only market data (no PII)
|
||||
- Rate limiting respected (100 requests/second per IP)
|
||||
@@ -1,5 +1,5 @@
|
||||
version: 1
|
||||
updated_at: 2026-03-27
|
||||
updated_at: 2026-03-28
|
||||
structure:
|
||||
backlog_dir: docs/roadmap/backlog
|
||||
in_progress_dir: docs/roadmap/in-progress
|
||||
@@ -13,14 +13,20 @@ notes:
|
||||
- Pre-alpha policy: we may cut or replace old features without backward compatibility until alpha is declared.
|
||||
- Alpha migration policy: once alpha is declared, compatibility only needs to move forward; backward migrations are not required.
|
||||
priority_queue:
|
||||
- DATA-DB-001
|
||||
- DATA-DB-002
|
||||
- DATA-DB-004
|
||||
- CONV-001
|
||||
- EXEC-002
|
||||
- DATA-DB-003
|
||||
- DATA-DB-005
|
||||
- DATA-002A
|
||||
- DATA-001A
|
||||
- OPS-001
|
||||
- BT-003
|
||||
- BT-002A
|
||||
- GCF-001
|
||||
- DATA-DB-006
|
||||
recently_completed:
|
||||
- PORTFOLIO-003
|
||||
- PORTFOLIO-002
|
||||
@@ -44,6 +50,12 @@ recently_completed:
|
||||
- CORE-002B
|
||||
states:
|
||||
backlog:
|
||||
- DATA-DB-001
|
||||
- DATA-DB-002
|
||||
- DATA-DB-003
|
||||
- DATA-DB-004
|
||||
- DATA-DB-005
|
||||
- DATA-DB-006
|
||||
- CONV-001
|
||||
- EXEC-002
|
||||
- DATA-002A
|
||||
|
||||
@@ -0,0 +1,34 @@
|
||||
id: DATA-DB-001
|
||||
title: Databento Historical Price Source
|
||||
status: backlog
|
||||
priority: high
|
||||
dependencies: []
|
||||
estimated_effort: 2-3 days
|
||||
created: 2026-03-28
|
||||
updated: 2026-03-28
|
||||
|
||||
description: |
|
||||
Integrate Databento historical API as a data source for backtesting and scenario
|
||||
comparison pages. This replaces yfinance for historical data on backtest pages
|
||||
and provides reliable, high-quality market data.
|
||||
|
||||
acceptance_criteria:
|
||||
- DatabentoHistoricalPriceSource implements HistoricalPriceSource protocol
|
||||
- Cache layer prevents redundant downloads when parameters unchanged
|
||||
- Environment variable DATABENTO_API_KEY used for authentication
|
||||
- Cost estimation available before data fetch
|
||||
- GLD symbol resolved to XNAS.BASIC dataset
|
||||
- GC=F symbol resolved to GLBX.MDP3 dataset
|
||||
- Unit tests with mocked Databento responses pass
|
||||
|
||||
implementation_notes: |
|
||||
Key files:
|
||||
- app/services/backtesting/databento_source.py (new)
|
||||
- tests/test_databento_source.py (new)
|
||||
|
||||
Uses ohlcv-1d schema for daily bars. The cache key includes dataset, symbol,
|
||||
schema, start_date, and end_date. Cache files are Parquet format for fast
|
||||
loading. Metadata includes download_date for age validation.
|
||||
|
||||
dependencies_detail:
|
||||
- None - this is the foundation for Databento integration
|
||||
@@ -0,0 +1,39 @@
|
||||
id: DATA-DB-002
|
||||
title: Backtest Settings Model
|
||||
status: backlog
|
||||
priority: high
|
||||
dependencies:
|
||||
- DATA-DB-001
|
||||
estimated_effort: 1 day
|
||||
created: 2026-03-28
|
||||
updated: 2026-03-28
|
||||
|
||||
description: |
|
||||
Create BacktestSettings model that captures user-configurable backtest parameters
|
||||
independent of portfolio settings. This allows running scenarios with custom start
|
||||
prices and position sizes without modifying the main portfolio.
|
||||
|
||||
acceptance_criteria:
|
||||
- BacktestSettings dataclass defined with all necessary fields
|
||||
- start_price can be 0 (auto-derive) or explicit value
|
||||
- underlying_units independent of portfolio.gold_ounces
|
||||
- loan_amount and margin_call_ltv for LTV analysis
|
||||
- data_source field supports "databento" and "yfinance"
|
||||
- Repository persists settings per workspace
|
||||
- Default settings created for new workspaces
|
||||
|
||||
implementation_notes: |
|
||||
Key fields:
|
||||
- settings_id: UUID for tracking
|
||||
- data_source: "databento" | "yfinance" | "synthetic"
|
||||
- dataset: "XNAS.BASIC" | "GLBX.MDP3"
|
||||
- underlying_symbol: "GLD" | "GC" | "XAU"
|
||||
- start_date, end_date: date range
|
||||
- start_price: 0 for auto-derive, or explicit
|
||||
- underlying_units: position size for scenario
|
||||
- loan_amount: debt level for LTV analysis
|
||||
|
||||
Settings are stored in .workspaces/{workspace_id}/backtest_settings.json
|
||||
|
||||
dependencies_detail:
|
||||
- DATA-DB-001: Need data source configuration fields
|
||||
@@ -0,0 +1,40 @@
|
||||
id: DATA-DB-003
|
||||
title: Databento Cache Management
|
||||
status: backlog
|
||||
priority: medium
|
||||
dependencies:
|
||||
- DATA-DB-001
|
||||
estimated_effort: 1 day
|
||||
created: 2026-03-28
|
||||
updated: 2026-03-28
|
||||
|
||||
description: |
|
||||
Implement cache lifecycle management for Databento data. Cache files should be
|
||||
invalidated after configurable age (default 30 days) and when request parameters
|
||||
change. Provide CLI tool for cache inspection and cleanup.
|
||||
|
||||
acceptance_criteria:
|
||||
- DatabentoCacheManager lists all cached entries
|
||||
- Entries invalidated after max_age_days
|
||||
- Parameters change detection triggers re-download
|
||||
- Cache size tracking available
|
||||
- CLI command to clear all cache
|
||||
- CLI command to show cache statistics
|
||||
|
||||
implementation_notes: |
|
||||
Cache files stored in .cache/databento/:
|
||||
- dbn_{hash}.parquet: Data file
|
||||
- dbn_{hash}_meta.json: Metadata (download_date, params, rows)
|
||||
|
||||
Cache invalidation rules:
|
||||
1. Age > 30 days: re-download
|
||||
2. Parameters changed: re-download
|
||||
3. File corruption: re-download
|
||||
|
||||
CLI commands:
|
||||
- vault-dash cache list
|
||||
- vault-dash cache clear
|
||||
- vault-dash cache stats
|
||||
|
||||
dependencies_detail:
|
||||
- DATA-DB-001: Needs DatabentoCacheKey structure
|
||||
@@ -0,0 +1,50 @@
|
||||
id: DATA-DB-004
|
||||
title: Backtest Page UI Updates
|
||||
status: backlog
|
||||
priority: high
|
||||
dependencies:
|
||||
- DATA-DB-001
|
||||
- DATA-DB-002
|
||||
estimated_effort: 2 days
|
||||
created: 2026-03-28
|
||||
updated: 2026-03-28
|
||||
|
||||
description: |
|
||||
Update backtest and event comparison pages to support Databento data source
|
||||
and independent scenario configuration. Show estimated data cost and cache
|
||||
status in the UI.
|
||||
|
||||
acceptance_criteria:
|
||||
- Data source selector shows Databento and yFinance options
|
||||
- Databento config shows dataset and resolution dropdowns
|
||||
- Dataset selection updates cost estimate display
|
||||
- Cache status shows age of cached data
|
||||
- Independent start price input (0 = auto-derive)
|
||||
- Independent underlying units and loan amount
|
||||
- Event comparison page uses same data source config
|
||||
- Settings persist across sessions
|
||||
|
||||
implementation_notes: |
|
||||
Page changes:
|
||||
|
||||
Backtests page:
|
||||
- Add "Data Source" section with Databento/yFinance toggle
|
||||
- Add dataset selector (XNAS.BASIC for GLD, GLBX.MDP3 for GC=F)
|
||||
- Add resolution selector (ohlcv-1d, ohlcv-1h)
|
||||
- Show estimated cost with refresh button
|
||||
- Show cache status (age, size)
|
||||
- "Configure Scenario" section with independent start price/units
|
||||
|
||||
Event comparison page:
|
||||
- Same data source configuration
|
||||
- Preset scenarios show if data cached
|
||||
- Cost estimate for missing data
|
||||
|
||||
State management:
|
||||
- Use workspace-level BacktestSettings
|
||||
- Load on page mount, save on change
|
||||
- Invalidate cache when params change
|
||||
|
||||
dependencies_detail:
|
||||
- DATA-DB-001: Need DatabentoHistoricalPriceSource
|
||||
- DATA-DB-002: Need BacktestSettings model
|
||||
48
docs/roadmap/backlog/DATA-DB-005-scenario-pre-seeding.yaml
Normal file
48
docs/roadmap/backlog/DATA-DB-005-scenario-pre-seeding.yaml
Normal file
@@ -0,0 +1,48 @@
|
||||
id: DATA-DB-005
|
||||
title: Scenario Pre-Seeding from Bulk Downloads
|
||||
status: backlog
|
||||
priority: medium
|
||||
dependencies:
|
||||
- DATA-DB-001
|
||||
estimated_effort: 1-2 days
|
||||
created: 2026-03-28
|
||||
updated: 2026-03-28
|
||||
|
||||
description: |
|
||||
Create pre-configured scenario presets for gold hedging research and implement
|
||||
bulk download capability to pre-seed event comparison pages. This allows quick
|
||||
testing against historical events without per-event data fetching.
|
||||
|
||||
acceptance_criteria:
|
||||
- Default presets include COVID crash, rate hike cycle, gold rally events
|
||||
- Bulk download script fetches all preset data
|
||||
- Presets stored in config file (JSON/YAML)
|
||||
- Event comparison page shows preset data availability
|
||||
- One-click "Download All Presets" button
|
||||
- Progress indicator during bulk download
|
||||
|
||||
implementation_notes: |
|
||||
Default presets:
|
||||
- GLD March 2020 COVID Crash (extreme volatility)
|
||||
- GLD 2022 Rate Hike Cycle (full year)
|
||||
- GC=F 2024 Gold Rally (futures data)
|
||||
|
||||
Bulk download flow:
|
||||
1. Create batch job for each preset
|
||||
2. Show progress per preset
|
||||
3. Store in cache directory
|
||||
4. Update preset availability status
|
||||
|
||||
Preset format:
|
||||
- preset_id: unique identifier
|
||||
- display_name: human-readable name
|
||||
- symbol: GLD, GC, etc.
|
||||
- dataset: Databento dataset
|
||||
- window_start/end: date range
|
||||
- default_start_price: first close
|
||||
- default_templates: hedging strategies
|
||||
- event_type: crash, rally, rate_cycle
|
||||
- tags: for filtering
|
||||
|
||||
dependencies_detail:
|
||||
- DATA-DB-001: Needs cache infrastructure
|
||||
@@ -0,0 +1,46 @@
|
||||
id: DATA-DB-006
|
||||
title: Databento Options Data Source
|
||||
status: backlog
|
||||
priority: low
|
||||
dependencies:
|
||||
- DATA-DB-001
|
||||
estimated_effort: 3-5 days
|
||||
created: 2026-03-28
|
||||
updated: 2026-03-28
|
||||
|
||||
description: |
|
||||
Implement historical options data source using Databento's OPRA.PILLAR dataset.
|
||||
This enables historical options chain lookups for accurate backtesting with
|
||||
real options prices, replacing synthetic Black-Scholes pricing.
|
||||
|
||||
acceptance_criteria:
|
||||
- DatabentoOptionSnapshotSource implements OptionSnapshotSource protocol
|
||||
- OPRA.PILLAR dataset used for GLD/SPY options
|
||||
- Option chain lookup by snapshot_date and symbol
|
||||
- Strike and expiry filtering supported
|
||||
- Cached per-date for efficiency
|
||||
- Fallback to synthetic pricing when data unavailable
|
||||
|
||||
implementation_notes: |
|
||||
OPRA.PILLAR provides consolidated options data from all US options exchanges.
|
||||
|
||||
Key challenges:
|
||||
1. OPRA data volume is large - need efficient caching
|
||||
2. Option symbology differs from regular symbols
|
||||
3. Need strike/expiry resolution in symbology
|
||||
|
||||
Implementation approach:
|
||||
- Use 'definition' schema to get instrument metadata
|
||||
- Use 'trades' or 'ohlcv-1d' for price history
|
||||
- Cache per (symbol, expiration, strike, option_type, date)
|
||||
- Use continuous contracts for futures options (GC=F)
|
||||
|
||||
Symbology:
|
||||
- GLD options: Use underlying symbol "GLD" with OPRA
|
||||
- GC options: Use parent symbology "GC" for continuous contracts
|
||||
|
||||
This is a future enhancement - not required for initial backtesting
|
||||
which uses synthetic Black-Scholes pricing.
|
||||
|
||||
dependencies_detail:
|
||||
- DATA-DB-001: Needs base cache infrastructure
|
||||
Reference in New Issue
Block a user