docs: add Databento integration plan and roadmap items

This commit is contained in:
Bu5hm4nn
2026-03-29 09:52:06 +02:00
parent 8079ca58e7
commit c02159481d
8 changed files with 1050 additions and 1 deletions

View File

@@ -0,0 +1,780 @@
# Databento Historical Data Integration Plan
## Overview
Integrate Databento historical API for backtesting and scenario comparison pages, replacing yfinance for historical data on these pages. The integration will support configurable start prices/values independent of portfolio settings, with intelligent caching to avoid redundant downloads.
## Architecture
### Current State
- **Backtest page** (`app/pages/backtests.py`): Uses `YFinanceHistoricalPriceSource` via `BacktestPageService`
- **Event comparison** (`app/pages/event_comparison.py`): Uses seeded event presets with yfinance data
- **Historical provider** (`app/services/backtesting/historical_provider.py`): Protocol-based architecture with `YFinanceHistoricalPriceSource` and `SyntheticHistoricalProvider`
### Target State
- Add `DatabentoHistoricalPriceSource` implementing `HistoricalPriceSource` protocol
- Add `DatabentoHistoricalOptionSource` implementing `OptionSnapshotSource` protocol (future)
- Smart caching layer: only re-download when parameters change
- Pre-seeded scenario data via batch downloads
## Databento Data Sources
### Underlyings and Datasets
| Instrument | Dataset | Symbol Format | Notes |
|------------|---------|----------------|-------|
| GLD ETF | `XNAS.BASIC` or `EQUS.PLUS` | `GLD` | US equities consolidated |
| GC=F Futures | `GLBX.MDP3` | `GC` + continuous or `GC=F` raw | Gold futures |
| Gold Options | `OPRA.PILLAR` | `GLD` underlying | Options on GLD ETF |
### Schemas
| Schema | Use Case | Fields |
|--------|----------|--------|
| `ohlcv-1d` | Daily backtesting | open, high, low, close, volume |
| `ohlcv-1h` | Intraday scenarios | Hourly bars |
| `trades` | Tick-level analysis | Full trade data |
| `definition` | Instrument metadata | Expiries, strike prices, tick sizes |
## Implementation Plan
### Phase 1: Historical Price Source (DATA-DB-001)
**File:** `app/services/backtesting/databento_source.py`
```python
from __future__ import annotations
from dataclasses import dataclass
from datetime import date, timedelta
from pathlib import Path
from typing import Any
import hashlib
import json
from app.services.backtesting.historical_provider import DailyClosePoint, HistoricalPriceSource
try:
import databento as db
DATABENTO_AVAILABLE = True
except ImportError:
DATABENTO_AVAILABLE = False
@dataclass(frozen=True)
class DatabentoCacheKey:
"""Cache key for Databento data requests."""
dataset: str
symbol: str
schema: str
start_date: date
end_date: date
def cache_path(self, cache_dir: Path) -> Path:
key_str = f"{self.dataset}_{self.symbol}_{self.schema}_{self.start_date}_{self.end_date}"
key_hash = hashlib.sha256(key_str.encode()).hexdigest()[:16]
return cache_dir / f"dbn_{key_hash}.parquet"
def metadata_path(self, cache_dir: Path) -> Path:
key_str = f"{self.dataset}_{self.symbol}_{self.schema}_{self.start_date}_{self.end_date}"
key_hash = hashlib.sha256(key_str.encode()).hexdigest()[:16]
return cache_dir / f"dbn_{key_hash}_meta.json"
@dataclass
class DatabentoSourceConfig:
"""Configuration for Databento data source."""
api_key: str | None = None # Falls back to DATABENTO_API_KEY env var
cache_dir: Path = Path(".cache/databento")
dataset: str = "XNAS.BASIC"
schema: str = "ohlcv-1d"
stype_in: str = "raw_symbol"
# Re-download threshold
max_cache_age_days: int = 30
class DatabentoHistoricalPriceSource(HistoricalPriceSource):
"""Databento-based historical price source for backtesting."""
def __init__(self, config: DatabentoSourceConfig | None = None) -> None:
if not DATABENTO_AVAILABLE:
raise RuntimeError("databento package required: pip install databento")
self.config = config or DatabentoSourceConfig()
self.config.cache_dir.mkdir(parents=True, exist_ok=True)
self._client: db.Historical | None = None
@property
def client(self) -> db.Historical:
if self._client is None:
self._client = db.Historical(key=self.config.api_key)
return self._client
def _load_from_cache(self, key: DatabentoCacheKey) -> list[DailyClosePoint] | None:
"""Load cached data if available and fresh."""
cache_file = key.cache_path(self.config.cache_dir)
meta_file = key.metadata_path(self.config.cache_dir)
if not cache_file.exists() or not meta_file.exists():
return None
try:
with open(meta_file) as f:
meta = json.load(f)
# Check cache age
download_date = date.fromisoformat(meta["download_date"])
age_days = (date.today() - download_date).days
if age_days > self.config.max_cache_age_days:
return None
# Check parameters match
if meta["dataset"] != key.dataset or meta["symbol"] != key.symbol:
return None
# Load parquet and convert
import pandas as pd
df = pd.read_parquet(cache_file)
return self._df_to_daily_points(df)
except Exception:
return None
def _save_to_cache(self, key: DatabentoCacheKey, df: pd.DataFrame) -> None:
"""Save data to cache."""
cache_file = key.cache_path(self.config.cache_dir)
meta_file = key.metadata_path(self.config.cache_dir)
df.to_parquet(cache_file, index=False)
meta = {
"download_date": date.today().isoformat(),
"dataset": key.dataset,
"symbol": key.symbol,
"schema": key.schema,
"start_date": key.start_date.isoformat(),
"end_date": key.end_date.isoformat(),
"rows": len(df),
}
with open(meta_file, "w") as f:
json.dump(meta, f, indent=2)
def _fetch_from_databento(self, key: DatabentoCacheKey) -> pd.DataFrame:
"""Fetch data from Databento API."""
data = self.client.timeseries.get_range(
dataset=key.dataset,
symbols=key.symbol,
schema=key.schema,
start=key.start_date.isoformat(),
end=(key.end_date + timedelta(days=1)).isoformat(), # Exclusive end
stype_in=self.config.stype_in,
)
df = data.to_df()
return df
def _df_to_daily_points(self, df: pd.DataFrame) -> list[DailyClosePoint]:
"""Convert DataFrame to DailyClosePoint list."""
points = []
for idx, row in df.iterrows():
# Databento ohlcv schema has ts_event as timestamp
ts = row.get("ts_event", row.get("ts_recv", idx))
if hasattr(ts, "date"):
row_date = ts.date()
else:
row_date = date.fromisoformat(str(ts)[:10])
close = float(row["close"]) / 1e9 # Databento prices are int64 x 1e-9
points.append(DailyClosePoint(date=row_date, close=close))
return sorted(points, key=lambda p: p.date)
def load_daily_closes(self, symbol: str, start_date: date, end_date: date) -> list[DailyClosePoint]:
"""Load daily closing prices from Databento (with caching)."""
# Map symbols to datasets
dataset = self._resolve_dataset(symbol)
databento_symbol = self._resolve_symbol(symbol)
key = DatabentoCacheKey(
dataset=dataset,
symbol=databento_symbol,
schema=self.config.schema,
start_date=start_date,
end_date=end_date,
)
# Try cache first
cached = self._load_from_cache(key)
if cached is not None:
return cached
# Fetch from Databento
import pandas as pd
df = self._fetch_from_databento(key)
# Cache results
self._save_to_cache(key, df)
return self._df_to_daily_points(df)
def _resolve_dataset(self, symbol: str) -> str:
"""Resolve symbol to Databento dataset."""
symbol_upper = symbol.upper()
if symbol_upper in ("GLD", "GLDM", "IAU"):
return "XNAS.BASIC" # ETFs on Nasdaq
elif symbol_upper in ("GC=F", "GC", "GOLD"):
return "GLBX.MDP3" # CME gold futures
elif symbol_upper == "XAU":
return "XNAS.BASIC" # Treat as GLD proxy
else:
return self.config.dataset # Use configured default
def _resolve_symbol(self, symbol: str) -> str:
"""Resolve vault-dash symbol to Databento symbol."""
symbol_upper = symbol.upper()
if symbol_upper == "XAU":
return "GLD" # Proxy XAU via GLD prices
elif symbol_upper == "GC=F":
return "GC" # Use parent symbol for continuous contracts
return symbol_upper
def get_cost_estimate(self, symbol: str, start_date: date, end_date: date) -> float:
"""Estimate cost in USD for a data request."""
dataset = self._resolve_dataset(symbol)
databento_symbol = self._resolve_symbol(symbol)
try:
cost = self.client.metadata.get_cost(
dataset=dataset,
symbols=databento_symbol,
schema=self.config.schema,
start=start_date.isoformat(),
end=(end_date + timedelta(days=1)).isoformat(),
)
return cost
except Exception:
return 0.0 # Return 0 if cost estimation fails
class DatabentoBacktestProvider:
"""Databento-backed historical provider for synthetic backtesting."""
provider_id = "databento_v1"
pricing_mode = "synthetic_bs_mid"
def __init__(
self,
price_source: DatabentoHistoricalPriceSource,
implied_volatility: float = 0.16,
risk_free_rate: float = 0.045,
) -> None:
self.price_source = price_source
self.implied_volatility = implied_volatility
self.risk_free_rate = risk_free_rate
def load_history(self, symbol: str, start_date: date, end_date: date) -> list[DailyClosePoint]:
return self.price_source.load_daily_closes(symbol, start_date, end_date)
# ... rest delegates to SyntheticHistoricalProvider logic
```
### Phase 2: Backtest Settings Model (DATA-DB-002)
**File:** `app/models/backtest_settings.py`
```python
from dataclasses import dataclass, field
from datetime import date
from uuid import UUID
from app.models.backtest import ProviderRef
@dataclass(frozen=True)
class BacktestSettings:
"""User-configurable backtest settings (independent of portfolio)."""
# Scenario identification
settings_id: UUID
name: str
# Data source configuration
data_source: str = "databento" # "databento", "yfinance", "synthetic"
dataset: str = "XNAS.BASIC"
schema: str = "ohlcv-1d"
# Date range
start_date: date = date(2024, 1, 1)
end_date: date = date(2024, 12, 31)
# Independent scenario configuration (not derived from portfolio)
underlying_symbol: str = "GLD"
start_price: float = 0.0 # 0 = auto-derive from first close
underlying_units: float = 1000.0 # Independent of portfolio
loan_amount: float = 0.0 # Debt position for LTV analysis
margin_call_ltv: float = 0.75
# Templates to test
template_slugs: tuple[str, ...] = field(default_factory=lambda: ("protective-put-atm-12m",))
# Provider reference
provider_ref: ProviderRef = field(default_factory=lambda: ProviderRef(
provider_id="databento_v1",
pricing_mode="synthetic_bs_mid",
))
# Cache metadata
cache_key: str = "" # Populated when data is fetched
data_cost_usd: float = 0.0 # Cost of last data fetch
```
### Phase 3: Cache Management (DATA-DB-003)
**File:** `app/services/backtesting/databento_cache.py`
```python
from __future__ import annotations
from dataclasses import dataclass
from datetime import date, timedelta
from pathlib import Path
import hashlib
import json
from app.services.backtesting.databento_source import DatabentoCacheKey
@dataclass
class CacheEntry:
"""Metadata for a cached Databento dataset."""
cache_key: DatabentoCacheKey
file_path: Path
download_date: date
size_bytes: int
cost_usd: float
class DatabentoCacheManager:
"""Manages Databento data cache lifecycle."""
def __init__(self, cache_dir: Path = Path(".cache/databento")) -> None:
self.cache_dir = cache_dir
self.cache_dir.mkdir(parents=True, exist_ok=True)
def list_entries(self) -> list[CacheEntry]:
"""List all cached entries."""
entries = []
for meta_file in self.cache_dir.glob("*_meta.json"):
with open(meta_file) as f:
meta = json.load(f)
cache_file = meta_file.with_name(meta_file.stem.replace("_meta", "") + ".parquet")
if cache_file.exists():
entries.append(CacheEntry(
cache_key=DatabentoCacheKey(
dataset=meta["dataset"],
symbol=meta["symbol"],
schema=meta["schema"],
start_date=date.fromisoformat(meta["start_date"]),
end_date=date.fromisoformat(meta["end_date"]),
),
file_path=cache_file,
download_date=date.fromisoformat(meta["download_date"]),
size_bytes=cache_file.stat().st_size,
cost_usd=0.0, # Would need to track separately
))
return entries
def invalidate_expired(self, max_age_days: int = 30) -> list[Path]:
"""Remove cache entries older than max_age_days."""
removed = []
cutoff = date.today() - timedelta(days=max_age_days)
for entry in self.list_entries():
if entry.download_date < cutoff:
entry.file_path.unlink(missing_ok=True)
meta_file = entry.file_path.with_name(entry.file_path.stem + "_meta.json")
meta_file.unlink(missing_ok=True)
removed.append(entry.file_path)
return removed
def clear_all(self) -> int:
"""Clear all cached data."""
count = 0
for file in self.cache_dir.glob("*"):
if file.is_file():
file.unlink()
count += 1
return count
def get_cache_size(self) -> int:
"""Get total cache size in bytes."""
return sum(f.stat().st_size for f in self.cache_dir.glob("*") if f.is_file())
def should_redownload(self, key: DatabentoCacheKey, params_changed: bool) -> bool:
"""Determine if data should be re-downloaded."""
cache_file = key.cache_path(self.cache_dir)
meta_file = key.metadata_path(self.cache_dir)
if params_changed:
return True
if not cache_file.exists() or not meta_file.exists():
return True
try:
with open(meta_file) as f:
meta = json.load(f)
download_date = date.fromisoformat(meta["download_date"])
age_days = (date.today() - download_date).days
return age_days > 30
except Exception:
return True
```
### Phase 4: Backtest Page UI Updates (DATA-DB-004)
**Key changes to `app/pages/backtests.py`:**
1. Add Databento configuration section
2. Add independent start price/units inputs
3. Show estimated data cost before fetching
4. Cache status indicator
```python
# In backtests.py
with ui.card().classes("w-full ..."):
ui.label("Data Source").classes("text-lg font-semibold")
data_source = ui.select(
{"databento": "Databento (historical market data)", "yfinance": "Yahoo Finance (free, limited)"},
value="databento",
label="Data source",
).classes("w-full")
# Databento-specific settings
with ui.column().classes("w-full gap-2").bind_visibility_from(data_source, "value", lambda v: v == "databento"):
ui.label("Dataset configuration").classes("text-sm text-slate-500")
dataset_select = ui.select(
{"XNAS.BASIC": "Nasdaq Basic (GLD)", "GLBX.MDP3": "CME Globex (GC=F)"},
value="XNAS.BASIC",
label="Dataset",
).classes("w-full")
schema_select = ui.select(
{"ohlcv-1d": "Daily bars", "ohlcv-1h": "Hourly bars"},
value="ohlcv-1d",
label="Resolution",
).classes("w-full")
# Cost estimate
cost_label = ui.label("Estimated cost: $0.00").classes("text-sm text-slate-500")
# Cache status
cache_status = ui.label("").classes("text-xs text-slate-400")
# Independent scenario settings
with ui.card().classes("w-full ..."):
ui.label("Scenario Configuration").classes("text-lg font-semibold")
ui.label("Configure start values independent of portfolio settings").classes("text-sm text-slate-500")
start_price_input = ui.number(
"Start price",
value=0.0,
min=0.0,
step=0.01,
).classes("w-full")
ui.label("Set to 0 to auto-derive from first historical close").classes("text-xs text-slate-400 -mt-2")
underlying_units_input = ui.number(
"Underlying units",
value=1000.0,
min=0.0001,
step=0.0001,
).classes("w-full")
loan_amount_input = ui.number(
"Loan amount ($)",
value=0.0,
min=0.0,
step=1000,
).classes("w-full")
```
### Phase 5: Scenario Pre-Seeding (DATA-DB-005)
**File:** `app/services/backtesting/scenario_bulk_download.py`
```python
from __future__ import annotations
from dataclasses import dataclass
from datetime import date
from pathlib import Path
import json
try:
import databento as db
DATABENTO_AVAILABLE = True
except ImportError:
DATABENTO_AVAILABLE = False
@dataclass
class ScenarioPreset:
"""Pre-configured scenario ready for backtesting."""
preset_id: str
display_name: str
symbol: str
dataset: str
window_start: date
window_end: date
default_start_price: float # First close in window
default_templates: tuple[str, ...]
event_type: str
tags: tuple[str, ...]
description: str
def download_historical_presets(
client: db.Historical,
presets: list[ScenarioPreset],
output_dir: Path,
) -> dict[str, Path]:
"""Bulk download historical data for all presets.
Returns mapping of preset_id to cached file path.
"""
results = {}
for preset in presets:
cache_key = DatabentoCacheKey(
dataset=preset.dataset,
symbol=preset.symbol,
schema="ohlcv-1d",
start_date=preset.window_start,
end_date=preset.window_end,
)
cache_file = cache_key.cache_path(output_dir)
# Download if not cached
if not cache_file.exists():
data = client.timeseries.get_range(
dataset=preset.dataset,
symbols=preset.symbol,
schema="ohlcv-1d",
start=preset.window_start.isoformat(),
end=preset.window_end.isoformat(),
)
data.to_parquet(cache_file)
results[preset.preset_id] = cache_file
return results
def create_default_presets() -> list[ScenarioPreset]:
"""Create default scenario presets for gold hedging research."""
return [
ScenarioPreset(
preset_id="gld-2020-covid-crash",
display_name="GLD March 2020 COVID Crash",
symbol="GLD",
dataset="XNAS.BASIC",
window_start=date(2020, 2, 15),
window_end=date(2020, 4, 15),
default_start_price=143.0, # Approx GLD close on 2020-02-15
default_templates=("protective-put-atm-12m", "protective-put-95pct-12m"),
event_type="crash",
tags=("covid", "crash", "high-vol"),
description="March 2020 COVID market crash - extreme volatility event",
),
ScenarioPreset(
preset_id="gld-2022-rate-hike-cycle",
display_name="GLD 2022 Rate Hike Cycle",
symbol="GLD",
dataset="XNAS.BASIC",
window_start=date(2022, 1, 1),
window_end=date(2022, 12, 31),
default_start_price=168.0,
default_templates=("protective-put-atm-12m", "ladder-50-50-atm-95pct-12m"),
event_type="rate_cycle",
tags=("rates", "fed", "extended"),
description="Full year 2022 - aggressive Fed rate hikes",
),
ScenarioPreset(
preset_id="gcf-2024-rally",
display_name="GC=F 2024 Gold Rally",
symbol="GC",
dataset="GLBX.MDP3",
window_start=date(2024, 1, 1),
window_end=date(2024, 12, 31),
default_start_price=2060.0,
default_templates=("protective-put-atm-12m",),
event_type="rally",
tags=("gold", "futures", "rally"),
description="Gold futures rally in 2024",
),
]
```
### Phase 6: Settings Persistence (DATA-DB-006)
**File:** `app/models/backtest_settings_repository.py`
```python
from dataclasses import asdict
from datetime import date
from pathlib import Path
from uuid import UUID, uuid4
import json
from app.models.backtest_settings import BacktestSettings
class BacktestSettingsRepository:
"""Persistence for backtest settings."""
def __init__(self, base_path: Path | None = None) -> None:
self.base_path = base_path or Path(".workspaces")
def _settings_path(self, workspace_id: str) -> Path:
return self.base_path / workspace_id / "backtest_settings.json"
def load(self, workspace_id: str) -> BacktestSettings:
"""Load backtest settings, creating defaults if not found."""
path = self._settings_path(workspace_id)
if path.exists():
with open(path) as f:
data = json.load(f)
return BacktestSettings(
settings_id=UUID(data["settings_id"]),
name=data.get("name", "Default Backtest"),
data_source=data.get("data_source", "databento"),
dataset=data.get("dataset", "XNAS.BASIC"),
schema=data.get("schema", "ohlcv-1d"),
start_date=date.fromisoformat(data["start_date"]),
end_date=date.fromisoformat(data["end_date"]),
underlying_symbol=data.get("underlying_symbol", "GLD"),
start_price=data.get("start_price", 0.0),
underlying_units=data.get("underlying_units", 1000.0),
loan_amount=data.get("loan_amount", 0.0),
margin_call_ltv=data.get("margin_call_ltv", 0.75),
template_slugs=tuple(data.get("template_slugs", ("protective-put-atm-12m",))),
cache_key=data.get("cache_key", ""),
data_cost_usd=data.get("data_cost_usd", 0.0),
)
# Return defaults
return BacktestSettings(
settings_id=uuid4(),
name="Default Backtest",
)
def save(self, workspace_id: str, settings: BacktestSettings) -> None:
"""Persist backtest settings."""
path = self._settings_path(workspace_id)
path.parent.mkdir(parents=True, exist_ok=True)
data = asdict(settings)
data["settings_id"] = str(data["settings_id"])
data["start_date"] = data["start_date"].isoformat()
data["end_date"] = data["end_date"].isoformat()
data["template_slugs"] = list(data["template_slugs"])
data["provider_ref"] = {
"provider_id": settings.provider_ref.provider_id,
"pricing_mode": settings.provider_ref.pricing_mode,
}
with open(path, "w") as f:
json.dump(data, f, indent=2)
```
## Roadmap Items
### DATA-DB-001: Databento Historical Price Source
**Dependencies:** None
**Estimated effort:** 2-3 days
**Deliverables:**
- `app/services/backtesting/databento_source.py`
- `tests/test_databento_source.py` (mocked API)
- Environment variable `DATABENTO_API_KEY` support
### DATA-DB-002: Backtest Settings Model
**Dependencies:** None
**Estimated effort:** 1 day
**Deliverables:**
- `app/models/backtest_settings.py`
- Repository for persistence
### DATA-DB-003: Cache Management
**Dependencies:** DATA-DB-001
**Estimated effort:** 1 day
**Deliverables:**
- `app/services/backtesting/databento_cache.py`
- Cache cleanup CLI command
### DATA-DB-004: Backtest Page UI Updates
**Dependencies:** DATA-DB-001, DATA-DB-002
**Estimated effort:** 2 days
**Deliverables:**
- Updated `app/pages/backtests.py`
- Updated `app/pages/event_comparison.py`
- Cost estimation display
### DATA-DB-005: Scenario Pre-Seeding
**Dependencies:** DATA-DB-001
**Estimated effort:** 1-2 days
**Deliverables:**
- `app/services/backtesting/scenario_bulk_download.py`
- Pre-configured presets for gold hedging research
- Bulk download script
### DATA-DB-006: Options Data Source (Future)
**Dependencies:** DATA-DB-001
**Estimated effort:** 3-5 days
**Deliverables:**
- `DatabentoOptionSnapshotSource` implementing `OptionSnapshotSource`
- OPRA.PILLAR integration for historical options chains
## Configuration
Add to `.env`:
```
DATABENTO_API_KEY=db-xxxxxxxxxxxxxxxxxxxxxxxx
```
Add to `requirements.txt`:
```
databento>=0.30.0
```
Add to `pyproject.toml`:
```toml
[project.optional-dependencies]
databento = ["databento>=0.30.0"]
```
## Testing Strategy
1. **Unit tests** with mocked Databento responses (`tests/test_databento_source.py`)
2. **Integration tests** with recorded VCR cassettes (`tests/cassettes/*.yaml`)
3. **E2E tests** using cached data (`tests/test_backtest_databento_playwright.py`)
## Cost Management
- Use `metadata.get_cost()` before fetching to show estimated cost
- Default to cached data when available
- Batch download for large historical ranges (>1 year)
- Consider Databento flat rate plans for heavy usage
## Security Considerations
- API key stored in environment variable, never in code
- Cache files contain only market data (no PII)
- Rate limiting respected (100 requests/second per IP)

View File

@@ -1,5 +1,5 @@
version: 1 version: 1
updated_at: 2026-03-27 updated_at: 2026-03-28
structure: structure:
backlog_dir: docs/roadmap/backlog backlog_dir: docs/roadmap/backlog
in_progress_dir: docs/roadmap/in-progress in_progress_dir: docs/roadmap/in-progress
@@ -13,14 +13,20 @@ notes:
- Pre-alpha policy: we may cut or replace old features without backward compatibility until alpha is declared. - Pre-alpha policy: we may cut or replace old features without backward compatibility until alpha is declared.
- Alpha migration policy: once alpha is declared, compatibility only needs to move forward; backward migrations are not required. - Alpha migration policy: once alpha is declared, compatibility only needs to move forward; backward migrations are not required.
priority_queue: priority_queue:
- DATA-DB-001
- DATA-DB-002
- DATA-DB-004
- CONV-001 - CONV-001
- EXEC-002 - EXEC-002
- DATA-DB-003
- DATA-DB-005
- DATA-002A - DATA-002A
- DATA-001A - DATA-001A
- OPS-001 - OPS-001
- BT-003 - BT-003
- BT-002A - BT-002A
- GCF-001 - GCF-001
- DATA-DB-006
recently_completed: recently_completed:
- PORTFOLIO-003 - PORTFOLIO-003
- PORTFOLIO-002 - PORTFOLIO-002
@@ -44,6 +50,12 @@ recently_completed:
- CORE-002B - CORE-002B
states: states:
backlog: backlog:
- DATA-DB-001
- DATA-DB-002
- DATA-DB-003
- DATA-DB-004
- DATA-DB-005
- DATA-DB-006
- CONV-001 - CONV-001
- EXEC-002 - EXEC-002
- DATA-002A - DATA-002A

View File

@@ -0,0 +1,34 @@
id: DATA-DB-001
title: Databento Historical Price Source
status: backlog
priority: high
dependencies: []
estimated_effort: 2-3 days
created: 2026-03-28
updated: 2026-03-28
description: |
Integrate Databento historical API as a data source for backtesting and scenario
comparison pages. This replaces yfinance for historical data on backtest pages
and provides reliable, high-quality market data.
acceptance_criteria:
- DatabentoHistoricalPriceSource implements HistoricalPriceSource protocol
- Cache layer prevents redundant downloads when parameters unchanged
- Environment variable DATABENTO_API_KEY used for authentication
- Cost estimation available before data fetch
- GLD symbol resolved to XNAS.BASIC dataset
- GC=F symbol resolved to GLBX.MDP3 dataset
- Unit tests with mocked Databento responses pass
implementation_notes: |
Key files:
- app/services/backtesting/databento_source.py (new)
- tests/test_databento_source.py (new)
Uses ohlcv-1d schema for daily bars. The cache key includes dataset, symbol,
schema, start_date, and end_date. Cache files are Parquet format for fast
loading. Metadata includes download_date for age validation.
dependencies_detail:
- None - this is the foundation for Databento integration

View File

@@ -0,0 +1,39 @@
id: DATA-DB-002
title: Backtest Settings Model
status: backlog
priority: high
dependencies:
- DATA-DB-001
estimated_effort: 1 day
created: 2026-03-28
updated: 2026-03-28
description: |
Create BacktestSettings model that captures user-configurable backtest parameters
independent of portfolio settings. This allows running scenarios with custom start
prices and position sizes without modifying the main portfolio.
acceptance_criteria:
- BacktestSettings dataclass defined with all necessary fields
- start_price can be 0 (auto-derive) or explicit value
- underlying_units independent of portfolio.gold_ounces
- loan_amount and margin_call_ltv for LTV analysis
- data_source field supports "databento" and "yfinance"
- Repository persists settings per workspace
- Default settings created for new workspaces
implementation_notes: |
Key fields:
- settings_id: UUID for tracking
- data_source: "databento" | "yfinance" | "synthetic"
- dataset: "XNAS.BASIC" | "GLBX.MDP3"
- underlying_symbol: "GLD" | "GC" | "XAU"
- start_date, end_date: date range
- start_price: 0 for auto-derive, or explicit
- underlying_units: position size for scenario
- loan_amount: debt level for LTV analysis
Settings are stored in .workspaces/{workspace_id}/backtest_settings.json
dependencies_detail:
- DATA-DB-001: Need data source configuration fields

View File

@@ -0,0 +1,40 @@
id: DATA-DB-003
title: Databento Cache Management
status: backlog
priority: medium
dependencies:
- DATA-DB-001
estimated_effort: 1 day
created: 2026-03-28
updated: 2026-03-28
description: |
Implement cache lifecycle management for Databento data. Cache files should be
invalidated after configurable age (default 30 days) and when request parameters
change. Provide CLI tool for cache inspection and cleanup.
acceptance_criteria:
- DatabentoCacheManager lists all cached entries
- Entries invalidated after max_age_days
- Parameters change detection triggers re-download
- Cache size tracking available
- CLI command to clear all cache
- CLI command to show cache statistics
implementation_notes: |
Cache files stored in .cache/databento/:
- dbn_{hash}.parquet: Data file
- dbn_{hash}_meta.json: Metadata (download_date, params, rows)
Cache invalidation rules:
1. Age > 30 days: re-download
2. Parameters changed: re-download
3. File corruption: re-download
CLI commands:
- vault-dash cache list
- vault-dash cache clear
- vault-dash cache stats
dependencies_detail:
- DATA-DB-001: Needs DatabentoCacheKey structure

View File

@@ -0,0 +1,50 @@
id: DATA-DB-004
title: Backtest Page UI Updates
status: backlog
priority: high
dependencies:
- DATA-DB-001
- DATA-DB-002
estimated_effort: 2 days
created: 2026-03-28
updated: 2026-03-28
description: |
Update backtest and event comparison pages to support Databento data source
and independent scenario configuration. Show estimated data cost and cache
status in the UI.
acceptance_criteria:
- Data source selector shows Databento and yFinance options
- Databento config shows dataset and resolution dropdowns
- Dataset selection updates cost estimate display
- Cache status shows age of cached data
- Independent start price input (0 = auto-derive)
- Independent underlying units and loan amount
- Event comparison page uses same data source config
- Settings persist across sessions
implementation_notes: |
Page changes:
Backtests page:
- Add "Data Source" section with Databento/yFinance toggle
- Add dataset selector (XNAS.BASIC for GLD, GLBX.MDP3 for GC=F)
- Add resolution selector (ohlcv-1d, ohlcv-1h)
- Show estimated cost with refresh button
- Show cache status (age, size)
- "Configure Scenario" section with independent start price/units
Event comparison page:
- Same data source configuration
- Preset scenarios show if data cached
- Cost estimate for missing data
State management:
- Use workspace-level BacktestSettings
- Load on page mount, save on change
- Invalidate cache when params change
dependencies_detail:
- DATA-DB-001: Need DatabentoHistoricalPriceSource
- DATA-DB-002: Need BacktestSettings model

View File

@@ -0,0 +1,48 @@
id: DATA-DB-005
title: Scenario Pre-Seeding from Bulk Downloads
status: backlog
priority: medium
dependencies:
- DATA-DB-001
estimated_effort: 1-2 days
created: 2026-03-28
updated: 2026-03-28
description: |
Create pre-configured scenario presets for gold hedging research and implement
bulk download capability to pre-seed event comparison pages. This allows quick
testing against historical events without per-event data fetching.
acceptance_criteria:
- Default presets include COVID crash, rate hike cycle, gold rally events
- Bulk download script fetches all preset data
- Presets stored in config file (JSON/YAML)
- Event comparison page shows preset data availability
- One-click "Download All Presets" button
- Progress indicator during bulk download
implementation_notes: |
Default presets:
- GLD March 2020 COVID Crash (extreme volatility)
- GLD 2022 Rate Hike Cycle (full year)
- GC=F 2024 Gold Rally (futures data)
Bulk download flow:
1. Create batch job for each preset
2. Show progress per preset
3. Store in cache directory
4. Update preset availability status
Preset format:
- preset_id: unique identifier
- display_name: human-readable name
- symbol: GLD, GC, etc.
- dataset: Databento dataset
- window_start/end: date range
- default_start_price: first close
- default_templates: hedging strategies
- event_type: crash, rally, rate_cycle
- tags: for filtering
dependencies_detail:
- DATA-DB-001: Needs cache infrastructure

View File

@@ -0,0 +1,46 @@
id: DATA-DB-006
title: Databento Options Data Source
status: backlog
priority: low
dependencies:
- DATA-DB-001
estimated_effort: 3-5 days
created: 2026-03-28
updated: 2026-03-28
description: |
Implement historical options data source using Databento's OPRA.PILLAR dataset.
This enables historical options chain lookups for accurate backtesting with
real options prices, replacing synthetic Black-Scholes pricing.
acceptance_criteria:
- DatabentoOptionSnapshotSource implements OptionSnapshotSource protocol
- OPRA.PILLAR dataset used for GLD/SPY options
- Option chain lookup by snapshot_date and symbol
- Strike and expiry filtering supported
- Cached per-date for efficiency
- Fallback to synthetic pricing when data unavailable
implementation_notes: |
OPRA.PILLAR provides consolidated options data from all US options exchanges.
Key challenges:
1. OPRA data volume is large - need efficient caching
2. Option symbology differs from regular symbols
3. Need strike/expiry resolution in symbology
Implementation approach:
- Use 'definition' schema to get instrument metadata
- Use 'trades' or 'ohlcv-1d' for price history
- Cache per (symbol, expiration, strike, option_type, date)
- Use continuous contracts for futures options (GC=F)
Symbology:
- GLD options: Use underlying symbol "GLD" with OPRA
- GC options: Use parent symbology "GC" for continuous contracts
This is a future enhancement - not required for initial backtesting
which uses synthetic Black-Scholes pricing.
dependencies_detail:
- DATA-DB-001: Needs base cache infrastructure