docs: add Databento integration plan and roadmap items

This commit is contained in:
Bu5hm4nn
2026-03-29 09:52:06 +02:00
parent 8079ca58e7
commit c02159481d
8 changed files with 1050 additions and 1 deletions

View File

@@ -0,0 +1,34 @@
id: DATA-DB-001
title: Databento Historical Price Source
status: backlog
priority: high
dependencies: []
estimated_effort: 2-3 days
created: 2026-03-28
updated: 2026-03-28
description: |
Integrate Databento historical API as a data source for backtesting and scenario
comparison pages. This replaces yfinance for historical data on backtest pages
and provides reliable, high-quality market data.
acceptance_criteria:
- DatabentoHistoricalPriceSource implements HistoricalPriceSource protocol
- Cache layer prevents redundant downloads when parameters unchanged
- Environment variable DATABENTO_API_KEY used for authentication
- Cost estimation available before data fetch
- GLD symbol resolved to XNAS.BASIC dataset
- GC=F symbol resolved to GLBX.MDP3 dataset
- Unit tests with mocked Databento responses pass
implementation_notes: |
Key files:
- app/services/backtesting/databento_source.py (new)
- tests/test_databento_source.py (new)
Uses ohlcv-1d schema for daily bars. The cache key includes dataset, symbol,
schema, start_date, and end_date. Cache files are Parquet format for fast
loading. Metadata includes download_date for age validation.
dependencies_detail:
- None - this is the foundation for Databento integration

View File

@@ -0,0 +1,39 @@
id: DATA-DB-002
title: Backtest Settings Model
status: backlog
priority: high
dependencies:
- DATA-DB-001
estimated_effort: 1 day
created: 2026-03-28
updated: 2026-03-28
description: |
Create BacktestSettings model that captures user-configurable backtest parameters
independent of portfolio settings. This allows running scenarios with custom start
prices and position sizes without modifying the main portfolio.
acceptance_criteria:
- BacktestSettings dataclass defined with all necessary fields
- start_price can be 0 (auto-derive) or explicit value
- underlying_units independent of portfolio.gold_ounces
- loan_amount and margin_call_ltv for LTV analysis
- data_source field supports "databento" and "yfinance"
- Repository persists settings per workspace
- Default settings created for new workspaces
implementation_notes: |
Key fields:
- settings_id: UUID for tracking
- data_source: "databento" | "yfinance" | "synthetic"
- dataset: "XNAS.BASIC" | "GLBX.MDP3"
- underlying_symbol: "GLD" | "GC" | "XAU"
- start_date, end_date: date range
- start_price: 0 for auto-derive, or explicit
- underlying_units: position size for scenario
- loan_amount: debt level for LTV analysis
Settings are stored in .workspaces/{workspace_id}/backtest_settings.json
dependencies_detail:
- DATA-DB-001: Need data source configuration fields

View File

@@ -0,0 +1,40 @@
id: DATA-DB-003
title: Databento Cache Management
status: backlog
priority: medium
dependencies:
- DATA-DB-001
estimated_effort: 1 day
created: 2026-03-28
updated: 2026-03-28
description: |
Implement cache lifecycle management for Databento data. Cache files should be
invalidated after configurable age (default 30 days) and when request parameters
change. Provide CLI tool for cache inspection and cleanup.
acceptance_criteria:
- DatabentoCacheManager lists all cached entries
- Entries invalidated after max_age_days
- Parameters change detection triggers re-download
- Cache size tracking available
- CLI command to clear all cache
- CLI command to show cache statistics
implementation_notes: |
Cache files stored in .cache/databento/:
- dbn_{hash}.parquet: Data file
- dbn_{hash}_meta.json: Metadata (download_date, params, rows)
Cache invalidation rules:
1. Age > 30 days: re-download
2. Parameters changed: re-download
3. File corruption: re-download
CLI commands:
- vault-dash cache list
- vault-dash cache clear
- vault-dash cache stats
dependencies_detail:
- DATA-DB-001: Needs DatabentoCacheKey structure

View File

@@ -0,0 +1,50 @@
id: DATA-DB-004
title: Backtest Page UI Updates
status: backlog
priority: high
dependencies:
- DATA-DB-001
- DATA-DB-002
estimated_effort: 2 days
created: 2026-03-28
updated: 2026-03-28
description: |
Update backtest and event comparison pages to support Databento data source
and independent scenario configuration. Show estimated data cost and cache
status in the UI.
acceptance_criteria:
- Data source selector shows Databento and yFinance options
- Databento config shows dataset and resolution dropdowns
- Dataset selection updates cost estimate display
- Cache status shows age of cached data
- Independent start price input (0 = auto-derive)
- Independent underlying units and loan amount
- Event comparison page uses same data source config
- Settings persist across sessions
implementation_notes: |
Page changes:
Backtests page:
- Add "Data Source" section with Databento/yFinance toggle
- Add dataset selector (XNAS.BASIC for GLD, GLBX.MDP3 for GC=F)
- Add resolution selector (ohlcv-1d, ohlcv-1h)
- Show estimated cost with refresh button
- Show cache status (age, size)
- "Configure Scenario" section with independent start price/units
Event comparison page:
- Same data source configuration
- Preset scenarios show if data cached
- Cost estimate for missing data
State management:
- Use workspace-level BacktestSettings
- Load on page mount, save on change
- Invalidate cache when params change
dependencies_detail:
- DATA-DB-001: Need DatabentoHistoricalPriceSource
- DATA-DB-002: Need BacktestSettings model

View File

@@ -0,0 +1,48 @@
id: DATA-DB-005
title: Scenario Pre-Seeding from Bulk Downloads
status: backlog
priority: medium
dependencies:
- DATA-DB-001
estimated_effort: 1-2 days
created: 2026-03-28
updated: 2026-03-28
description: |
Create pre-configured scenario presets for gold hedging research and implement
bulk download capability to pre-seed event comparison pages. This allows quick
testing against historical events without per-event data fetching.
acceptance_criteria:
- Default presets include COVID crash, rate hike cycle, gold rally events
- Bulk download script fetches all preset data
- Presets stored in config file (JSON/YAML)
- Event comparison page shows preset data availability
- One-click "Download All Presets" button
- Progress indicator during bulk download
implementation_notes: |
Default presets:
- GLD March 2020 COVID Crash (extreme volatility)
- GLD 2022 Rate Hike Cycle (full year)
- GC=F 2024 Gold Rally (futures data)
Bulk download flow:
1. Create batch job for each preset
2. Show progress per preset
3. Store in cache directory
4. Update preset availability status
Preset format:
- preset_id: unique identifier
- display_name: human-readable name
- symbol: GLD, GC, etc.
- dataset: Databento dataset
- window_start/end: date range
- default_start_price: first close
- default_templates: hedging strategies
- event_type: crash, rally, rate_cycle
- tags: for filtering
dependencies_detail:
- DATA-DB-001: Needs cache infrastructure

View File

@@ -0,0 +1,46 @@
id: DATA-DB-006
title: Databento Options Data Source
status: backlog
priority: low
dependencies:
- DATA-DB-001
estimated_effort: 3-5 days
created: 2026-03-28
updated: 2026-03-28
description: |
Implement historical options data source using Databento's OPRA.PILLAR dataset.
This enables historical options chain lookups for accurate backtesting with
real options prices, replacing synthetic Black-Scholes pricing.
acceptance_criteria:
- DatabentoOptionSnapshotSource implements OptionSnapshotSource protocol
- OPRA.PILLAR dataset used for GLD/SPY options
- Option chain lookup by snapshot_date and symbol
- Strike and expiry filtering supported
- Cached per-date for efficiency
- Fallback to synthetic pricing when data unavailable
implementation_notes: |
OPRA.PILLAR provides consolidated options data from all US options exchanges.
Key challenges:
1. OPRA data volume is large - need efficient caching
2. Option symbology differs from regular symbols
3. Need strike/expiry resolution in symbology
Implementation approach:
- Use 'definition' schema to get instrument metadata
- Use 'trades' or 'ohlcv-1d' for price history
- Cache per (symbol, expiration, strike, option_type, date)
- Use continuous contracts for futures options (GC=F)
Symbology:
- GLD options: Use underlying symbol "GLD" with OPRA
- GC options: Use parent symbology "GC" for continuous contracts
This is a future enhancement - not required for initial backtesting
which uses synthetic Black-Scholes pricing.
dependencies_detail:
- DATA-DB-001: Needs base cache infrastructure