Development Audit Brief
Audit earnings-event-vol as an earnings option event-variance research
codebase.
Binding Scope
The active project tests whether models improve trading decisions around option-implied earnings event variance mispricing.
Do not audit it as:
- a generic implied-volatility forecasting project;
- a generic options-pricing surface reconstruction project;
- a paper-grade NBBO backtest already.
The current implemented route is a no-NBBO trade-price proxy. It is useful for pipeline validation and signal screening, not final execution claims.
Required Checks
Project identity and workflow:
pyproject.tomlusesearnings-event-vol.- Active package is
src/earnings_event_vol. justfilecallspython -m earnings_event_vol.cli.- Public command surface stays small:
status,audit,data,research,docs, andcheck. just checkremains the handoff gate.- The test gate enforces at least 95% coverage.
Credential safety:
- Massive secrets are file-only.
MASSIVE_API_KEY_FILEandMASSIVE_FLAT_FILE_KEY_FILEpoint to local secret files.- Source probes never print secret values.
- Direct API-key values are not accepted as the primary path in source, docs, or tests.
Data route and labeling:
- Current data outputs are labeled
no_nbbo_trade_proxyandpaper_grade=false. - Second aggregates are described as trade-price OHLCV bars, not bid/ask or NBBO quotes.
- Market-data inputs are separated correctly: options day aggregates for universe/contract/IV proxy/exit/sequence construction, underlying day aggregates for vendor OHLC opens, C2O/C2C/O2C targets, and exit spot, and targeted option one-second aggregates for entry proxy pricing.
- Second-aggregate entry cache keeps only the pre-cutoff buffer, default 60 minutes before event cutoff; entry price selection uses the true per-leg volume-weighted option VWAP over the final 900 seconds before cutoff.
- The option open anchor is unified as trade-aggregate 5-15 minute post-open VWAP. It is the primary C2O exit proxy and the O2C diagnostic entry proxy; 0-5 minute VWAP remains only an opening microstructure stress test.
- O2C proxy PnL is diagnostic only unless a post-open residual-IV baseline is added.
- C2C exit diagnostics use same-contract option VWAP over the final 15 minutes before the exit-date close as the primary mark. Same-contract option day-aggregate close is not a strategy-exit fallback. Intrinsic fallback is flagged when the exit-preclose trade-aggregate mark is missing/unusable or when the option expires on the exit date.
- The current default proxy range is the observed entitlement range
2022-12-01to2025-12-31; 2013-2025 remains the target paper range, not the current completed result. - Universe construction filters ETF/index/non-single-name symbols before selecting the monthly top 50.
- SEC EDGAR submissions plus SEC primary filing documents are the primary earnings candidate and text-validation route.
- Massive 8-K text is auxiliary fallback only.
Event variance construction:
- AMC and BMO event windows use the documented pre-announcement close.
- DMH and unknown timing are excluded from the first main sample.
- Target construction writes C2O, C2C, and O2C realized-variance columns.
rvar_eventremains the documented close-to-close alias for V1 proxy PnL.- C2O is not described as executable option PnL.
IVAR_eventis extracted from total ATM implied variance across two expiries that cover the realized event window.- Negative and nonmonotone IVAR extractions are flagged and reported.
- DTE losses and IVAR failures are surfaced in manifests and reports.
Leakage and timestamp gates:
- Every feature row has an as-of timestamp at or before event entry.
- Timezones are explicit and consistent; naive/aware mixtures fail closed.
- Same-event realized fields, post-event fields, and vendor forecast leakage are excluded unless explicitly whitelisted for diagnostics.
- Temporal splits are chronological or walk-forward, not random.
Model and research package:
- Market-implied IVAR is always the primary benchmark.
- Last-four RVAR, last-four IVAR, Goyal-Saretto-style RV-IV spread, Elastic Net, LightGBM/XGBoost, FT-Transformer, and sequence diagnostics are compared only when their callable implementations and diagnostics exist.
- Model registry
implementedflags must match callable behavior. - Sequence-model results report coverage, drop rate, and mask-only ablation.
- High sequence-selection risk is surfaced and not hidden in headline claims.
- Forecast metrics, ranking metrics, strategy metrics, cost sensitivity,
inference diagnostics, and model-fit diagnostics are written under
artifacts/modeling/.
Backtest and execution claims:
- Proxy strategy summaries are cost-aware screening diagnostics.
- Full bid-ask crossing or NBBO execution is a future paper-grade requirement, not a claim supported by the current proxy route.
- Long ATM straddle and short iron fly remain the v1 headline strategy designs.
- Calendar spread is labeled as second-stage relative value.
- Multi-leg fills document simultaneous-fill assumptions and legging-risk limitations.
Output
Report findings in this order:
- Blockers.
- Leakage or timestamp risks.
- Data-source, entitlement, or credential risks.
- Model and research-package completeness gaps.
- Backtest and execution-claim gaps.
- Documentation drift.