Earnings Event Vol
Reproducible research pipeline for U.S. equity-options earnings event variance forecasting and risk-defined option backtests.
Research Question
This is not a generic implied-volatility forecasting project. The paper-facing question is:
Can models improve trading decisions around option-implied earnings event variance mispricing?
The realized-variance target system is decomposed into three labels:
jump_c2o = close-to-open earnings jump variance
day_c2c = close-to-close full reaction-day variance
reaction_o2c = open-to-close post-open digestion variance
The market benchmark is the event variance implied by short-dated options:
IVAR_event
C2C ex post mispricing is:
RVAR_event_day_c2c - IVAR_event
The V1 strategy/PnL layer uses day_c2c only. jump_c2o is the primary
scientific forecast/ranking target, but it is not reported as executable option
PnL in the current no-NBBO proxy run. Trading decisions are evaluated in premium
space. A raw variance forecast is not enough; expected strategy value must beat
market entry cost and transaction cost estimates.
Current State
Verified local state on 2026-05-12:
just databuilds the active no-NBBO proxy data pipeline.just researchbuilds the canonical V5 proxy feature/model/report package from the current trade-proxy event panel. The current paper-facing snapshot uses the canonical tuned protocol.just mamba-installinstalls the local CUDA Mamba wheels andjust mamba-doctorverifies the officialmamba-ssmruntime.- Current data range is
2022-12-01through2025-12-31, because the observed Massive options day-aggregate entitlement in this workspace starts in 2022. - The target paper range remains 2013-2025, but that needs upgraded historical option data entitlement or another licensed options route.
- All current trade-price results are
panel_grade=no_nbbo_trade_proxyandpaper_grade=false.
Latest proxy data artifacts:
- Dynamic calendar: 1,054 SEC-first candidate rows; 810 BMO/AMC main-sample candidates after universe and text-validation filters.
- Trade-proxy panel: 810 events, 801 with the backward-compatible C2C
rvar_eventalias, 693 with trade-proxyIVAR_event. - Proxy contracts: 12,038 candidates; 10,165 with usable pre-cutoff second-aggregate prices.
- Proxy straddle diagnostics: 779 rows; mean gross C2C primary exit-preclose VWAP proxy PnL about -100.72 USD, mean haircut proxy PnL about -250.54 USD.
Latest proxy modeling artifacts:
- Feature matrix: 810 rows.
- Models evaluated: market-implied IVAR, last-four RVAR, last-four IVAR, Goyal-Saretto-style RV-IV spread, Elastic Net, LightGBM, XGBoost, a LightGBM/XGBoost rank-average ensemble, FT-Transformer, and the V5 sequence diagnostic suite.
- Current tuned protocol: the canonical tuned-only research protocol. Hyperparameter selection uses train and locked-validation rows only, then evaluates locked test rows once. Paired original tabular and single-seed sequence rows are no longer emitted.
- Full sequence diagnostic suite: ridge-flat sequence aggregates, 5-seed BiGRU,
5-seed official bidirectional
mamba-ssm, attention pooling, non-causal dilated CNN, mask-only, and deterministic time-shuffle controls. - Sequence audit: 678 eligible events out of 810 under the default path coverage rule; flagged as high sequence-selection risk.
FT-Transformerrefers to the validation-tuned tabular transformer specification.- The active canonical outputs use the default
fe_v2_sec_xbrlschema, but the same-code FE V1 versus FE V2 ablation is negative for FE V2. In FE V2, the strongestjump_c2oAUC is the Goyal-Saretto-style spread at about 0.602, and the positiveday_c2cridge-flat sequence PnL of about 19,918 USD is diagnostic because the sequence gate does not pass. - The stronger current sell is the
fe_v1_legacysame-code ablation: LightGBM reachesjump_c2oAUC about 0.677, XGBoost has bestjump_c2oOOS R2 versus IVAR at about 0.375, and LightGBM leads theday_c2cheadline proxy strategy at about 53,664 USD net PnL. This is signal-screening evidence, not a paper-grade executable trading result. reaction_o2cis now included in the V5 proxy model artifacts as a diagnostic target. Ridge-flat sequence aggregates lead O2C AUC at about 0.799; among the tabular rows, XGBoost leads at about 0.768. O2C uses full-eventIVAR_eventonly as a weak comparator and all O2C strategy rows remainpnl_headline_eligible=false.- The full sequence diagnostic suite has not passed the common-row bootstrap
gate in the current proxy evidence: the 5-seed official
mamba-ssmrow hasjump_c2oAUC about 0.501 and negativeday_c2cproxy PnL. Sequence rows remain diagnostic and do not upgrade the claim.
Command Surface
Use just as the public command surface:
just status
just check
just mamba-doctor
just mamba-install
just data args="--dry-run"
just data
just research
just research-report
just docs
just check formats, fixes lint, runs mypy, pytest, MkDocs strict build,
status, and source probes.
just data runs the active proxy-all DAG:
options-day-aggs-bulk -> universe -> dynamic-calendar -> sec-companyfacts
-> event-window-panel -> contract-reference-validation -> trade-proxy-panel
Default data parameters:
- study range:
2022-12-01to2025-12-31; - universe lookback: from
2022-06-01; - monthly top 50 liquid U.S. single-name option underlyings;
- DTE
3-21, supporting the main5-14sample and robustness window; - market data route:
- options day aggregates for universe liquidity ranking, contract discovery, local IV/IVAR proxy inputs, same-contract option exit closes, and the 20-day close-trade-implied option-surface sequence;
- underlying stock day aggregates for underlying closes, vendor OHLC opens, C2O/C2C/O2C event returns, and exit spot;
- targeted Massive option second aggregates from
/range/1/second/<date>/<date>for the entry proxy. - entry proxy window: keep only bars in the resolved pre-cutoff buffer,
default 60 minutes before the event cutoff, then compute the true per-leg
volume-weighted
option_vwapover the final 900 seconds. - The option-proxy open anchor is unified as same-contract option VWAP from 5-15 minutes after open. C2O uses it as the primary post-open exit proxy; O2C uses the same mark as the diagnostic post-open entry proxy. The 0-5 minute VWAP remains an opening-microstructure stress test.
- second aggregates are trade OHLCV bars, not quote, bid/ask, or NBBO data; the primary C2C exit proxy is same-contract option VWAP over the final 15 minutes before the exit-date close. Same-contract option day-aggregate close is retained only as fallback/diagnostic.
- SEC CompanyFacts is public XBRL financial-statement data. The active stage
uses CIK-mapped CompanyFacts with conservative as-of gating:
acceptanceDateTime <= feature_asof_timestampwhen available, otherwisefiled < feature_asof_date.
just research does not download market data. For the paper-facing snapshot,
run the canonical tuned proxy package with the full sequence diagnostic suite
and 1,000 bootstrap iterations:
just research args="--stage all --sequence-suite all --allow-high-sequence-risk --bootstrap-iter 1000 --tuning-profile tuned_phase1 --feature-schema-version fe_v2_sec_xbrl"
In the canonical tuned protocol, Optuna objectives and ElasticNetCV read only train and
locked validation rows. The selected hyperparameters are refit on
train+validation, and locked test rows are evaluated once after selection.
Paired original rows are intentionally not emitted.
The default feature schema is fe_v2_sec_xbrl. It uses the resolved
artifacts/modeling/feature_schema_report.csv as the model-feature allowlist,
excludes raw IDs and outcome/exit/PnL fields, adds point-in-time rolling
same-ticker earnings history, SEC XBRL fundamentals, train-fitted rank/z-score
features, and single-name run-up/surface proxy features. fe_v1_legacy remains
available only for same-code feature-ablation reruns.
It consumes the current proxy panel, builds features, trains/evaluates models,
writes metrics, writes reports/modeling/proxy_research_report.md, regenerates
reports/modeling/figures/*.png, and syncs those figures into
docs/assets/images/modeling/.
just research-report regenerates only the generated report and figure assets
from existing modeling artifacts. The curated reader-facing
docs/results_snapshot.md is intentionally manual: update it when a run changes
the paper-facing tables or interpretation, then run just check.
Key Outputs
Data pipeline:
artifacts/data_pipeline/data_pipeline_manifest.jsonartifacts/data_pipeline/universe/universe_manifest.jsonartifacts/data_pipeline/dynamic_calendar/earnings_calendar_report.jsonartifacts/data_pipeline/sec_companyfacts/sec_companyfacts_manifest.jsonartifacts/data_pipeline/sec_companyfacts/sec_companyfacts_diagnostics.csvartifacts/data_pipeline/trade_proxy_panel/trade_proxy_panel_report.json$GOLD_DATA_DIR/event_panel/trade_proxy_event_panel.parquet
Research package:
$GOLD_DATA_DIR/modeling/feature_matrix.parquetartifacts/modeling/feature_schema_report.csvartifacts/modeling/feature_transform_params.jsonartifacts/modeling/forecast_metrics.csvartifacts/modeling/ranking_metrics.csvartifacts/modeling/strategy_metrics.csvartifacts/modeling/model_fit_diagnostics.csvartifacts/modeling/model_predictions.parquetartifacts/modeling/sequence_v2_quality.csvartifacts/modeling/common_row_pairwise_metrics.csvartifacts/modeling/incremental_value_diagnostics.csvartifacts/modeling/sequence_model_fit_diagnostics.csvreports/modeling/proxy_research_report.mdreports/modeling/figures/
Claim Boundaries
Current evidence supports engineering and signal-screening discussion only. It does not support final paper claims that require bid/ask or NBBO execution.
Do not claim:
- generic IV forecasting superiority;
- paper-grade full-spread tradability;
- that second-aggregate trade bars are NBBO quotes;
- that Mamba is the contribution independent of baselines and costs;
- that lower RMSE alone implies economic value.
The defensible near-term claim is narrower:
In a no-NBBO proxy sample, state and event-history features show preliminary cross-sectional ranking signal for earnings event-variance mispricing beyond the market-implied IVAR baseline. The current same-code ablation says the parsimonious FE V1 tabular signal is stronger than the richer FE V2 default, so FE V2 is a negative diagnostic result rather than a headline improvement. Paper-grade claims require quote/NBBO data and robust cost/inference checks.
Docs
- Home: project object and current status.
- Results Snapshot: current artifacts and readiness boundaries.
- Paper Plan: research design and model/backtest protocol.
- Audit Prompts: implementation and manuscript review checklists.
- Future Work: paper blockers and deferred extensions.
SPEC.md is the implementation and research-protocol contract. It stays at the
repo root and is not a separate docs-nav page.