N225 Open Gap Tail Risk¶
Research code for a point-in-time out-of-sample study of OSE Nikkei 225 Futures opening-gap tail risk.
The paper asks whether information observed at the U.S. cash-market close helps forecast the VaR and Expected Shortfall (ES) of the next OSE day-session open. The setting is not a generic overnight-return exercise: OSE futures trade through an active night session, so U.S. information may be partially incorporated before the Japanese opening auction.
What This Repository Does¶
- Builds a point-in-time forecast panel for Nikkei 225 Futures opening-gap risk.
- Evaluates downside and upside opening-gap risk as separate tail surfaces.
- Tests nested information sets built around Japan-only history, U.S. close variables, Japan proxy ETFs, and Asia proxy ETFs.
- Compares benchmark econometric models, advanced tail-risk benchmarks, and LightGBM tail specifications.
- Reports VaR coverage, quantile loss, Fissler-Ziegel VaR-ES loss, DM inference, Murphy diagrams, and supporting risk diagnostics.
The repository does not implement a live trading system, portfolio allocation rule, or execution-cost study.
Current Empirical Snapshot¶
The current clean snapshot is based on the completed run
tailrisk_20160719_20260522_20260525T142611Z_commit_bf63b13d.
- Clean evaluation window:
2018-06-20to2026-05-22. - Forecast sample:
1722trading-day observations. - Primary risk level: 95% VaR, corresponding to a nominal 5% exception rate.
- Baseline benchmark median breach rate: about 5.8%.
- ML direct-quantile breach rates across nested information sets: about 8.9% to 12.3%.
This means lower average loss values for the ML tail models must be read together with coverage diagnostics. In the current evidence, the direct-quantile LightGBM models often produce less conservative VaR estimates than the baseline benchmarks.
The current results are research-candidate evidence. Final manuscript claims still require author review of the tables, figures, and claim boundaries.
Research Design¶
Target¶
The main target is the settlement-to-open gap:
Forecasts are evaluated in positive loss units:
left_tail: downside opening-gap risk,realized_loss = -gap_t.right_tail: upside opening-gap risk,realized_loss = gap_t.
For both sides, a VaR exception is defined as:
Left and right tails are evaluated separately. The empirical results should not be averaged across sides or interpreted as the same economic mechanism.
Information Sets¶
japan_only: target history, lagged Japanese futures variables, rolling volatility, volume/open-interest information, and Japanese calendar variables.japan_only_plus_us_close_core: adds U.S. close equity, volatility, FX, and rates predictors available before the OSE open.japan_only_plus_us_close_core_plus_japan_proxy: adds U.S.-traded Japan proxy ETFs, includingEWJandDXJ.japan_only_plus_us_close_core_plus_japan_proxy_plus_asia_proxy: adds Asia proxy ETFs, includingEWY,EWT, andEWH.
Models¶
- Baseline benchmarks: historical quantile, rolling quantile, volatility-scaled quantile, GARCH/GJR-GARCH, and GJR-GARCH-EVT.
- Advanced econometric benchmarks: CAViaR, CARE/expectile models, and GAS models where convergence and validity checks are satisfied.
- ML tail models: flexible, tree-based direct quantile estimation via gradient boosting (LightGBM), LightGBM location-scale models, and LightGBM standardized-loss POT-GPD.
ML tail models are refit monthly using expanding training windows. LightGBM hyperparameters are held fixed across information sets and refit dates to limit data-dependent tuning.
Point-in-Time Controls¶
Every predictor used for a forecast must satisfy:
The current audit reports no hard look-ahead-bias failures. FRED macro-financial predictors use conservative publication lags, but they do not use unrevised real-time ALFRED vintages. This is a data limitation and should be disclosed in empirical writing.
Data Sources¶
- J-Quants Premium: Nikkei 225 Futures daily prices, settlement, volume, and open interest.
- Massive: U.S. ETFs, Japan proxy ETFs, Asia proxy ETFs, and curated U.S.-listed ETF late-session minute predictors.
- FRED: rates, FX, and macro-financial predictors with publication-lag controls.
- CBOE: volatility-index predictors, including VIX.
Source credentials belong in .env. Do not commit .env; keep shareable defaults in
.env.example.
Quick Start¶
This repo uses uv and just. The local virtual environment is controlled by .env:
Mutable research storage is also controlled by .env. On local machines, keep
DATA_DIR as an absolute external path outside cloud-synced checkouts. A
repo-local data/ symlink is acceptable if it resolves outside this repo.
REPORTS_DIR can remain reports because run
summaries, figures, and tables are much smaller than the data lake.
Typical local checks:
just check syncs the uv environment and runs read-only validation: format check,
ruff lint, mypy, the mypy-ignore debt guard, default pytest, strict MkDocs build,
and local architecture/name guards. Use just fix when you want ruff to format
and apply automatic lint fixes.
To serve the documentation site:
Research Run¶
The full workflow is:
It runs checks, builds the point-in-time panel, evaluates benchmark and ML-tail suites,
exports LaTeX tables and figures, and writes run outputs under REPORTS_DIR/runs/.
When the end argument is omitted, the workflow uses the most recent completed
Friday as the data cutoff rather than the run date. Pass an explicit
YYYY-MM-DD end date to override that paper-freeze default.
The default is cache-first (force=false); use force only for intentional schema/cache
invalidation. By default, just full excludes Massive OPRA U.S. options features
(options=false) so the canonical full-history run is not driven by the shorter
OPRA entitlement window. To build those features only for appendix or recent-window
diagnostics, opt in explicitly:
For a completed run, regenerate the snapshot without fetching vendor data:
Snapshot export also refreshes the slide-facing model metrics audit under
docs/tables/<run_id>/model_metrics_breach_audit.md and
docs/tables/<run_id>/model_metrics_full_rows.csv.
Appendix-only configuration robustness can be generated without changing primary/promoted results:
The sensitivity run is fixed to the post-24-check paper set: GJR-GARCH-EVT,
LGBM POT-GPD plain MLE (C), and LGBM POT-GPD UniBM (C).
The visible just surface is intentionally small:
just status
just source-probe
just check
just fix
just full
just snapshot latest
just sensitivity latest
just docs
Outputs¶
docs/results_snapshot.md: generated evidence map for the latest completed run.REPORTS_DIR/runs/<run_id>/latex/tables/: paper-facing LaTeX tables.REPORTS_DIR/runs/<run_id>/latex/figures/: paper-facing figures.REPORTS_DIR/runs/<run_id>/latex/table_manifest.json: table provenance.REPORTS_DIR/runs/<run_id>/latex/figure_manifest.json: figure provenance.DATA_DIR: local mutable data lake, ignored by git and kept outside the repo.REPORTS_DIR: local run summaries and generated reporting artifacts, ignored by git and usually kept atreports.
Claim Boundaries¶
- The contribution is a forecast-evaluation design for Nikkei 225 Futures opening-gap VaR/ES, not a new machine-learning algorithm.
- The U.S.-close-mark target is deferred until licensed intraday Nikkei futures marks are available.
- EVT results are judged within the registered 95% VaR/ES design and its common out-of-sample dates.
More Detail¶
docs/paper_plan.md: research questions, model families, evaluation design, and claim boundaries.docs/results_snapshot.md: generated evidence map for the current completed run.docs/data.md: source roles, target hierarchy, point-in-time controls, and data limitations.docs/future_work.md: extensions that should remain separate from the current paper.
Documentation Map¶
- Paper Plan: research questions, empirical design, model families, evaluation metrics, and manuscript-level caveats.
- Data: source roles, target hierarchy, point-in-time controls, and data-source limitations.
- Results Snapshot: generated evidence map for the completed run, including figures, tables, inference outputs, and claim boundaries.
- FAQ: short reader-facing framing for the research question, target, data, models, metrics, and current evidence.
- Development Audit Prompt: implementation review prompt for checking code, data, timing, tests, and reporting against the research design.
- Manuscript Audit Prompt: JFM/Wiley-facing review prompt for checking draft claims, evidence locks, and submission-style build readiness.
- Future Work: deferred extensions that should not dilute the current paper.
Reading Order¶
- Start with the Paper Plan as the research contract.
- Read Data before changing ingestion, schemas, or point-in-time controls.
- Use the Results Snapshot to inspect the current evidence.
- Use the FAQ for the plain research framing.
- Use the Development Audit Prompt before implementation handoffs or code review.
- Use the Manuscript Audit Prompt before circulating a draft.
- Use Future Work to keep deferred extensions out of the current paper.