Skip to content

Results And Discussion

Research-candidate full-run artifact. This page is generated from tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4. It summarizes the durable gold modeling sample and run outputs, not the older bounded access-check snapshot. It is still a research-candidate artifact: final manuscript claims require a clean committed run and author review of the tables and notes. It is organized as the paper's Results and Discussion section: sample and timing results, model setup, forecasting outcomes, diagnostics, tables, figures, and claim boundaries.

This page is the generated Results and Discussion companion to Paper Plan. It carries forward the planned manuscript sections: data/timing evidence, model/evaluation setup, benchmark results, ML nested information-set results, post-24-check comparisons, supporting diagnostics, and claim boundaries. Full data-source detail lives in Data.

Evidence Map

flowchart LR
  A["Vendor and calendar inputs"] --> B["Bronze / silver caches"]
  B --> C["Gold panel and timing map"]
  C --> D["Leakage and sample gates"]
  D --> E["Baseline benchmarks and advanced econometric benchmarks"]
  D --> F["Primary ML nested information sets"]
  E --> G["Metrics, DM, Murphy diagnostics"]
  F --> G
  G --> I["Tables and figures"]
  I --> J["Generated results snapshot"]
  • The left branch binds vendor and calendar inputs into a timestamp-audited gold panel.
  • The middle branch compares baseline benchmarks, advanced econometric benchmarks, and ML-tail forecasts on registered loss units.
  • The right branch separates primary ML nested information sets, diagnostic model-family comparisons, unconditional DM inference, and supporting figures.

2. Data, Target, And Timing Results

Run Metadata

Field Value
Run ID tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4
Artifact root reports/runs/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4
Claim level research_candidate
Requested window ['2016-07-19', '2026-05-22']
Combined clean start 2018-06-20
Gold panel dates 2016-07-19 to 2026-05-22
Forecast sample dates 2018-06-20 to 2026-05-22 (1722 rows)
Git commit 7f628ff4f66258a36314f492b652cdf7ef594b7e
Git dirty False
FRED vintage safe False
  • combined_clean_start is the modeling lower bound; dates before it remain audit history rather than forecast evidence.
  • git_dirty is recorded so dirty runs can be rejected before manuscript tables are frozen.
  • fred_vintage_safe=False is an explicit limitation: FRED data are current historical values with conservative release lag, not real-time vintage observations.

Target Distribution And Tail Diagnostics

  • These diagnostics are computed from the raw clean settlement-to-open target gap_t; left loss is -gap_t, and right loss is gap_t.
  • The purpose is to show why the dependent variable is a tail-risk object before comparing VaR/ES forecasts.
  • Positive tail-shape estimates, heavy empirical tails, and upward mean-excess patterns are empirical support for using heavy-tail approximations such as POT-GPD; they are not a finite-sample proof of Frechet max-domain attraction.
  • Raw target diagnostics motivate VaR/ES and EVT modeling. They do not validate LightGBM+EVT forecasts; forecast validity must be read from out-of-sample VaR/ES backtests and loss comparisons.

Target Summary

Measure Value
Clean forecast observations 1722
Date range 2018-06-20 to 2026-05-22
Mean gap 0.000599 log (+0.06%)
Standard deviation 0.011039 log (+1.11%)
Skewness -0.066817
Excess kurtosis 11.159
1% quantile -0.031062 log (-3.06%)
5% quantile -0.015606 log (-1.55%)
Median 0.001031 log (+0.10%)
95% quantile 0.015357 log (+1.55%)
99% quantile 0.027480 log (+2.79%)
Max drawdown gap -0.087513 log (-8.38%) on 2020-03-13
Max upside gap 0.096937 log (+10.18%) on 2025-04-10
Jarque-Bera p-value 0
Jarque-Bera statistic 8962.16

Raw-Tail EVT Diagnostics

Tail Threshold probability Threshold Exceedances Mean excess GPD xi GPD scale Hill xi
left_tail_loss 0.900 0.0160237 78 0.0104227 0.148364 0.00886263 0.432871
left_tail_loss 0.925 0.0195554 59 0.00979449 0.318986 0.00680111 0.342056
left_tail_loss 0.950 0.0223228 39 0.0114201 0.232349 0.0088172 0.354783
left_tail_loss 0.975 0.0293044 20 0.0127979 0.257683 0.00960064 0.31884
left_tail_loss 0.990 0.0373166 8 0.0175619 0.204438 0.0142233 0.342351
right_tail_loss 0.900 0.0150066 92 0.00903798 0.403374 0.00560621 0.381845
right_tail_loss 0.925 0.0169408 69 0.00984642 0.47548 0.00563718 0.370713
right_tail_loss 0.950 0.0189956 46 0.0122032 0.29399 0.00878548 0.41336
right_tail_loss 0.975 0.0259629 23 0.0146916 0.225126 0.0115191 0.383413
right_tail_loss 0.990 0.0369444 10 0.0171855 0.218772 0.0136836 0.352692
absolute_gap 0.900 0.0155233 170 0.00965025 0.287887 0.00694789 0.401867
absolute_gap 0.925 0.0175227 127 0.0106185 0.249472 0.00801625 0.397719
absolute_gap 0.950 0.0208133 85 0.011767 0.256151 0.00884033 0.383021
absolute_gap 0.975 0.0269986 43 0.0144273 0.164437 0.0120976 0.372398
absolute_gap 0.990 0.0371795 17 0.0183049 0.0605131 0.0172189 0.353301
  • The GPD threshold table is computed on raw left loss, raw right loss, and the absolute gap; it should not be read as a forecast-model diagnostic.
  • The Hill and GPD shape estimates are deliberately reported over multiple thresholds because tail-index estimates are sensitive in samples of this length.

Target Distribution Figures

Figure Tail side Source Claim scope Docs file
target_tail_motivation left_right_target_distribution panel/modeling_panel.parquet target_distribution_motivation_not_forecast_validation figures/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/target_tail_motivation.png

Gold Panel Construction

Measure Value
Gold modeling rows 2403
Gold columns 1428
Target-audit rows 2403
Clean target rows 2206
Forecast-sample rows 1722
Rows before combined clean start 420
Target-not-clean rows 197
Mapping excluded rows 64
Target audit reason Rows
None 2206
roll_sq_excluded 195
missing_previous_jpx_session 1
missing_reference_price 1
  • The cache lower bound is 2016-07-19, but XLC/core predictor coverage pushes the actual forecast sample to the combined clean start.
  • Target exclusion is explicit: roll/SQ windows and the single missing reference price are carried as audit evidence, not silently dropped.
  • The forecast-sample reason column makes the sample boundary reproducible row by row.

Calendar And Timing Map

Measure Value
Normal trading mappings 2333
U.S./Japan desync mappings 1
NYSE early-close mappings 32
EDT rows 1563
EST rows 840
  • The map covers EST/EDT, early closes, U.S./Japan holiday desynchronization, and normal trading alignments.
  • Desync rows are not treated as normal forecast rows.
  • The timing map is part of the leakage-bound gold artifact, not ad hoc evaluation logic.

Feature Coverage

Source family Block Features Mean missing Max missing
Asia proxy Asia proxy 10 0.000% 0.000%
cboe_volatility fred_core 2 0.000% 0.000%
cross_market_derived Asia proxy 1 0.000% 0.000%
cross_market_derived fred_core 2 0.000% 0.000%
cross_market_derived JP proxy 2 0.000% 0.000%
cross_market_derived US core 2 0.000% 0.000%
event_calendar calendar_controls 7 0.000% 0.000%
fred_core fred_core 9 0.000% 0.000%
FRED credit enriched FRED credit enriched 4 62.398% 62.427%
fx_core fx_core 4 0.000% 0.000%
JP history JP only 37 0.005% 0.058%
JP proxy JP proxy 8 0.000% 0.000%
J-Quants N225 options JP only 30 1.605% 14.634%
massive_daily US core 40 0.001% 0.058%
massive_minute Asia proxy 60 0.000% 0.000%
massive_minute JP proxy 24 0.348% 4.181%
massive_minute US late session 84 0.000% 0.000%
massive_optional massive_optional 2 0.000% 0.000%
  • U.S. core, proxy ETFs, minute late-session features, CBOE VIX, FRED rates, FRED H.10 FX, and any audit-gated options-risk fields are separated by source family and block.
  • Credit-spread FRED features are enriched/optional and visibly late-starting, so they do not move the core clean start.
  • Feature coverage should be read together with the leakage summary; high coverage alone is not enough without timestamp validity.

Leakage Audit

Field Value
Status pass_with_warnings
Rows audited 783378
Failures 0
Warnings 611790
Panel row count 2403
Panel signature seed 42
Panel signature 8094755ffc96b01af6fb904876e0abdd3920370fa1b07e44c2c95681cd3e5431
  • Zero failures means no audited row violated the hard timestamp invariant.
  • Warnings are retained because they identify conservative-lag or missing-feature situations that may matter for interpretation.
  • The panel signature is deterministic and binds the leakage check to the current gold panel/config.

3. Model And Evaluation Setup

Pipeline Structure

Step Layer Purpose
1 Vendor and calendar sources Pull or read J-Quants, Massive, FRED, CBOE, and exchange-calendar inputs.
2 Bronze and silver cache Preserve typed vendor/cache rows, then normalize point-in-time research features.
3 Gold modeling panel Join targets, calendar map, feature coverage, and leakage-bound signatures.
4 Leakage and coverage gates Enforce timestamp ordering and sample eligibility before evaluation.
5 Baseline benchmarks and ML-tail registry Run target-history/econometric baseline benchmarks and LightGBM tail-model families.
6 Metrics, inference, diagnostics Build loss matrices, DM/Murphy diagnostics, stress windows, and result matrix artifacts.
7 Results snapshot Summarize run-specific evidence and claim boundaries for reader review.
  • Data-access and cache artifacts live under data/bronze and data/silver.
  • Durable modeling evidence lives under data/gold; forecast/evaluation/reporting read from gold and reports.
  • Run-specific forecasts, metrics, diagnostics, and LaTeX tables live under reports/runs/<run_id>.

Model And Evaluation Protocol

  • The registered risk level is tail_level = 0.95; the nominal VaR exception rate is 5%.
  • A VaR exception is counted when realized_loss > var_forecast; this follows the standard exception-counting logic of VaR backtesting, but the snapshot does not apply Basel green/yellow/red traffic-light capital zones.
  • Forecast evaluation is based on coverage diagnostics, Kupiec/Christoffersen tests where available, quantile loss, Fissler-Ziegel joint VaR-ES loss, and DM inference.
  • Benchmarks use target-history information only. ML-tail models add predictors through fixed nested information sets.
  • Most specifications use expanding pre-forecast training histories. The rolling-quantile benchmark is the designed exception and uses the most recent 1,000 clean observations.
  • LightGBM hyperparameters are held fixed across information sets and refit dates; the snapshot reports model-family evidence rather than tuning-search evidence.
  • DM inference is read on average across the unconditional evaluation sample.

4. Forecasting Results And Discussion

4.1 Benchmark results

Status: completed; forecast rows: 15173; metric rows: 14; failures: 0.

Benchmark layer Status Forecast rows Diagnostic rows Failures How to read it
baseline completed 8664 12 0 Implemented evidence for target-history and econometric baseline benchmark models.
advanced econometric completed_nonblocking 6509 2526 0 Implemented nonblocking advanced econometric benchmark forecasts; review with common-sample gates.
Model Information set Tail side Rows VaR breach rate Exceptions Mean quantile loss Mean FZ loss
ewma_vol_scaled Target history left_tail 722 5.263% 38 0.00140906 -3.64746
ewma_vol_scaled Target history right_tail 722 4.571% 33 0.00134173 -3.65662
garch_t Target history left_tail 722 6.094% 44 0.00136831 -3.70108
garch_t Target history right_tail 722 4.155% 30 0.00127794 -3.70337
gas_t_location_scale Target history left_tail 722 6.371% 46 0.00135225 -3.69351
gas_t_location_scale Target history right_tail 722 4.986% 36 0.00130686 -3.66834
gjr_garch_evt Target history left_tail 722 6.094% 44 0.00133021 -3.74623
gjr_garch_evt Target history right_tail 722 5.817% 42 0.00123302 -3.70857
gjr_garch_t Target history left_tail 722 7.064% 51 0.00133844 -3.72346
gjr_garch_t Target history right_tail 722 4.294% 31 0.00122154 -3.73971
historical_quantile Target history left_tail 722 5.540% 40 0.00147622 -3.54066
historical_quantile Target history right_tail 722 6.925% 50 0.00150215 -3.40678
rolling_quantile Target history left_tail 722 5.817% 42 0.00148114 -3.52468
rolling_quantile Target history right_tail 722 7.202% 52 0.00149558 -3.42732
  • Baseline benchmark rows set the target-history/econometric reference that ML models should be interpreted against.
  • Advanced econometric benchmark families are nonblocking; rows with valid forecasts are empirical evidence subject to the same sample and inference gates, while unavailable rows remain diagnostics.
  • The table is not a leaderboard by itself; coverage, exception counts, quantile loss, and FZ loss must be read together.
  • Common-sample rows are reported directly so readers can see the effective evidence size.

4.2 Primary ML specifications across nested information sets

Status: completed LGBM ML-tail models; implemented models: LGBM direct quantile, LGBM location-scale empirical, LGBM POT-GPD plain MLE, LGBM POT-GPD UniBM block-maxima shape, LGBM median/MAD POT-GPD plain MLE, LGBM median/MAD POT-GPD UniBM block-maxima shape, LGBM median/IQR POT-GPD plain MLE, LGBM median/IQR POT-GPD UniBM block-maxima shape; forecast rows: 43088; failures: 0.

Model Information set Tail side Rows VaR breach rate Exceptions Mean quantile loss Mean FZ loss
LGBM direct quantile JP only left_tail 527 8.159% 43 0.00141174 -3.48935
LGBM direct quantile JP + US close core left_tail 527 11.195% 59 0.00115754 -3.6684
LGBM direct quantile JP + US close core + JP proxy left_tail 527 11.765% 62 0.00111393 -3.86888
LGBM direct quantile JP + US close core + JP proxy + Asia proxy left_tail 527 11.765% 62 0.00111901 -3.8076
LGBM direct quantile JP only right_tail 527 9.677% 51 0.00130995 -3.48743
LGBM direct quantile JP + US close core right_tail 527 11.954% 63 0.00125168 -3.48624
LGBM direct quantile JP + US close core + JP proxy right_tail 527 11.575% 61 0.00121124 -3.55746
LGBM direct quantile JP + US close core + JP proxy + Asia proxy right_tail 527 12.903% 68 0.00122247 -3.55915
  • This primary ML table remains strict and reports only ML-tail rows that pass the registered common-sample and forecast-validity gates; coverage is reviewed separately.
  • Location-scale empirical and plain POT-GPD are primary candidates only after their valid OOS coverage, standardized-loss, exceedance, and ES-validity gates pass.
  • Differences across information blocks are candidate forecast evidence only after the common-sample, coverage, and inference diagnostics are reviewed.
  • Coverage review: 8/8 primary ML rows differ from the expected breach rate by more than 2.5 percentage points, so quantile/FZ loss differences alone must not be read as forecast improvement.

4.3 Side-specific ML-tail promotion gate

Role Model Information set Tail side Rows Breach Q loss FZ loss DM q DM FZ Gate
left promoted LGBM median/IQR POT-GPD plain MLE JP + US close core + JP proxy + Asia proxy left_tail 527 5.882% 0.000917119 -4.22247 -0.000235864; p=0.028; reject10 -0.488666; p=0.049; reject10 pass
right promoted LGBM location-scale empirical JP + US close core + JP proxy right_tail 493 6.085% 0.00102336 -4.02729 -0.000238584; p=0.026; reject10 -0.529731; p=0.003; reject10 pass
  • This paper-facing bridge promotes side-specific ML-tail candidates only after the N/coverage gate and restricted common-sample inference are visible.
  • The current run's promoted rows are exactly the rows shown above; read them as side-specific paper candidates, not as a universal family ranking.
  • This is not a universal model-family ranking and does not replace the strict primary nested-information-set table above.

4.4 ML-tail artifact relationship

Artifact Rows Role Claim boundary
ml_tail_metrics.parquet 8 Primary ML nested-information-set comparison Eligible for primary discussion after author review.
ml_tail_metrics_per_model.parquet 64 Per-model diagnostics on each model's own valid OOS rows Not a cross-model comparison and not a replacement primary ML table.
ml_tail_result_matrix.parquet 384 Restricted common-sample VaR-only and VaR-ES comparisons Restricted evidence; direct quantile rows here are comparison anchors.
  • ml_tail_metrics.parquet is the primary nested-information-set artifact. It contains the ML-tail rows that survived the strict common-sample gate in this run.
  • ml_tail_metrics_per_model.parquet reports each implemented ML-tail model on its own valid OOS rows; it is useful for debugging coverage but is not a cross-model comparison table.
  • ml_tail_result_matrix.parquet creates restricted common samples for VaR-only and VaR-ES comparisons across model families and within-model information-set increments.

4.5 All-model diagnostic scan

Suite Model Information set Metric rows OOS N mean+-sd Breach mean+-sd Abs cov err mean+-sd Q loss mean+-sd FZ loss mean+-sd ES severity mean+-sd
benchmark_advanced care_expectile_asymmetric_slope Target history 2 653 +/- 36.7696 7.884% +/- 0.097% 2.884% +/- 0.097% 0.00138317 +/- 8.00061e-05 -3.57653 +/- 0.0638322 0.0082744 +/- 0.000559256
benchmark_advanced care_expectile_sav Target history 2 649.5 +/- 41.7193 7.658% +/- 1.250% 2.658% +/- 1.250% 0.00141367 +/- 5.95611e-05 -3.54112 +/- 0.0938805 0.00857223 +/- 0.000528859
benchmark_advanced caviar_asymmetric_slope Target history 2 506 +/- 2.82843 6.916% +/- 0.241% 1.916% +/- 0.241% 0.00152915 +/- 4.9109e-05 -3.49931 +/- 0.0468367 0.0100652 +/- 0.000926358
benchmark_advanced caviar_sav Target history 2 501 +/- 4.24264 6.086% +/- 0.372% 1.086% +/- 0.372% 0.00156273 +/- 6.81431e-06 -3.47395 +/- 0.093514 0.0116523 +/- 3.28833e-05
benchmark_advanced gas_t_location_scale Target history 2 722 +/- 0 5.679% +/- 0.979% 0.693% +/- 0.960% 0.00132955 +/- 3.20964e-05 -3.68092 +/- 0.0177958 0.00959201 +/- 0.00130427
benchmark_advanced gas_t_pot_gpd Target history 2 223 +/- 0 6.502% +/- 2.854% 2.018% +/- 2.124% 0.00153061 +/- 0.000359334 -3.41151 +/- 0.534257 0.009774 +/- 0.00427048
benchmark_baseline ewma_vol_scaled Target history 2 722 +/- 0 4.917% +/- 0.490% 0.346% +/- 0.118% 0.00137539 +/- 4.76049e-05 -3.65204 +/- 0.00647254 0.00921114 +/- 0.000813727
benchmark_baseline garch_t Target history 2 722 +/- 0 5.125% +/- 1.371% 0.970% +/- 0.176% 0.00132312 +/- 6.39034e-05 -3.70223 +/- 0.00162002 0.00974876 +/- 0.00161122
benchmark_baseline gjr_garch_evt Target history 2 722 +/- 0 5.956% +/- 0.196% 0.956% +/- 0.196% 0.00128162 +/- 6.87209e-05 -3.7274 +/- 0.0266288 0.00830943 +/- 0.000959208
benchmark_baseline gjr_garch_t Target history 2 722 +/- 0 5.679% +/- 1.959% 1.385% +/- 0.960% 0.00127999 +/- 8.26617e-05 -3.73159 +/- 0.0114919 0.00881549 +/- 0.00205961
benchmark_baseline historical_quantile Target history 2 722 +/- 0 6.233% +/- 0.979% 1.233% +/- 0.979% 0.00148919 +/- 1.83406e-05 -3.47372 +/- 0.0946715 0.0120717 +/- 9.68084e-05
benchmark_baseline rolling_quantile Target history 2 722 +/- 0 6.510% +/- 0.979% 1.510% +/- 0.979% 0.00148836 +/- 1.02111e-05 -3.476 +/- 0.068848 0.0115817 +/- 0.000143645
ml_tail LGBM POT-GPD UniBM block-maxima shape JP only 2 484 +/- 0 4.855% +/- 1.023% 0.723% +/- 0.205% 0.00143954 +/- 0.000108633 -3.54743 +/- 0.050522 0.0101296 +/- 0.000667455
ml_tail LGBM POT-GPD UniBM block-maxima shape JP + US close core 2 483 +/- 0 5.901% +/- 0.732% 0.901% +/- 0.732% 0.00102253 +/- 3.45998e-05 -4.02862 +/- 0.0642791 0.00630662 +/- 0.000886595
ml_tail LGBM POT-GPD UniBM block-maxima shape JP + US close core + JP proxy 2 473.5 +/- 0.707107 6.019% +/- 0.457% 1.019% +/- 0.457% 0.00101481 +/- 2.40341e-05 -4.07568 +/- 0.0784556 0.0063269 +/- 0.000530558
ml_tail LGBM POT-GPD UniBM block-maxima shape JP + US close core + JP proxy + Asia proxy 2 473.5 +/- 0.707107 6.125% +/- 0.606% 1.125% +/- 0.606% 0.00104182 +/- 6.17669e-05 -3.99333 +/- 0.0899169 0.00643634 +/- 0.00132792
ml_tail LGBM POT-GPD plain MLE JP only 2 484 +/- 0 4.649% +/- 1.315% 0.930% +/- 0.497% 0.00144417 +/- 0.000106677 -3.55083 +/- 0.03612 0.0106097 +/- 0.00111591
ml_tail LGBM POT-GPD plain MLE JP + US close core 2 482 +/- 0 5.498% +/- 0.440% 0.498% +/- 0.440% 0.00102429 +/- 3.8851e-05 -4.03154 +/- 0.069105 0.00653613 +/- 0.000836497
ml_tail LGBM POT-GPD plain MLE JP + US close core + JP proxy 2 473.5 +/- 0.707107 5.808% +/- 0.158% 0.808% +/- 0.158% 0.00101213 +/- 3.0664e-05 -4.0908 +/- 0.0864067 0.00623547 +/- 0.000519173
ml_tail LGBM POT-GPD plain MLE JP + US close core + JP proxy + Asia proxy 2 473 +/- 0 6.131% +/- 0.598% 1.131% +/- 0.598% 0.00103841 +/- 7.23005e-05 -4.00776 +/- 0.117572 0.00622715 +/- 0.0016854
ml_tail LGBM direct quantile JP only 2 554 +/- 0 8.664% +/- 1.021% 3.664% +/- 1.021% 0.00133825 +/- 9.25573e-05 -3.50275 +/- 0.0392189 0.00813676 +/- 0.00113574
ml_tail LGBM direct quantile JP + US close core 2 554 +/- 0 11.101% +/- 0.383% 6.101% +/- 0.383% 0.00118022 +/- 4.32589e-05 -3.61364 +/- 0.0922633 0.0064386 +/- 0.000247838
ml_tail LGBM direct quantile JP + US close core + JP proxy 2 527 +/- 0 11.670% +/- 0.134% 6.670% +/- 0.134% 0.00116259 +/- 6.88028e-05 -3.71317 +/- 0.220204 0.0059796 +/- 0.000662772
ml_tail LGBM direct quantile JP + US close core + JP proxy + Asia proxy 2 527 +/- 0 12.334% +/- 0.805% 7.334% +/- 0.805% 0.00117074 +/- 7.31616e-05 -3.68337 +/- 0.17568 0.00571666 +/- 0.000192238
ml_tail LGBM location-scale empirical JP only 2 508 +/- 0 4.823% +/- 0.696% 0.492% +/- 0.251% 0.00141628 +/- 9.98491e-05 -3.56404 +/- 0.0139556 0.00998686 +/- 0.000171526
ml_tail LGBM location-scale empirical JP + US close core 2 505.5 +/- 0.707107 6.133% +/- 0.009% 1.133% +/- 0.009% 0.00101756 +/- 4.15541e-05 -4.01111 +/- 0.0557055 0.00604578 +/- 0.000458165
ml_tail LGBM location-scale empirical JP + US close core + JP proxy 2 492.5 +/- 0.707107 6.295% +/- 0.296% 1.295% +/- 0.296% 0.00100557 +/- 2.51561e-05 -4.08481 +/- 0.0813422 0.0059098 +/- 0.000533713
ml_tail LGBM location-scale empirical JP + US close core + JP proxy + Asia proxy 2 492 +/- 0 6.707% +/- 0.575% 1.707% +/- 0.575% 0.00103557 +/- 7.0027e-05 -3.98452 +/- 0.104636 0.00590198 +/- 0.00152973
ml_tail LGBM median/IQR POT-GPD UniBM block-maxima shape JP only 2 554 +/- 0 3.430% +/- 0.255% 1.570% +/- 0.255% 0.00129247 +/- 0.000128665 -3.72329 +/- 0.0553233 0.0103895 +/- 0.000332801
ml_tail LGBM median/IQR POT-GPD UniBM block-maxima shape JP + US close core 2 553.5 +/- 0.707107 5.420% +/- 0.249% 0.420% +/- 0.249% 0.001005 +/- 6.42702e-05 -4.03849 +/- 0.105695 0.00680349 +/- 2.92911e-05
ml_tail LGBM median/IQR POT-GPD UniBM block-maxima shape JP + US close core + JP proxy 2 527 +/- 0 4.934% +/- 0.268% 0.190% +/- 0.094% 0.000977369 +/- 9.96524e-05 -4.07223 +/- 0.267913 0.00722562 +/- 0.000739001
ml_tail LGBM median/IQR POT-GPD UniBM block-maxima shape JP + US close core + JP proxy + Asia proxy 2 527 +/- 0 5.882% +/- 0.000% 0.882% +/- 0.000% 0.000981983 +/- 9.6245e-05 -4.07821 +/- 0.191496 0.00617459 +/- 0.00100583
ml_tail LGBM median/IQR POT-GPD plain MLE JP only 2 554 +/- 0 3.339% +/- 0.383% 1.661% +/- 0.383% 0.00129732 +/- 0.000121471 -3.71766 +/- 0.0543739 0.0104723 +/- 0.000488415
ml_tail LGBM median/IQR POT-GPD plain MLE JP + US close core 2 553.5 +/- 0.707107 5.330% +/- 0.121% 0.330% +/- 0.121% 0.00100641 +/- 5.8567e-05 -4.05822 +/- 0.117346 0.00677553 +/- 9.19624e-05
ml_tail LGBM median/IQR POT-GPD plain MLE JP + US close core + JP proxy 2 526 +/- 0 4.753% +/- 0.000% 0.247% +/- 0.000% 0.000972485 +/- 9.62715e-05 -4.06369 +/- 0.190447 0.00717965 +/- 0.000903805
ml_tail LGBM median/IQR POT-GPD plain MLE JP + US close core + JP proxy + Asia proxy 2 527 +/- 0 5.977% +/- 0.134% 0.977% +/- 0.134% 0.000981802 +/- 9.14751e-05 -4.07318 +/- 0.21113 0.00595077 +/- 0.000647845
ml_tail LGBM median/MAD POT-GPD UniBM block-maxima shape JP only 2 484 +/- 0 5.579% +/- 0.584% 0.579% +/- 0.584% 0.0014127 +/- 0.000111137 -3.60619 +/- 0.0108432 0.0101188 +/- 0.00167497
ml_tail LGBM median/MAD POT-GPD UniBM block-maxima shape JP + US close core 2 484 +/- 0 6.921% +/- 0.438% 1.921% +/- 0.438% 0.00109506 +/- 3.41604e-05 -4.06831 +/- 0.110891 0.00792169 +/- 0.000308992
ml_tail LGBM median/MAD POT-GPD UniBM block-maxima shape JP + US close core + JP proxy 2 473.5 +/- 0.707107 7.603% +/- 0.310% 2.603% +/- 0.310% 0.00103226 +/- 7.39019e-05 -4.16953 +/- 0.157826 0.00660286 +/- 0.000522774
ml_tail LGBM median/MAD POT-GPD UniBM block-maxima shape JP + US close core + JP proxy + Asia proxy 2 473.5 +/- 0.707107 7.497% +/- 0.161% 2.497% +/- 0.161% 0.00106577 +/- 8.54015e-05 -4.09025 +/- 0.239148 0.00711434 +/- 0.000949755
ml_tail LGBM median/MAD POT-GPD plain MLE JP only 2 484 +/- 0 5.165% +/- 0.000% 0.165% +/- 0.000% 0.00141016 +/- 0.000113401 -3.60412 +/- 0.009619 0.0105104 +/- 0.00101589
ml_tail LGBM median/MAD POT-GPD plain MLE JP + US close core 2 484 +/- 0 6.818% +/- 0.584% 1.818% +/- 0.584% 0.00109354 +/- 3.73982e-05 -4.05872 +/- 0.0856922 0.00790369 +/- 0.000394935
ml_tail LGBM median/MAD POT-GPD plain MLE JP + US close core + JP proxy 2 473.5 +/- 0.707107 7.287% +/- 0.758% 2.287% +/- 0.758% 0.00102391 +/- 8.23481e-05 -4.19814 +/- 0.158674 0.00660111 +/- 0.000331299
ml_tail LGBM median/MAD POT-GPD plain MLE JP + US close core + JP proxy + Asia proxy 2 473.5 +/- 0.707107 7.076% +/- 0.757% 2.076% +/- 0.757% 0.00105943 +/- 9.34222e-05 -4.09633 +/- 0.245078 0.0073057 +/- 0.000570338
  • This table joins benchmark_metrics_per_model.parquet and ml_tail_metrics_per_model.parquet so all benchmark and LGBM tail-model variants are visible in one place.
  • Mean and standard deviation are computed across registered metric rows for the same suite/model/information-set configuration; for most rows this summarizes left- and right-tail metrics.
  • It is a diagnostic scan, not the formal cross-model comparison table. Cross-model claims still require common-sample result-matrix and DM evidence because valid dates and model gates can differ.

4.6 Restricted common-sample result matrix and DM evidence

Family Axis Loss Rows Common N Date range Joint exceptions
nested information sets information_set_increment var_coverage 64 471 to 527 2023-01-26 to 2026-05-21 34 to 80
nested information sets information_set_increment var_es_fz_loss 64 471 to 527 2023-01-26 to 2026-05-21 34 to 80
nested information sets information_set_increment var_quantile_loss 64 471 to 527 2023-01-26 to 2026-05-21 34 to 80
tail_model_family model_family var_coverage 64 472 to 484 2023-06-16 to 2026-05-21 47 to 70
tail_model_family model_family var_es_fz_loss 64 472 to 484 2023-06-16 to 2026-05-21 47 to 70
tail_model_family model_family var_quantile_loss 64 472 to 484 2023-06-16 to 2026-05-21 47 to 70
  • The result matrix is the right place to compare direct quantile, location-scale empirical, plain POT-GPD, and the robust plain POT-GPD routes on their restricted common dates.
  • It separates VaR-only losses from VaR-ES joint scoring, so VaR-only claims are not confused with ES claims.
  • Restricted direct-quantile performance is only a comparison anchor for the tail-model family; it does not replace the primary direct-quantile evidence.
  • DM records are emitted only where registered row-count and exception-count gates pass; otherwise the result matrix remains descriptive.

4.7 Stress and diagnostic windows

Suite Rows Window labels
benchmark 146 loss_top_decile
ml_tail 212 loss_top_decile, vix_top_decile
  • Stress windows identify high-loss or high-volatility subsamples for two-sided risk diagnostics.
  • These rows use reproducible full-sample classifiers in this first pass, so they should be described as diagnostics rather than a live stress classifier.
  • They are useful for finding whether model behavior changes in difficult regimes before writing manuscript discussion.

4.8 Integrated interpretation and claim boundaries

Data and timing audit

  • The gold timing map covers 2016-07-19 to 2026-05-22 and the combined clean start is 2018-06-20.
  • No forecast-sample rows before 2018-06-20 enter the modeling evidence.
  • The leakage check reports status pass_with_warnings with zero leakage failures and 611790 warnings.
  • FRED vintage safety is recorded as False; FRED values use conservative release timing but remain current historical observations rather than ALFRED real-time vintages.

Baseline benchmarks and advanced econometric benchmarks

  • benchmark_metrics.parquet reports 14 common-sample rows across 7 baseline benchmark model families and 2 tail side(s), while benchmark forecasts contain 15173 model-date rows.
  • Baseline benchmark models are external target-history and econometric references; this section does not rank them.
  • Advanced econometric benchmark rows are implemented for 6 model families and contribute 6509 nonblocking forecast rows; these rows are claim-gated diagnostics unless a manuscript table explicitly promotes them through the same sample and inference review.
  • Baseline benchmark breach rates have a median of 0.0581717, within 2.5 percentage points of the nominal level, indicating reasonable coverage calibration relative to the ML-tail models whose breach rates are reported in the nested-information-set section.

Primary ML specifications across nested information sets

  • ml_tail_metrics.parquet defines the primary ML specification comparison across nested information sets for this run.
  • The primary ML artifact contains 4 information sets, 1 tail level(s), and 2 tail side(s); the retained primary ML rows are LGBM direct quantile.
  • The implemented ML-tail registry is LGBM direct quantile, LGBM location-scale empirical, LGBM POT-GPD plain MLE, LGBM POT-GPD UniBM block-maxima shape, LGBM median/MAD POT-GPD plain MLE, LGBM median/MAD POT-GPD UniBM block-maxima shape, LGBM median/IQR POT-GPD plain MLE, LGBM median/IQR POT-GPD UniBM block-maxima shape, but the primary nested-information-set comparison should be read only from ml_tail_metrics.parquet.
  • The nested information sets report downside-risk and upside-risk surfaces separately. The registered artifacts show different left/right patterns, and the generator does not assume that the two sides share the same economic mechanism.
  • Coverage warning: all 8 primary ML rows exhibit VaR breach rates (0.0815939 to 0.129032) that exceed the nominal level by more than 2.5 percentage points. Quantile-loss and FZ-loss differences across the nested information sets must be interpreted in this context; lower loss scores may partly reflect less conservative VaR estimates rather than better conditional tail calibration.
  • For left_tail / LGBM direct quantile / tail=0.950, the largest quantile-loss change occurs at the first information-set augmentation (adding U.S. close core); subsequent additions of Japan proxy and Asia proxy ETFs contribute diminishing incremental loss changes. This saturation pattern is descriptive and does not automatically reduce the value of the broader information set.
  • The nested information sets are used to assess candidate incremental U.S.-close information under strict common-sample rules; they do not by themselves establish forecast improvement.

Restricted model-family comparison

  • ml_tail_result_matrix.parquet contains restricted common-sample comparisons for 8 LightGBM tail-model families.
  • The restricted common-N range is 471 to 527 and the joint-exception range is 34 to 80.
  • Recorded claim scopes are restricted_model_comparison_not_primary; these rows are restricted evidence and cannot replace the primary ML nested-information-set comparison.
  • The tail-model family comparison is severely sample-limited: the largest restricted common-N is 484 rows. No model-family ranking claim is supportable from this restricted sample; extended OOS coverage is needed before tail-model family ranking becomes meaningful.
  • Result-matrix inference is recorded separately from the primary suite-level DM: restricted DM records include 208 gate-pass rows and 104 unavailable rows. These entries are restricted common-sample diagnostics, not primary model-family rankings.
  • The result matrix is a matched-date diagnostic layer. It should not be worded as one family being better than another.

Coverage and inference gates

  • Coverage review flags 8/8 primary ML rows with breach rates more than 2.5 percentage points from nominal coverage; Kupiec p-values fall below 0.05 in 8/8 reported rows and Christoffersen p-values fall below 0.05 in 0/8 reported rows.
  • Model-eviction artifacts record 8 retained rows and 56 non-retained rows under the primary ML sample policy.
  • Block-bootstrap DM artifacts are unconditional forecast-comparison diagnostics; any p-value should be read on average across the unconditional evaluation sample, not as condition-specific evidence.
  • Loss differentials alone do not constitute an improvement claim; coverage, exception counts, sample gates, and inference status must be reviewed together.
  • Result-matrix tail-event power flags and suite-level inference gates report 0 restricted rows with insufficient tail-event power and 0/18 unavailable DM inference rows.

Supporting diagnostics

  • Supporting LaTeX diagnostic table files are present for 2/2 registered diagnostic families.
  • ES severity diagnostics contain 86 finite rows with mean exceedance severity ranging from 0.00482029 to 0.0121402; this is conditional-on-exception evidence.
  • Stress-window diagnostics contain 358 rows, and 24-check LGBM Murphy diagnostics contain 3200 rows.
  • Feature-unavailability diagnostics contain 384 rows.
  • Figure manifest references:
  • Figure: market_timing_design (Source: manifest.json, config/research_config.json, panel/calendar_map.parquet; Claim scope: design_forecast_origin_not_causal_price_discovery; File: latex/figures/market_timing_design.png).
  • Figure: coverage_breach_rates_left_tail (Source: metrics/benchmark_metrics.parquet, metrics/benchmark_metrics_per_model.parquet, metrics/ml_tail_metrics.parquet, metrics/ml_tail_metrics_per_model.parquet; Claim scope: coverage_diagnostic_not_primary_claim; File: latex/figures/coverage_breach_rates_left_tail.png).
  • Figure: coverage_breach_rates_right_tail (Source: metrics/benchmark_metrics.parquet, metrics/benchmark_metrics_per_model.parquet, metrics/ml_tail_metrics.parquet, metrics/ml_tail_metrics_per_model.parquet; Claim scope: coverage_diagnostic_not_primary_claim; File: latex/figures/coverage_breach_rates_right_tail.png).
  • Figure: cumulative_lgbm_a_anchor_fz_gain (Source: metrics/benchmark_loss_matrix.parquet, metrics/ml_tail_loss_matrix.parquet, forecasts/benchmark_forecasts.parquet, forecasts/ml_tail_forecasts.parquet; Claim scope: headline_lgbm_a_anchor_gjr_evt_and_information_increment_fz_gain; File: latex/figures/cumulative_lgbm_a_anchor_fz_gain.png).
  • Figure: selected_model_performance_left_tail (Source: metrics/benchmark_metrics_per_model.parquet, metrics/ml_tail_metrics_per_model.parquet; Claim scope: selected_benchmark_vs_lgbm_main_figure_not_full_result_set; File: latex/figures/selected_model_performance_left_tail.png).
  • Figure: selected_model_performance_right_tail (Source: metrics/benchmark_metrics_per_model.parquet, metrics/ml_tail_metrics_per_model.parquet; Claim scope: selected_benchmark_vs_lgbm_main_figure_not_full_result_set; File: latex/figures/selected_model_performance_right_tail.png).
  • Figure: full_sample_var_overlay_left_tail (Source: forecasts/benchmark_forecasts.parquet, forecasts/ml_tail_forecasts.parquet; Claim scope: full_sample_var_overlay_fixed_selection_visual_diagnostic; File: latex/figures/full_sample_var_overlay_left_tail.png).
  • Figure: full_sample_var_overlay_right_tail (Source: forecasts/benchmark_forecasts.parquet, forecasts/ml_tail_forecasts.parquet; Claim scope: full_sample_var_overlay_fixed_selection_visual_diagnostic; File: latex/figures/full_sample_var_overlay_right_tail.png).
  • Figure: benchmark_murphy_left_tail (Source: metrics/benchmark_murphy.parquet; Claim scope: murphy_diagnostic_benchmark_baseline_common_grid; File: latex/figures/benchmark_murphy_left_tail.png).
  • Figure: benchmark_murphy_right_tail (Source: metrics/benchmark_murphy.parquet; Claim scope: murphy_diagnostic_benchmark_baseline_common_grid; File: latex/figures/benchmark_murphy_right_tail.png).
  • Figure: lgbm_24check_murphy_left_tail (Source: metrics/lgbm_24check_murphy.parquet, metrics/ml_tail_metrics_per_model.parquet, forecasts/ml_tail_forecasts.parquet; Claim scope: murphy_diagnostic_lgbm_24check_robust_ladder; File: latex/figures/lgbm_24check_murphy_left_tail.png).
  • Figure: lgbm_24check_murphy_right_tail (Source: metrics/lgbm_24check_murphy.parquet, metrics/ml_tail_metrics_per_model.parquet, forecasts/ml_tail_forecasts.parquet; Claim scope: murphy_diagnostic_lgbm_24check_robust_ladder; File: latex/figures/lgbm_24check_murphy_right_tail.png).
  • Figure: es_severity_left_tail (Source: metrics/benchmark_metrics.parquet, metrics/ml_tail_metrics.parquet, metrics/ml_tail_metrics_per_model.parquet; Claim scope: es_severity_diagnostic_not_model_selection_claim; File: latex/figures/es_severity_left_tail.png).
  • Figure: es_severity_right_tail (Source: metrics/benchmark_metrics.parquet, metrics/ml_tail_metrics.parquet, metrics/ml_tail_metrics_per_model.parquet; Claim scope: es_severity_diagnostic_not_model_selection_claim; File: latex/figures/es_severity_right_tail.png).
  • Figure: var_es_stress_overlay_2024_stress_episode (Source: forecasts/benchmark_forecasts.parquet, forecasts/ml_tail_forecasts.parquet; Claim scope: appendix_stress_overlay_illustration_not_validation; File: latex/figures/var_es_stress_overlay_2024_stress_episode.png).
  • Figure: var_es_stress_overlay_2025_stress_episode (Source: forecasts/benchmark_forecasts.parquet, forecasts/ml_tail_forecasts.parquet; Claim scope: appendix_stress_overlay_illustration_not_validation; File: latex/figures/var_es_stress_overlay_2025_stress_episode.png).
  • Figure: dm_heatmap_left_tail (Source: forecasts/benchmark_forecasts.parquet, forecasts/ml_tail_forecasts.parquet; Claim scope: post_24check_cross_suite_fz_dm_diagnostic; File: latex/figures/dm_heatmap_left_tail.png).
  • Figure: dm_heatmap_right_tail (Source: forecasts/benchmark_forecasts.parquet, forecasts/ml_tail_forecasts.parquet; Claim scope: post_24check_cross_suite_fz_dm_diagnostic; File: latex/figures/dm_heatmap_right_tail.png).

Not yet claimed

  • No hedge PnL, transaction-cost, or trading-alpha analysis is performed.
  • Left-tail and right-tail outputs are both economic tail-risk surfaces for futures positions; neither side should be promoted beyond the sample, coverage, and inference gates without author review.
  • The current evidence does not create an automatic model-selection statement; any manuscript claim still requires author review of sample gates, coverage, loss metrics, and inference diagnostics.

5. Figures, Tables, And Source Artifacts

This section merges the former figure/table placement page into the results snapshot. All generated figures and tables are listed with their intended interpretation. The words "supporting" and "diagnostic" describe claim scope; they do not mean the artifact is missing from this page.

5.1 Configuration robustness evidence

Field Value
Source primary run tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4
Scope paper
Primary-claim allowed False
Selected LGBM models LGBM POT-GPD plain MLE, LGBM POT-GPD UniBM block-maxima shape
Selected benchmark models gjr_garch_evt
Selected information sets JP + US close core + JP proxy
Job counts evt_boundary_rows=6, evt_threshold=12, lgbm_capacity=8
Forecast rows 13096
Metric rows 26
Status ok
Sensitivity family Rows / classifications
LGBM capacity 8 rows (robust=8)
POT threshold 18 rows (boundary_diagnostic=6, robust=12)
  • The primary design compares pre-specified point-in-time forecast specifications. Configuration sensitivity is post-24-check robustness evidence and is not used for model selection or the cross-suite FZ DM heatmap.
  • The run is fixed to the post-24-check paper set. LGBM rows perturb capacity only for the pass-all C-information LGBM+EVT families, and POT threshold rows perturb those rows plus GJR-GARCH-EVT.
  • Robustness classes describe conclusion stability versus the registered primary specification. They do not feed DM gates, promoted-model logic, or selected-model figures.

5.2 Generated table manifest

Table Source artifacts Claim scope Tail side File
tailrisk_predictor_block_coverage panel/feature_coverage.parquet main_text_predictor_block_coverage_information_transparency None latex/tables/tailrisk_predictor_block_coverage_table.tex
benchmark_metrics metrics/benchmark_metrics.parquet benchmark_common_sample_metric_table None latex/tables/benchmark_metrics_table.tex
benchmark_left_tail_risk metrics/benchmark_metrics.parquet left_tail_benchmark_risk_table left_tail latex/tables/benchmark_left_tail_risk_table.tex
benchmark_right_tail_risk metrics/benchmark_metrics.parquet right_tail_benchmark_risk_table right_tail latex/tables/benchmark_right_tail_risk_table.tex
ml_tail_metrics metrics/ml_tail_metrics.parquet ml_tail_nested_information_set_table None latex/tables/ml_tail_metrics_table.tex
ml_tail_left_tail_risk metrics/ml_tail_metrics.parquet left_tail_ml_tail_primary_risk_table left_tail latex/tables/ml_tail_left_tail_risk_table.tex
ml_tail_right_tail_risk metrics/ml_tail_metrics.parquet right_tail_ml_tail_primary_risk_table right_tail latex/tables/ml_tail_right_tail_risk_table.tex
tailrisk_model_inventory config/research_config.json, metrics/benchmark_metrics_per_model.parquet, metrics/ml_tail_metrics_per_model.parquet main_text_model_inventory_forecast_construction None latex/tables/tailrisk_model_inventory_table.tex
tailrisk_selected_model_performance metrics/benchmark_metrics_per_model.parquet, metrics/ml_tail_metrics_per_model.parquet selected_benchmark_vs_lgbm_main_figure_rows None latex/tables/tailrisk_selected_model_performance_table.tex
appendix_benchmark_all_models metrics/benchmark_metrics_per_model.parquet appendix_full_benchmark_results None latex/tables/appendix_benchmark_all_models_table.tex
ml_tail_promoted_tail_models metrics/ml_tail_metrics_per_model.parquet, metrics/ml_tail_result_matrix_dm.parquet side_specific_ml_tail_promotion_gate None latex/tables/ml_tail_promoted_tail_models_table.tex
appendix_lgbm_all_models metrics/ml_tail_metrics_per_model.parquet appendix_full_lgbm_results None latex/tables/appendix_lgbm_all_models_table.tex
tailrisk_es_severity metrics/benchmark_metrics.parquet, metrics/ml_tail_metrics.parquet, metrics/ml_tail_metrics_per_model.parquet es_severity_diagnostic_table None latex/tables/tailrisk_es_severity_table.tex
tailrisk_claim_scope manifest.json, config/research_config.json claim_boundary_reference_table None latex/tables/tailrisk_claim_scope_table.tex
ml_tail_result_matrix metrics/ml_tail_result_matrix.parquet restricted_model_comparison_table None latex/tables/ml_tail_result_matrix_table.tex
ml_tail_result_matrix_summary metrics/ml_tail_result_matrix.parquet, metrics/ml_tail_result_matrix_dm.parquet restricted_result_matrix_summary_table None latex/tables/ml_tail_result_matrix_summary_table.tex
tailrisk_dm_summary metrics/ml_tail_result_matrix_dm.parquet main_text_compact_dm_summary None latex/tables/tailrisk_dm_summary_table.tex
appendix_lgbm_configuration_sensitivity sensitivity/metrics/lgbm_configuration_sensitivity_metrics.parquet appendix_configuration_robustness_lgbm None sensitivity/latex/tables/appendix_lgbm_configuration_sensitivity_table.tex
appendix_evt_threshold_sensitivity sensitivity/metrics/evt_threshold_sensitivity_metrics.parquet appendix_configuration_robustness_evt_threshold None sensitivity/latex/tables/appendix_evt_threshold_sensitivity_table.tex
  • The table manifest records the generated LaTeX table files, their source artifacts, and their claim scopes.
  • Tables are paper-facing exports; the Markdown tables above are snapshot summaries for browser review.

5.3 Table interpretation guide

Results/Discussion role Artifact How to read it
Predictor block and coverage tailrisk_predictor_block_coverage_table.tex Data/Methods table showing source families, feature counts, examples, missingness, and model role; coverage is not timestamp admissibility.
Model inventory tailrisk_model_inventory_table.tex Methods table explaining model families, information sets, VaR construction, ES construction, and role; performance belongs elsewhere.
Benchmark floor summary benchmark_metrics_table.tex Results table for target-history and econometric benchmark calibration and loss evidence.
Benchmark tail-side details benchmark_left_tail_risk_table.tex, benchmark_right_tail_risk_table.tex Tail-specific benchmark rows for left and right risk surfaces.
ML information ladder ml_tail_metrics_table.tex Core nested-information-set table for direct LightGBM; read loss changes with coverage gates.
ML tail-side details ml_tail_left_tail_risk_table.tex, ml_tail_right_tail_risk_table.tex Tail-specific direct LightGBM information-set rows.
Selected model performance tailrisk_selected_model_performance_table.tex Deterministic selected-row summary after sample-size, coverage, FZ-loss, and quantile-loss gates.
Promoted tail rows ml_tail_promoted_tail_models_table.tex Locked side-specific promotion-gate rows; not a universal model-family ranking.
Full benchmark scan appendix_benchmark_all_models_table.tex Complete benchmark inventory supporting benchmark breadth.
Full LGBM scan appendix_lgbm_all_models_table.tex Complete per-model LightGBM scan; do not use as a raw leaderboard.
Restricted result matrix ml_tail_result_matrix_table.tex, ml_tail_result_matrix_summary_table.tex Restricted common-sample model-family comparison and summary.
Compact DM summary tailrisk_dm_summary_table.tex Headline paired inference table; negative loss differences favor the candidate.
ES severity tailrisk_es_severity_table.tex Conditional-on-exception severity diagnostic; not standalone model selection.
Claim boundary tailrisk_claim_scope_table.tex Reference table separating headline, restricted, diagnostic, and robustness claims.

Figure 1. Market Timing Design

  • Key readings: the diagram defines JST event timing, the matched U.S.-close cutoff, and the OSE day-open target.
  • OSE schedule note: pre-2024-11-05 hours use day close 15:15 JST and night session 16:30-05:30 JST; from 2024-11-05, JPX uses day close 15:45 JST and night session 17:00-06:00 JST, with day open still 08:45 JST.
  • The OSE night close is timing context; the forecast origin is the matched U.S. cash close plus the data-availability lag.
  • It is a session-alignment schematic, not a structural market-transmission diagram.

market_timing_design

Figure: market_timing_design. Source: manifest.json, config/research_config.json, panel/calendar_map.parquet. Claim scope: design_forecast_origin_not_causal_price_discovery. Tail side: design. Run file: latex/figures/market_timing_design.png.

Figure 2. Opening-Gap Tail Motivation

  • Key readings: the composite figure combines density, log survival, mean-excess, and Hill tail-index diagnostics for the raw opening-gap target.
  • It motivates tail-risk modeling and does not validate any forecast model.

target_tail_motivation

Figure: target_tail_motivation. Source: panel/modeling_panel.parquet. Claim scope: target_distribution_motivation_not_forecast_validation. Tail side: left_right_target_distribution. Run file: latex/figures/target_tail_motivation.png.

Figure 3. Cumulative FZ-Gain Diagnostics

  • Key readings: upward movement means the candidate has lower cumulative FZ loss under the fixed anchor-loss-minus-candidate-loss convention.
  • Each panel uses the corresponding A-only LGBM+EVT forecast as anchor.
  • The GJR-GARCH-EVT line is an own-history benchmark reference; B/C/D lines show same-family information increments.

cumulative_lgbm_a_anchor_fz_gain

Figure: cumulative_lgbm_a_anchor_fz_gain. Source: metrics/benchmark_loss_matrix.parquet, metrics/ml_tail_loss_matrix.parquet, forecasts/benchmark_forecasts.parquet, forecasts/ml_tail_forecasts.parquet. Claim scope: headline_lgbm_a_anchor_gjr_evt_and_information_increment_fz_gain. Tail side: left_right. Run file: latex/figures/cumulative_lgbm_a_anchor_fz_gain.png.

Figure 4. Full Coverage Breach-Rate Diagnostics

  • Key readings: bars report realized VaR exception rates against the nominal line.
  • Read this first: exception-rate deviations set the boundary for any loss-based interpretation.

coverage_breach_rates_left_tail

Figure: coverage_breach_rates_left_tail. Source: metrics/benchmark_metrics.parquet, metrics/benchmark_metrics_per_model.parquet, metrics/ml_tail_metrics.parquet, metrics/ml_tail_metrics_per_model.parquet. Claim scope: coverage_diagnostic_not_primary_claim. Tail side: left_tail. Run file: latex/figures/coverage_breach_rates_left_tail.png.

coverage_breach_rates_right_tail

Figure: coverage_breach_rates_right_tail. Source: metrics/benchmark_metrics.parquet, metrics/benchmark_metrics_per_model.parquet, metrics/ml_tail_metrics.parquet, metrics/ml_tail_metrics_per_model.parquet. Claim scope: coverage_diagnostic_not_primary_claim. Tail side: right_tail. Run file: latex/figures/coverage_breach_rates_right_tail.png.

Figure 5. Selected Benchmark-vs-LGBM Performance

  • Key readings: compact main-figure rows split models into two broad groups, Benchmark and LGBM.
  • Within each tail and group, rows are selected by sufficient sample size, VaR coverage near 5%, then lower FZ loss and quantile loss.
  • Full benchmark and LGBM per-model results are exported in full-result tables, so this figure is a readable summary rather than the full result set.

selected_model_performance_left_tail

Figure: selected_model_performance_left_tail. Source: metrics/benchmark_metrics_per_model.parquet, metrics/ml_tail_metrics_per_model.parquet. Claim scope: selected_benchmark_vs_lgbm_main_figure_not_full_result_set. Tail side: left_tail. Run file: latex/figures/selected_model_performance_left_tail.png.

selected_model_performance_right_tail

Figure: selected_model_performance_right_tail. Source: metrics/benchmark_metrics_per_model.parquet, metrics/ml_tail_metrics_per_model.parquet. Claim scope: selected_benchmark_vs_lgbm_main_figure_not_full_result_set. Tail side: right_tail. Run file: latex/figures/selected_model_performance_right_tail.png.

Figure 6. Full-Sample VaR Overlay Diagnostics

  • Key readings: full-sample overlays show realized loss against a fixed benchmark-comparator VaR and the locked side-specific promoted ML-tail VaR.
  • The benchmark line uses GJR-GARCH-EVT with GJR-GARCH-t fallback; the ML line is not selected by inspecting this plot.
  • Treat the plot as a visual diagnostic. Formal validation remains the coverage, loss, DM, Murphy, and EVT evidence.

full_sample_var_overlay_left_tail

Figure: full_sample_var_overlay_left_tail. Source: forecasts/benchmark_forecasts.parquet, forecasts/ml_tail_forecasts.parquet. Claim scope: full_sample_var_overlay_fixed_selection_visual_diagnostic. Tail side: left_tail. Run file: latex/figures/full_sample_var_overlay_left_tail.png.

full_sample_var_overlay_right_tail

Figure: full_sample_var_overlay_right_tail. Source: forecasts/benchmark_forecasts.parquet, forecasts/ml_tail_forecasts.parquet. Claim scope: full_sample_var_overlay_fixed_selection_visual_diagnostic. Tail side: right_tail. Run file: latex/figures/full_sample_var_overlay_right_tail.png.

Figure 7. VaR/ES Stress-Window Overlays

  • Supporting diagnostic: stress-window overlays illustrate threshold behavior in broad OOS stress episodes with left/right tails sharing each episode's x-axis.
  • The LGBM overlays use information set C, the best-FZ row within the two 24-check LGBM+EVT families.
  • They do not report hedge PnL, transaction-cost evidence, or trading performance.

var_es_stress_overlay_2024_stress_episode

Figure: var_es_stress_overlay_2024_stress_episode. Source: forecasts/benchmark_forecasts.parquet, forecasts/ml_tail_forecasts.parquet. Claim scope: appendix_stress_overlay_illustration_not_validation. Tail side: left_right_tail. Run file: latex/figures/var_es_stress_overlay_2024_stress_episode.png.

var_es_stress_overlay_2025_stress_episode

Figure: var_es_stress_overlay_2025_stress_episode. Source: forecasts/benchmark_forecasts.parquet, forecasts/ml_tail_forecasts.parquet. Claim scope: appendix_stress_overlay_illustration_not_validation. Tail side: left_right_tail. Run file: latex/figures/var_es_stress_overlay_2025_stress_episode.png.

Figure 8. DM Heatmaps

  • Supporting diagnostic: heatmap cells report pairwise FZ-loss differences and one-sided DM p-values for the pass-all cross-suite model set.
  • Rows are candidates, columns are anchors, and negative candidate-minus-anchor differences favor the row model.
  • Each tail uses a strict global common sample across GJR-GARCH-EVT, LGBM plain MLE C, and LGBM UniBM C.

dm_heatmap_left_tail

Figure: dm_heatmap_left_tail. Source: forecasts/benchmark_forecasts.parquet, forecasts/ml_tail_forecasts.parquet. Claim scope: post_24check_cross_suite_fz_dm_diagnostic. Tail side: left_tail. Run file: latex/figures/dm_heatmap_left_tail.png.

dm_heatmap_right_tail

Figure: dm_heatmap_right_tail. Source: forecasts/benchmark_forecasts.parquet, forecasts/ml_tail_forecasts.parquet. Claim scope: post_24check_cross_suite_fz_dm_diagnostic. Tail side: right_tail. Run file: latex/figures/dm_heatmap_right_tail.png.

Figure 9. Benchmark Murphy Diagnostics

  • Key readings: curves report target-history benchmark elementary-score diagnostics on a common grid.
  • The plot is a scoring-family diagnostic, not a pairwise ranking statement.

benchmark_murphy_left_tail

Figure: benchmark_murphy_left_tail. Source: metrics/benchmark_murphy.parquet. Claim scope: murphy_diagnostic_benchmark_baseline_common_grid. Tail side: left_tail. Run file: latex/figures/benchmark_murphy_left_tail.png.

benchmark_murphy_right_tail

Figure: benchmark_murphy_right_tail. Source: metrics/benchmark_murphy.parquet. Claim scope: murphy_diagnostic_benchmark_baseline_common_grid. Tail side: right_tail. Run file: latex/figures/benchmark_murphy_right_tail.png.

Figure 10. 24-Check LGBM Murphy Diagnostics

  • Key readings: curves report only the LGBM families that pass the full tail-by-information-set calibration screen.
  • Interpret curve separation as scoring-family sensitivity evidence, not as a standalone model-selection rule.

lgbm_24check_murphy_left_tail

Figure: lgbm_24check_murphy_left_tail. Source: metrics/lgbm_24check_murphy.parquet, metrics/ml_tail_metrics_per_model.parquet, forecasts/ml_tail_forecasts.parquet. Claim scope: murphy_diagnostic_lgbm_24check_robust_ladder. Tail side: left_tail. Run file: latex/figures/lgbm_24check_murphy_left_tail.png.

lgbm_24check_murphy_right_tail

Figure: lgbm_24check_murphy_right_tail. Source: metrics/lgbm_24check_murphy.parquet, metrics/ml_tail_metrics_per_model.parquet, forecasts/ml_tail_forecasts.parquet. Claim scope: murphy_diagnostic_lgbm_24check_robust_ladder. Tail side: right_tail. Run file: latex/figures/lgbm_24check_murphy_right_tail.png.

Figure 11. ES Severity Diagnostics

  • Key readings: bars report conditional-on-exception severity diagnostics.
  • Severity is reported for risk interpretation but is not a standalone model-selection claim.

es_severity_left_tail

Figure: es_severity_left_tail. Source: metrics/benchmark_metrics.parquet, metrics/ml_tail_metrics.parquet, metrics/ml_tail_metrics_per_model.parquet. Claim scope: es_severity_diagnostic_not_model_selection_claim. Tail side: left_tail. Run file: latex/figures/es_severity_left_tail.png.

es_severity_right_tail

Figure: es_severity_right_tail. Source: metrics/benchmark_metrics.parquet, metrics/ml_tail_metrics.parquet, metrics/ml_tail_metrics_per_model.parquet. Claim scope: es_severity_diagnostic_not_model_selection_claim. Tail side: right_tail. Run file: latex/figures/es_severity_right_tail.png.

5.4 Source artifact index

Artifact Path Exists
manifest reports/runs/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/manifest.json yes
data_vintage reports/runs/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/data_vintage.json yes
modeling_panel /Volumes/ExternalSSD/data/n225-open-gap-tail/gold/tp/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/modeling_panel.parquet yes
target_audit /Volumes/ExternalSSD/data/n225-open-gap-tail/gold/tp/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/target_audit.parquet yes
calendar_map /Volumes/ExternalSSD/data/n225-open-gap-tail/gold/tp/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/calendar_map.parquet yes
feature_coverage /Volumes/ExternalSSD/data/n225-open-gap-tail/gold/tp/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/feature_coverage.parquet yes
leakage_summary /Volumes/ExternalSSD/data/n225-open-gap-tail/gold/ls/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/summary.json yes
benchmark_status reports/runs/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/metrics/benchmark_status.json yes
benchmark_metrics reports/runs/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/metrics/benchmark_metrics.parquet yes
benchmark_metrics_per_model reports/runs/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/metrics/benchmark_metrics_per_model.parquet yes
benchmark_forecasts reports/runs/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/forecasts/benchmark_forecasts.parquet yes
benchmark_dm_inference reports/runs/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/metrics/benchmark_dm_inference.parquet yes
ml_tail_status reports/runs/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/metrics/ml_tail_status.json yes
ml_tail_metrics reports/runs/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/metrics/ml_tail_metrics.parquet yes
ml_tail_metrics_per_model reports/runs/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/metrics/ml_tail_metrics_per_model.parquet yes
ml_tail_forecasts reports/runs/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/forecasts/ml_tail_forecasts.parquet yes
ml_tail_result_matrix reports/runs/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/metrics/ml_tail_result_matrix.parquet yes
ml_tail_result_matrix_dm reports/runs/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/metrics/ml_tail_result_matrix_dm.parquet yes
ml_tail_dm_inference reports/runs/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/metrics/ml_tail_dm_inference.parquet yes
ml_tail_model_eviction reports/runs/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/metrics/ml_tail_model_eviction.parquet yes
lgbm_24check_murphy reports/runs/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/metrics/lgbm_24check_murphy.parquet yes
ml_tail_feature_unavailability reports/runs/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/metrics/ml_tail_feature_unavailability.parquet yes
benchmark_stress_windows reports/runs/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/metrics/benchmark_stress_windows.parquet yes
ml_tail_stress_windows reports/runs/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/metrics/ml_tail_stress_windows.parquet yes
figure_manifest reports/runs/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/latex/figure_manifest.json yes
table_manifest reports/runs/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/latex/table_manifest.json yes
latex_dir reports/runs/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/latex/tables yes
claim_scope_table reports/runs/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/latex/tables/tailrisk_claim_scope_table.tex yes
es_severity_table reports/runs/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/latex/tables/tailrisk_es_severity_table.tex yes
result_matrix_summary_table reports/runs/tailrisk_20160719_20260522_20260527T083659Z_commit_7f628ff4/latex/tables/ml_tail_result_matrix_summary_table.tex yes
  • All paths above are local ignored artifacts; they are reproducible outputs, not tracked source files.
  • Forecast/reporting rebuilds should read these artifacts and must not call vendor APIs.
  • If this page is stale, rerun just snapshot after a completed just full or pass an explicit run id to the CLI snapshot command.