Research Design
Working title:
From Restatements to Public Review and Correction: Label Observability and the Public Reporting-Risk Cascade
- Research setting. Traditional misstatement and restatement benchmarks label firm-years after misconduct is detected, disclosed, or made publicly visible.
- Measurement problem. These labels are useful, but they mix the underlying reporting problem with discovery probability, disclosure delay, and selective public observability.
- Empirical object. The paper studies a filing-native public reporting-risk cascade built from SEC and PCAOB data.
- Primary target. The cascade measures public review-and-correction outcomes from the filing origin, not unobserved fraud occurrence.
Research Question and Contribution
- Core question. Can reporting-risk prediction be reframed from ex post detected misconduct to filing-origin public review-and-correction risk?
- Timing contamination. Static detected-misstatement labels mix occurrence, discovery, disclosure lag, and public visibility.
- Main contribution. The intended contribution is a measurement redesign, not a claim that a new classifier performs better than prior fraud-prediction models.
- Filing-origin estimand. The repo defines a filing-origin public reporting-risk estimand based only on information visible at or before
origin_date. - Construct claim. The public cascade is expected to be related to, but not identical with, legacy detected-misstatement labels.
- Peer comparison boundary. Peer models and metrics are used for compatibility checks; comparisons provide metric-compatible ranking evidence, not same-estimand performance rankings.
- Model-family boundary. The legacy benchmark suite and the public-label transfer suite use the same Dechow, Perols, Bao, and Bertomeu model-family vocabulary. Mapping quality determines whether Dechow/Bao adapters can use stronger names or must be reported as mapped or inspired variants. Public peer transfer runs only in
fullmode so the default workflow stays bounded. - Evidence requirement. Credible bridge-based overlap validation is required before any integrated old-benchmark/public-cascade claim.
Design Overview
flowchart LR
subgraph LEGACY["Legacy benchmark diagnostic: external benchmark, not SEC/PCAOB public lake"]
L0["Input panel<br/>gvkey x data_year<br/>2001-2019<br/>legacy accounting, audit, governance,<br/>market, and industry predictors"]
L1["Legacy X<br/>engineered benchmark predictors<br/>exclude ids, labels, res_an* timing proxies,<br/>and post-outcome fields"]
L2["Legacy Y<br/>detected misstatement firm-year<br/>naive, proxy_drop_observed,<br/>proxy_imputed_lag 1/2/3/5y<br/>external_timing only if validated dates exist"]
L3["Legacy prediction loops<br/>annual out-of-time test years<br/>rolling_5y, rolling_7y, rolling_10y, expanding<br/>core benchmark plus legacy peer suite<br/>Dechow / Perols / Bao / Bertomeu families"]
L4["Legacy metrics<br/>PR-AUC relative to prevalence, ROC-AUC,<br/>Brier/BSS, ECE, top-50/100/200 precision,<br/>Bao top-fraction precision, sensitivity, BAC, NDCG"]
L5["Legacy interpretation<br/>label observability, timing fragility,<br/>concept drift, missingness diagnostics<br/>not unobserved fraud occurrence"]
L0 --> L1
L0 --> L2
L1 --> L3
L2 --> L3
L3 --> L4
L4 --> L5
end
subgraph PUBLIC["Public filing-origin cascade: SEC/PCAOB public information set"]
P0["Public inputs<br/>EDGAR filings, FSDS/XBRL, Notes summaries,<br/>comment letters, amendments, 8-K Item 4.02,<br/>PCAOB Form AP, PCAOB inspections, AAER support<br/>public sample 2011-2023, as-of 2026-04-23"]
P1["Parquet public lake<br/>Bronze source cache<br/>Silver normalized event and fact tables<br/>Gold filing_origin_panel and issuer_origin_panel"]
P2["Public modeling grain<br/>issuer_cik x fiscal_year<br/>origin_date is selected annual filing date<br/>features visible at or before origin_date"]
P3["Public X<br/>metadata, XBRL ratios, auditor, oversight, all<br/>rolling public history requires event_date < origin_date<br/>exclude source_available_*, public_date_*, vintage_* fields"]
P4["Public Y<br/>label_comment_thread_365: SEC comment-thread scrutiny<br/>label_amendment_365: amended filing or filing friction<br/>label_8k_402_365: Item 4.02 non-reliance<br/>label_aaer_proxy_730: sparse high-severity AAER support"]
P5["Public prediction loops<br/>annual out-of-time fiscal-year tests<br/>rolling/expanding windows from earlier years only<br/>core cascade model plus public-label peer suite<br/>same Dechow / Perols / Bao / Bertomeu families<br/>aaer_proxy status-only when sparse"]
P6["Public metrics<br/>same metric vocabulary as benchmark where defined<br/>PR-AUC vs prevalence, ROC-AUC, Brier/BSS, ECE,<br/>top-50/100/200 precision, top-decile lift,<br/>Bao-style top-fraction metrics"]
P7["Public interpretation<br/>filing-origin review-and-correction risk signal<br/>feature-family value and model-family transfer evidence<br/>not a performance ranking on the legacy fraud-prediction estimand"]
P8["Public-label opacity DML<br/>missingness_density_score to public labels<br/>cross-fitted nuisance models<br/>adjusted association, not causal effect"]
P0 --> P1
P1 --> P2
P2 --> P3
P2 --> P4
P3 --> P5
P4 --> P5
P5 --> P6
P6 --> P7
P3 --> P8
end
subgraph VALIDATION["Bridge and interpretation layer"]
V0["Bridge gate<br/>gvkey-CIK-year crosswalk<br/>farr candidate bridge now available<br/>WRDS-quality bridge preferred for final validation"]
V1["Construct-overlap checks<br/>matched 2011-2019 old/public sample<br/>legacy labels vs public labels<br/>public scores rank legacy positives<br/>legacy scores rank public labels<br/>event-time concentration and AAER high-severity support"]
V2["Final claim boundary<br/>public cascade is related to,<br/>but not identical with,<br/>legacy detected-misstatement labels<br/>candidate validation until WRDS-equivalent bridge"]
V3["If bridge validation is incomplete<br/>report public-cascade measurement result<br/>without unobserved-fraud or same-estimand performance claim"]
V0 --> V1
V1 --> V2
V0 -.-> V3
end
L5 --> V0
P7 --> V0
P8 --> V0
Prior Literature and Positioning
- Literature role. Prior work supplies model families, performance metrics, and construct anchors.
- Estimand shift. Prior fraud and restatement studies often predict detected ex post misconduct labels; this paper predicts subsequent public review-and-correction events from a filing-origin information set.
- Metric-compatible comparison. Metric-compatible comparison is evidence about ranking behavior under a shared scoring language, not evidence that the tasks share the same estimand.
| Stream | Canonical anchors | Typical models and metrics | Role in this paper |
|---|---|---|---|
| Detected misstatement and fraud prediction | Dechow, Ge, Larson, and Sloan (2011); Perols (2011); Bao, Ke, Li, Yu, and Zhang (2020); Bertomeu, Cheynel, Floyd, and Pan (2021), "Using Machine Learning to Detect Misstatements" | Logistic/F-score models, SVM, decision trees, bagging, stacking, neural nets, and tree ensembles; AUC, classification rates, lift, variable importance, and top-fraction ranking metrics | Supplies the benchmark peer suite: Dechow-style scores, a Perols-style legacy model zoo, Bao-style top-fraction balanced accuracy and NDCG, and Bertomeu-style XGBoost feature importance. |
| Partial observability and hidden misconduct | Barton, Burnett, Gunny, and Miller (2024); Dyck, Morse, and Zingales (2024) | Occurrence/detection separation, hidden misconduct estimation, likelihood and coefficient evidence | Motivates the estimand shift; these models are not PR-AUC comparators for the current design. |
| SEC comment-letter and disclosure-review research | Cassell, Cunningham, and Myers (2013); Bozanic, Dietrich, and Johnson (2018); Brown, Tian, and Tucker (2018); the SEC filing review process | Regression-style evidence on comment receipt, remediation, and disclosure response | Establishes public comment-letter scrutiny as economically meaningful; this paper embeds it as one public-cascade outcome rather than the sole endpoint. This stream supplies regression-style evidence rather than direct ranking-score comparators. |
| Public regulatory and structured-data sources | SEC Item 4.02 guidance; SEC AAER pages; PCAOB Form AP; SEC Inline eXtensible Business Reporting Language (XBRL) | Public filing events, audit-participant data, oversight data, and standardized financial facts | Supplies the filing-native public lake and reproducible feature construction. AAER is a rare high-severity enforcement indicator, not a complete enforcement universe. |
- Positioning. The paper aligns the prediction target to the observable public process.
- Benchmark role. The legacy benchmark remains a disciplined diagnostic for timing sensitivity, label observability, concept drift, and missingness.
- Overlap role. The bridge tests where the public cascade agrees or disagrees with detected-misstatement labels.
Measurement Design
Legacy Benchmark Labels
- Unit. The legacy benchmark panel uses
gvkey,data_year, andmisstatement firm-year. - Purpose. It tests whether traditional restatement prediction is sensitive to timing, drift, and missingness.
- Label modes.
naive: the observedmisstatement firm-yearlabel without detection-timing adjustment.proxy_drop_observed: a coverage stress test using sparse same-rowres_an*timing proxies, excluding positives without usable timing evidence.proxy_imputed_lag: a timing-assumption grid assigning unknown positives one-, two-, three-, or five-year detection lags.external_timing: the paper-grade benchmark maturation target, available only if validated public restatement or detection dates are supplied.- Leakage rule.
res_an0,res_an1,res_an2, andres_an3are timing proxies only and never enter predictors. - Required reporting.
timing_coverage.csvmust report same-row timing coverage, unknown positives, retained-positive share, and class-balance changes.
Public Review-and-Correction Labels
- Outcome design. The public cascade is a multi-label outcome system, not a deterministic hierarchy.
- Public labels.
label_comment_thread_365: public comment-letter scrutiny, measured from the first public EDGAR date of the comment-thread sequence; source: SEC filing review process and public EDGAR correspondence.label_amendment_365: broad amendment/friction signal, including administrative amendments, filing friction, and potentially material corrections; source: SEC EDGAR filing access and amended filing form metadata.label_8k_402_365: Item 4.02 non-reliance and material-correction proxy; source: SEC Form 8-K, Item 4.02.label_aaer_proxy_730: rare high-severity AAER indicator, fit only as robustness when positives are sufficient; source: SEC Accounting and Auditing Enforcement Releases andfarr::aaer_*support data.- Co-occurrence rule. A later-stage positive does not mechanically force an earlier-stage label.
- Construct meaning. These labels are not alternative names for fraud. They are public observability states: regulatory scrutiny (
comment_thread), filing correction or friction (amendment), material non-reliance (8k_402), and rare public enforcement-tail evidence (aaer_proxy). - Target distinction. The target is public review-and-correction risk rather than unobserved fraud occurrence.
Timing and Censoring Rules
- Origin date. In the current v1 panel,
filing_origin_panel.origin_date = filing_date, andissuer_origin_panel.origin_dateis the selected annual filing date for the issuer-year. - No post-origin leakage. No event released after
origin_datemay enter predictors. - Excluded coverage fields.
source_available_*,public_date_*,vintage_*, andas_of_datedocument source availability and public vintages but are excluded from default predictors. - Censoring. 365-day outcomes use
censored_365; AAER 730-day robustness usescensored_730.
Claim Boundaries
The public-cascade design supports evidence about a public reporting-risk state. It does not by itself identify unobserved fraud occurrence, causal effects, or a stable enforcement-prediction result.
Comment letters are public scrutiny signals, not the full SEC review universe. AAER is a rare high-severity enforcement indicator, not a complete enforcement universe.
Bridge validation is mandatory for an integrated claim that the public cascade and the legacy benchmark measure related but non-identical constructs. Without that validation, the public-cascade result remains a public-data measurement result rather than a validated fraud/restatement overlap paper.
Data and Feature Construction
Reproduction Inputs
- Operational inputs. A reproducible run needs the legacy benchmark file, the public SEC/PCAOB lake configuration, and the optional bridge/support exports:
$DATA_DIR/raw_dataset_misstatement.parquetfor the legacygvkey x data_yearbenchmark.config/public_data.yamlandconfig/study.yamlfor public-source and study defaults.$DATA_DIR/external/gvkey_cik_year.csvfor bridge validation, generated from WRDS when available or from farr as the current candidate bridge.$DATA_DIR/external/farr_aaer_firm_year.csv,$DATA_DIR/external/farr_aaer_dates.csv, and$DATA_DIR/external/farr_state_hq.csvfor AAER and headquarters-state support.- Public-data run. The current paper-facing public lake is built with
storage_format=parquet,notes_mode=summary, DuckDB, and as-of date2026-04-23. The stable full command is:
just full mode=full dataset=raw
- Peer and overlap run. The peer-enabled study is a separate run so the default workflow stays bounded:
just task study raw artifacts/full_with_peer \
extra="--peer-comparison-mode full --peer-target both --parallel-jobs 4 --model-threads 2 --seed-policy task-isolated"
just snapshot
just manuscript
Legacy Benchmark Panel
- File.
$DATA_DIR/raw_dataset_misstatement.parquet. - Grain.
gvkey x data_year. - Coverage. 2001-2019.
- Required fields.
gvkey,data_year,misstatement firm-year,res_an0tores_an3,missing_*flags, and accounting/audit/governance/market/industry predictors. - Predictor surface. Legacy predictors are the engineered benchmark columns already present in the raw benchmark panel.
gvkey,data_year, target columns, timing proxy columns, missingness labels, and post-outcome fields are not treated as ordinary predictors. - Label construction. The benchmark label modes are constructed from
misstatement firm-yearand timing proxy availability: naive: the final detected-misstatement indicator.proxy_drop_observed: positives without usable same-rowres_an*timing evidence are excluded from proxy-visible positives.proxy_imputed_lag: unknown positive timing is shifted by a one-, two-, three-, or five-year assumed detection lag.external_timing: reserved for validated restatement or detection dates when supplied.- Evaluation grain. All benchmark prediction rows remain annual firm-year rows evaluated by out-of-time test year.
- Limitation. No CIK, ticker, PERMNO, restatement filing date, detector identity, or complete public filing history.
Public SEC/PCAOB Lake
- Storage design.
$DATA_DIR/public_lake/is organized as bronze, silver, and gold. - Bronze. Downloaded public files with source URL, timestamp, SHA256 hash, parser version, schema version, and as-of date.
- Silver. Normalized issuer, filing, XBRL, Notes, comment-thread, correction, Form AP, PCAOB inspection, and AAER proxy tables; large Silver tables are Parquet-first.
- Gold.
issuer_origin_panel.parquetandfiling_origin_panel.parquet. - DuckDB path. The default DuckDB path uses SQL for XBRL core-tag pivoting, label-horizon joins, and Parquet output on the annual issuer-year modeling panel.
- Filing-origin provenance. The full filing-origin panel is retained as a lightweight, year-sharded provenance panel rather than a fully labeled 20M-row modeling table.
- Required v1 sources. SEC submissions, SEC Financial Statement Data Sets (FSDS), SEC
UPLOADandCORRESP, 10-K/A and 10-Q/A amendments, 8-K Item 4.02, PCAOB Form AP, PCAOB inspection datasets, and SEC AAER pages. - Main public sample. Domestic U.S. GAAP issuer-years from 2011-2023, with
2026-04-23as the current reproducibility as-of date. - Source-to-table mapping.
- SEC submissions and filing index data form
filing_dim.parquet,issuer_dim.parquet,filing_origin_panel.parquet, and the annualissuer_origin_panel.parquet. - FSDS/XBRL
subandnumfiles formfiling_xbrl_dim.parquet,xbrl_fact_summary.parquet, andxbrl_core_fact/. - SEC Notes are normalized in summary mode into
notes_filing_dim.parquetandnote_summary.parquet; raw text blobs are not part of the default paper-facing run. - SEC
UPLOADandCORRESPproducecomment_thread.csv.gzwith first public correspondence dates. - 10-K/A and 10-Q/A filings, explanatory notes, and form-level filters produce
correction_event.csv.gzandamendment_annotation.csv.gz. - 8-K Item 4.02 parsing produces
issuer_8k_item_event.csv.gz. - PCAOB Form AP and inspection sources produce auditor and oversight features in the Silver/Gold panels.
- SEC AAER pages and farr AAER support produce high-severity event tables; these are support evidence, not a complete enforcement universe.
- Origin-date rule.
filing_origin_panel.origin_date = filing_date.issuer_origin_panel.origin_dateis the selected annual filing date for the issuer-year. All public prediction features must be observed at or before this date.
Public Labels and Censoring
- Public-label grain. Public labels are attached to the annual
issuer_cik x fiscal_year x origin_dateissuer-year row. - Forward horizons.
label_comment_thread_365 = 1if a public SEC comment-letter thread is first observed afterorigin_dateand within 365 days.label_amendment_365 = 1if a qualifying amended-filing or correction/friction event appears afterorigin_dateand within 365 days.label_8k_402_365 = 1if an 8-K Item 4.02 non-reliance event appears afterorigin_dateand within 365 days.label_aaer_proxy_730 = 1if an AAER support event matches the issuer afterorigin_dateand within 730 days.- Censoring. Horizon-specific censoring flags remove issuer-years whose outcome window extends beyond the as-of date. A 365-day label uses 365-day censoring;
aaer_proxyuses 730-day censoring. - AAER status. AAER is retained as a high-severity enforcement descriptor or robustness signal only when positives are sufficient; sparse AAER folds are reported as blockers rather than headline model failures.
Bridge and External Validation Inputs
- Bridge file.
$DATA_DIR/external/gvkey_cik_year.csv. - Required fields.
gvkey,issuer_cik, a single year or start/end years, and provenance fields such as source, version, extraction date, match method, and match score. - Bridge grain. Bridge validation maps legacy
gvkey x data_yearrows to publicissuer_cik x fiscal_yearrows. It must report coverage, multiplicity, high-confidence and ambiguous matches, and unmatched diagnostics before overlap evidence is interpreted. - WRDS-preferred route.
set -a; source .env; set +a
uv run python scripts/prepare_gvkey_cik_crosswalk.py \
--input path/to/wrds_cik_gvkey_link.csv \
--out "$DATA_DIR/external/gvkey_cik_year.csv"
just task bridge raw artifacts/full_with_peer/bridge_probe
uv run python scripts/run_construct_overlap.py --study-dir artifacts/full_with_peer
- Public candidate route while WRDS access is pending.
bash scripts/prepare_farr_gvkey_cik_bridge.sh --install-missing
bash scripts/prepare_farr_support_data.sh --install-missing
- Current candidate bridge.
farr::gvkey_ciksis the working high-coverage candidate bridge; it must be reported with coverage and multiplicity tables and should not be described as WRDS-verified. - Validation tier. Construct-overlap outputs infer
validation_tierfrom normalized crosswalk provenance: farr exports remaincandidate_farr, WRDS/Compustat provenance iswrds_validated, and mixed-source bridges remain candidate evidence. - AAER support.
farr::aaer_firm_yearandfarr::aaer_datesare external AAER validation anchors. - Metadata support.
farr::state_hqis a date-bounded headquarters-state metadata control. - Missing bridge behavior. If no usable external crosswalk exists, the bridge probe must report
raw_identifier_blockerrather than infer links from benchmark identifiers alone.
Feature Families
- Common sample rule. Feature families use the same filtered issuer-year sample for fair ablations.
- Missing-value rule. Tree models with native missing-value handling retain numeric
np.nan; non-tree adapters use fold-internal imputation only when required. - Excluded columns. Label, censoring, identifier, source-availability, public-date, and vintage columns are excluded by default.
- Metadata. SIC, form, SEC submissions
entityType, filing size, XBRL flags, prior filing count, days since prior filing, and headquarters-state controls when available. - Filing friction and public history. Current-cycle NT status and amendment friction, plus strictly pre-origin rolling counts and recency for prior NT filings, comment threads, amendments, and 8-K instability items. Rolling public-history features must use only events with
event_date < origin_date. - XBRL.
xbrl_ratio_*andxbrl_coverage_*features from controlled core tags, including size, leverage, profitability, working capital, receivables, inventory, cash, debt, operating cash flow, and year-over-year revenue/assets changes. - Auditor and oversight. PCAOB Form AP fields, engagement-partner exposure, and PCAOB inspection features in their public source windows.
- Note opacity. Note count, note character count, note-tag coverage, and tag entropy as a disclosure breadth measure.
- Leakage exclusions.
source_available_*,public_date_*,vintage_*,as_of_date, accession identifiers, CIK/GVKEY identifiers, labels, censoring flags, and direct event-date fields document provenance and timing but are not default predictors. - Fold-local transformations. Imputation, scaling, and any model-specific preprocessing are fit inside the training fold and then applied to the held-out fiscal year.
- Public peer mapping. Public peer transfer reuses Dechow/Perols/Bao/Bertomeu model-family language. Dechow and Bao labels are reported as fixed, mapped, or inspired variants according to mapping quality; public Bao transfer uses
public_issuer_origininput and is thereforebao_inspired_tree_ensemble, not a Bao raw-accounting-number replication. - Deferred extensions. Proxy-governance content, SEC insider-pressure features, macro-vintage controls, auditor-firm public-status fields, and broader security/attention layers are useful extensions, not required for the current v1 paper claim.
Empirical Design
Evaluation Metrics
- Predictive metrics. PR-AUC, ROC-AUC, Brier score, Brier Skill Score, expected calibration error, top-50/100/200 precision, and Bao-style top-fraction metrics.
- Bao-style metrics. Top-fraction precision, sensitivity, specificity, balanced accuracy, and binary-relevance NDCG@k.
- Calibration. Calibration metrics are diagnostic under class imbalance and resampling.
- Prevalence.
Prevalenceis the positive-class rate in the evaluated sample and the natural random-ranking baseline for PR-AUC. - PR-AUC interpretation. When positives are rare, a numerically small PR-AUC can still represent meaningful lift over the base rate.
- ROC-AUC contrast. ROC-AUC has a fixed random baseline near 0.5 and can look much larger than PR-AUC in rare-event settings.
- Split design. Prediction experiments use annual out-of-time evaluation, not random cross-validation.
- Training windows. For a given test year, training uses earlier years only, with expanding or rolling 5-, 7-, and 10-year windows.
- DML separation. Cross-fitting appears separately in Double / Debiased Machine Learning (DML) opacity diagnostics; it is not the train/test split used for headline prediction tables.
Experiment 1: Label Observability and Detection Timing
- Purpose. Quantify how sensitive traditional restatement evaluation is to timing coverage and unknown-positive assumptions.
- Design. Run annual out-of-time benchmark backtests across expanding, rolling 5-year, rolling 7-year, and rolling 10-year windows; compare
naive,proxy_drop_observed, andproxy_imputed_laglabels. - Outputs.
rolling_metrics.csv,rolling_predictions.parquet,timing_coverage.csv,timing_summary.json,timing_claim_status, and window summaries. - Interpretation. This is a benchmark-validity diagnostic; a decline under
proxy_drop_observedis timing-observability sensitivity, not proof of look-ahead bias by itself.
Experiment 2: Concept Drift and Model Shelf-Life
- Purpose. Estimate whether reporting-risk models trained in one regime remain useful in later regimes.
- Design. Compare rolling and expanding windows over test years; track feature-family importance; report pre/post diagnostics around major regulatory and data-regime breakpoints.
- Outputs. Annual metrics, window summaries, structural-break diagnostics, and feature-family importance.
- Interpretation. The experiment supports model shelf-life and retraining-window evidence; it does not establish structural causality from predictive drift alone.
Experiment 3: Opacity and Public Review/Correction Risk
- Purpose. Test whether pre-origin opacity and missingness profiles predict later public scrutiny or correction.
- Design. Construct missingness-density and missing-profile indicators; estimate Double / Debiased Machine Learning (DML) partially linear regressions on public labels.
- Primary outcomes.
label_comment_thread_365,label_amendment_365, andlabel_8k_402_365. - Treatment-like variable.
D = missingness_density_score. - Controls.
X = pre-origin metadata, XBRL, filing-friction, public-history, auditor, oversight, note-opacity, and calendar controls. - Outputs. Missing-profile clusters, public-label PLR spec results, nuisance-model metadata, and diagnostic benchmark-side DML outputs.
- Interpretation. Coefficients are adjusted associations, not causal effects; the old
misstatement firm-yearoutcome remains a legacy diagnostic only.
Experiment 4: Public Cascade Construction
- Purpose. Demonstrate that public data can support a defensible review-and-correction cascade.
- Design. Build the public lake from SEC/PCAOB sources; construct labels from first public dates; report source coverage, event rates, censoring, and task readiness.
- Outputs. Source coverage tables, event-rate tables, censoring summaries, public-lake metadata, and task-positive counts.
- Interpretation. This experiment validates the measurement surface; AAER remains descriptive high-severity evidence unless robust positive counts are available.
Experiment 5: Public Cascade Prediction
- Purpose. Estimate the pre-disclosure public reporting-risk state from public features.
- Design. Use
issuer_origin_panelto predict comment-thread scrutiny, broad amendment/friction, and 8-K Item 4.02 outcomes; run feature-family ablations over metadata, XBRL, auditor, oversight, and all-feature sets. - Skip rule. Skip task/family/window fits with one-class train or test labels.
- Outputs.
public_cascade_metrics.csv,public_cascade_predictions.parquet,public_cascade_task_status.csv,public_cascade_summary.md, andpublic_opacity_dml.csv. - Interpretation. Full public-cascade claims require non-metadata features;
metadata_baselineis a readiness state,xbrl_ratio_baselineis the first non-metadata empirical baseline, and sparse AAER folds are blockers rather than failed headline models.
Experiment 6: Old Benchmark and Public Cascade Overlap
- Purpose. Test whether legacy detected-misstatement labels and public review-and-correction labels measure related but non-identical constructs.
- Design. Run the bridge probe, report coverage and multiplicity, then test event-time concentration and reciprocal risk-score alignment in the mapped sample.
- Current bridge. The current implementation uses farr
gvkey_ciksas a high-coverage candidate bridge; WRDS remains the preferred final validation source. - Outputs.
bridge_probe_summary.json,coverage_report.csv,multiplicity_report.csv,unmatched_raw_characteristics.csv,construct_overlap/label_contingency_lift.csv,construct_overlap/public_score_legacy_ranking.csv,construct_overlap/reciprocal_alignment.csv,construct_overlap/event_time_concentration.csv, and AAER high-severity support tables. - Interpretation. This is the integrated-paper gate; the farr bridge supports candidate overlap validation, and WRDS remains preferred before final manuscript claims.
Evidence Gates
| Component | Current status | Gate before paper claim |
|---|---|---|
| Benchmark timing | implemented as observability sensitivity | report timing_coverage.csv, retained positives, and imputed-lag scenarios; external timing required for paper-grade maturation |
| Concept drift | implemented as rolling-window diagnostics | validate annual PR-AUC, Brier Skill Score, feature-importance drift, and breakpoint summaries |
| Opacity | public-label DML implemented; refresh summary is separate from construct overlap | public-label PLR results must use label_comment_thread_365, label_amendment_365, and label_8k_402_365 as primary outcomes |
| Public lake | full public lake path implemented | refreshed source coverage, row counts, censoring, and reproducibility metadata |
| Public cascade | current full-run state is xbrl_ratio_baseline |
non-degenerate comment-thread, amendment, and 8-K Item 4.02 tasks; AAER framed as high-severity enforcement evidence |
| Bridge overlap | farr candidate overlap implemented | coverage, multiplicity, reciprocal alignment, no silent many-to-many joins, and WRDS-preferred validation before final integrated claims |
- Data integrity gates.
- No post-
origin_dateevent enters predictors. - No
res_an*column enters benchmark predictors. source_available_*,public_date_*,vintage_*, andas_of_datestay outside default predictors.- Censoring masks are horizon-specific.
- Crosswalk coverage and multiplicity are reported before overlap validation.
- Construct-overlap outputs carry
validation_tier = candidate_farrunder farr provenance and should flip towrds_validatedonly after a provenance-tagged WRDS or equivalent bridge is supplied. - Empirical sufficiency gates.
- Benchmark outputs non-empty rolling metrics, timing coverage, and missingness diagnostics.
- Public cascade covers fiscal years 2011-2023 in the full panel.
- Comment-thread, amendment, and 8-K Item 4.02 tasks have nonzero positives.
xbrl_ratio_*andxbrl_coverage_*features are present for non-metadata public-cascade evidence.- Prediction metrics are read relative to each task's prevalence; there is no absolute PR-AUC sufficiency threshold.
- Overlap evidence reports top-decile lift, reciprocal alignment, bridge tiers, and bridge coverage before integrated claims are made.
- Zero-positive or sparse AAER robustness tasks are skipped and reported as high-severity blockers.
- Paper-readiness gates.
- Claims remain measurement and decision-useful prediction claims, not causal proof of fraud occurrence.
- AAER is described as a rare high-severity descriptive proxy.
- Comment letters are described as public scrutiny, not complete SEC review.
- Bridge validation is mandatory for the integrated old-benchmark/public-cascade paper claim.
- Candidate farr overlap can support a related-but-non-identical construct argument, but not final WRDS-quality validation.
Execution Contract
- Operational reference. The operational command surface lives in the repository home page and README so there is a single maintained entrypoint for users and coauthors.
- Quality gate.
just check
- Paper-facing core run.
just full mode=full dataset=raw
- Peer-compatible model-family transfer.
just task study raw artifacts/full_with_peer \
extra="--peer-comparison-mode full --peer-target both --parallel-jobs 4 --model-threads 2 --seed-policy task-isolated"
just snapshot
just manuscript
- Command boundary.
just checkis the local quality gate;just full mode=full dataset=rawis the paper-facing core run for data engineering and core experiments;full_with_peeradds the legacy and public-label peer model-family transfer suites;just snapshotrefreshes the results snapshot fromartifacts/full_with_peerand then runsjust check;just manuscriptbuilds paper-facing tables, figures, and result prose inartifacts/manuscript_package. Use--peer-target publicwhen only the public-label peer transfer needs to be refreshed. - Detailed operations. Component-level reruns and public-lake operational details are documented in the repository home page, which includes the root
README.md.