Data Methodology

How numbers are sourced, parsed, and normalized

This page documents how Tokyo Intelligence ingests, parses, and stores financial data โ€” including the decisions made when source data is ambiguous, missing, or filed under multiple accounting standards.

How filings become data

All financial data originates from EDINET โ€” the FSA's mandatory disclosure platform. Each evening our pipeline fetches that day's filing index and processes new submissions end-to-end.

1
Download ZIP
EDINET publishes a ZIP archive per filing. We fetch and extract the contents โ€” an iXBRL HTML document plus companion taxonomy files.
2
Parse XBRL elements
Every tagged element is scanned. Each has a namespace prefix (jpcrp_cor, jpigp_cor, jppfs_cor, โ€ฆ) identifying its taxonomy, and a contextRef locating it in time and consolidation scope.
3
Select context
We prefer the consolidated current-period context (ConsolidatedMember, CurrentYear). Where absent โ€” common for smaller filers or per-share metrics โ€” we fall back to the non-consolidated context.
4
Map to schema
Tagged values are mapped to our schema: revenue, operating income, net income, total assets, equity, EPS, DPS, BPS, shares outstanding, and debt. Unmapped extension elements are dropped unless a per-company fallback pattern matches.
5
Store to DB
Rows are written keyed by (ticker, fiscal_year, period_type). Re-runs are upserts โ€” the freshest parse wins.
TDNet feeds a separate pipeline for capital allocation signals, material events, tender offers, and daily prices. TDNet data is never mixed into the EDINET financial time series โ€” the two pipelines are kept independent to preserve source fidelity.

J-GAAP ยท IFRS ยท US-GAAP detection

Japan allows listed companies to report under three standards. We auto-detect the standard per filing from XBRL namespace prefixes โ€” no manual selection required.

J-GAAP
Detected via jpcrp_cor: or jppfs_cor: namespaces. Revenue = NetSales; OI = OperatingIncome; NI = ProfitAttributableToOwnersOfParent.
IFRS
Detected via jpigp_cor: elements. Revenue = RevenueIFRS; OI = OperatingProfitLossIFRS. Large filers (Toyota, SoftBank) often use extension namespaces โ€” fallback pattern matching applies.
US-GAAP
Rare (~30 filers). Same detection logic, separate tag set. Displayed with the same fields as J-GAAP where tags align.
  • IFRS revenue may exclude items J-GAAP includes as net sales. Cross-standard comparisons require care.
  • BPS for IFRS filers is drawn from the non-consolidated context โ€” the consolidated BPS is not a mandatory element under the JP-IFRS taxonomy.
  • If a company switched standards mid-history, older years retain the prior standard's values with no restatement. The break appears as a discontinuity in time-series charts.

Zero vs. blank โ€” what each means

A displayed 0 and a displayed โ€” have different meanings throughout the platform.

0
Reported zero
The XBRL tag was present and its value was explicitly zero. Common examples: DPS = 0 for a suspended dividend; net debt = 0 for a net-cash company with no financial debt.
โ€”
Not available
The metric could not be extracted โ€” the XBRL tag was absent, the filing was PDF-only, or the parser encountered an ambiguous context. It does not imply zero.
Derived ratios (P/E, EV/EBITDA, ROE, etc.) are only computed when all inputs are non-null and non-zero where division is involved. A blank ratio always means at least one required input was missing โ€” never that the ratio happened to equal zero.

Flash results vs. formal annual filings

Japanese companies file two distinct documents after each fiscal year end.

Kessan Tanshin (ๆฑบ็ฎ—็Ÿญไฟก)
Flash ยท TDNet ยท within ~45 days of FY end
Preliminary unaudited figures. First public data for the fiscal year. Includes headline P&L and next-year guidance. Also appears in Capital Allocation Signals when a dividend change is announced alongside it.
Yukashoken Hokokusho (ๆœ‰ไพก่จผๅˆธๅ ฑๅ‘Šๆ›ธ)
Annual report ยท EDINET ยท ~3 months after FY end
Audited. Full iXBRL disclosure. The authoritative source for financials, shareholders, cross-holdings, and balance sheet detail. Powers the Financials tab and Equity Screener.
Precedence rule: when both documents exist for the same fiscal year, the EDINET annual filing takes precedence. Flash values are used only during the gap between the Kessan Tanshin date and the EDINET filing date (typically 6โ€“10 weeks). Once the EDINET filing is parsed, it permanently supersedes the flash data.

Split-factor adjustment

When a company executes a stock split (or reverse split), per-share metrics in pre-split filings are on a different basis than post-split figures. Without adjustment, EPS and DPS time series would show a step-change that does not reflect any change in the underlying business.

EPS
Divided by cumulative factor for pre-split years
DPS
Same adjustment โ€” makes yield history comparable
BPS
Adjusted so trend lines are smooth across split events
Shares outstanding
Multiplied by split factor for pre-split years
Absolute P&L and balance sheet totals are unaffected โ€” they are company-wide figures with no per-share basis. Split events are sourced from Yahoo Finance's ratio-adjusted price series and stored in ticker_split_factors.

Known limitations

๐Ÿ“„
PDF-only filers
Some smaller companies โ€” particularly pre-2020 regional exchange filers โ€” submitted PDF-only annual reports before iXBRL was mandated. No structured financials can be extracted from PDF. These appear with sparse or missing history prior to FY2020.
๐Ÿท๏ธ
Extension namespace elements
Large or complex companies sometimes create their own XBRL taxonomy extensions. We maintain per-company fallback patterns for the most common cases (NTT, Toyota, SoftBank). Unusual extension elements that don't match any known pattern are dropped rather than guessed.
๐Ÿ“…
History depth
EDINET's public API provides filings back to approximately FY2014 for most companies. Price history (Yahoo Finance) extends to CY2021 for most tickers.
โ›ฉ๏ธ
Nagoya / regional exchange tickers
~19 tickers listed exclusively on the Nagoya Stock Exchange are unsupported by Yahoo Finance. Ratios that require a current market price (P/E, P/B, EV/EBITDA) will appear blank for these companies.
๐Ÿข
P/NCAV suppressed when it exceeds P/B
NCAV (Current Assets โˆ’ Total Liabilities) per share should always be โ‰ค Book Value per share, since current assets are a subset of total assets. When NCAV/share exceeds BPS in our data โ€” typically because consolidated current assets outpace the parent-only equity the company uses for BPS (a non-controlling interest effect) โ€” P/NCAV is suppressed rather than shown as a misleadingly low or incorrect value. Affected companies show P/B but a blank P/NCAV in screener results.
See also Data Sources for the full source-to-feature mapping. For filing-specific discrepancies, the authoritative reference is the original document on EDINET โ†—.