A reproducible, open measurement of citability on the LLM answer surface. Four modules, one composite, one empirical-state disclosure. This page is the full specification.
Published by ORION · Citability Intelligence · Version v5.0.1 · Last updated April 17, 2026 · View live benchmarks
The ORION Score is a composite metric on a 0-100 scale that estimates the probability that a large language model (LLM) answer engine, ChatGPT, Claude, Perplexity, Google’s AI Overviews, and similar systems, will cite a given web page when generating an answer to a realistic user query.
ORION v5.0.1 is the fifth major release of the framework. It departs from earlier versions in three structural respects: it scores at the chunk level rather than the page level, it distinguishes between a predictive score (derived from the content of the page) and an observed score (derived from live LLM behavior), and it publishes an explicit empirical-state disclosure on every live score.
ORION Score (noun): A composite metric (0-100) measuring the probability that an LLM answer engine will cite a given web page. Computed from four predictive modules (C1, C2, Q, T) and, when available, blended with Observed Visibility measured from live LLM queries.
ORION is an acronym, Optimized Readiness for Intelligence-Observed Notability, but the name is historical. In practice the score is better understood through its current decomposition: four modules of citability, one measurement of observation, and one derived quantity, the Citability Gap.
Informational queries increasingly resolve inside an LLM-generated answer rather than on a search-engine results page. Within a typical answer, the model selects three to seven supporting citations. The unit the model selects is not a page in the SEO sense; it is a passage, a two-to-four-sentence chunk that can be lifted without further context and used as direct support.
Traditional SEO metrics (domain authority, keyword rank, backlink count) measure behavior in a link-ranking system. They do not measure the probability that a specific chunk will be selected by an LLM. ORION v5 exists to measure that probability directly and to produce the measurement in a reproducible, auditable form.
The shift from “where does this page rank” to “how citable is this chunk” is not a rebrand of SEO. It changes the unit of measurement, the object of optimization, and the feedback loop. Citability measurement treats each chunk as an independent citation candidate, evaluates it against a realistic query fan-out, and compares the predicted outcome to actual LLM behavior.
ORION v5 decomposes a page’s predictive score into four orthogonal modules. Each module targets a distinct failure mode observed in empirical citation behavior and is scored 0-100 independently before weighting.
C1 measures whether individual passages on the page are self-contained enough to survive extraction. An LLM does not quote whole pages; it lifts a chunk (typically 180 – 280 words with 40-word overlap) and cites it as the answer’s support. A chunk that relies on prior paragraph context, mid-paragraph pronoun chains, or unresolved references cannot be safely extracted and is passed over.
C2 is a two-pass evaluator. The first pass extracts every numeric, comparative, or statistical claim in the chunk; the second pass checks whether the adjacent citation actually supports that claim on the topic at issue. Proximity is not support. A nearby hyperlink to an authoritative domain does not pass C2 if the linked content is off-topic relative to the claim.
C2 is new in v5. It is the module most responsible for the gap between pages that look authoritative and pages that actually are authoritative under retrieve-then-verify evaluation.
Q measures fan-out robustness: across the realistic distribution of user queries that could surface this chunk, how many does it answer well, and how stable is that coverage when the wording shifts? A chunk that only answers its exact target query with a narrow phrasing scores lower than a chunk that gracefully covers several phrasings of the same intent.
T measures the signals that LLMs use to dampen or amplify citation of a source. Empirically, pages with verifiable human authorship, visible review dates, and first-party data are cited at materially higher rates than pages without. T captures those signals as a single subscore.
dateLastReviewed field and visible update timestamp; first-party data or proprietary research; Article, Person, and Organization JSON-LD; a publisher whose identity is traceable off-site.
Observed Visibility is measured separately from the four predictive modules. It is a direct count of how often live LLM answers cite the page under evaluation across a representative query set.
The measurement protocol for Observed Visibility: a canonical query set is sampled against the page’s target topic; each query is submitted to the supported LLM endpoints at controlled temperatures; the returned citations are matched to the page by canonical URL and by chunk-text similarity (cosine ≥ 0.78 on the standard embedding space). The citation rate on that query set is the page’s Observed Visibility score (0-100).
Observed Visibility is not folded into the four modules. It is maintained as an independent measurement so that the Citability Gap, the difference between predicted and observed, can be computed cleanly and tracked over time.
The predictive score is the weighted sum of the four modules. Weights reflect calibrated coefficients against the current compliance cohort and are subject to adjustment as the dataset grows. Weight changes are logged in the public changelog and versioned with the framework release.
The 0.65 / 0.35 weighting reflects the principle that live behavior is the strongest available signal but is noisy on any single query set; the predictive score provides the stabilizing prior. As the Observed query set grows and variance stabilizes, this weighting will be re-calibrated and any change will be published.
Current weights: v5.0.1 (2026-04-17). Prior versions are archived in the public changelog.
The Citability Gap is a derived quantity, signed and scored on the same 0-100 scale. A positive gap indicates that a page has structural citability (good chunk structure, clean citation hygiene, broad query fit, verifiable trust signals) that is not yet being realized in live LLM behavior. A negative gap indicates a page is being cited above what the four modules predict, typically a signal of brand authority or strong external linking carrying the page past its chunk-level fundamentals.
The gap is the primary operational metric in Citability Intelligence because it is directly actionable: it identifies where the structural work is done but the observed rate has not yet responded, and where observed rate is fragile relative to what the content earns.
The page’s four-module decomposition predicts citability of 68. Live LLM queries return a 41 citation rate. Twenty-seven points of demand are being earned but not captured. The gap is typically closed by module-level work: in this example, C2 (citation hygiene) scores 38 with three unsupported numeric claims; fixing those alone closes roughly half the gap.
Every ORION v5 score carries an explicit empirical-state disclosure. Until four validation thresholds are cleared, the framework returns scores flagged as a Research Release, a label that is rendered on the dashboard, in the API response (isResearchRelease: true), and on every public surface where a live score appears.
The Honesty Gate is enforced in code, not in marketing. The flag only flips to Validated when all four thresholds clear simultaneously on the compliance cohort.
At least 300 positive labels across the stratified compliance cohort (40% positives, 40% marginals, 20% negatives).
Brier-score stability across three-replica judge runs at temperature = 0. Replicated measurements must agree to within this bound to release a score as Validated.
A σ-adaptive CUSUM drift detector must hold steady across three consecutive evaluation windows without tripping hysteresis. This guards against silent distribution shift in LLM behavior.
Spearman’s ρ between the Predictive score and Observed Visibility on held-out pages must meet the threshold below. Lower correlation means the predictive model has not yet earned the right to replace direct measurement.
When all four gates clear simultaneously, the flag flips to Validated and the calibration evidence is published alongside the release note.
Every ORION Score is reported both as a 0-100 number and as a letter grade (A / B / C / D / F). The grade is a monotonic mapping of the numeric score into a band intended for glanceable interpretation; it does not replace the number.
The two metric families measure different outcomes and should not be treated as substitutes for each other.
In practice most production sites need both: SEO to remain discoverable on navigational queries, and ORION to remain citable on the growing share of informational queries that resolve inside an LLM answer before a user reaches a link list.
The Citability Index (formerly the GEO Index) is the public benchmark dataset maintained alongside the ORION framework. It aggregates ORION Scores across every scan performed through WhatsMyGeoScore and breaks them down by industry, page type, and score distribution.
The Index is licensed under Creative Commons Attribution 4.0 and is available as a free, unauthenticated HTTP API at /api/geo-index. The legacy URL is retained for link equity and API compatibility.
The Index’s value is cumulative: every scan adds a data point and sharpens the distribution. As the corpus grows, the Honesty Gate thresholds (Section 7) are re-evaluated and the framework’s release status advances accordingly.
The Citability Gap closes by module-level work. Below is the standard order of operations; exact priorities for any given page are generated in the dashboard’s Recommendation Panel, ranked by expected lift divided by effort.
dateLastReviewed in schema. Add first-party data or primary-source citations where the page is derivative.
Most AI visibility scores count how often an LLM mentions a brand name across a sampled prompt set. The ORION Score measures a different quantity: the probability that an LLM will cite a specific passage as support for an answer. The distinction matters because a mention without a citation does not drive traffic and does not survive the LLM’s own trust checks.
LLMs cite passages, not pages. An answer engine typically extracts a 180-280 word chunk from a source and uses it as the basis for its citation. A page-level score obscures whether any individual chunk is citable in isolation. Scoring at the chunk level produces both a per-chunk diagnostic and a page-level composite.
Citability Gap = Predictive score − Observed Visibility. It is the difference between what the four modules predict and what live LLM queries return. A positive gap represents citation demand earned by the page’s structure but not yet captured in live behavior; a negative gap typically indicates the page is being carried by brand or link signals beyond what its chunk-level structure accounts for.
Research Release is the framework’s current empirical state. The score is operational and reproducible; its predictive calibration has not yet cleared all four Honesty Gate thresholds. Every score surface renders the Research Release disclosure so a reader knows the score’s present validation status.
Yes. Up to 30 scans per day with no credit card required. The methodology is open and the benchmark dataset is licensed under CC BY 4.0.
The letter grade (A / B / C / D / F) is a monotonic mapping of the 0-100 score into named bands for at-a-glance interpretation. It does not replace the numeric score; both are published. The boundaries are documented in Section 8.
The ORION framework is published as an open standard. Methodology is documented here; the benchmark dataset is available under Creative Commons Attribution 4.0; the scoring endpoint is accessible without authentication at standard rate limits.
Framework specification: ORION-Framework-v5.md. Judge implementations: lib/orion/judges/. Compliance corpus and release changelog are published alongside each versioned tag.
Paste a URL. ORION returns the four module subscores, the gap between Predictive and Observed, and the highest-lift changes to close it.
bolt Run a free Citability scanFREE · NO SIGNUP · FORMERLY KNOWN AS GEO SCORE