ORION v5.0.1 · Research Release Methodology · 2026-04-17

The ORION Score, decomposed.

A reproducible, open measurement of citability on the LLM answer surface. Four modules, one composite, one empirical-state disclosure. This page is the full specification.

Published by ORION · Citability Intelligence · Version v5.0.1 · Last updated April 17, 2026 · View live benchmarks

1. What the ORION Score Measures

The ORION Score is a composite metric on a 0-100 scale that estimates the probability that a large language model (LLM) answer engine, ChatGPT, Claude, Perplexity, Google’s AI Overviews, and similar systems, will cite a given web page when generating an answer to a realistic user query.

ORION v5.0.1 is the fifth major release of the framework. It departs from earlier versions in three structural respects: it scores at the chunk level rather than the page level, it distinguishes between a predictive score (derived from the content of the page) and an observed score (derived from live LLM behavior), and it publishes an explicit empirical-state disclosure on every live score.

Definition

ORION Score (noun): A composite metric (0-100) measuring the probability that an LLM answer engine will cite a given web page. Computed from four predictive modules (C1, C2, Q, T) and, when available, blended with Observed Visibility measured from live LLM queries.

ORION is an acronym, Optimized Readiness for Intelligence-Observed Notability, but the name is historical. In practice the score is better understood through its current decomposition: four modules of citability, one measurement of observation, and one derived quantity, the Citability Gap.

2. Why Citability Measurement Matters Now

Informational queries increasingly resolve inside an LLM-generated answer rather than on a search-engine results page. Within a typical answer, the model selects three to seven supporting citations. The unit the model selects is not a page in the SEO sense; it is a passage, a two-to-four-sentence chunk that can be lifted without further context and used as direct support.

Traditional SEO metrics (domain authority, keyword rank, backlink count) measure behavior in a link-ranking system. They do not measure the probability that a specific chunk will be selected by an LLM. ORION v5 exists to measure that probability directly and to produce the measurement in a reproducible, auditable form.

The shift from “where does this page rank” to “how citable is this chunk” is not a rebrand of SEO. It changes the unit of measurement, the object of optimization, and the feedback loop. Citability measurement treats each chunk as an independent citation candidate, evaluates it against a realistic query fan-out, and compares the predicted outcome to actual LLM behavior.

3. The Four-Module Framework

ORION v5 decomposes a page’s predictive score into four orthogonal modules. Each module targets a distinct failure mode observed in empirical citation behavior and is scored 0-100 independently before weighting.

Module C1

Chunk Citability

Weight: 30% · Can a passage stand alone inside an LLM answer?

C1 measures whether individual passages on the page are self-contained enough to survive extraction. An LLM does not quote whole pages; it lifts a chunk (typically 180 – 280 words with 40-word overlap) and cites it as the answer’s support. A chunk that relies on prior paragraph context, mid-paragraph pronoun chains, or unresolved references cannot be safely extracted and is passed over.

Increases C1: self-contained paragraphs; a one-sentence summary leading each logical section; explicit subjects instead of pronouns at chunk boundaries; complete sentences; bounded definitions; callouts and blockquotes with standalone claims.
Decreases C1: cross-paragraph pronoun chains; implicit antecedents; claims that require prior context to interpret; nested conditional language (“if the above applies then …”); single-sentence paragraphs fragmented across the page.
Module C2

Citation Hygiene

Weight: 25% · Are numeric claims backed by on-topic primary sources?

C2 is a two-pass evaluator. The first pass extracts every numeric, comparative, or statistical claim in the chunk; the second pass checks whether the adjacent citation actually supports that claim on the topic at issue. Proximity is not support. A nearby hyperlink to an authoritative domain does not pass C2 if the linked content is off-topic relative to the claim.

Increases C2: each numeric claim linked to a primary source that independently supports the same claim on the same topic; dates and sample sizes cited; clear attribution of quotes; primary data preferred over secondary summaries.
Decreases C2: unsupported numbers (−3 per claim, no ceiling); citations that are on the source domain but off-topic relative to the claim; citation of a secondary summary when the primary source is available; stale references that no longer match the current version of the cited work.

C2 is new in v5. It is the module most responsible for the gap between pages that look authoritative and pages that actually are authoritative under retrieve-then-verify evaluation.

Module Q

Query-Fit Breadth

Weight: 25% · How many realistic queries does this chunk answer?

Q measures fan-out robustness: across the realistic distribution of user queries that could surface this chunk, how many does it answer well, and how stable is that coverage when the wording shifts? A chunk that only answers its exact target query with a narrow phrasing scores lower than a chunk that gracefully covers several phrasings of the same intent.

Increases Q: FAQ sections with varied question phrasings; definition blocks using alternate terms and synonyms; explicit how-to steps; comparative framings (“versus”, “differences between”); coverage of both beginner and advanced variants of the query.
Decreases Q: jargon-only phrasing without plain-language equivalents; a single narrow phrasing of the target concept; coverage gaps on obvious adjacent questions; collapsible/interactive content that hides query-relevant text from static extraction.
Module T

Trust & Authorship

Weight: 20% · Named author, review date, first-party data, verifiable source.

T measures the signals that LLMs use to dampen or amplify citation of a source. Empirically, pages with verifiable human authorship, visible review dates, and first-party data are cited at materially higher rates than pages without. T captures those signals as a single subscore.

Increases T: a named author with a verifiable identity (LinkedIn, institutional page); a dateLastReviewed field and visible update timestamp; first-party data or proprietary research; Article, Person, and Organization JSON-LD; a publisher whose identity is traceable off-site.
Decreases T: anonymous or generic bylines (“Admin”, “Editorial Team” without traceable members); missing review date; purely derivative content with no first-party contribution; absence of publisher identity signals.

4. Observed Visibility

Observed Visibility is measured separately from the four predictive modules. It is a direct count of how often live LLM answers cite the page under evaluation across a representative query set.

The measurement protocol for Observed Visibility: a canonical query set is sampled against the page’s target topic; each query is submitted to the supported LLM endpoints at controlled temperatures; the returned citations are matched to the page by canonical URL and by chunk-text similarity (cosine ≥ 0.78 on the standard embedding space). The citation rate on that query set is the page’s Observed Visibility score (0-100).

Observed Visibility is not folded into the four modules. It is maintained as an independent measurement so that the Citability Gap, the difference between predicted and observed, can be computed cleanly and tracked over time.

5. The Scoring Formula

Predictive Score, canonical
Predictive = (C1 × 0.30) + (C2 × 0.25) + (Q × 0.25) + (T × 0.20)

The predictive score is the weighted sum of the four modules. Weights reflect calibrated coefficients against the current compliance cohort and are subject to adjustment as the dataset grows. Weight changes are logged in the public changelog and versioned with the framework release.

C1, Chunk Citability
30%
C2, Citation Hygiene
25%
Q, Query-Fit Breadth
25%
T, Trust & Authorship
20%
Composite ORION Score, when Observed Visibility is available
ORION = (Predictive × 0.65) + (Observed × 0.35)
When Observed Visibility is not available (no live LLM data on the query set), ORION = Predictive and the Citability Gap is undefined for that scan.

The 0.65 / 0.35 weighting reflects the principle that live behavior is the strongest available signal but is noisy on any single query set; the predictive score provides the stabilizing prior. As the Observed query set grows and variance stabilizes, this weighting will be re-calibrated and any change will be published.

Current weights: v5.0.1 (2026-04-17). Prior versions are archived in the public changelog.

6. The Citability Gap

Citability Gap
Citability Gap = Predictive − Observed

The Citability Gap is a derived quantity, signed and scored on the same 0-100 scale. A positive gap indicates that a page has structural citability (good chunk structure, clean citation hygiene, broad query fit, verifiable trust signals) that is not yet being realized in live LLM behavior. A negative gap indicates a page is being cited above what the four modules predict, typically a signal of brand authority or strong external linking carrying the page past its chunk-level fundamentals.

The gap is the primary operational metric in Citability Intelligence because it is directly actionable: it identifies where the structural work is done but the observed rate has not yet responded, and where observed rate is fragile relative to what the content earns.

Worked example

retirement-savings-strategies-for-young-professionals

Predictive
68
Observed
41
Citability Gap
+27
▲ Predictive 68
Observed 41

The page’s four-module decomposition predicts citability of 68. Live LLM queries return a 41 citation rate. Twenty-seven points of demand are being earned but not captured. The gap is typically closed by module-level work: in this example, C2 (citation hygiene) scores 38 with three unsupported numeric claims; fixing those alone closes roughly half the gap.

7. The Honesty Gate

Every ORION v5 score carries an explicit empirical-state disclosure. Until four validation thresholds are cleared, the framework returns scores flagged as a Research Release, a label that is rendered on the dashboard, in the API response (isResearchRelease: true), and on every public surface where a live score appears.

The Honesty Gate is enforced in code, not in marketing. The flag only flips to Validated when all four thresholds clear simultaneously on the compliance cohort.

Gate 1

Label coverage

At least 300 positive labels across the stratified compliance cohort (40% positives, 40% marginals, 20% negatives).

threshold
≥ 300 labels
Gate 2

Judge stability

Brier-score stability across three-replica judge runs at temperature = 0. Replicated measurements must agree to within this bound to release a score as Validated.

threshold
Brier ≤ 0.04
Gate 3

Drift stability

A σ-adaptive CUSUM drift detector must hold steady across three consecutive evaluation windows without tripping hysteresis. This guards against silent distribution shift in LLM behavior.

threshold
3 windows stable
Gate 4

Predictive–Observed correlation

Spearman’s ρ between the Predictive score and Observed Visibility on held-out pages must meet the threshold below. Lower correlation means the predictive model has not yet earned the right to replace direct measurement.

threshold
ρ ≥ 0.55
Current state Research Release · pre-gate

When all four gates clear simultaneously, the flag flips to Validated and the calibration evidence is published alongside the release note.

8. Score Scale & Letter Grade

Every ORION Score is reported both as a 0-100 number and as a letter grade (A / B / C / D / F). The grade is a monotonic mapping of the numeric score into a band intended for glanceable interpretation; it does not replace the number.

A
85, 100 · Exemplary
Cited at or near the upper bound of the LLM citation distribution for the page’s topic.
B
70, 84 · Strong
Reliably cited; typically 1, 2 modules below exemplary and clearly identifiable.
C
55, 69 · Competitive
Cited on some queries but fragile; most common band for informational pages at scan time.
D
40, 54 · At risk
Rarely cited; at least two modules scoring below threshold.
F
0, 39 · Invisible
Not cited. Structural rewrites typically required across multiple modules.

9. ORION vs Traditional SEO Metrics

The two metric families measure different outcomes and should not be treated as substitutes for each other.

Dimension
Traditional SEO
ORION v5
Outcome measured
Position in a ranked link list
Probability of citation inside an LLM answer
Unit of evaluation
The page
The chunk (a citable passage)
Primary signals
Backlinks, anchor text, keyword relevance, page speed
Chunk self-containment, citation-support integrity, query-fit breadth, authorship signals
Feedback surface
SERP impressions and clicks
Observed LLM citation rate
Complementarity
Still useful for navigational and transactional queries
Increasingly primary for informational queries resolved inside an AI answer

In practice most production sites need both: SEO to remain discoverable on navigational queries, and ORION to remain citable on the growing share of informational queries that resolve inside an LLM answer before a user reaches a link list.

10. The Citability Index

The Citability Index (formerly the GEO Index) is the public benchmark dataset maintained alongside the ORION framework. It aggregates ORION Scores across every scan performed through WhatsMyGeoScore and breaks them down by industry, page type, and score distribution.

The Index is licensed under Creative Commons Attribution 4.0 and is available as a free, unauthenticated HTTP API at /api/geo-index. The legacy URL is retained for link equity and API compatibility.

The Index’s value is cumulative: every scan adds a data point and sharpens the distribution. As the corpus grows, the Honesty Gate thresholds (Section 7) are re-evaluated and the framework’s release status advances accordingly.

11. How to Close Your Citability Gap

The Citability Gap closes by module-level work. Below is the standard order of operations; exact priorities for any given page are generated in the dashboard’s Recommendation Panel, ranked by expected lift divided by effort.

  1. 1
    Scan and read your gap. Run a free ORION scan and note the three numbers: Predictive, Observed, and Citability Gap. The gap is the quantity to close.
  2. 2
    C1 , Chunk Citability. Rewrite paragraphs so each can stand alone. Add a one-sentence summary at the top of every logical section. Replace mid-paragraph pronouns with explicit subjects at chunk boundaries.
  3. 3
    C2 , Citation Hygiene. For every numeric claim, verify the adjacent citation is on-topic. Replace or remove off-topic citations. Add primary-source links where missing. Each unsupported claim costs the page 3 points with no ceiling.
  4. 4
    Q , Query-Fit Breadth. Add FAQ entries with plain-language phrasings, definition blocks with synonyms, and how-to steps. Target the realistic fan-out of queries your audience uses, not just the preferred wording.
  5. 5
    T , Trust & Authorship. Add a named human author with a verifiable identity. Add a visible review date and dateLastReviewed in schema. Add first-party data or primary-source citations where the page is derivative.
  6. 6
    Re-scan and track the gap over time. The gap is a moving quantity. Re-scan after each change and track the delta between Predictive and Observed. Closing the gap is the work.

12. Frequently Asked Questions

What is the difference between the ORION Score and an “AI visibility score” from other tools? expand_more

Most AI visibility scores count how often an LLM mentions a brand name across a sampled prompt set. The ORION Score measures a different quantity: the probability that an LLM will cite a specific passage as support for an answer. The distinction matters because a mention without a citation does not drive traffic and does not survive the LLM’s own trust checks.

Why does ORION v5 score at the chunk level instead of the page level? expand_more

LLMs cite passages, not pages. An answer engine typically extracts a 180-280 word chunk from a source and uses it as the basis for its citation. A page-level score obscures whether any individual chunk is citable in isolation. Scoring at the chunk level produces both a per-chunk diagnostic and a page-level composite.

What is the Citability Gap, specifically? expand_more

Citability Gap = Predictive score − Observed Visibility. It is the difference between what the four modules predict and what live LLM queries return. A positive gap represents citation demand earned by the page’s structure but not yet captured in live behavior; a negative gap typically indicates the page is being carried by brand or link signals beyond what its chunk-level structure accounts for.

What does “Research Release” mean? expand_more

Research Release is the framework’s current empirical state. The score is operational and reproducible; its predictive calibration has not yet cleared all four Honesty Gate thresholds. Every score surface renders the Research Release disclosure so a reader knows the score’s present validation status.

Is the ORION Score free? expand_more

Yes. Up to 30 scans per day with no credit card required. The methodology is open and the benchmark dataset is licensed under CC BY 4.0.

How do I interpret the letter grade alongside the numeric score? expand_more

The letter grade (A / B / C / D / F) is a monotonic mapping of the 0-100 score into named bands for at-a-glance interpretation. It does not replace the numeric score; both are published. The boundaries are documented in Section 8.

13. Open Standard & API

The ORION framework is published as an open standard. Methodology is documented here; the benchmark dataset is available under Creative Commons Attribution 4.0; the scoring endpoint is accessible without authentication at standard rate limits.

Public endpoints
GET /api/geo-index
The Citability Index, aggregated stats, distribution, by industry, by page type. Cache 1h. Rate limit 60/min.
POST /api/score
Scan a URL. Returns the full score_response_v5 payload including per-module subscores, Citability Gap, and Research Release flag.

Framework specification: ORION-Framework-v5.md. Judge implementations: lib/orion/judges/. Compliance corpus and release changelog are published alongside each versioned tag.

Close the gap

See your Citability Gap.

Paste a URL. ORION returns the four module subscores, the gap between Predictive and Observed, and the highest-lift changes to close it.

bolt Run a free Citability scan

FREE · NO SIGNUP · FORMERLY KNOWN AS GEO SCORE