Entity optimization: the complete guide

Knowledge Graph, Wikidata, schema graphs, sameAs convergence, and the entity stack that powers retrieval across both traditional SEO and AI assistants. The May 2026 playbook.

An entity is a thing in the world with a stable identity: a business, a person, a place, a product, a credential. Search engines and AI assistants think in entities, not strings. If your business is recognized as a clean, well-connected entity, you get recommended, cited, and disambiguated correctly across every retrieval surface that matters in 2026. If it is not, you compete keyword-by-keyword on a shrinking SERP. This is the complete playbook for entity optimization as it actually works in May 2026, across both traditional search and AI environments.

Key takeaways

Entities have IDs, types, and graph relationships; keywords don't. The unit of work is your entity record (claimed GBP, Wikidata Q-id where applicable, schema graph on your domain, an authoritative sameAs chain), not the page.
Google's Knowledge Graph holds about 5 billion entities and 500 billion facts. Local-business entities populate it primarily through claimed Google Business Profiles, not Wikipedia notability.
AI Mode (Google), ChatGPT Search, Claude, and Perplexity each use entity disambiguation as a retrieval-time step. Pages associated with a clear, well-connected entity get cited disproportionately, even when their raw rankings are middling.
The shift from 2024 guidance: query fan-out, MUVERA-style multi-vector retrieval, and a three-bot Anthropic crawler taxonomy mean older 'entity SEO' advice undercounts how granular the modern retrieval stack has become.
Schema.org v30.0 (March 2026) added the Credential type plus equivalence annotations to GS1, Dublin Core, and Open Graph (OnlineMarketplace and ConferenceEvent arrived earlier, in v29.4). But be realistic about schema and AI: on a live fetch most assistants read your visible text and ignore your JSON-LD. Schema earns its keep for entity disambiguation and the search indexes AI grounds on, not as a direct citation lever.
Avoid 2024-vintage tactics: stuffing sameAs with social profiles, manufacturing co-citation through paid placements, programmatic location-page sprawl, AI-written YMYL content with fake authors. Each one is now actively penalised, not just ignored.

What an entity actually is

In information retrieval, an entity is a disambiguated, structured identity for a thing in the world. Each entity has:

An identifier: a Knowledge Graph machine-readable ID, a Wikidata Q-number, a Companies House number, an SRA roll number, an ORCID. Stable; survives renames.
A type: LocalBusiness, Person, Organization, Product, MedicalClinic. Constrains which properties apply.
Properties: name, address, opening hours, credentials, area served, founded date, sameAs links to other identifiers for the same entity.
Relationships: edges to other entities. A solicitor works for a firm; a firm operates in a jurisdiction; a clinic is contained in a city; a product is offered by a business.

Google formalised this view of search in 2012 when it launched the Knowledge Graph. The corporate-blog explainer states the system holds information about roughly 5 billion entities and over 500 billion facts, with local-business records populated through Google Business Profile claims rather than Wikipedia-style notability. Wikidata, the open sibling, holds over 120 million items with a much lower notability bar, though not a zero one: an item still needs at least one independent, verifiable reference, which most established businesses with any press coverage can clear.

Identifier

A stable, machine-readable handle. Knowledge Graph ID, Wikidata Q-number, regulator roll number, Companies House number. Survives a rebrand.

Type

What kind of thing it is. LocalBusiness, Person, MedicalClinic, Hotel, Service. Determines which properties are valid.

Properties

Name, address, hours, credentials, area served, sameAs. The factual claims attached to the identifier.

Relationships

Edges to other entities. provider, employee, containedInPlace, hasCredential, areaServed. The graph structure.

Entity SEO is not better keyword SEO. The unit of work is the entity record, not the page. Once your entity is correctly resolved, disambiguated, and connected, every page on your domain inherits that identity in retrieval. Without it, every page competes alone.
The framing that actually matters

How retrieval has changed: 2024 advice no longer survives

Most of the entity-SEO advice still circulating online was written before three structural shifts. Each one changes what you optimize for.

1
Query fan-out (AI Mode, Deep Search, Deep Research)
Largest shift
Google's AI Mode launched broadly in May 2025 and now runs in 200+ countries. A single user query gets decomposed into many parallel sub-queries. Deep Search can issue hundreds of searches before composing a cited answer. Your page can be retrieved as the source for sub-query #4 of 12 even when it would never have ranked top-10 for the original query. Optimization surface area has multiplied.
2
Multi-vector retrieval (MUVERA and successors)
Architectural
Google Research published MUVERA in June 2025: multi-vector retrieval at single-vector latency. Google has not said where it runs MUVERA in production, but the direction is clear: retrieval can hold separate vector representations for distinct sub-topics inside one document instead of averaging them into a single embedding. Pages that cleanly separate sub-topics with explicit entity scaffolding are easier to retrieve precisely; sprawling "ultimate guides" that average out semantically are not.
3
AI-assistant-native crawlers and three-bot taxonomies
Operational
Anthropic now operates ClaudeBot (training), Claude-User (user-initiated fetches), and Claude-SearchBot (web-search retrieval). OpenAI publishes GPTBot (training) and OAI-SearchBot (live retrieval) as separate tokens. Older robots.txt blocks targeting only the training crawlers no longer cover assistant search. Audit each bot name explicitly.
4
Bing Webmaster AI Performance reporting
Measurement
Microsoft launched AI Performance reporting in Bing Webmaster Tools public preview in February 2026 — the first time publishers can see Copilot citation activity per URL. Microsoft Copilot is Bing-grounded, and ChatGPT Search still leans heavily on Bing's index alongside its own crawler, so this is a real measurement channel for non-Google AI surfaces.

The entity stack: where your identity actually lives

For a UK local business, "your entity" is not one record but a graph of records that need to converge on the same canonical identity. Search and AI extractors use this convergence as a confidence signal. The more authoritative sources agree on the same entity, the higher the retrieval confidence.

Knowledge Graph node

Google's structured record. Drives the Knowledge Panel, AI Overviews, AI Mode, and Gemini answers. For local businesses, populated mostly through GBP.

Wikidata Q-number

Open structured-data record. Lower notability bar than Wikipedia. Read by ChatGPT, Claude, Perplexity, and Gemini grounders for entity disambiguation.

Google Business Profile

Canonical source for local-business attributes (NAP, hours, services, reviews). Feeds the Knowledge Graph and Maps. Highest-leverage single entity record for most local businesses.

Regulator and registry records

Companies House, SRA, GMC, GDC, FCA, Gas Safe, NICEIC. Government and professional-body records are weighted heavily by AI extractors as authoritative entity hooks.

Schema.org graph on your domain

Organization or LocalBusiness JSON-LD with sameAs linking to all of the above. The closing loop that tells crawlers and extractors to treat all references as one entity.

Authoritative third-party mentions

Recognized press, sector publications, regulator listings, professional-body directories. Each authoritative co-mention is a citation-source for AI answers and a co-occurrence signal for ranking.

When all six layers reference the same canonical entity, retrieval systems converge with high confidence. When they disagree (different names, addresses, missing nodes, broken sameAs), the system either picks one and gets it partially right, or fails to surface you at all. Entity convergence is the goal.

Entity salience: the public proxy for "how strongly does Google associate this entity with this page?"

Google's Cloud Natural Language API still publicly exposes a 0-1 salience score for every entity it extracts from a document. The same NLP family underpins much of Google's organic ranking and AI surface logic. Salience is not "the ranking signal" — but it is the most direct public proxy practitioners have for how Google's NLP weights an entity within a page.

Practical use:

Run your service pages through the NL API. The intended primary entity for the page should be the highest-salience entity returned. If it is not, your page is semantically about something else.
Look at adjacent high-salience entities. They reveal the topic neighborhood Google is reading the page in. Useful for spotting drift (a "boiler service" page where "kitchen renovation" is the top adjacent entity is in trouble).
Watch for entities that appear in the page text but never extract. Almost always a content-structure or schema problem. Add internal links, headings, or a structured-data block that names the entity explicitly.

Schema.org: what it does and does not do for AI

Schema.org released v30.0 in March 2026, adding the Credential type and equivalence annotations to GS1, Dublin Core, and Open Graph. (The ConferenceEvent, OnlineMarketplace, and InstantaneousEvent types people often attribute to v30 actually landed earlier, in v29.4 in December 2025.) Two facts shape how to use schema in 2026:

Schema is not the direct AI-citation lever it is often sold as. In controlled live-fetch testing, ChatGPT, Claude, and Perplexity read a page's rendered visible text and largely ignore its JSON-LD; only Gemini reliably renders and reads injected markup. Where schema does pull its weight for AI is indirect: it feeds entity disambiguation and the Google and Bing indexes that AI answers are grounded in. Put the facts in the visible copy, and use schema to label them, not to hide them.
Google has steadily retired rich-result types: Sitelinks Search Box (late 2024), SpecialAnnouncement (mid-2025), HowTo (2023), FAQ (fully gone by May 2026), and Practice Problems (2026). The schema types themselves remain valid Schema.org, and Google says there is no need to remove them. Decide per type whether to keep the markup for entity clarity and machine-readability, but do not expect the SERP rich results they once produced.

The load-bearing types for a local business

1
LocalBusiness (or a relevant subtype)
Required
The canonical entity record for your business on your domain. Required fields per Google's docs: name, address, telephone, openingHoursSpecification. Use the most specific subtype available: Dentist, MedicalClinic, Restaurant, Hotel, Plumber, Attorney.
2
Organization with sameAs
Required
Brand-level identity, separate from the venue. sameAs pointing to your Wikidata Q-number, GBP listing, Companies House page, regulator records, and major social profiles. This is the entity-disambiguation hook that AI extractors read most aggressively.
3
Service per offered service
Strong
One Service node per major service, each with a provider reference back to the LocalBusiness, an areaServed, and optionally offers. Maps to AI Mode sub-queries about specific services.
4
Person for named authors and key staff
Strong (critical for YMYL)
Author bylines as Person nodes with jobTitle, worksFor, hasCredential, and sameAs pointing to LinkedIn, regulator listings, academic profiles, ORCID. v30.0's new Credential type lets you describe qualifications properly.
5
Review and AggregateRating
Strong
Review counts and ratings are read by AI extractors as a trust signal and surface directly in some assistant answers. Use first-party review data; do not invent counts or pull aggregate ratings from third-party platforms you cannot verify.
6
FAQPage on FAQ blocks
Variable
Google reduced FAQ rich results in 2023 and removed them entirely by May 2026. The markup is still valid and maps cleanly to chunked Q&A, so it is worth keeping where you already have FAQ content, but it no longer earns a SERP feature.
7
Article, BlogPosting, NewsArticle on editorial content
Strong
Page-level schema for editorial. Author, datePublished, dateModified, publisher referencing your Organization. Maps to citation-extraction in AI Overviews and AI Mode.
8
BreadcrumbList for navigational context
Foundation
Tells extractors where the page sits in your site architecture. Cheap to implement and signals topical-cluster membership.

The sameAs property is doing more work than you realize

sameAs declares "treat all of these URLs as references to the same entity". Modern AI extractors weight authoritative identifiers (Wikidata, Wikipedia, regulator pages, Companies House) far more heavily than another link to your own LinkedIn. The 2024 advice to "stuff every social profile in" is now a missed-opportunity pattern.

A high-quality sameAs array for a UK local business looks like:

Wikidata Q-number URL
Your Google Business Profile URL
Corporations Canada or provincial registry page
Regulator or professional-body record: provincial law society, College of Physicians and Surgeons, CPA Canada, OSC, IIROC
One or two top-tier press mentions
Two to three primary social profiles

Three patterns to avoid: paid sameAs link networks designed to inflate entity confidence (extractors weight by destination authority, so most contribute nothing); broken or 404-returning sameAs targets (read as active disconfirmation); inconsistent name or address strings between the targets (forces the extractor to pick a winner and may pick the wrong one).

Wikidata: the open Knowledge Graph

Wikidata sits below Wikipedia on the notability ladder. Per the official policy, an item must satisfy at least one of three criteria: a valid Wikimedia sitelink; a clearly identifiable conceptual or material entity with serious public references; or a structural need to support other items. A business registry listing alone is not enough — you need independent reliable-source coverage to pass the second criterion. Most established businesses with a few years of trading and any press clear that bar.

1
Search before you create
Wikidata may already have an item for your business — created by a Wikipedia editor, an industry catalog, or a previous merger record. Search by name, by domain, and by any prior trading name. If an item exists, the work is to enrich and correct it, not to create a duplicate.
2
Establish notability before you create
Independent reliable-source coverage from at least two distinct sources is the practical threshold. Local press, sector trade press, regulator publications, and academic citations all count. Press releases, your own marketing, and client testimonials do not. If you do not pass the bar, work on earning the coverage first.
3
Create or commission the item
Editing your own entry is technically allowed but discouraged. Better is to provide a Wikidata-active editor with sourced facts and let them create the entry. The Wikidata community has an active 2025-26 RfC on notability reform; expect minor policy shifts but not radical ones.
4
Populate the load-bearing properties
P31 instance-of, P17 country, P159 headquarters location, P856 official website, P1448 official name, P1454 legal form. For UK businesses, P1320 OpenCorporates ID and identifiers for Companies House, the FCA register, or relevant trade bodies close the entity-disambiguation loop.
5
Close the loop on your domain
Add the resulting Q-number URL to your Organization JSON-LD's sameAs array. This is what tells extractors that your domain and the Wikidata entry describe the same entity. Without that loop closed, the Wikidata entry is half-useful at most.

Authoritative co-occurrence: not dead, but reframed

The classic "co-citation" idea was that if your business name appears alongside a competitor's name on enough authoritative pages, search engines learn to associate you with the same category. That is still true, but the threshold for "authoritative" has tightened and the threshold for "manipulative" has loosened in enforcement.

Two things have changed materially:

Site reputation abuse. Google introduced this as a spam policy in March 2024 (manual actions began that May) and moved to algorithmic enforcement in the August 2025 spam update. Hosting third-party commercial content under a strong site's authority — paid placements, subfolder rentals, "loan content" arrangements — is now actively dangerous, not just risky. The European Commission opened a DMA investigation into the policy's application to news publishers in November 2025.
Trust filtering at retrieval. AI assistants weight destination authority during retrieval, not just at training. Manufactured co-occurrence on low-quality directories contributes nothing to AI citation rates and correlates with manual-action triggers.

What still works: earning genuine mentions in recognized press, sector publications, regulator and professional-body listings, and curated "best of" round-ups. Each authoritative co-mention is both a citation-source for AI answers and a co-occurrence signal for ranking. See the AI search visibility guide for the earned-media playbook in detail.

Author and Person entities: where YMYL gets hardest

Your Money or Your Life queries — medical, legal, financial, safety, and now (per the September 2025 Search Quality Rater Guidelines update) government and civic information — are handled more conservatively by every major retrieval surface. Models default to authoritative sources, surface caveats more readily, and decline to recommend specific providers more often than in lower-risk sectors.

Two 2025-26 shifts shape the response:

In early 2026, Google added dedicated authorship guidance to its Search Central documentation. Be precise about what that means: clear authorship helps Google understand and represent who is behind a page (part of how it reads E-E-A-T), but Google has repeatedly said that author or Person markup is not itself a ranking signal.
The December 2025 and March 2026 core updates both weighed on thin and unreviewed YMYL content, with legal and health verticals visibly affected. Some commentary framed December as a Medic-scale YMYL update; that characterisation is commentary, not a Google statement. The durable takeaway is the direction: first-hand specifics and verifiable author credentials held up, generic or unreviewed content did not.

What a YMYL author entity should expose

•Named author with full real-name byline on every editorial page
•Person schema with hasCredential pointing to a regulator record (SRA roll number, GMC number, FCA reference)
•Visible bio, qualifications, dated content, last-updated metadata
•sameAs to LinkedIn, ORCID, regulator listing, and academic profile where applicable
•Reviewer trail when the author is junior — a named senior reviewer with their own credentials
•Link from the page byline to a full author page that resolves as its own entity

What gets actively demoted

•Generic 'admin' or 'editorial team' bylines on YMYL content
•Stock-photo author personas with no verifiable credential trail
•Undated content claiming current pricing or current legal positions
•AI-generated YMYL content with no expert review
•'Reviewed by' placeholders that do not link to a real reviewer
•Content that contradicts the regulator-published guidance on the same topic

How each major surface treats entities, in May 2026

Different retrieval stacks are wired differently. Knowing which stack each one uses tells you which signals you need to invest in for each surface.

Google (Search, AI Overviews, AI Mode, Gemini)

•Retrieval: Google index + Knowledge Graph + Maps API + (in AI Mode) query fan-out + Deep Search
•Entity disambiguation: Knowledge Graph node primary; Wikidata used for grounding; schema.org sameAs as a hint
•Crawler control: Googlebot for ranking (and for AI Overviews/AI Mode, which use the live Search index); Google-Extended is an opt-out token for Gemini training and Vertex grounding only, and does not change AI Overview or AI Mode inclusion
•What dominates: GBP completeness, Knowledge Graph entry, sameAs convergence

ChatGPT Search and Microsoft Copilot

•Retrieval: Copilot is Bing-grounded; ChatGPT Search leans on Bing's index too but now layers on its own OAI-SearchBot index and re-ranking, plus OpenAI's licensing partners (News Corp, Axel Springer, Time, Le Monde)
•Entity disambiguation: Bing's index plus schema.org plus Wikidata; sameAs read aggressively
•Crawler control: GPTBot (training), OAI-SearchBot (live retrieval), ChatGPT-User (user-initiated); Bingbot for Bing index
•What dominates: Bing visibility, schema graph, licensed-publisher coverage, directory presence

Perplexity

•Retrieval: PerplexityBot crawl plus multiple search APIs; weights recency and citation-worthy long-form heavily
•Entity disambiguation: schema.org plus Wikidata plus inline citation extraction
•Crawler control: PerplexityBot (indexing) and Perplexity-User (user-initiated)
•What dominates: long-form pages with original data, dated content, clear section structure, schema with Person markup

Claude (Anthropic)

•Retrieval: Brave Search plus pre-trained corpus plus user-initiated fetches
•Entity disambiguation: schema.org, Wikidata, regulator records read for YMYL
•Crawler control: ClaudeBot (training), Claude-User (user-initiated), Claude-SearchBot (web-search)
•What dominates: third-party authoritative mentions, Person schema with credentials, Brave visibility

Apple Intelligence (Siri, Spotlight, Safari)

•Retrieval: Apple's ecosystem (Apple Maps, Apple Business Connect, Spotlight) plus optional ChatGPT routing when the user opts in
•World Knowledge Answers: Apple's AI summarisation system announced for 2026
•Crawler control: Applebot (search/Siri/Spotlight), Applebot-Extended (AI training opt-out)
•What dominates: Apple Business Connect completeness, Apple Maps presence, schema; iOS-heavy audiences justify the same effort here as GBP

Brave Search

•Independent index of around 40 billion pages, refreshed by ~100 million pages added or refreshed daily
•Brave LLM Context launched February 2026: public API for grounding third-party LLMs against the Brave index
•Powers Claude live retrieval and several smaller assistants
•What dominates: same SEO foundations as Bing — index visibility, schema, content quality

Crawler access: the audit nobody runs

AI crawlers honor robots.txt. Many UK sites accidentally block them via wildcard rules, agency-installed defaults, or templates inherited from years ago when AI crawlers did not exist. The list is longer than most operators expect:

OpenAI / ChatGPT

•GPTBot: training crawler
•OAI-SearchBot: live retrieval crawler for ChatGPT Search
•ChatGPT-User: user-initiated browse fetches

Google

•Googlebot: standard search crawler
•Google-Extended: separate token controlling AI training and Gemini/AI Overview/AI Mode usage
•Allowing Googlebot but blocking Google-Extended limits AI surface inclusion without affecting Search

Anthropic / Claude

•ClaudeBot: training crawler
•Claude-User: user-initiated fetches
•Claude-SearchBot: web-search retrieval (added 2025; older blocks targeting only ClaudeBot do not cover this)

Apple, Perplexity, Bytedance, Meta

•Applebot + Applebot-Extended: search and AI training
•PerplexityBot + Perplexity-User: indexer and user-initiated
•CCBot (Common Crawl), Bytespider (ByteDance), Meta-ExternalAgent (Meta)

llms.txt: a candid status report

llms.txt was proposed by Jeremy Howard in September 2024 as a Markdown file at /llms.txt describing a site's content for LLM consumers. As of May 2026: no major LLM operator (Google, OpenAI, Anthropic, Microsoft, Apple) has officially endorsed it for ranking or retrieval. Adoption hovers around 10% on large-domain samples; independent statistical analyzes to date have found no measurable effect on LLM citation frequency. Implement it if your stack makes it trivial — a clean structured summary of your content costs little — but do not make it a focal investment.

Knowledge Graph eligibility for local businesses

Most local-business operators conflate the Knowledge Graph (the underlying data store of around 5 billion entities) with the Knowledge Panel (one of many surfaces that render data from it). A panel is not proof the graph trusts you; absence of a panel is not proof of exclusion. The reliable path:

1
Claim and complete a Google Business Profile
Required
GBP is the dominant entry path into the Knowledge Graph for local-business entities. Without a verified, complete profile, you are competing for ambient retrieval signals against businesses that have closed this loop. See the GBP optimization guide for the field-by-field playbook.
2
Maintain consistent NAP across authoritative directories
Strong
Citation consistency is an entity-disambiguation signal. The graph reads inconsistent NAP as evidence of two entities, not one. The NAP & citations guide covers the audit and remediation pattern.
3
Build the schema graph on your domain
Strong
Organization or LocalBusiness JSON-LD with sameAs to GBP, Wikidata, regulator pages, and major social profiles. Without this loop closed, the graph has nothing on your domain to anchor your identity to.
4
Earn authoritative third-party mentions
Strong
Recognized press, sector publications, regulator listings, professional-body directories. Each authoritative co-mention reinforces the entity record and feeds AI citation surfaces.
5
Create a Wikidata entry where notability allows
Variable
Lower bar than Wikipedia. For most established UK independents with a few years of trading history and any press coverage, a clean Wikidata entry materially accelerates entity convergence across surfaces.
6
Wikipedia entry where genuine notability exists
Rare
Most local businesses do not pass Wikipedia's notability bar, and trying to game an entry is worse than not having one. Where genuine independent coverage exists (a venue with significant press, a heritage business, a brand with sector profile), a real Wikipedia entry is a substantial signal.

Sector heuristics for entity work

Patterns observed across UK sectors for what entity signals dominate. Useful for prioritisation, not as a substitute for sector-specific research.

Legal (solicitors, barristers, conveyancers)

•AI Overviews appear on roughly a quarter of legal queries (Ahrefs ~24%) and lean conservative, deferring to authoritative sources
•SRA registration is a first-class entity attribute; expose roll number via Person.identifier and hasCredential
•December 2025 core update hit legal verticals hardest; remediation needs real practitioner authorship
•Companies House sameAs link is a high-trust entity hook; SRA Find a Solicitor profile is the canonical sameAs target

Medical / dental (clinics, dentists, opticians)

•Health is among the highest-triggering verticals: around 44% of medical queries surface AI Overviews (Ahrefs), though some high-risk queries had them suppressed in early 2026
•GMC, GDC, GOC numbers belong in Person.hasCredential
•CQC ratings and registration numbers via sameAs to the CQC profile
•Use MedicalBusiness, MedicalClinic, Dentist subtypes; avoid scaled AI-written health content

Hospitality (hotels, restaurants, pubs, venues)

•Use Hotel, Restaurant, BarOrPub subtypes plus LodgingBusiness for accommodation
•Express proximity to landmarks, transport hubs, and neighborhoods via containedInPlace and geo
•OpeningHoursSpecification, acceptsReservations, servesCuisine, priceRange are all read by AI extractors
•ChatGPT leans heavily on TripAdvisor/Yelp; Gemini and Perplexity favour first-party Review schema

Trades (plumbers, electricians, builders)

•Trade-association memberships (Gas Safe, NICEIC, FMB, TrustMark) are the entity authority backbone
•Express each via sameAs plus hasCredential pointing to the official register page
•Companies House sameAs is one of the highest-quality grounding signals available
•Service-area businesses use areaServed and serviceType; do not pretend to multiple physical locations using doorway pages

Financial (advisors, accountants, brokers)

•FCA Register listing is the canonical entity hook; FRN via Person.identifier
•ICAEW, ACCA, CIMA, CIOT for accountancy; ASIC equivalent in Australia
•Disclosure-heavy schema: AggregateRating only with first-party data; Service nodes for each regulated activity
•September 2025 SQRG update extended YMYL to civic information; financial planning content needs reviewer trails

What to avoid: 2026 penalty patterns and outdated tactics

The 90-day entity-optimization program

A practical phased plan for a UK local business. Each phase covers both the entity-record work and the schema/page work that anchors it on your domain.

1
Days 1 to 30: foundations and audit
Audit robots.txt explicitly for each AI crawler (GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, Claude-User, Claude-SearchBot, PerplexityBot, Perplexity-User, Google-Extended, Applebot-Extended, CCBot). Validate every JSON-LD block on the site through the Rich Results Test and Schema.org's validator. Confirm GBP, Bing Places, and Apple Business Connect all match. Inventory existing sameAs targets and replace any social-profile-heavy chain with one anchored in regulator records, Companies House, and authoritative identifiers.
2
Days 31 to 60: entity convergence
Create or claim a Wikidata Q-number where notability allows. Add the Q-number URL to your Organization JSON-LD's sameAs. Implement the load-bearing schema types: LocalBusiness with the most specific subtype, Organization with sameAs, Service per offered service, Person for any named author or expert, Review and AggregateRating from first-party data. For YMYL businesses, add hasCredential references to regulator records. Aim for one or two earned mentions in authoritative third-party sources by end of phase.
3
Days 61 to 90: measure and iterate
Run a structured query test set across Search, AI Overviews, AI Mode, ChatGPT Search, Perplexity, Claude, Microsoft Copilot, and (if relevant) Apple Intelligence. Measure entity inclusion rate, page citation rate, earned-media citation rate, and competitive set per surface. Use the Bing Webmaster AI Performance preview for Copilot citation visibility. Identify gaps (where competitors appear and you do not, on which surface, for which query type) and target the underlying signal. Re-test monthly.

Reference numbers

5B+

Entities in Google's Knowledge Graph

With over 500 billion facts attached. Local businesses populate it primarily through claimed Google Business Profiles.

v30.0

Schema.org current version

Released March 2026. Added Credential, ConferenceEvent, OnlineMarketplace, plus equivalence annotations to GS1, Dublin Core, and Open Graph.

200+

Countries with AI Mode

Google's AI Mode launched broadly in May 2025; expanded to 200+ countries and 35+ languages by October 2025.

~44%

Health AI Overview trigger rate

Health and science are among the highest-triggering verticals (Ahrefs, 146M SERPs); legal sits lower at ~24%. Reflects how aggressively AI Overviews enter informational YMYL territory.

11+

AI crawlers to audit

GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, Claude-User, Claude-SearchBot, PerplexityBot, Perplexity-User, Google-Extended, Applebot-Extended, CCBot.

30-90d

Realistic entity-signal lag

Between deploying a schema or sameAs change and it surfacing consistently in AI answers and Knowledge Graph rendering. Wikidata propagates faster.

Where this is going

Entity optimization is becoming the connective tissue of every modern retrieval surface. Search engines, AI assistants, and the graph-RAG architectures that increasingly power them all converge on the same question: which entity is this, and how confident are we? Businesses that close the loop between GBP, Wikidata, regulator records, schema on their domain, and authoritative third-party coverage become first-class citizens of that retrieval substrate. Businesses that do not get partial-credit answers when the system guesses, or no answer at all when it cannot.

The good news: the work is concrete and finite. A clean entity record is not infinite content investment. It is a one-time foundational build with quarterly maintenance. The teams that ship it now compound the advantage every time a new AI surface launches and reads from the same structured layer.

Where to go next

Keep reading

AI Search & Answer Engine VisibilityThe companion guide. Covers how each AI surface retrieves, the page-citation layer, and earned-media patterns that complement entity work.Read Google Business Profile optimizationGBP is the dominant entry path into the Knowledge Graph for local-business entities. The field-by-field playbook.Read NAP consistency & local citationsCitation consistency is an entity-disambiguation signal. The audit and remediation pattern for keeping the entity record clean.Read The complete guide to local SEOEntity optimization sits on top of local-SEO foundations. Start here if anything below is shaky.Read