Entity optimization: the complete guide
Knowledge Graph, Wikidata, schema graphs, sameAs convergence, and the entity stack that powers retrieval across both traditional SEO and AI assistants. The May 2026 playbook.
An entity is a thing in the world with a stable identity: a business, a person, a place, a product, a credential. Search engines and AI assistants think in entities, not strings. If your business is recognized as a clean, well-connected entity, you get recommended, cited, and disambiguated correctly across every retrieval surface that matters in 2026. If it is not, you compete keyword-by-keyword on a shrinking SERP. This is the complete playbook for entity optimization as it actually works in May 2026, across both traditional search and AI environments.
What an entity actually is
In information retrieval, an entity is a disambiguated, structured identity for a thing in the world. Each entity has:
- An identifier: a Knowledge Graph machine-readable ID, a Wikidata Q-number, a Companies House number, an SRA roll number, an ORCID. Stable; survives renames.
- A type:
LocalBusiness,Person,Organization,Product,MedicalClinic. Constrains which properties apply. - Properties: name, address, opening hours, credentials, area served, founded date, sameAs links to other identifiers for the same entity.
- Relationships: edges to other entities. A solicitor works for a firm; a firm operates in a jurisdiction; a clinic is contained in a city; a product is offered by a business.
Google formalised this view of search in 2012 when it launched the Knowledge Graph. The corporate-blog explainer states the system holds information about roughly 5 billion entities and over 500 billion facts, with local-business records populated through Google Business Profile claims rather than Wikipedia-style notability. Wikidata, the open sibling, holds well over 100 million items with a much lower notability bar — most established businesses with independent press coverage qualify, even small independents.
Identifier
A stable, machine-readable handle. Knowledge Graph ID, Wikidata Q-number, regulator roll number, Companies House number. Survives a rebrand.
Type
What kind of thing it is. LocalBusiness, Person, MedicalClinic, Hotel, Service. Determines which properties are valid.
Properties
Name, address, hours, credentials, area served, sameAs. The factual claims attached to the identifier.
Relationships
Edges to other entities. provider, employee, containedInPlace, hasCredential, areaServed. The graph structure.
Entity SEO is not better keyword SEO. The unit of work is the entity record, not the page. Once your entity is correctly resolved, disambiguated, and connected, every page on your domain inherits that identity in retrieval. Without it, every page competes alone.
How retrieval has changed: 2024 advice no longer survives
Most of the entity-SEO advice still circulating online was written before three structural shifts. Each one changes what you optimize for.
- 1
Query fan-out (AI Mode, Deep Search, Deep Research)
Largest shiftGoogle's AI Mode launched broadly in May 2025 and now runs in 200+ countries. A single user query gets decomposed into many parallel sub-queries. Deep Search can issue hundreds of searches before composing a cited answer. Your page can be retrieved as the source for sub-query #4 of 12 even when it would never have ranked top-10 for the original query. Optimization surface area has multiplied.
- 2
Multi-vector retrieval (MUVERA and successors)
ArchitecturalGoogle Research published MUVERA in June 2025: multi-vector retrieval at single-vector latency. In production rollout through 2026, this means Google can hold separate vector representations for distinct sub-topics inside one document instead of averaging them into one embedding. Pages that cleanly separate sub-topics with explicit entity scaffolding win; sprawling "ultimate guides" that average out semantically lose.
- 3
AI-assistant-native crawlers and three-bot taxonomies
OperationalAnthropic now operates ClaudeBot (training), Claude-User (user-initiated fetches), and Claude-SearchBot (web-search retrieval). OpenAI publishes GPTBot (training) and OAI-SearchBot (live retrieval) as separate tokens. Older robots.txt blocks targeting only the training crawlers no longer cover assistant search. Audit each bot name explicitly.
- 4
Bing Webmaster AI Performance reporting
MeasurementMicrosoft launched AI Performance reporting in Bing Webmaster Tools public preview in February 2026 — the first time publishers can see Copilot citation activity per URL. ChatGPT Search and Microsoft Copilot are both Bing-grounded, so this is a real measurement channel for non-Google AI surfaces.
The entity stack: where your identity actually lives
For a UK local business, "your entity" is not one record but a graph of records that need to converge on the same canonical identity. Search and AI extractors use this convergence as a confidence signal. The more authoritative sources agree on the same entity, the higher the retrieval confidence.
Knowledge Graph node
Google's structured record. Drives the Knowledge Panel, AI Overviews, AI Mode, and Gemini answers. For local businesses, populated mostly through GBP.
Wikidata Q-number
Open structured-data record. Lower notability bar than Wikipedia. Read by ChatGPT, Claude, Perplexity, and Gemini grounders for entity disambiguation.
Google Business Profile
Canonical source for local-business attributes (NAP, hours, services, reviews). Feeds the Knowledge Graph and Maps. Highest-leverage single entity record for most local businesses.
Regulator and registry records
Companies House, SRA, GMC, GDC, FCA, Gas Safe, NICEIC. Government and professional-body records are weighted heavily by AI extractors as authoritative entity hooks.
Schema.org graph on your domain
Organization or LocalBusiness JSON-LD with sameAs linking to all of the above. The closing loop that tells crawlers and extractors to treat all references as one entity.
Authoritative third-party mentions
Recognized press, sector publications, regulator listings, professional-body directories. Each authoritative co-mention is a citation-source for AI answers and a co-occurrence signal for ranking.
When all six layers reference the same canonical entity, retrieval systems converge with high confidence. When they disagree (different names, addresses, missing nodes, broken sameAs), the system either picks one and gets it partially right, or fails to surface you at all. Entity convergence is the goal.
Entity salience: the public proxy for "how strongly does Google associate this entity with this page?"
Google's Cloud Natural Language API still publicly exposes a 0-1 salience score for every entity it extracts from a document. The same NLP family underpins much of Google's organic ranking and AI surface logic. Salience is not "the ranking signal" — but it is the most direct public proxy practitioners have for how Google's NLP weights an entity within a page.
Practical use:
- Run your service pages through the NL API. The intended primary entity for the page should be the highest-salience entity returned. If it is not, your page is semantically about something else.
- Look at adjacent high-salience entities. They reveal the topic neighborhood Google is reading the page in. Useful for spotting drift (a "boiler service" page where "kitchen renovation" is the top adjacent entity is in trouble).
- Watch for entities that appear in the page text but never extract. Almost always a content-structure or schema problem. Add internal links, headings, or a structured-data block that names the entity explicitly.
Schema.org as the structured handoff to LLM extractors
Schema.org released v30.0 in March 2026, adding new types (Credential, ConferenceEvent, OnlineMarketplace, InstantaneousEvent) and equivalence annotations to GS1, Dublin Core, and Open Graph. Two facts shape how to use it in 2026:
- Schema is now the primary structured handoff to LLM extractors, not just a rich-results lever. ChatGPT, Claude, Gemini, and Perplexity read your JSON-LD when they ground answers, even when no SERP rich result is on offer.
- Google deprecated rich-result eligibility for several types from January 2026 (Q&A, Practice Problem, Sitelinks Search Box, SpecialAnnouncement). The schema types themselves are not deprecated — and AI extractors continue to read them. Decide on a per-type basis whether the maintenance cost is justified by AI-extraction value.
The load-bearing types for a local business
- 1
LocalBusiness (or a relevant subtype)
RequiredThe canonical entity record for your business on your domain. Required fields per Google's docs:
name,address,telephone,openingHoursSpecification. Use the most specific subtype available:Dentist,MedicalClinic,Restaurant,Hotel,Plumber,Attorney. - 2
Organization with sameAs
RequiredBrand-level identity, separate from the venue. sameAs pointing to your Wikidata Q-number, GBP listing, Companies House page, regulator records, and major social profiles. This is the entity-disambiguation hook that AI extractors read most aggressively.
- 3
Service per offered service
StrongOne
Servicenode per major service, each with aproviderreference back to the LocalBusiness, anareaServed, and optionallyoffers. Maps to AI Mode sub-queries about specific services. - 4
Person for named authors and key staff
Strong (critical for YMYL)Author bylines as
Personnodes withjobTitle,worksFor,hasCredential, andsameAspointing to LinkedIn, regulator listings, academic profiles, ORCID. v30.0's newCredentialtype lets you describe qualifications properly. - 5
Review and AggregateRating
StrongReview counts and ratings are read by AI extractors as a trust signal and surface directly in some assistant answers. Use first-party review data; do not invent counts or pull aggregate ratings from third-party platforms you cannot verify.
- 6
FAQPage on FAQ blocks
VariableGoogle deprecated rich-result eligibility for FAQ in 2023, but AI extractors still parse FAQPage as chunked Q&A. Worth keeping for AI-extraction value alone where you already have FAQ content.
- 7
Article, BlogPosting, NewsArticle on editorial content
StrongPage-level schema for editorial. Author, datePublished, dateModified, publisher referencing your Organization. Maps to citation-extraction in AI Overviews and AI Mode.
- 8
BreadcrumbList for navigational context
FoundationTells extractors where the page sits in your site architecture. Cheap to implement and signals topical-cluster membership.
The sameAs property is doing more work than you realize
sameAs declares "treat all of these URLs as references to the same entity". Modern AI extractors weight authoritative identifiers (Wikidata, Wikipedia, regulator pages, Companies House) far more heavily than another link to your own LinkedIn. The 2024 advice to "stuff every social profile in" is now a missed-opportunity pattern.
A high-quality sameAs array for a UK local business looks like:
- Wikidata Q-number URL
- Your Google Business Profile URL
- Corporations Canada or provincial registry page
- Regulator or professional-body record: provincial law society, College of Physicians and Surgeons, CPA Canada, OSC, IIROC
- One or two top-tier press mentions
- Two to three primary social profiles
Three patterns to avoid: paid sameAs link networks designed to inflate entity confidence (extractors weight by destination authority, so most contribute nothing); broken or 404-returning sameAs targets (read as active disconfirmation); inconsistent name or address strings between the targets (forces the extractor to pick a winner and may pick the wrong one).
Wikidata: the open Knowledge Graph
Wikidata sits below Wikipedia on the notability ladder. Per the official policy, an item must satisfy at least one of three criteria: a valid Wikimedia sitelink; a clearly identifiable conceptual or material entity with serious public references; or a structural need to support other items. A business registry listing alone is not enough — you need independent reliable-source coverage to pass the second criterion. Most established businesses with a few years of trading and any press clear that bar.
- 1
Search before you create
Wikidata may already have an item for your business — created by a Wikipedia editor, an industry catalog, or a previous merger record. Search by name, by domain, and by any prior trading name. If an item exists, the work is to enrich and correct it, not to create a duplicate.
- 2
Establish notability before you create
Independent reliable-source coverage from at least two distinct sources is the practical threshold. Local press, sector trade press, regulator publications, and academic citations all count. Press releases, your own marketing, and client testimonials do not. If you do not pass the bar, work on earning the coverage first.
- 3
Create or commission the item
Editing your own entry is technically allowed but discouraged. Better is to provide a Wikidata-active editor with sourced facts and let them create the entry. The Wikidata community has an active 2025-26 RfC on notability reform; expect minor policy shifts but not radical ones.
- 4
Populate the load-bearing properties
P31instance-of,P17country,P159headquarters location,P856official website,P1448official name,P1454legal form. For UK businesses,P1320OpenCorporates ID and identifiers for Companies House, the FCA register, or relevant trade bodies close the entity-disambiguation loop. - 5
Close the loop on your domain
Add the resulting Q-number URL to your Organization JSON-LD's
sameAsarray. This is what tells extractors that your domain and the Wikidata entry describe the same entity. Without that loop closed, the Wikidata entry is half-useful at most.
Authoritative co-occurrence: not dead, but reframed
The classic "co-citation" idea was that if your business name appears alongside a competitor's name on enough authoritative pages, search engines learn to associate you with the same category. That is still true, but the threshold for "authoritative" has tightened and the threshold for "manipulative" has loosened in enforcement.
Two things have changed materially:
- Site reputation abuse. Google introduced this as a spam category in March 2024 (manual actions only) and moved to algorithmic enforcement in the August 2025 spam update. Hosting third-party commercial content under a strong site's authority — paid placements, subfolder rentals, "loan content" arrangements — is now actively dangerous, not just risky. The European Commission opened a DMA investigation into the policy's application to news publishers in November 2025.
- Trust filtering at retrieval. AI assistants weight destination authority during retrieval, not just at training. Manufactured co-occurrence on low-quality directories contributes nothing to AI citation rates and correlates with manual-action triggers.
What still works: earning genuine mentions in recognized press, sector publications, regulator and professional-body listings, and curated "best of" round-ups. Each authoritative co-mention is both a citation-source for AI answers and a co-occurrence signal for ranking. See the AI search visibility guide for the earned-media playbook in detail.
Author and Person entities: where YMYL gets hardest
Your Money or Your Life queries — medical, legal, financial, safety, and now (per the September 2025 Search Quality Rater Guidelines update) government and civic information — are handled more conservatively by every major retrieval surface. Models default to authoritative sources, surface caveats more readily, and decline to recommend specific providers more often than in lower-risk sectors.
Two 2025-26 shifts shape the response:
- On 1 February 2026, Google added a dedicated "Authors" section to Search Central documentation, codifying author-entity transparency as an explicit quality signal.
- The December 2025 core update was the most significant YMYL-focused update since the 2018 Medic update; legal verticals were among the hardest hit. The March 2026 core update further amplified the "Experience" leg of E-E-A-T, rewarding first-hand specifics and verifiable author credentials.
What a YMYL author entity should expose
- •Named author with full real-name byline on every editorial page
- •Person schema with hasCredential pointing to a regulator record (SRA roll number, GMC number, FCA reference)
- •Visible bio, qualifications, dated content, last-updated metadata
- •sameAs to LinkedIn, ORCID, regulator listing, and academic profile where applicable
- •Reviewer trail when the author is junior — a named senior reviewer with their own credentials
- •Link from the page byline to a full author page that resolves as its own entity
What gets actively demoted
- •Generic 'admin' or 'editorial team' bylines on YMYL content
- •Stock-photo author personas with no verifiable credential trail
- •Undated content claiming current pricing or current legal positions
- •AI-generated YMYL content with no expert review
- •'Reviewed by' placeholders that do not link to a real reviewer
- •Content that contradicts the regulator-published guidance on the same topic
How each major surface treats entities, in May 2026
Different retrieval stacks are wired differently. Knowing which stack each one uses tells you which signals you need to invest in for each surface.
Google (Search, AI Overviews, AI Mode, Gemini)
- •Retrieval: Google index + Knowledge Graph + Maps API + (in AI Mode) query fan-out + Deep Search
- •Entity disambiguation: Knowledge Graph node primary; Wikidata used for grounding; schema.org sameAs as a hint
- •Crawler control: Googlebot for ranking; Google-Extended for AI training and AI surface usage; both honor robots.txt
- •What dominates: GBP completeness, Knowledge Graph entry, sameAs convergence
ChatGPT Search and Microsoft Copilot
- •Retrieval: Bing-grounded plus OpenAI's licensing partners (News Corp, Axel Springer, Time, Le Monde)
- •Entity disambiguation: Bing's index plus schema.org plus Wikidata; sameAs read aggressively
- •Crawler control: GPTBot (training), OAI-SearchBot (live retrieval), ChatGPT-User (user-initiated); Bingbot for Bing index
- •What dominates: Bing visibility, schema graph, licensed-publisher coverage, directory presence
Perplexity
- •Retrieval: PerplexityBot crawl plus multiple search APIs; weights recency and citation-worthy long-form heavily
- •Entity disambiguation: schema.org plus Wikidata plus inline citation extraction
- •Crawler control: PerplexityBot (indexing) and Perplexity-User (user-initiated)
- •What dominates: long-form pages with original data, dated content, clear section structure, schema with Person markup
Claude (Anthropic)
- •Retrieval: Brave Search plus pre-trained corpus plus user-initiated fetches
- •Entity disambiguation: schema.org, Wikidata, regulator records read for YMYL
- •Crawler control: ClaudeBot (training), Claude-User (user-initiated), Claude-SearchBot (web-search)
- •What dominates: third-party authoritative mentions, Person schema with credentials, Brave visibility
Apple Intelligence (Siri, Spotlight, Safari)
- •Retrieval: Apple's ecosystem (Apple Maps, Apple Business Connect, Spotlight) plus optional ChatGPT routing when the user opts in
- •World Knowledge Answers: Apple's AI summarisation system announced for 2026
- •Crawler control: Applebot (search/Siri/Spotlight), Applebot-Extended (AI training opt-out)
- •What dominates: Apple Business Connect completeness, Apple Maps presence, schema; iOS-heavy audiences justify the same effort here as GBP
Brave Search
- •Independent index of around 40 billion pages, refreshed by ~100 million pages added or refreshed daily
- •Brave LLM Context launched February 2026: public API for grounding third-party LLMs against the Brave index
- •Powers Claude live retrieval and several smaller assistants
- •What dominates: same SEO foundations as Bing — index visibility, schema, content quality
Crawler access: the audit nobody runs
AI crawlers honor robots.txt. Many UK sites accidentally block them via wildcard rules, agency-installed defaults, or templates inherited from years ago when AI crawlers did not exist. The list is longer than most operators expect:
OpenAI / ChatGPT
- •
GPTBot: training crawler - •
OAI-SearchBot: live retrieval crawler for ChatGPT Search - •
ChatGPT-User: user-initiated browse fetches
- •
Googlebot: standard search crawler - •
Google-Extended: separate token controlling AI training and Gemini/AI Overview/AI Mode usage - •Allowing Googlebot but blocking Google-Extended limits AI surface inclusion without affecting Search
Anthropic / Claude
- •
ClaudeBot: training crawler - •
Claude-User: user-initiated fetches - •
Claude-SearchBot: web-search retrieval (added 2025; older blocks targeting only ClaudeBot do not cover this)
Apple, Perplexity, Bytedance, Meta
- •
Applebot+Applebot-Extended: search and AI training - •
PerplexityBot+Perplexity-User: indexer and user-initiated - •
CCBot(Common Crawl),Bytespider(ByteDance),Meta-ExternalAgent(Meta)
llms.txt: a candid status report
llms.txt was proposed by Jeremy Howard in September 2024 as a Markdown file at /llms.txt describing a site's content for LLM consumers. As of May 2026: no major LLM operator (Google, OpenAI, Anthropic, Microsoft, Apple) has officially endorsed it for ranking or retrieval. Adoption hovers around 10% on large-domain samples; independent statistical analyzes to date have found no measurable effect on LLM citation frequency. Implement it if your stack makes it trivial — a clean structured summary of your content costs little — but do not make it a focal investment.
Knowledge Graph eligibility for local businesses
Most local-business operators conflate the Knowledge Graph (the underlying data store of around 5 billion entities) with the Knowledge Panel (one of many surfaces that render data from it). A panel is not proof the graph trusts you; absence of a panel is not proof of exclusion. The reliable path:
- 1
Claim and complete a Google Business Profile
RequiredGBP is the dominant entry path into the Knowledge Graph for local-business entities. Without a verified, complete profile, you are competing for ambient retrieval signals against businesses that have closed this loop. See the GBP optimization guide for the field-by-field playbook.
- 2
Maintain consistent NAP across authoritative directories
StrongCitation consistency is an entity-disambiguation signal. The graph reads inconsistent NAP as evidence of two entities, not one. The NAP & citations guide covers the audit and remediation pattern.
- 3
Build the schema graph on your domain
StrongOrganization or LocalBusiness JSON-LD with sameAs to GBP, Wikidata, regulator pages, and major social profiles. Without this loop closed, the graph has nothing on your domain to anchor your identity to.
- 4
Earn authoritative third-party mentions
StrongRecognized press, sector publications, regulator listings, professional-body directories. Each authoritative co-mention reinforces the entity record and feeds AI citation surfaces.
- 5
Create a Wikidata entry where notability allows
VariableLower bar than Wikipedia. For most established UK independents with a few years of trading history and any press coverage, a clean Wikidata entry materially accelerates entity convergence across surfaces.
- 6
Wikipedia entry where genuine notability exists
RareMost local businesses do not pass Wikipedia's notability bar, and trying to game an entry is worse than not having one. Where genuine independent coverage exists (a venue with significant press, a heritage business, a brand with sector profile), a real Wikipedia entry is a substantial signal.
Sector heuristics for entity work
Patterns observed across UK sectors for what entity signals dominate. Useful for prioritisation, not as a substitute for sector-specific research.
Legal (solicitors, barristers, conveyancers)
- •Highest AI Overview trigger rate of any vertical (~78%)
- •SRA registration is a first-class entity attribute; expose roll number via Person.identifier and hasCredential
- •December 2025 core update hit legal verticals hardest; remediation needs real practitioner authorship
- •Companies House sameAs link is a high-trust entity hook; SRA Find a Solicitor profile is the canonical sameAs target
Medical / dental (clinics, dentists, opticians)
- •Around 44% of medical YMYL queries trigger AI Overviews; some high-risk queries had AI Overviews suppressed in early 2026
- •GMC, GDC, GOC numbers belong in Person.hasCredential
- •CQC ratings and registration numbers via sameAs to the CQC profile
- •Use MedicalBusiness, MedicalClinic, Dentist subtypes; avoid scaled AI-written health content
Hospitality (hotels, restaurants, pubs, venues)
- •Use Hotel, Restaurant, BarOrPub subtypes plus LodgingBusiness for accommodation
- •Express proximity to landmarks, transport hubs, and neighborhoods via containedInPlace and geo
- •OpeningHoursSpecification, acceptsReservations, servesCuisine, priceRange are all read by AI extractors
- •ChatGPT leans heavily on TripAdvisor/Yelp; Gemini and Perplexity favour first-party Review schema
Trades (plumbers, electricians, builders)
- •Trade-association memberships (Gas Safe, NICEIC, FMB, TrustMark) are the entity authority backbone
- •Express each via sameAs plus hasCredential pointing to the official register page
- •Companies House sameAs is one of the highest-quality grounding signals available
- •Service-area businesses use areaServed and serviceType; do not pretend to multiple physical locations using doorway pages
Financial (advisors, accountants, brokers)
- •FCA Register listing is the canonical entity hook; FRN via Person.identifier
- •ICAEW, ACCA, CIMA, CIOT for accountancy; ASIC equivalent in Australia
- •Disclosure-heavy schema: AggregateRating only with first-party data; Service nodes for each regulated activity
- •September 2025 SQRG update extended YMYL to civic information; financial planning content needs reviewer trails
What to avoid: 2026 penalty patterns and outdated tactics
The 90-day entity-optimization program
A practical phased plan for a UK local business. Each phase covers both the entity-record work and the schema/page work that anchors it on your domain.
- 1
Days 1 to 30: foundations and audit
Audit robots.txt explicitly for each AI crawler (GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, Claude-User, Claude-SearchBot, PerplexityBot, Perplexity-User, Google-Extended, Applebot-Extended, CCBot). Validate every JSON-LD block on the site through the Rich Results Test and Schema.org's validator. Confirm GBP, Bing Places, and Apple Business Connect all match. Inventory existing sameAs targets and replace any social-profile-heavy chain with one anchored in regulator records, Companies House, and authoritative identifiers.
- 2
Days 31 to 60: entity convergence
Create or claim a Wikidata Q-number where notability allows. Add the Q-number URL to your Organization JSON-LD's sameAs. Implement the load-bearing schema types: LocalBusiness with the most specific subtype, Organization with sameAs, Service per offered service, Person for any named author or expert, Review and AggregateRating from first-party data. For YMYL businesses, add hasCredential references to regulator records. Aim for one or two earned mentions in authoritative third-party sources by end of phase.
- 3
Days 61 to 90: measure and iterate
Run a structured query test set across Search, AI Overviews, AI Mode, ChatGPT Search, Perplexity, Claude, Microsoft Copilot, and (if relevant) Apple Intelligence. Measure entity inclusion rate, page citation rate, earned-media citation rate, and competitive set per surface. Use the Bing Webmaster AI Performance preview for Copilot citation visibility. Identify gaps (where competitors appear and you do not, on which surface, for which query type) and target the underlying signal. Re-test monthly.
Reference numbers
5B+
Entities in Google's Knowledge Graph
With over 500 billion facts attached. Local businesses populate it primarily through claimed Google Business Profiles.
v30.0
Schema.org current version
Released March 2026. Added Credential, ConferenceEvent, OnlineMarketplace, plus equivalence annotations to GS1, Dublin Core, and Open Graph.
200+
Countries with AI Mode
Google's AI Mode launched broadly in May 2025; expanded to 200+ countries and 35+ languages by October 2025.
78%
Legal-vertical AI Overview trigger rate
The highest of any vertical. Medical follows at around 44%. Reflects how aggressively AI Overviews enter YMYL territory.
11+
AI crawlers to audit
GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, Claude-User, Claude-SearchBot, PerplexityBot, Perplexity-User, Google-Extended, Applebot-Extended, CCBot.
30-90d
Realistic entity-signal lag
Between deploying a schema or sameAs change and it surfacing consistently in AI answers and Knowledge Graph rendering. Wikidata propagates faster.
Where this is going
Entity optimization is becoming the connective tissue of every modern retrieval surface. Search engines, AI assistants, and the graph-RAG architectures that increasingly power them all converge on the same question: which entity is this, and how confident are we? Businesses that close the loop between GBP, Wikidata, regulator records, schema on their domain, and authoritative third-party coverage become first-class citizens of that retrieval substrate. Businesses that do not get partial-credit answers when the system guesses, or no answer at all when it cannot.
The good news: the work is concrete and finite. A clean entity record is not infinite content investment. It is a one-time foundational build with quarterly maintenance. The teams that ship it now compound the advantage every time a new AI surface launches and reads from the same structured layer.
Where to go next
Keep reading