Live · status OK
Back to blog
SEO & AEO13 min

GEO in 2026: Getting Cited by ChatGPT, Perplexity and Google AI Overviews

TL;DR

GEO (Generative Engine Optimization) optimizes for being cited by AI engines, not just indexed by Google. Princeton 2024 study: sourced citations (+40.6%), expert quotations (+41%) and numerical statistics (+37.2%) increase extraction odds. 2026 stack: extractable structure (autonomous chunks), rich schemas (FAQPage, HowTo, Speakable), llms.txt, robots.txt allowing OAI-SearchBot/PerplexityBot, plus Wikidata + Knowledge Graph entity.

Julien Daniel
ByJulien Daniel
Founder & CTO, OptionWeb
Share
ChatGPT, Perplexity and Gemini logos with citation extraction arrows

65% of Google searches no longer generate a click in 2026 according to SparkToro. Users get their answers directly in AI Overviews, ChatGPT Search, Perplexity, Gemini or Copilot. If your site is not cited as a source in those answers, you are invisible — regardless of your classic SEO ranking. Generative Engine Optimization (GEO) addresses this new game. Here are the concrete techniques OptionWeb applies on client sites in 2026.

1. GEO vs SEO: the fundamental difference

SEO optimizes to appear in a list of results. GEO optimizes to be the answer extracted by an LLM. Two different games with techniques that partially overlap.

AspectClassic SEOGEO
TargetTop 10 Google SERPCitation in LLM synthesis
Winning formatFull optimized pageAutonomous extractable passages
SignalsBacklinks, authority, relevanceSourced citations, statistics, clear structure
MeasurementPosition, clicks, impressionsShare of Model, citation rate, sentiment
Entity roleImportantCritical (Knowledge Graph, Wikidata)
Content formatLong-form, completeModular, independent chunks

2. Mapping AI engines in 2026

Knowing each engine's retrieval source allows you to target effort.

EngineRetrieval sourceSynthesis modelAI market share
Google AI OverviewsGoogle indexGemini~55-60%
ChatGPT SearchBing + OAI-SearchBot crawl + partnershipsGPT-4o/o-series~20-25%
PerplexityBing + Google + PerplexityBot crawlClaude/GPT-4o (your choice)~8-10%
Microsoft CopilotBingGPT-4o~5-8%
Gemini appGoogle indexGemini~5%
Claude (web search)Brave SearchClaude~1-2%

Strategic conclusion: optimizing for Google + Bing = covering 95% of AI retrieval. Bing SEO (often neglected) becomes critical for ChatGPT Search and Copilot.

3. Understanding RAG to optimize

RAG (Retrieval Augmented Generation) follows a 5-step pipeline:

  1. Query rewriting: the LLM reformulates the user query into 3-10 sub-queries
  2. Retrieval: a search engine (Google, Bing, Brave) returns 10-20 documents
  3. Chunk extraction: documents are split into 100-500 token passages
  4. Similarity scoring: vector embedding to rank the most relevant chunks
  5. Synthesis: the LLM generates a response from the top-K chunks and chooses which to cite

Concrete implications: each paragraph must be autonomous (understandable in isolation), named entities must be explicit (no ambiguous pronouns), statistics and citations reinforce the 'reliability' perceived by the LLM.

4. The 9 techniques from the Princeton study

Aggarwal et al. (NeurIPS 2024) tested 9 optimizations on GPT-4 under real RAG conditions. Results on the Subjective Impression metric (weight in the generated response):

TechniqueExtraction liftImplementation
Expert quotations+41.0%Citations with name, title, organization
Sourced citations+40.6%Inline academic or authoritative refs
Numerical statistics+37.2%3+ stats per article, sources linked
Stylistic authority+13.8%Expert tone, precise technical vocabulary
Technical terms+9.1%Domain language
Ease of reading+7.9%Clear structure, short paragraphs
Unique words-1.2%Diverse vocabulary — little effect
Keyword stuffing-10.3%Penalizing — avoid
Fluency optimization-1.8%Marketing-speak — penalizing

5. Extractable content structure

Chunk-friendly atomic pattern:

  • H2 or H3 as a question/factual statementMatch user queries. E.g.: 'What is Consent Mode v2?' rather than 'Consent Mode v2'.
  • First sentence = autonomous answerSelf-contained, no pronouns, with named entity. Understandable outside the document context.
  • 2-4 evidence sentencesStatistics, citations, concrete examples. Reinforces reliability.
  • Optional details afterThe LLM extracts the first sentences. Details are for the human reader.

Anti-patterns to avoid: river sentences >40 words, pronouns without nearby antecedent ("it", "this", "these"), key info hidden in paragraph 5, content in images without rich alt, broken heading hierarchy (H2 → H4 without H3).

6. llms.txt and robots.txt for AI

llms.txt is an emerging standard (Jeremy Howard, fast.ai, September 2024). Markdown file at the root of public/ summarizing the curated structure of the site for LLMs.

public/llms.txtmarkdown
# OptionWeb

> Agence web belge depuis 2014. Spécialisée Next.js, hébergement cloud EU, SEO technique pour PME européennes.

## Services
- [Création de sites web](https://optionweb.dev/fr/creation-sites-web): sites Next.js 100/100 SEO
- [Hébergement cloud](https://optionweb.dev/fr/hebergement-cloud): infrastructure managed EU
- [SEO Marketing](https://optionweb.dev/fr/seo-marketing): SEO technique + AEO + GEO

## Blog
- [Hébergement Belgique 2026](https://optionweb.dev/fr/blog/hebergement-belgique-2026/)
- [Next.js vs WordPress](https://optionweb.dev/fr/blog/nextjs-vs-wordpress/)

For robots.txt, balanced 2026 configuration for an SMB wanting to maximize AI visibility:

public/robots.txttxt
# Autoriser tous les bots de retrieval IA
User-agent: OAI-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: Perplexity-User
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: GPTBot
Allow: /

# Default
User-agent: *
Allow: /
Sitemap: https://optionweb.dev/sitemap.xml

7. Entity SEO: Wikidata and Knowledge Graph

LLMs reason about entities, not keywords. An entity well-defined in Wikidata + Google Knowledge Graph = massive reliability signal for LLMs.

Action plan:

  1. Create a Wikidata item (Q-number) for your organization. More accessible than Wikipedia (notability criteria less strict).
  2. Add serious external identifiers: P856 (official site), P3608 (EU VAT), P3376 (Belgian BCE), P4264 (LinkedIn), P2671 (Google Knowledge Graph ID if you have it).
  3. Emit Schema.org Organization on the site with sameAs to Wikidata, LinkedIn, Crunchbase, official social profiles.
  4. For authors: Schema Person + sameAs to verifiable profiles (LinkedIn, ORCID, Google Scholar, etc.).

How to measure your AI visibility

Three approaches depending on budget:

MethodCostAccuracy
Manual tests (50 monthly prompts)0 €Good but subjective
SaaS tools (Profound, Athena, Otterly)100-500 €/monthExcellent, automated
GA4 custom 'AI Search' channel0 €Real traffic but lagging indicator

Core KPIs to track:

  • Share of Model (SoM)% of target prompts where your brand is mentioned in the response
  • Citation Rate% of prompts where you are linked as a clickable source
  • SentimentTone (neutral/positive/negative) in responses that mention you
  • Position in synthesis1st cited source = more clicks than the following ones
  • GA4 AI Search referralReal traffic measurable via UTMs or channel attribution
Tags#geo#aeo#ai-overviews#chatgpt-search#perplexity#llms-txt