The AEO audit: What it actually measures (and how to run one)

Jun 11, 2026

by Ema Fulga

Ema is an AI Search Content Strategist and GEO (Generative Engine Optimisation) expert. She's also the founder of decipher., an AEO agency that helps brands appear where people are now searching: AI-powered platforms like Perplexity, ChatGPT, Gemini, and others. With a background in copywriting and creative strategy, she’s on a mission to turn messy messaging into clear and structured content that helps brands get mentioned and cited in AI searches.

Connect with Ema.

Last updated 16.06.2026

Most brands are still asking the wrong question. "Do we rank on Google?" is 2019 thinking. The question that matters now is simpler and more uncomfortable: when someone asks ChatGPT, Perplexity, or Gemini about your category, does your brand come up?

An AI search audit answers that question with actual data. Not a gut feeling. Not a one-off screenshot of a ChatGPT conversation your marketing intern ran on a Tuesday. A structured, repeatable measurement of where you stand across the AI platforms that are increasingly where decisions begin.

Here's the uncomfortable truth: most brands have no idea what AI says about them. And some of what it says is wrong.

This is the methodology we use at decipher. when we run an AI search audit for a client. Use it to run one yourself, or use it to pressure-test what an agency hands you.

TL;DR: An AI search audit measures how often your brand appears in AI-generated answers, how it is described, and how it compares with competitors. A proper audit covers six steps: scoping, building a prompt bank, running across multiple models, scoring results, benchmarking competitors, and producing a prioritised action plan. Forty to sixty prompts across ChatGPT, Claude, Perplexity, and Gemini is the minimum for credible data.

What an AI search audit actually measures

A serious audit is not a vibe check. It produces metrics you can track over time and compare against competitors. Here's what we measure:

Metric	What it tells you
Mention rate	How often your brand is named across a set of relevant prompts
Citation rate	How often you're named with a link (stronger authority signal, drives actual traffic)
Share of voice	Your mentions as a percentage of all brand mentions in your category
Sentiment	Whether AI describes you positively, neutrally, or... less helpfully
Position in lists	When AI returns a ranked list, where do you land? First or fifth is not the same thing
Hallucination rate	Factual errors about your brand: invented products, wrong pricing, made-up claims

That last one tends to surprise people. AI doesn't always get it right. A model confidently describing your service incorrectly is a reputational problem, not just a visibility one. It's worth tracking explicitly.

A high mention rate with poor sentiment is not a win. The metrics only make sense together.

The methodology: how we run an AI search audit

Step 1: Scope it properly

Before a single prompt gets written, you need three things locked in: your target audience (who they are, where they are, what language they search in), the buying-journey questions your brand needs to win, and your competitor set. Four to six direct rivals are enough.

Without this frame, prompts drift toward generic queries and the results become meaningless. "Best [category] brand" is not a useful prompt. "Which [category] brand is best for [specific use case] in [geography]?" is.

Step 2: Build a prompt bank

Forty to sixty prompts is a credible starting point for a single market. Below thirty, statistical noise overwhelms the signal. Above two hundred, you're generating resolution you can't act on.

The key is stratification: spread prompts across the buying journey.

Discovery: "What should I look for in a [category] provider?"
Comparison: "What are the differences between [Brand A] and [Brand B]?"
Decision: "Which [category] brand is recommended for [specific need]?"
Branded: "What does [your brand] specialise in?" (reveals what AI thinks it knows about you)

Around 20-30% of prompts should be branded. The rest stay non-branded to measure organic visibility.

Step 3: Run across multiple models

Testing one model is an anecdote. A credible audit in 2026 covers ChatGPT, Claude, Perplexity, and Gemini as a minimum. Each has different training data, different source-selection logic, and different citation behaviour.

Run each prompt in both native mode (model answers from training data) and web mode (model browses live). They measure different things. A brand that performs well in web mode but poorly in native mode has a recency problem: its authority signals are too new to have made it into training data yet.

Step 4: Score and extract

For each answer, record:

Is the brand mentioned? (Y/N)
Form of mention: cited with link, named without link, listed in a comparison
Sentiment: positive, neutral, negative
Position when listed
Competitors named in the same answer
URLs cited as sources
Any factual errors

Manual scoring works for a sample of fifty answers. Beyond that, you need automation. Doing this monthly across four models and six competitors by hand is not a strategy; it's a punishment.

Step 5: Benchmark against competitors

This is where the audit earns its keep. The numbers only matter in context. A 25% mention rate is strong if your nearest competitor sits at 12%. It's a problem if they're at 45%.

Useful cuts to run:

Share of voice by model (are you invisible on one specific engine?)
Share of voice by funnel stage (visible at awareness, absent at decision?)
Sentiment gap (does a competitor consistently get warmer framing than you?)
Native vs web delta (strong on web, weak on native = your content is too new)

Step 6: Prioritised action plan

The audit ends with a written action plan. Not dashboards. Not a PDF of screenshots. A list of no more than ten prioritised actions, each with an estimated effort, the metric it moves, and a clear owner.

More than ten priorities means nothing gets done. We've seen it.

The most common mistakes we see

Too few prompts. Under thirty, one hostile answer swings your sentiment score by ten points. Stratify rather than inflate.
No competitive baseline. A 20% mention rate is meaningless without competitor numbers next to it.
One model only. A ChatGPT-only audit misses Perplexity's citation patterns entirely, and Google AI Overviews is a different surface again.
Ignoring hallucinations. A model that invents a product feature or a wrong price point hurts you even when it's mentioning you. Track it.
One-shot audits. AI answers shift constantly. A single audit ages within weeks. The real value is in the trend line, not the snapshot.
Confusing mentions with citations. Being named is good. Being named with a link back to your site is better. They are not the same metric.

For a broader look at what AI search optimisation involves beyond the audit itself, our complete guide to AI search optimisation covers the strategic layer.

What happens after the audit

The audit is the baseline. It tells you where you stand.

What it doesn't do is fix anything. Once you have the data, the work shifts to closing the gaps: content structured for AI citation, authority signals, schema markup, third-party mentions. The audit also becomes your recurring measurement instrument. Every quarter, you re-run the same prompt bank (frozen, so the trend line is comparing like-for-like) and track whether the work is moving the numbers.

That's the difference between an AI search strategy and a one-off experiment.

If you'd rather not run this yourself, we'll do it for you. Our AI search audit gives you the structured baseline, the competitive benchmark, and the prioritised action plan described above. No dashboards without context. No screenshots without interpretation. Just a clear picture of where you stand and what to do about it.

FAQ AI search audit: common questions

What is an AI search audit?

An AI search audit measures how often your brand appears in AI-generated answers, how it is described, whether it is cited with a link, and how it compares with competitors. It gives you a baseline for visibility inside tools like ChatGPT, Perplexity, Gemini, and AI Overviews.

How long does an AI search audit take?

A well-run audit covering 50-60 prompts across four models takes a day of structured analysis to produce a written report. If you're doing it manually, add time for the prompt runs themselves. Automated tooling compresses the data collection significantly.

How often should you re-run an AI search audit?

Quarterly is the minimum for most categories. Monthly makes sense in competitive markets where AI answers shift frequently. The prompt bank must stay frozen between runs; otherwise you're comparing different questions and the trend line is meaningless.

Which AI models should be included?

At minimum, test ChatGPT, Claude, Perplexity, and Gemini. They do not use the same source-selection logic, and they do not cite in the same way, so a single-model audit will miss important gaps.

How many prompts do you need for a useful audit?

Forty to sixty prompts is a solid starting point for one market. Fewer than thirty, and the data gets noisy fast. The real trick is covering discovery, comparison, decision, and branded queries so you can see the full picture.

Does an AI search audit replace an SEO audit?

No. They measure different surfaces. SEO and AI search optimisation work together: strong search authority still influences what AI models cite, especially in web-search mode. Think of the AI audit as the layer on top, not a replacement for the foundation.

Should small brands bother with an AI search audit?

Yes. A smaller brand with clear positioning and three direct competitors only needs around 40 prompts. The complexity scales with the number of geographies and languages, not with the brand's size. And smaller brands often find the audit more actionable precisely because the gaps are clearer.

What's the difference between a mention and a citation?

A mention is when AI names your brand in an answer. A citation is when it names you and links to your site. Citations drive referral traffic and carry a stronger authority signal. Both matter, but they're not interchangeable metrics.

‹ FAQ structure for AI discovery (2026 guide)

SaaS GEO services for UAE market entry: How software brands get recommended by AI ›