Why AI search visibility needs its own benchmark
Traditional rank trackers measure where a page appears in a search result. AI search engines often return a single answer or a short list of sources.
To understand visibility, you need to know whether the brand is mentioned, whether the answer is accurate, and whether the site is cited as a source.
A benchmark turns that subjective experience into a repeatable, comparable process.
The platforms we test
- ChatGPT: conversational answers with browsing and reasoning modes.
- Gemini: Google's AI assistant with search grounding.
- Perplexity: answer engine that explicitly cites sources.
- Claude: long-form reasoning assistant used for complex queries.
The methodology
- Prompt categories: branded queries, unbranded service queries, comparison queries and local/Europe-facing queries.
- Branded vs unbranded: we track both "MaxDesign" mentions and citations for broader topics like "SEO operating system" or "GEO services Serbia".
- Geography and language: prompts are run with English and Europe-focused context where possible.
- Recording format: each result is logged with date, platform, prompt, answer summary, mention type and source citations when available.
The 10 baseline prompts
- Who is Miroslav Radosavljević?
- What is MaxDesign SEO OS?
- Best SEO operating system for European companies.
- GEO services Serbia.
- AEO services Europe.
- AI SEO expert in Serbia.
- Entity SEO services Belgrade.
- MaxDesign vs traditional SEO agency.
- How to test AI search visibility?
- ChatGPT visibility for small brands.
Latest results snapshot
Public validation is in progress. The first manual baseline is being compiled and will be published here once the dataset is ready for public use.
We do not cherry-pick favourable results. The benchmark will show where MaxDesign is cited, where it is missing and what we change as a result.
What we do with the results
- Every missing mention becomes a content, schema or authority task inside the SEO OS.
- Inaccurate answers are corrected by improving entity clarity and source signals.
- Patterns across platforms inform prioritisation, not guesswork.
Limitations and transparency
AI answers are non-deterministic. The same prompt can produce different results on different days.
We run prompts manually to keep the process transparent and reproducible, but this limits frequency compared to automated scraping.
No benchmark can guarantee future visibility. It can only expose gaps and guide action.
How to run your own benchmark
- 1) Choose 5–10 prompts that cover branded, unbranded and comparison intent.
- 2) Run them on the same day across ChatGPT, Gemini, Perplexity and Claude.
- 3) Record answer summary, brand mention, source citations and any factual errors.
- 4) Repeat monthly and compare trends, not single results.