AI crawlers from ChatGPT, Perplexity, Claude, and others need to access your site to cite your brand. Here is exactly how to configure robots.txt for maximum AI visibility.
Your robots.txt file tells web crawlers what they can and cannot access. Traditionally, it was about search engine bots. Now, it is also about AI crawlers from ChatGPT, Perplexity, Claude, and others.
The decision to allow or block AI crawlers has direct consequences for your AEO. Here is what you need to know.
| Crawler | AI Engine | Company |
|---|---|---|
| GPTBot | ChatGPT | OpenAI |
| ChatGPT-User | ChatGPT (browsing) | OpenAI |
| ClaudeBot | Claude | Anthropic |
| PerplexityBot | Perplexity | Perplexity AI |
| Google-Extended | Gemini (training) | |
| Googlebot | Google AI Overviews | |
| FacebookBot | Meta AI | Meta |
| Applebot-Extended | Apple Intelligence | Apple |
| cohere-ai | Cohere | Cohere |
If you want AI engines to cite your brand, those engines need to be able to read your content. Blocking AI crawlers prevents:
The math is simple: if AI bots cannot crawl your site, AI engines cannot cite you. And if AI engines cannot cite you, you have zero presence in the AI research sessions that influence your buyers.
There are legitimate reasons to block specific AI crawlers:
If these apply to specific sections of your site, use path-specific robots.txt rules rather than blanket blocks.
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: FacebookBot
Allow: /User-agent: GPTBot
Allow: /
Disallow: /private/
Disallow: /customer-data/
User-agent: Google-Extended
Allow: /
Disallow: /internal/User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /TrueCite calculates an AI crawler permission score based on how many of the 9 major AI crawlers can access your site. The score affects your overall AEO health:
The `<meta name="robots" content="noai, noimageai">` tag signals to some AI systems not to use page content for training. Use selectively — if applied site-wide, it may reduce your AI citation potential.
If you have a high-traffic site and want to control AI crawler load:
User-agent: GPTBot
Allow: /
Crawl-delay: 10Always include your sitemap in robots.txt — this helps all crawlers, including AI ones, discover your content:
Sitemap: https://yourdomain.com/sitemap.xmlUse TrueCite's Crawler Check to see:
[Check your AI crawler status with TrueCite →](/dashboard/crawlers)