robots.txt
A file that tells web crawlers and AI bots which parts of your site they can access and crawl.
robots.txt is a standard file placed at the root of a website that instructs web crawlers and bots which pages they can and cannot access. In the context of AI search, robots.txt has taken on new importance as AI companies deploy their own crawlers.
AI-specific crawlers
Major AI companies use dedicated crawlers:
- GPTBot (OpenAI): Used to crawl content for ChatGPT
- ClaudeBot (Anthropic): Crawls content for Claude
- Google-Extended: Used by Google for AI training (separate from Googlebot)
- PerplexityBot (Perplexity): Crawls for Perplexity AI search
- CCBot (Common Crawl): Open dataset used by many AI models
Configuring robots.txt for AI
You can selectively allow or block AI crawlers:
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /
The AI crawl dilemma
Website owners face a strategic decision:
- Allow AI crawlers: Your content can be used for training and cited in AI responses, increasing visibility
- Block AI crawlers: Protect your content from being used for AI training, but potentially reduce AI visibility
Best practice for GEO
For brands seeking AI visibility, the recommended approach is to allow AI crawlers access to your public content while monitoring how that content is used in AI responses.
