robots.txt

A file that tells web crawlers and AI bots which parts of your site they can access and crawl.

Try the free robots.txt checker

Free robots.txt analyzer for AI crawler access — no signup needed.

robots.txt is a standard file placed at the root of a website that instructs web crawlers and bots which pages they can and cannot access. In the context of AI search, robots.txt has taken on new importance as AI companies deploy their own crawlers.

AI-specific crawlers

Major AI companies use dedicated crawlers:

GPTBot (OpenAI): Used to crawl content for ChatGPT
ClaudeBot (Anthropic): Crawls content for Claude
Google-Extended: Used by Google for AI training (separate from Googlebot)
PerplexityBot (Perplexity): Crawls for Perplexity AI search
CCBot (Common Crawl): Open dataset used by many AI models

Configuring robots.txt for AI

You can selectively allow or block AI crawlers:

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

The AI crawl dilemma

Website owners face a strategic decision:

Allow AI crawlers: Your content can be used for training and cited in AI responses, increasing visibility
Block AI crawlers: Protect your content from being used for AI training, but potentially reduce AI visibility

Best practice for GEO

For brands seeking AI visibility, the recommended approach is to allow AI crawlers access to your public content while monitoring how that content is used in AI responses.

robots.txt

AI-specific crawlers

Configuring robots.txt for AI

The AI crawl dilemma

Best practice for GEO

Start tracking robots.txt today

Geosaur

GEOSAUR SURVIVAL