Blog / Apr 25, 2026

Should you let ChatGPT, Perplexity, and Claude crawl your site?

by Connor

Sometime in 2024, every small-business website quietly acquired a new SEO question nobody asked for: should we let the AI crawlers in? GPTBot, PerplexityBot, ClaudeBot, anthropic-ai, Google-Extended. Most site owners do not know these exist, and the default robots.txt does not mention them. That silence is itself an answer — just not a deliberate one.

The short answer for almost every small business

Allow them. The cost is near zero (these crawlers respect robots.txt and crawl politely), and the upside is real and growing fast: when somebody asks ChatGPT "best HVAC contractor in Rochester" or "how do I price a kitchen remodel in Greece NY," your business should be in the answer. The way you get there is by letting the underlying models read your site.

When you might want to block instead

There is exactly one common case where blocking makes sense: your content is the product, not the marketing. Paywalled news sites, original research databases, large reference works. For those, the AI training is a direct competitor and the case to block is real. For a contractor or a consultancy or a hospitality business, your content is marketing — it exists to bring customers to your door. The AI crawlers carry that marketing into a new distribution channel for free.

How to be explicit

Default robots.txt files do not mention AI crawlers because they were written before AI crawlers existed. Being explicit is just listing each agent and its allow rule: GPTBot, ChatGPT-User, OAI-SearchBot, PerplexityBot, ClaudeBot, Claude-Web, anthropic-ai, Google-Extended, Bingbot, Googlebot. Each one gets one line of "Allow: /". On every site we ship, fourteen AI-aligned agents are explicitly allowed.

And block the bad ones, while you are at it

The same robots.txt is the right place to block the abusive crawlers — SemrushBot in non-business mode, AhrefsBot when not licensed, MJ12bot, DotBot, BLEXBot, ByteSpider when behaving badly. These are the crawlers that hammer your server for SEO data farms or scrape your site to resell. Blocking them at the robots.txt layer is polite (they mostly respect it) and the middleware layer is for the ones that do not.

Why this matters more in 2026

Search is splitting in two. The classic Google ranking layer is still where most discovery happens. The AI answer layer — ChatGPT web, Perplexity, Google AI Overviews — is where a fast-growing share of decision-research happens. Being well-indexed in one and silent in the other is a self-imposed handicap. Allowing the AI crawlers takes about ninety seconds of robots.txt editing. The compound payoff is being one of the small number of authoritative sources the AI layer actually cites.