Do AI crawlers like GPTBot execute JavaScript?

Generally, no. Training crawlers like GPTBot and ClaudeBot typically do not render JavaScript to save on compute costs. They fetch the raw HTML. This means content reliant on client-side rendering is often invisible to them.

How do I block ChatGPT from my website?

You can block the training crawler `GPTBot` via robots.txt. However, as of December 2025, `ChatGPT-User` (the live browsing agent) reportedly ignores robots.txt. Blocking it requires server-side user-agent checks or IP blocking.

Why is server-side rendering (SSR) important for AI visibility?

Since AI crawlers don't execute JavaScript, your content must be present in the initial HTML response. SSR ensures that your text, links, and structured data are immediately available to these bots.

What is the difference between GPTBot and ChatGPT-User?

GPTBot is OpenAI's web crawler used to gather data for training future models. ChatGPT-User is the user-agent used when a ChatGPT user explicitly asks the tool to browse a live URL during a chat session.

Does structured data help with AI crawlers?

Yes, but only if it is server-rendered. JSON-LD implanted via client-side JavaScript (like through GTM) is often missed by AI crawlers. Hard-coding it or using SSR for schema is recommended.

From Googlebot to GPTBot: The Technical Guide to AI Crawler Optimisation

The web crawling landscape has fundamentally shifted. While Googlebot dominated for decades, AI crawlers like GPTBot and ClaudeBot now represent a growing force that demands entirely different optimisation strategies. For Australian businesses targeting both traditional search and AI-powered discovery, understanding these technical differences isn't just beneficial, it's becoming essential.

The Great Divide: JavaScript Execution

The most critical difference between traditional and AI crawlers lies in their JavaScript handling capabilities. GPTBot and ClaudeBot cannot execute JavaScript, while Googlebot has full rendering capabilities through its browser-based infrastructure, according to Cloudflare's 2025 analysis.

This creates a fundamental challenge for modern websites built with React, Vue, or Angular. Consider this common scenario:

javascript

// This content is invisible to AI crawlers useEffect(() => { setContent("Critical business information loaded dynamically") }, [])

Vercel's analysis reveals that while ChatGPT crawler makes 11.50% JavaScript requests and Claude makes 23.84%, they're only fetching these files as text for training purposes, not executing them. The implications are significant: any content dependent on client-side rendering becomes completely invisible to AI systems.

CRITICAL UPDATE: OpenAI's December 2025 Policy Change

On 9 December 2025, OpenAI made a significant policy change that fundamentally alters how ChatGPT-User interacts with websites. ChatGPT-User no longer respects robots.txt directives for user-initiated browsing actions, as reported by PPC Land and Stan Ventures.

This creates a three-tiered system for OpenAI's crawler family:

GPTBot (Training Crawler): Still respects robots.txt. Used for model training data collection. User agent: GPTBot/1.0

OAI-SearchBot (Search Indexing): Still respects robots.txt. Used for search feature indexing. User agent: OAI-SearchBot/1.0

ChatGPT-User (User-Initiated Browsing): NO LONGER respects robots.txt. Used for real-time web browsing when users ask ChatGPT to visit sites. User agent: ChatGPT-User/1.0

Why This Matters for Australian Businesses

The policy change means traditional robots.txt blocking is no longer sufficient for controlling AI crawler access. If you've blocked GPTBot thinking you've prevented ChatGPT from accessing your content, you're only blocking training data collection. User-initiated ChatGPT browsing sessions will bypass those restrictions entirely.

Alternative Blocking Strategies

Since robots.txt is ineffective against ChatGPT-User, consider these approaches:

javascript

// Server-side user agent detection and blocking app.use((req, res, next) => { const userAgent = req.headers['user-agent'] || '' if (/ChatGPT-User/i.test(userAgent)) { return res.status(403).json({ error: 'Access denied', message: 'ChatGPT-User access not permitted' }) } next() })

Or implement IP-based blocking if OpenAI publishes their IP ranges (though this is more fragile and maintenance-intensive).

The reality is that many businesses will choose to allow ChatGPT-User access despite robots.txt preferences, as blocking it means users can't get ChatGPT to analyse or summarise your content, which may reduce your visibility in AI-powered discovery.

Performance Impact: Speed Matters More Than Ever

Traditional crawlers like Googlebot operate with a 3-minute timeout for most operations, but AI crawlers demonstrate significantly less patience. They require faster response times and will abandon slow-loading pages, making performance optimisation critical for AI visibility.

The scale of AI crawler traffic is substantial. GPTBot alone generated 569 million requests across Vercel's network in a single month, representing a 305% increase in raw requests. This surge has pushed GPTBot's market share among AI crawlers from 5% to 30% between May 2024 and May 2025.

Server-Side Rendering: No Longer Optional

For AI crawler compatibility, server-side rendering transitions from "best practice" to "strongly recommended." Here's how to implement SSR effectively across different frameworks:

Next.js Implementation

javascript

// pages/_document.js - Critical content must be server-rendered import Document, { Html, Head, Main, NextScript } from 'next/document' // AI crawler detection patterns (split for readability) const AI_CRAWLERS = /GPTBot|ClaudeBot|ChatGPT-User|CCBot|anthropic-ai/i const AI_CRAWLERS_EXT = /Claude-Web|Google-Extended|FacebookBot/i const AI_CRAWLERS_OTHER = /meta-externalagent|OAI-SearchBot|PerplexityBot/i class MyDocument extends Document { static async getInitialProps(ctx) { const userAgent = ctx.req.headers['user-agent'] || '' const isAICrawler = ( AI_CRAWLERS.test(userAgent) || AI_CRAWLERS_EXT.test(userAgent) || AI_CRAWLERS_OTHER.test(userAgent) ) const initialProps = await Document.getInitialProps(ctx) return { ...initialProps, isAICrawler } } render() { return ( <Html> <Head> <script type="application/ld+json" dangerouslySetInnerHTML={{ __html: JSON.stringify({ "@context": "https://schema.org", "@type": "Organisation", "name": "Your Organisation", "description": "Complete description for AI" }) }} /> </Head> <body> <Main /> <NextScript /> </body> </Html> ) } }

Selective Rendering Strategy

Implement intelligent content delivery based on crawler type:

javascript

// Detect AI crawlers and serve optimised content const AI_CRAWLER_PATTERNS = [ /GPTBot/i, /ClaudeBot/i, /ChatGPT-User/i, /CCBot/i, /Google-Extended/i, /FacebookBot/i, /meta-externalagent/i, /anthropic-ai/i, /Claude-Web/i, /OAI-SearchBot/i, /PerplexityBot/i ] function optimiseForAICrawler(userAgent, content) { const isAICrawler = AI_CRAWLER_PATTERNS.some( pattern => pattern.test(userAgent) ) if (isAICrawler) { return { html: generateCompleteHTML(content), structuredData: includeAllSchema(content) } } return content }

Structured Data: The Server-Side Imperative

AI crawlers' JavaScript limitations create a critical constraint for structured data implementation. Schema markup added through Google Tag Manager or client-side JavaScript remains completely invisible to AI systems, as Search Engine Journal notes.

Implementation Requirements

html

<!-- GOOD: Server-rendered structured data --> <script type="application/ld+json"> { "@context": "https://schema.org", "@type": "Organisation", "name": "Webcoda Pty Ltd", "description": "20 years delivering digital solutions", "address": { "@type": "PostalAddress", "addressCountry": "AU", "addressLocality": "Sydney" } } </script>

javascript

// BAD: JavaScript-injected schema (invisible to AI crawlers) useEffect(() => { const script = document.createElement('script') script.type = 'application/ld+json' script.textContent = JSON.stringify(structuredData) document.head.appendChild(script) }, [])

Note the use of Australian English spelling ("Organisation") in structured data. This consistency matters for local businesses targeting Australian markets.

User Agent Detection and Traffic Management

Current AI crawler user agents (December 2025):

OpenAI Crawler Family

text

GPTBot: "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)" OAI-SearchBot: "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot)" ChatGPT-User: "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ChatGPT-User/1.0; +https://openai.com/bot)"

Anthropic Crawler Family

text

ClaudeBot: "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)" anthropic-ai: "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; anthropic-ai/1.0; +https://www.anthropic.com/bot)"

Other Major AI Crawlers

text

PerplexityBot: "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://docs.perplexity.ai/docs/perplexity-bot)" Google-Extended: "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Google-Extended/1.0)" FacebookBot: "Mozilla/5.0 (compatible; FacebookBot/1.0; +https://www.facebook.com/externalhit_uatext.php)"

Meta AI Crawlers: The Hidden Giant

According to Fastly's Q2 2025 Threat Insights Report, Meta's crawler family (FacebookBot and meta-externalagent) represents 52% of all AI crawler traffic, making it the largest AI crawler operation by volume. Despite this massive footprint, Meta's crawlers receive far less attention than OpenAI's GPTBot. Australian businesses ignoring Meta's crawlers are missing optimisation opportunities for more than half of AI crawler traffic.

Traffic Control Implementation

With AI crawlers potentially hitting websites with over 39,000 requests per minute, traffic management becomes crucial. However, the December 2025 OpenAI policy change complicates the traditional robots.txt approach:

text

# robots.txt configuration (December 2025) # Training crawler - RESPECTS robots.txt User-agent: GPTBot Crawl-delay: 10 Disallow: /admin/ Disallow: /private/ # Search indexing - RESPECTS robots.txt User-agent: OAI-SearchBot Crawl-delay: 10 Disallow: /admin/ # User browsing - IGNORES robots.txt (Dec 9, 2025) User-agent: ChatGPT-User Disallow: / # Other AI crawlers User-agent: ClaudeBot Crawl-delay: 10 Disallow: /private/ User-agent: Google-Extended Crawl-delay: 10 User-agent: FacebookBot Crawl-delay: 10 User-agent: anthropic-ai Crawl-delay: 10

The New Reality of AI Crawler Compliance

As of December 2025, crawler compliance falls into three categories:

Full robots.txt compliance: GPTBot, OAI-SearchBot, ClaudeBot, anthropic-ai, Google-Extended, FacebookBot, PerplexityBot

Ignores robots.txt: ChatGPT-User (user-initiated browsing)

Unknown/Variable: Smaller AI crawlers (test before relying on compliance)

This means robots.txt alone is no longer sufficient for complete AI crawler control. You'll need server-side blocking (shown earlier) if you truly want to prevent ChatGPT-User access.

Performance Optimisation Strategies

Critical Rendering Path

Optimise for AI crawlers' impatience with fast initial response times:

javascript

function optimiseCriticalPath(isAICrawler) { if (isAICrawler) { return { inlineCSS: getMinimalCriticalCSS(), preloadResources: [], content: getCompleteTextContent() } } return getEnhancedUserExperience() }

Caching Strategy

Implement intelligent caching for different crawler types:

javascript

function setCacheHeaders(req, res) { if (isAICrawler(req.headers['user-agent'])) { res.setHeader('Cache-Control', 'public, max-age=3600') res.setHeader('X-Crawler-Type', 'AI') } }

Monitoring and Measurement

Track AI crawler impact on your infrastructure:

javascript

class AICrawlerAnalytics { static trackVisit(req, res) { if (req.isAICrawler) { const metrics = { crawler: identifyCrawler(req.headers['user-agent']), responseTime: Date.now() - req.startTime, statusCode: res.statusCode, timestamp: new Date().toISOString() } logMetrics(metrics) } } }

Common Implementation Pitfalls

Pitfall 1: Client-Side Content Loading

javascript

// AVOID: Dynamic content loading fetch('/api/content') .then(data => { const el = document.getElementById('content') el.innerHTML = data.html }) // PREFER: Server-side content export async function getServerSideProps() { const content = await fetchContent() return { props: { content } } }

html

<!-- AVOID: Client-side only routing --> <Link to="/about">About</Link> <!-- PREFER: Progressive enhancement --> <a href="/about" onClick={handleClientSideNav}>About</a>

Testing AI Crawler Compatibility

Validate your implementation with this testing framework:

javascript

async function testAICrawlerAccess(url) { const crawlers = [ 'GPTBot/1.0 (+https://openai.com/gptbot)', 'ClaudeBot/1.0 (+https://anthropic.com/claudebot)', 'ChatGPT-User/1.0 (+https://openai.com/bot)', 'Google-Extended/1.0', 'FacebookBot/1.0' ] const results = await Promise.all( crawlers.map(async userAgent => { const response = await fetch(url, { headers: { 'User-Agent': userAgent } }) const html = await response.text() return { crawler: userAgent.split('/')[0], hasSchema: html.includes('application/ld+json'), contentLength: html.length, blocked: response.status === 403 } }) ) return results }

The Business Case for AI Crawler Optimisation

The data speaks clearly: GPTBot's market share among AI crawlers grew from 5% to 30% in just one year, while overall AI crawler traffic increased 18% year-over-year. With 82% of AI crawling focused on training purposes versus only 15% for search, these systems are building the foundation for how AI will understand and represent your business.

For Australian businesses, this represents both an opportunity and a risk. Companies optimising for AI crawlers today position themselves advantageously as AI-powered search and recommendation systems become mainstream. Those ignoring this shift risk becoming invisible to an increasingly important discovery mechanism.

Implementation Priority Framework

Immediate Actions (Week 1): Implement server-side rendering for critical pages. Add structured data to HTML (not JavaScript). Test with AI crawler user agents.

Short-term Optimisations (Month 1): Deploy selective rendering for AI crawlers. Implement performance monitoring. Optimise critical rendering path.

Long-term Strategy (Quarter 1): Comprehensive structured data implementation. Advanced caching strategies. Regular AI crawler compatibility testing.

The transition from Googlebot to GPTBot isn't just about search, it's about ensuring your business remains discoverable in an AI-first world. By implementing these technical strategies, Australian businesses can maintain visibility across both traditional search and emerging AI platforms, securing their digital presence for the next decade of web evolution.

---

For technical implementation assistance with AI crawler optimisation, contact our development team.