If you look at the robots.txt files of the world's largest publishers today—The New York Times, Reddit, or major media houses—you will see a wall of "Disallow." The prevailing logic is simple: My content is my intellectual property. If you want to train your AI on it, you have to pay.
At AkuparaAI, we are taking the exact opposite approach.
We recently updated our robots.txt to be holistically open. We explicitly welcome GPTBot, Google-Extended, ClaudeBot, Applebot, and even the foundational CCBot (Common Crawl).
Why? Because in the age of Generative AI, invisibility is a death sentence.
The New SEO: From "Search" to "Answer"
For two decades, the goal of digital marketing was to rank on a list of blue links. You optimized for keywords so Google would show your URL.
Today, users aren't just searching; they are asking. When a user asks ChatGPT, "What is the best platform for tracking AI brand visibility?", the AI doesn't give a list of links—it gives an answer.
If your site is blocked from the crawler, the AI cannot read your latest features, your pricing, or your value proposition. You aren't just ranking lower; you are effectively being erased from the conversation.
We believe that being the answer is more valuable than protecting the data.
The Difference Between "Training" and "Live Retrieval"
A major misconception is that allowing bots only means giving away your data for free model training. This ignores the two distinct types of AI interaction:
- Training (Long-term Memory): Bots like GPTBot or Google-Extended ingest content to teach the model how to "think" about a topic. If we block these, the model doesn't "know" AkuparaAI exists at a fundamental level.
- Live Retrieval (RAG/Browsing): Bots like OAI-SearchBot (SearchGPT) or Claude-Web fetch real-time data to answer current queries. If we block these, the AI cannot verify if our site is up, read our latest blog post, or check our current stats.
By blocking "AI," you are often breaking the bridge that allows these tools to cite you as a source right now.
Why We Included the "Long Tail" of Bots
Our new robots.txt doesn't just stop at the Big Three (OpenAI, Google, Anthropic). We included:
- CCBot (Common Crawl): This is the dataset that powers thousands of open-source models and academic projects.
- Perplexity & SearchGPT: These are "answer engines" that drive high-intent traffic.
- Applebot-Extended: As Apple Intelligence integrates into iOS, being visible to Siri and on-device models is crucial.
We want AkuparaAI to be accessible whether a user is on ChatGPT, a developer is using Llama 3, or a researcher is exploring Common Crawl data.
Our "Open Gates" Robots.txt
We are making our configuration public. If you are building a brand that wants to be found in the AI era, feel free to use this as a template.
# robots.txt for AkuparaAI
# The "Open Gates" Strategy
User-agent: *
Allow: /
# AI Crawlers - Explicitly Allowed
User-agent: GPTBot # OpenAI Training
User-agent: OAI-SearchBot # SearchGPT / Live Retrieval
User-agent: Google-Extended # Gemini Training
User-agent: ClaudeBot # Anthropic Training
User-agent: PerplexityBot # Perplexity AI
User-agent: CCBot # Common Crawl (Open Source Models)
User-agent: Applebot-Extended # Apple Intelligence
When You Should Be Careful: The Risks of Total Openness
While we believe an open strategy is right for AkuparaAI, it isn't a one-size-fits-all solution. There are legitimate reasons why a company might choose to lock their doors, and we considered these risks carefully before flipping the switch.
Here is when you should be cautious about "allowing all":
If Your Data Is Your Product
If your business model relies on selling access to proprietary data (e.g., financial terminals, detailed weather forecasts, or premium investigative journalism), allowing AI bots to scrape it for free destroys your leverage. Once the AI "knows" your data, it can serve it to users without them ever visiting your site or paying for a subscription.
The "DDoS by Crawler" Effect
Not all bots are polite. While Googlebot and GPTBot generally respect crawl delays, aggressive scrapers can hammer your servers with thousands of requests per second. For smaller websites, this "heavy traffic" can increase server costs. We monitor our server logs closely to ensure this doesn't happen.
Loss of Attribution
The biggest gamble with AI visibility is the "Zero-Click" future. An AI might read your amazing blog post, synthesize the answer perfectly for a user, and never provide a link back to your site. You gain "mindshare" (the AI knows you), but you might lose "traffic" (the user doesn't click). For us, brand awareness is worth that trade-off; for an ad-supported blog, it might not be.
Competitive Intelligence
An open robots.txt makes it easier for competitors—not just AI companies—to analyze your site structure, pricing changes, and keyword strategies programmatically. Radical transparency means your competitors can see everything you are doing, the moment you do it.
The Bottom Line
You cannot control what an AI says about you if you refuse to speak to it.
By allowing these crawlers, we are ensuring that AkuparaAI is part of the corpus of human knowledge that these models rely on. We are betting that in the future, the most successful brands won't be the ones with the highest walls, but the ones with the most open doors.
Want to Optimize Your AI Visibility?
Curious about how your brand appears in AI-generated answers? Learn how AkuparaAI can help you measure and improve your presence across ChatGPT, Gemini, Claude, and more.
Schedule a Conversation