Your Content, Your Rules: llmtag.txt

The web already solved search with robots.txt. But AI agents don’t just index; they train, ground, summarize, and repackage your work. That’s why the industry is converging on a tiny, zero-friction convention at your domain root: /llmtag.txt. If you create or host content, publishing this one file sets clear, machine-readable rules for AI—no meetings, no NDAs, no vendor lock-in. See the initiative and starter guidance at llmtag.org. (LLMTAG Protocol)

Wait—doesn’t `robots.txt` already do this?

Not really. robots.txt governs search crawling and relies on voluntary compliance. It was never designed to express purpose-level permissions (e.g., “no training, summaries ok”) or AI-specific rates and attribution needs. Even the formal spec (RFC 9309) states the rules “are not a form of access authorization.” In other words, it’s guidance for crawlers—not a policy contract for AI usage. (rfc-editor.org)

Why now (and why this will stick)

The traffic has changed. AI-focused scraping can be bursty and opaque; some crawlers ignore robots altogether. Major infrastructure is responding—the largest CDNs now block known AI crawlers by default and are piloting pay-per-crawl models. That’s leverage, but you still need a canonical, machine-readable policy of your intent. (WIRED)
Good actors want clarity. Leading vendors document how to respect site preferences (e.g., Google’s AI access controls and OpenAI’s GPTBot). They still need a single file to read first and interpret consistently. llmtag.txt is built to be that file. (Google for Developers)

What `llmtag.txt` is (in one breath)

A small plaintext file at https://yourdomain.com/llmtag.txt describing AI-specific permissions: whether training is allowed, which inference modes are permitted (summary/QA/grounding), how fast agents may fetch, what attribution you expect, and per-agent overrides—plus optional reporting and verification hooks. It complements robots.txt (keep search crawlers open) and pairs with your CDN/WAF for enforcement.

The “adoption flywheel”

Publishers ship llmtag.txt.
CMS & plugins make it a checkbox.
AI vendors read & respect it, optionally reporting adherence.
Analytics & licensing emerge on top (from “no” → “maybe, under terms”).
Spec vocabulary stabilizes via real-world use.

You don’t need step 5 to benefit from steps 1–4.

Copy-paste: a sensible `llmtag.txt` you can ship today

(Tweak the paths and contact, then drop at your domain root.)

# LLMTAG policy v0.2
Site: https://example.com
Policy-URL: https://example.com/ai-usage-policy
Contact: legal@example.com
Policy-Revision: 2025-10-18

# Global defaults
Use-Training: no
Use-Inference: summary,qa
Attribution: required
Attribution-Format: "Source: {url} — © Example Inc."
Cache: no
Crawl-Delay-LLM: 30
Sitemap: /sitemap.xml

# Rate guidance (enforce via CDN/App)
Rate: 60/min/ip on /api/summary, /api/search

# Per-agent overrides
Agent: Google-Extended
  Use-Training: no
  Use-Inference: grounding
  Allow: /docs/public/, /faq/
  Disallow: /members-only/

Agent: GPTBot|OAI-SearchBot|ChatGPT-User
  Use-Training: no
  Use-Inference: summary
  Disallow: /private/, /raw-datasets/

Agent: ClaudeBot|Claude-User
  Use-Training: no
  Crawl-Delay-LLM: 45

Agent: PerplexityBot
  Use-Training: no
  Allow: /news/
  Disallow: /exports/

# Optional governance
Verify: DNS-TXT llmtag=pubkey:ed25519:BASE64KEY
Report-Endpoint: https://example.com/.well-known/llmtag/report
Report-Sample: 0.1

# LLMTAG policy v0.2
Site: https://example.com
Policy-URL: https://example.com/ai-usage-policy
Contact: legal@example.com
Policy-Revision: 2025-10-18

# Global defaults
Use-Training: no
Use-Inference: summary,qa
Attribution: required
Attribution-Format: "Source: {url} — © Example Inc."
Cache: no
Crawl-Delay-LLM: 30
Sitemap: /sitemap.xml

# Rate guidance (enforce via CDN/App)
Rate: 60/min/ip on /api/summary, /api/search

# Per-agent overrides
Agent: Google-Extended
  Use-Training: no
  Use-Inference: grounding
  Allow: /docs/public/, /faq/
  Disallow: /members-only/

Agent: GPTBot|OAI-SearchBot|ChatGPT-User
  Use-Training: no
  Use-Inference: summary
  Disallow: /private/, /raw-datasets/

Agent: ClaudeBot|Claude-User
  Use-Training: no
  Crawl-Delay-LLM: 45

Agent: PerplexityBot
  Use-Training: no
  Allow: /news/
  Disallow: /exports/

# Optional governance
Verify: DNS-TXT llmtag=pubkey:ed25519:BASE64KEY
Report-Endpoint: https://example.com/.well-known/llmtag/report
Report-Sample: 0.1

Why these defaults?

SEO is preserved. Keep Googlebot/Bingbot governed by robots.txt for search. Google-Extended controls Gemini/Vertex AI usage—not indexing—so you can opt out of AI training while staying visible in Search. (Search Engine Journal)
OpenAI & others: naming the AI agents clarifies your expectations and reduces ambiguity for cooperative crawlers (see OpenAI’s crawler docs). (OpenAI Platform)
Telemetry & verification are optional—but valuable if vendors start self-reporting compliance.

Make it real: enforcement that matches the policy

Policy without teeth is a suggestion. Pair llmtag.txt with lightweight enforcement:

CDN/WAF layer: Turn on managed controls for AI crawlers; default-block if that fits your strategy, and permit only what your policy allows. This protects you even when a bot ignores robots/policy. (WIRED)
App layer: Add a JS challenge, honeypot, and path-based rate limits for /api/*, exports, or costly endpoints. Log decisions (“challenge”, “rate_limit”, “honeypot”) for audits.

WordPress: 10-minute rollout

Publish llmtag.txt from a small admin UI (fields: training/inference, attribution, per-agent overrides).
Keep robots.txt for search; add explicit blocks or allowances for AI agents there only if needed.
Enable app-layer protections (JS challenge, honeypot, rate limit) via a security/bot plugin or a simple custom plugin.
Verify: hit https://yourdomain.com/llmtag.txt, test with known user-agents, then watch your logs.
Tip: If you use Cloudflare, enable the AI crawler controls to align enforcement with your policy from day one. (The Cloudflare Blog)

FAQs (send this to the team)

Will this hurt my SEO?
No—llmtag.txt targets AI usage, not search indexing. Keep search crawlers governed via robots.txt; use llmtag.txt to declare AI permissions and rates. Google’s Google-Extended is separate from Search ranking/signals. (Search Engine Journal)

What if a bot ignores my policy?
Block or throttle it at your CDN/WAF and app layer. This is increasingly the default posture on major infrastructure, precisely because some AI scrapers ignore site signals. (WIRED)

Why not wait for a formal standard?
De-facto conventions precede specs. llmtag.txt is deliberately simple so vendors can adopt it immediately. Read the initiative at llmtag.org and ship your file now. (LLMTAG Protocol)

The ask

If you publish or host content, add llmtag.txt this month. Keep Search healthy with robots.txt; set AI expectations with llmtag.txt; and back it up with basic enforcement. The web runs on small, open conventions. This is the smallest one that restores consent, clarity, and control in the AI era.

Get the rationale and examples at llmtag.org. (LLMTAG Protocol)

What are You Looking For?

Your Content, Your Rules: llmtag.txt

Wait—doesn’t `robots.txt` already do this?

Why now (and why this will stick)

What `llmtag.txt` is (in one breath)

The “adoption flywheel”

Copy-paste: a sensible `llmtag.txt` you can ship today

Why these defaults?

Make it real: enforcement that matches the policy

WordPress: 10-minute rollout

FAQs (send this to the team)

The ask

AWS Introduction and Certification Guide

Read Next

The Best AI Tools

AWS Introduction and Certification Guide

Common Google Chrome Errors and Their Solutions

Your Content, Your Rules: llmtag.txt

Wait—doesn’t robots.txt already do this?

Why now (and why this will stick)

What llmtag.txt is (in one breath)

The “adoption flywheel”

Copy-paste: a sensible llmtag.txt you can ship today

Why these defaults?

Make it real: enforcement that matches the policy

WordPress: 10-minute rollout

FAQs (send this to the team)

The ask

AWS Introduction and Certification Guide

Read Next

The Best AI Tools

AWS Introduction and Certification Guide

Common Google Chrome Errors and Their Solutions

Subscribe to our Newsletter! 📬

Wait—doesn’t `robots.txt` already do this?

What `llmtag.txt` is (in one breath)

Copy-paste: a sensible `llmtag.txt` you can ship today