llms.txt Generator
Enter any domain to crawl its sitemap, classify every page, and generate a spec-compliant llms.txt plus a full-content llms-full.txt — ready to drop at the root of your site so AI agents can index it. Powered by the Sitemap API, Markdown API, & Brand API.
What is llms.txt?
llms.txt is a proposed web standard, published by llmstxt.org, that gives large language model agents a clean, Markdown-formatted index of your site. It lives at the root of your domain — e.g. https://yoursite.com/llms.txt — and lists the pages an AI should read to understand your product, docs, pricing, and company.
Think of it as robots.txt for AI, but optimized for reading instead of crawling. Agents in ChatGPT, Claude, Perplexity, and Cursor fetch it to route themselves through your site without parsing bloated HTML.
What's the difference between llms.txt and llms-full.txt?
- llms.txt — a short index. An H1 with your site name, a one-sentence blockquote summary, then sectioned links (Documentation, Pricing, Blog, etc.) with brief descriptions. Target: under 3K tokens.
- llms-full.txt— the full content. Every important page's Markdown concatenated into one file, separated by URL headers. Use this when an agent needs the actual text, not just the map. Target: under 300K tokens.
This tool generates both at once so you can host them side by side.
How the generator works
- Crawl the sitemap — finds every discoverable URL via the Sitemap API, following sitemap index files recursively.
- Classify pages — buckets URLs into sections (Documentation, API Reference, Pricing, Blog, Changelog, Company, Legal, etc.) based on path heuristics, skipping assets and deep archive pages.
- Scrape markdown — pulls clean Markdown for the top pages per section via the Markdown API, extracting titles and one-sentence descriptions.
- Assemble — produces a spec-compliant
llms.txt(index) andllms-full.txt(concatenated content), ready to copy or download.
How to host llms.txt on your site
- Upload both files to your site's root directory so they're served at
/llms.txtand/llms-full.txt. - Serve them as
text/plain; charset=utf-8with a long cache (e.g.max-age=3600). - Make sure
robots.txtdoes not block them, and that AI user-agents (GPTBot, ClaudeBot, PerplexityBot, Google-Extended) are allowed to fetch. - Optional: add
<link rel="alternate" type="text/plain" href="/llms.txt" title="LLM site index" />to the<head>of your root layout so agents can discover it from any page. - Re-run the generator every quarter (or after any major site change) so the index stays fresh.
Why llms.txt matters for AI search
AI agents are becoming the primary way users discover products, docs, and pricing. ChatGPT Search, Perplexity, Google AI Overviews, and Copilot all fetch live web content to answer questions. When they crawl your site, they'd rather read a clean, purpose-built index than parse 300KB of JavaScript and tracking pixels.
Publishing llms.txt is one of the highest-leverage moves in generative engine optimization (GEO): almost no one has one yet, the standard is gaining momentum, and it directly shapes how AI agents describe your product. Pair it with structured data, fresh content, and a visible last-updated date for maximum citation lift.
Built on Context.dev APIs
This generator is a thin wrapper around three Context.dev endpoints: the Sitemap API for URL discovery, the Markdown API for clean page content, and the Brand API for the company description. If you need programmatic access, batch processing, or a self-hosted build step that regenerates llms.txt on every deploy, those same APIs are available with a free account.
Ship an agent that actually knows things.
Free tier, 10-minute integration, and the same API powering agents at Mintlify, daily.dev, and Propane. No credit card to start.