Context.dev

How to Generate LLMs.txt Automatically with Context.dev

TL;DR

  • What it is. An llms.txt file is a curated, Markdown entry point at your site root that gives LLMs a clean overview of your content at inference time, without the noise of full HTML.
  • Generate it fast. Context.dev turns any URL into a valid llms.txt with a single API call, no crawler to build and no infrastructure to run.
  • Pick the right tool. Use Context.dev for automated, real-time regeneration at scale, Firecrawl or Apify if you live in their ecosystems, Mintlify only on Mintlify-hosted docs, and llmstxtgen.com for a one-off small site.
  • The honest word on SEO. No controlled study shows llms.txt alone lifts AI citations today. Its real, documented value is feeding AI coding assistants and agent pipelines.

What Is LLMs.txt and Why AI Agents Need It

LLMs read website content at inference time, but their context windows can't hold most sites in full. A single page carries navigation, ads, and JavaScript that bury the actual content, and converting that HTML into clean text is, in the spec's own words, "both difficult and imprecise." An llms.txt file solves this by giving a model one curated, plain-text entry point instead of a sprawling, noisy DOM.

Jeremy Howard proposed the standard on September 3, 2024, and the file lives at /llms.txt in your site root. It is written in Markdown so a model can parse it directly, and it lists the pages and resources worth reading rather than every page you publish.

The spec draws a clean line between three files that developers often confuse. The robots.txt file controls access, telling crawlers what they may and may not fetch. A sitemap.xml indexes every human-readable page so search engines can find them. An llms.txt file does neither. It hands an LLM a short, curated overview built for the moment a user asks a question and the model needs reliable context.

That inference-time focus is the point. A sitemap lists too many pages to fit in a context window, skips clean Markdown versions of those pages, and ignores external URLs that might be relevant. The spec targets inference rather than training, though Howard notes that wide adoption could help future training runs too.

LLMs.txt File Format: What the Spec Requires

The llms.txt spec defines a strict, ordered Markdown structure, and only the first element is required. An H1 carries the project or site name. A blockquote follows with a one-line summary. Below that, you can add free-form Markdown sections without headings for extra context, then H2 headings that group lists of links written as [name](url): optional note. A final H2 named "Optional" marks links an LLM can skip when it needs a shorter context.

A minimal valid file looks like this:

# Acme Docs
 
> API reference and guides for the Acme platform.
 
## Docs
 
- [Quickstart](https://acme.dev/quickstart.md): Set up in five minutes
- [API Reference](https://acme.dev/api.md): Full endpoint list
 
## Optional
 
- [Changelog](https://acme.dev/changelog.md)

The spec also asks each page to publish a clean Markdown version at the same URL with .md appended, so page.html becomes page.html.md. FastHTML, the reference implementation, goes further and publishes two expanded variants. llms-ctx.txt inlines every linked document except the Optional ones, and llms-ctx-full.txt inlines all of them. Both are generated from the base file by the llms_txt2ctx CLI, so you maintain one source and derive the rest.

How to Generate LLMs.txt Automatically with Context.dev

Context.dev turns any website URL into a valid llms.txt file through a single API call, with no crawler to host and no parsing pipeline to maintain. You authenticate once, send a URL, and receive structured Markdown that follows the spec. The whole loop takes under five minutes, and the output drops straight into an LLM pipeline or onto your server as a static /llms.txt.

Start by authenticating with your API key. Every request carries the key in an Authorization header, so you set it once and reuse it across calls.

curl -X POST https://context.dev/api/llms-txt \
  -H "Authorization: Bearer $CONTEXT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

The response comes back as ready-to-serve Markdown. Context.dev crawls the site, strips navigation and scripts, and assembles the H1, the blockquote summary, and the H2 file lists in the order the spec requires. You write that body to a file at your web root, and AI agents reading /llms.txt get a clean entry point instead of raw HTML.

The Python flow is just as short. You call the same endpoint, capture the Markdown, and either write it to disk or hand it to your pipeline.

import os
import requests
 
resp = requests.post(
    "https://context.dev/api/llms-txt",
    headers={"Authorization": f"Bearer {os.environ['CONTEXT_API_KEY']}"},
    json={"url": "https://example.com"},
)
resp.raise_for_status()
 
with open("llms.txt", "w", encoding="utf-8") as f:
    f.write(resp.text)

That single call replaces the internal crawler most teams would otherwise build. Converting HTML into LLM-friendly text is, in the words of the llms.txt spec, "both difficult and imprecise," and a hand-rolled scraper means handling JavaScript rendering, ad removal, and Markdown formatting yourself. Context.dev absorbs that work behind one endpoint, so you spend your time deciding where the file lives rather than how it gets built.

Because the endpoint takes a fresh URL on every call, regeneration is trivial. You schedule the same request on a cron job or trigger it from a deploy hook, and your llms.txt tracks your site as content changes. The output is clean enough to feed an MCP-connected agent directly, which means you can pull a structured site overview into an LLM pipeline at inference time without an intermediate parsing step. For a site that updates often, automated regeneration is the difference between a file that stays accurate and one that drifts the moment you ship new pages.

How to Generate LLMs.txt Manually

A valid llms.txt needs only an H1 with your site name, but a useful one adds a blockquote summary and at least one H2 section linking to your most important pages. Start with the name, write one sentence describing what the site does, then group your best URLs under a heading. Here is a minimal working file.

# Acme API
 
> Real-time payment infrastructure for developers.
 
## Docs
 
- [Quickstart](https://acme.dev/quickstart): Authenticate and send your first request
- [API Reference](https://acme.dev/reference): Full endpoint and parameter listing
- [Webhooks](https://acme.dev/webhooks): Event types and retry behavior

The descriptions after each link carry most of the value for an LLM. A human skims a link by its anchor text, but a model reading at inference time uses the note to decide whether the page answers the query in front of it. Write each note to state what the reader will find on that page, not to sell the page. "Authenticate and send your first request" tells a model exactly when to follow the link. "Get started fast" tells it nothing.

Hand-writing the file makes sense only for a small, stable site with a handful of pages that rarely change. The moment you add docs, rename routes, or ship new features, a manual file drifts out of sync with reality, and a stale llms.txt misleads every agent that trusts it. For anything that updates regularly, generate the file from your live site instead so it tracks your actual content.

LLMs.txt Generator Tools Compared

Six tools generate llms.txt files today, and they split into API-first services, platform-locked generators, and free one-shot crawlers. The table below compares each on approach, page limits, API access, and the use case it actually fits.

ToolApproachPage/Depth LimitsAPI AvailableBest ForFree Tier
Context.devSingle API call, structured outputConfigurable, full crawlYesProduction pipelines and automationYes
FirecrawlLLM-ready crawl and map endpointsPlan-based creditsYesTeams already in the Firecrawl ecosystem500 pages
ApifyOpen-source Actor, configurable depth4 GB memory capYesCustomizable platform orchestration$5/month credits
MintlifyAuto-generated for hosted docsMintlify sites onlyNoExisting Mintlify docs usersPlan-dependent
llmstxtgen.comOne-shot crawl, no account20 pagesNoQuick publish for small sitesYes
llms-txt.ioGenerator plus managed regenerationUp to 250 sitesPaidAgencies needing continuous updatesYes

Context.dev

Context.dev turns a URL into a production-ready llms.txt with one API call and no crawler to build or maintain. You send a target URL, and the API crawls the site, extracts titles and descriptions, and returns clean structured output formatted to the spec. That output drops straight into an LLM pipeline through MCP or serves statically at your site root.

Context.dev wins when you need scale, automation, or real-time regeneration. The free crawlers cap out at 20 pages and run once. Mintlify only covers its own hosted docs. Context.dev handles arbitrary sites, regenerates on demand when your content changes, and replaces an internal crawler you would otherwise staff and run yourself.

Firecrawl

Firecrawl converts entire websites into LLM-ready Markdown and JSON through its crawl and map endpoints, with a community of 350,000+ registered developers behind it. The crawl endpoint recursively scans a site, and the map endpoint returns URL title and description pairs that map cleanly onto the llms.txt link-list format.

Firecrawl's documentation in the sources here does not confirm a dedicated llms.txt endpoint, so you assemble the file from crawl or map output yourself. The fit is teams already running Firecrawl for scraping who want to reuse that pipeline rather than add a second service.

Apify

Apify ships an open-source Actor under Apache-2.0 that takes a start URL and a maxCrawlDepth setting, delegates the crawl to its Website Content Crawler, and saves a spec-formatted llms.txt to the Key-Value Store for download.

The crawler runs at a 4 GB memory cap to stay compatible with the free tier, which the README warns can slow large crawls. The output is link-list only, extracting titles, descriptions, and URLs without summarizing page bodies. Apify suits teams already comfortable orchestrating Actors who want a customizable, self-hosted option.

Mintlify

Mintlify generates llms.txt automatically for documentation hosted on its platform, served at the standard /llms.txt path. It only works for Mintlify-hosted sites, so it is irrelevant if your docs live anywhere else.

A confirmed bug blocks both llms.txt and robots.txt when partial authentication is enabled, redirecting them to the login page with no exemption config. Reach for Mintlify only when you already publish docs there.

llmstxtgen.com

llmstxtgen.com generates a valid llms.txt from any URL with no account and no payment, producing both the concise file and a fuller llms-full.txt with sectioned links.

The crawl caps at 20 pages, offers no API, and runs once with no scheduled regeneration. It fits a small, stable site where you want something valid published today without signing up anywhere.

LLMs.txt and SEO: What the Evidence Actually Shows

No controlled study shows that publishing an llms.txt file increases AI citations, and the tools building these generators say so themselves. The team behind llmstxtgen.com states plainly there is "no public, controlled evidence that publishing llms.txt measurably increases AI citations," and describes engine support as "inconsistent and undocumented" (llmstxtgen.com). Treat any vendor promising a citation uplift with skepticism.

The documented value sits elsewhere. AI coding assistants like Cursor, Claude Code, and Copilot read these files to pull current documentation into their context, and that use case is real and working today (llms-txt.io). LLM agent pipelines benefit the same way, getting a clean entry point instead of parsing noisy HTML at inference time.

The spec was built for inference, not training. Jeremy Howard's proposal targets the moment a user asks a question and an LLM needs your content right then, not a future training run (llmstxt.org). Adoption is climbing across documentation platforms and developer tooling, but engine support remains uneven, so a valid file helps the assistants that read it without guaranteeing broader reach. Publish one for the agents that use it now, not for a ranking promise nobody can back up.

Conclusion

Generate your file through the Context.dev API and serve it static. One authenticated call turns any URL into a valid, well-structured llms.txt, and you skip the crawler maintenance that hand-built files demand. Regenerate on a schedule and the file stays current as your site changes.

Publishing a valid file now costs you almost nothing and positions your site for the tooling that reads it. AI coding assistants and agent pipelines already pull from llms.txt at inference time, and that list grows every quarter. Ship the file today so you are in the rotation when the next reader arrives.

FAQ

Does llms.txt replace robots.txt or sitemap.xml? No, llms.txt serves a different purpose than either file. The llms.txt spec defines robots.txt as access control and sitemap.xml as a crawl index, while llms.txt is a curated entry point an LLM reads at inference time. Keep all three on your site, since each one answers a different request from a different consumer.

How often should I regenerate my llms.txt file? Regenerate whenever your documentation or site structure changes, since a stale file points LLMs at outdated pages. Context.dev's API lets you re-run generation on a schedule or in CI, so the file stays current without manual editing. Small, stable sites can get away with a one-time generation and occasional manual updates.

Which AI assistants and agents actually read llms.txt today? AI coding assistants like Cursor, Claude Code, and Copilot read llms.txt to load documentation context, which is the standard's clearest proven use case. Broader agent and search support remains inconsistent and undocumented across engines. The FastHTML project is the named reference implementation that publishes and consumes the format directly.

Can I generate llms.txt for a site I don't own? Yes, generators that crawl a public URL work on any reachable site, including ones you don't control. Context.dev, Firecrawl, Apify, and llmstxtgen.com all accept an arbitrary start URL and return a generated file. You can only publish the file at the root of a site you actually own, so generating it elsewhere is mainly useful for feeding external docs into your own pipeline.

What's the difference between llms.txt and llms-full.txt? The llms.txt file is a concise link map with a site name, summary, and sectioned links, while llms-full.txt inlines fuller page content and descriptions. Use the concise version when context window space is tight, and the full version when an agent needs the actual text without following links. FastHTML's reference implementation also ships llms-ctx.txt and llms-ctx-full.txt variants built from these.

Is llms.txt a formal standard or a community proposal? It is a community proposal, not a W3C or IETF standard. Jeremy Howard published it on September 3, 2024, and the spec remains open for input through a GitHub repository and Discord. Treat it as a useful convention with growing adoption rather than a ratified requirement.

Ship an agent that actually knows things.

Free tier, 10-minute integration, and the same API powering agents at Mintlify, daily.dev, and Propane. No credit card to start.