Context.dev

Best llms.txt Generator Tools in 2026

TL;DR

  • Context.dev is the top pick for API-first, programmatic llms.txt generation at scale, with a single crawl and markdown API that feeds LLM pipelines directly.
  • Firecrawl's hosted llms.txt tool was deprecated in June 2025, so reach for its create-llmstxt-py script instead.
  • Mintlify suits teams already running its hosted docs platform, generating llms.txt and llms-full.txt with zero config.
  • llmstxtgenerate.com and llmstxtgen.com cover one-off needs and small sites through free web forms.

What Is llms.txt (and Why It Matters for LLM Pipelines)

llms.txt is a markdown file placed at the root of a website (/llms.txt) that gives large language models a curated, structured overview of a site's content at inference time. Jeremy Howard proposed the standard on September 3, 2024. The file starts with an H1 site name, a blockquote summary, and H2 sections that list links to the pages an LLM should read.

Standard HTML fails when a model needs an answer right now. Most sites are too large to fit inside a context window, and converting pages full of navigation, ads, and JavaScript into clean text is imprecise. An llms.txt file solves both problems by pointing the model at the content that matters in a format parsers and regex can read directly.

The standard borrows its file-path convention from robots.txt and sitemap.xml but serves a different moment. robots.txt controls what crawlers may access, and sitemap.xml lists every indexable page for search engines. Both work at crawl time. llms.txt works at inference time, when a user actively asks an assistant about your content.

Two companion patterns round out the surface area. A site can publish llms-full.txt with expanded detail for models that want more context, and individual pages can offer clean markdown versions at the same URL with .md appended. Together they let an LLM fetch a quick map first, then drill into full page content on demand.

Comparison Table: llms.txt Generator Tools at a Glance

ToolBest ForAPI AccessFree TierOutput FilesNotable Limitation
Context.devAPI-first generation at scaleYesYesllms.txt, llms-full.txt, markdownBuilt for developers, not a no-code UI
FirecrawlQuick one-off generationYes (deprecated)Yesllms.txt, llms-full.txtDeprecated June 2025, no maintenance
Apify llmstxt-generatorApify ecosystem usersYes (Actor)Yesllms.txt4 GB memory cap can slow crawls
MintlifyHosted docs, zero configVia platformYesllms.txt, llms-full.txtNo analytics or content-level controls
llmstxtgenerate.comNon-technical users needing a validatorNoYesllms.txt, llms-full.txt$1 for full content, single URL
llmstxtgen.comFree generation for small sitesNoYesllms.txt, llms-full.txt20-page cap
LLMrefsMarketers tracking AI citationsNoYesllms.txtNo page-count transparency

How We Chose These Tools

We ranked these tools against four criteria that decide whether a generator actually fits an LLM pipeline.

First, API and programmatic access. A web form works for one site, but teams generating files at scale need a callable endpoint, not a button.

Second, compliance with the llmstxt.org specification. The output must include the required H1, summary blockquote, and H2 link sections so parsers and LLMs can read it.

Third, crawl depth and scale. We checked how many pages each tool covers and whether you can configure depth.

Fourth, transparency. Tools that name their page caps, security tradeoffs, and limitations earned higher placement than those that hide them.

The Best llms.txt Generator Tools in 2026

Seven tools generate llms.txt files in 2026, and they split into two camps. Most are web forms or hosted documentation features that produce a single file from a single domain. A few expose an API you can call from a pipeline. The right choice depends on whether you generate one file once or generate thousands on a schedule.

Context.dev — Best for API-First, Programmatic llms.txt Generation at Scale

If you need llms.txt files generated programmatically across many sites, Context.dev is the strongest pick because it ships as an API instead of a web form. You call one endpoint to crawl a target site, get clean markdown back, and assemble llms.txt and llms-full.txt output in your own pipeline. There is no Actor to configure, no dashboard to babysit, and no crawler infrastructure for you to run and maintain.

Best for: Developers and AI teams that need llms.txt generation at scale, integrated directly into an LLM pipeline rather than triggered by hand.

The core of the product is a crawl-and-markdown API that does the two jobs llms.txt actually requires. It crawls a site to discover pages with the Sitemap API, and it returns each page as clean markdown with navigation, ads, and scripts stripped out through the Markdown API. The llmstxt.org spec wants exactly this kind of LLM-readable content, so the markdown output maps onto both the index file and the companion .md pages without a separate conversion step. You decide how to structure the H1, the summary blockquote, and the section lists, because you own the assembly code.

The single-API design is the practical advantage over running your own scraper. One contract covers scraping, crawling, and structured data delivery, so you are not stitching together a headless browser, a proxy layer, a content extractor, and a markdown converter. For teams that already maintain an internal crawler, Context.dev is a direct replacement that removes the operational burden of keeping that crawler alive. The MCP integration matters here, because it lets a coding assistant or agent call the same crawl-and-markdown capability inside an LLM workflow instead of scraping pages itself.

Honest limitations matter. The API is built for developers, not marketers, so if you want a single file for one site without writing code, Context.dev's own free llms.txt generator or a hosted form like llmstxtgenerate.com is the faster route. The API itself has no built-in spec validator or AI Readiness Score, and there is no bundled citation tracking the way LLMrefs offers. You write the code that turns crawl results into the final file format, which is the cost of the flexibility. If you only need one llms.txt file once and never again, the full API is more machinery than the job warrants.

Pricing follows usage, so you pay for the crawling and extraction you actually run rather than a flat documentation-platform subscription. That model fits teams generating files across many domains, where a per-seat or per-site plan would price the work badly. See context.dev for current rates and endpoint details.

Firecrawl — Best for Quick One-Off Generation (Deprecated)

Firecrawl deprecated its standalone llms.txt generator on June 30, 2025, and the team now points developers to its main scraping endpoints instead. The repository still carries a deprecation notice, and the API stays live but unmaintained. If you want the same functionality today, Firecrawl directs you to the create-llmstxt-py example repo built on its core API.

Before deprecation, the tool earned its 531 GitHub stars by doing one job cleanly. It used Firecrawl's open-source scraper to fetch a full site, including JavaScript-heavy pages, then parsed the markdown and extracted each page's title and description with GPT-4o-mini. The result came in two files, a condensed llms.txt and a complete llms-full.txt, which covered both the index and full-content use cases.

Access was the real draw. You could hit https://llmstxt.firecrawl.dev/[YOUR_URL_HERE] with no API key for basic usage and get a file back, though crawling plus the language-model pass often took several minutes. A Hacker News commenter flagged a security problem during the November 2024 launch, noting the tool passed API keys as URL query parameters over plain HTTP. Another called the Firecrawl-plus-GPT-4o stack heavy next to lighter alternatives like crawl4ai.

Best for: quick, one-off generation when you accept that the tool is no longer maintained.

Key features: no-key web access, GPT-4o-mini summaries, two-file output, full JavaScript rendering.

Limitations: deprecated with no active maintenance, dependent on external Firecrawl and OpenAI services, multi-minute latency, and no batch URL input. For anything programmatic or recurring, the unmaintained status alone should send you to an actively supported tool.

Pricing: no documented tiers. Basic usage required no key.

Apify llmstxt-generator — Best for Apify Ecosystem Users

The Apify llmstxt-generator makes the most sense when you already run workflows on Apify and want to drop llms.txt generation into an existing pipeline. It ships as an open-source Actor under the Apache-2.0 license, so you can fork it, read the code, and run it inside the same platform you use for other scraping jobs. For teams outside that ecosystem, the setup overhead rarely justifies the choice.

Under the hood, the Actor calls Apify's Website Content Crawler, which runs on the Crawlee library and handles deep, multi-level crawls with markdown output. You control how far it goes with a single maxCrawlDepth integer, so a value of 1 stays shallow while higher numbers walk deeper into the site. The Actor writes a standard /llms.txt file with a domain title, an ## Index section, page titles, URLs, and descriptions, then saves it to Apify's Key-Value Store where you download it from the run's storage.

Best for: Teams already running Apify workflows who want configurable crawl depth.

Key features: Open-source Actor, deep multi-level crawling, maxCrawlDepth control, standard-format output, and downloadable Key-Value Store results. Apify also supports LangChain through the langchain-apify package and a Dify marketplace plugin, so the output feeds into agent pipelines without extra glue code.

Limitations: The crawler Actor caps memory at 4 GB to stay inside the free tier's 8 GB limit, which can slow large crawls. The repo documents no page-count caps, rate limits, or output-quality benchmarks against rivals.

Pricing: Apify runs on pay-per-use with a free tier, though no specific per-run cost for this Actor is published.

Mintlify — Best for Documentation Sites That Want Zero-Config llms.txt

Mintlify generates both llms.txt and llms-full.txt automatically at the root of every hosted documentation project, with no configuration to write. The files go live the moment you deploy, and they stay synced as your docs change. If your team already runs documentation on Mintlify, you get AI-readable files for free without touching your build.

The strongest signal of adoption came in November 2024, when Mintlify enabled automatic llms.txt generation across every site it hosts. That single change handed llms.txt files to thousands of documentation sites overnight, including Anthropic, Cursor, Pinecone, and Windsurf (getpublii.com). Anthropic specifically requested the files for its own docs, which is concrete demand from a leading AI lab rather than vendor speculation.

Mintlify is a docs hosting platform, not a standalone generator. You cannot point it at an arbitrary website and pull an llms.txt file out. The generation only works for content you already host there, so it solves a narrower problem than crawl-based tools like Firecrawl or Context.dev.

Two gaps matter for teams taking AI seriously. Mintlify offers no analytics on AI or LLM bot traffic, so you cannot see how coding assistants discover or consume your docs (buildwithfern.com). It also lacks content-level controls, which means you cannot include AI-only context or strip marketing copy from the LLM endpoints.

Best for: Small teams already hosting docs on Mintlify who want zero-effort, always-synced llms.txt files.

Pricing: A free Hobby tier covers basic use, Pro runs $250 per month, and Enterprise is custom-priced (readme.com).

llmstxtgenerate.com — Best for Non-Technical Users Who Need a Validator Too

llmstxtgenerate.com bundles three tools into one web suite, and the validator is the reason to choose it over simpler generators. The generator accepts a sitemap URL, a website URL that auto-follows internal links, or a manually pasted list of specific URLs, so non-technical users can pick whichever input they have on hand. The free tier returns page titles, links, and descriptions. A one-time $1 payment through Stripe unlocks llms-full.txt with either summaries near 300 words per page or full content near 2,000 words per page, capped at 25 to 250 pages.

The validator separates this tool from the rest. It checks an existing llms.txt for broken links, missing descriptions, and structural errors, then returns an AI Readiness Score out of 100. Files you upload are read locally in your browser and never stored on a server, which matters if your file references private or unpublished URLs. The site also credits its honesty, stating plainly that llms.txt is a proposed convention rather than a W3C or IETF standard, and warning against expecting overnight AI-traffic gains.

Best for: Non-technical users who want to generate and then validate a file in one place.

Key features: Three input methods, validator with AI Readiness Score, local-only file processing, platform-specific deployment guides for WordPress, Shopify, and Next.js.

Limitations: No API, no bulk generation, one URL at a time.

llmstxtgen.com — Best Free Tool for Small Sites (Up to 20 Pages)

llmstxtgen.com is the fastest way to generate a clean llms.txt for a small site, and it asks nothing of you. The tool runs free with no signup, and it crawls up to 20 pages per URL. For a marketing site, a portfolio, or compact documentation, that ceiling covers the whole site in one pass.

The output sets it apart from flat link dumps. llmstxtgen.com groups discovered pages into logical sections like Guides, Docs, and API rather than listing them in one undifferentiated block, and it produces both llms.txt and llms-full.txt following the llmstxt.org proposal. The tool crawls titles and descriptions only, not full page body text.

Best for: Small sites that need a compliant llms.txt in under a minute.

Honest limitations: The 20-page cap rules out anything larger than a brochure site, and there is no API or bulk generation. The team is candid about impact, calling llms.txt "low-risk hygiene, not a magic lever" and noting that engine support stays inconsistent and undocumented.

LLMrefs — Best for Marketers Tracking AI Brand Citations

LLMrefs makes the most sense if you already track AI brand citations and want llms.txt generation as part of the same toolkit. The free generator deep-crawls your site, reads page metadata, discovers internal links, and writes an AI-generated summary for each page. You can edit the output before downloading, so the per-page descriptions become a starting draft rather than a fixed result.

The generator is one piece of a wider platform that monitors brand mentions across ChatGPT, Google AI, Perplexity, Gemini, and Claude. LLMrefs reports 10,000+ marketers on the platform, which tells you who the tool is built for.

Best for: Marketing teams who want citation monitoring and llms.txt generation in one place.

Key features: AI-generated per-page summaries, editable output, no credit card required, and integration with the broader citation-tracking suite.

Limitations: LLMrefs does not state a page-count limit, so you cannot tell how deep the crawl runs before you start. Its claim that GPTBot visits its own file daily is self-reported, with no independent verification cited.

Why Context.dev Is the Right Pick for Developer Teams

Web-form generators break the moment you need llms.txt for hundreds of domains or want fresh files on a schedule. You paste one URL, wait, and download a file. Context.dev replaces that manual loop with a single API call your pipeline can run on demand, so you generate and regenerate llms.txt the same way you fetch any other data.

The mechanism matters for developer teams. You hit one endpoint for scraping, crawling, and clean markdown output, then pipe the result straight into your LLM workflow through MCP. There is no crawler to build, no queue to babysit, and no infrastructure to keep alive when the spec shifts.

If you already run an internal crawler to keep AI-readable content current, Context.dev is the drop-in replacement. Programmatic generation at scale is the one thing a web form cannot do, and it is the thing your pipeline actually needs.

FAQs

What is an llms.txt file and do I need one? An llms.txt file is a markdown file placed at your site's root that gives LLMs a curated, structured overview of your content at inference time. You need one if AI assistants and coding tools regularly answer questions about your site, since it gives them a clean entry point instead of forcing them to parse noisy HTML. Documentation sites, product pages, and developer tools benefit most.

Is llms.txt an official standard? No. Jeremy Howard and Answer.AI proposed llms.txt in September 2024, and it remains a community convention rather than a W3C- or IETF-ratified standard. The specification is open-source on GitHub and open to input, so treat it as a low-risk convention worth following, not a guaranteed requirement.

Does publishing llms.txt actually increase AI citations? No public, controlled evidence shows that publishing llms.txt measurably increases AI citations, and llmstxtgen.com states this plainly. Engine support stays inconsistent and undocumented across ChatGPT, Perplexity, and others. Publish it as cheap hygiene, and do not expect overnight traffic gains.

What's the difference between llms.txt and llms-full.txt? The llms.txt file is a condensed link map with your site name, a summary, and sectioned links to key pages. The llms-full.txt file inlines the actual page content, giving LLMs the full text in one fetch. Most generators, including Context.dev, produce both so you can serve a short context when the window is tight and a full one when depth matters.

Ship an agent that actually knows things.

Free tier, 10-minute integration, and the same API powering agents at Mintlify, daily.dev, and Propane. No credit card to start.