Website Crawler {API}

Crawl any website and extract Markdown from every page.

Point the crawler at any URL and get back clean Markdown for every page it discovers. Control depth, page limits, URL filters, and subdomain following — all with a single POST request.

1 credit per page crawled.

No credit card required
View Documentation
Daydream logo
Kovai logo
Passionfroot logo
Orange logo
SendX logo
Klarna logo
Super.com logo
Daydream logo
Kovai logo
Passionfroot logo
Orange logo
SendX logo
Klarna logo
Super.com logo
Daydream logo
Kovai logo
Passionfroot logo
Orange logo
SendX logo
Klarna logo
Super.com logo
Daydream logo
Kovai logo
Passionfroot logo
Orange logo
SendX logo
Klarna logo
Super.com logo

What You Get

Crawl entire sites and get structured Markdown for every page.

Multi-page Markdown extraction

Crawl discovered pages and get clean Markdown for every page returned

Depth & page limits

Control crawl depth from the start URL and cap the total number of pages

URL regex filtering

Only follow and scrape URLs matching your custom regex pattern

Subdomain following

Optionally crawl across subdomains of the starting domain

How It Works

We handle link discovery, page fetching, and Markdown conversion for you.

— step 01

POST a starting URL

Send a fully qualified URL with optional depth, page limit, and regex filters

— step 02

Links are discovered

The crawler follows same-domain links respecting your depth and regex constraints

— step 03

Pages converted to Markdown

Each page is fetched and its content is extracted as clean Markdown

— step 04

Results returned

Get all pages with Markdown content, metadata, crawl depth, and status codes

API Response

Crawl results for docs.context.dev

POST /v1/web/crawl
{
  "results": [
    {
      "markdown": "# Context.dev Documentation\n\nWelcome to the Context.dev API docs.\nLearn how to extract brand data, scrape websites,\nand query any domain with a single API call.\n\n## Getting Started\n\n- [Authentication](/authentication)\n- [Quick Start](/quickstart)\n- [API Reference](/api-reference)\n...",
      "metadata": {
        "url": "https://docs.context.dev",
        "title": "Context.dev Documentation",
        "crawlDepth": 0,
        "statusCode": 200,
        "success": true
      }
    },
    {
      "markdown": "# Authentication\n\nAll API requests require a Bearer token...\n",
      "metadata": {
        "url": "https://docs.context.dev/authentication",
        "title": "Authentication - Context.dev",
        "crawlDepth": 1,
        "statusCode": 200,
        "success": true
      }
    }
  ],
  "metadata": {
    "numUrls": 12,
    "maxCrawlDepth": 2,
    "numSucceeded": 11,
    "numFailed": 1
  }
}

Frequently asked questions

Common questions about the Context.dev Website Crawler API.

Am I billed for failed requests?
No. You are not billed for failed requests or requests where we are blocked (rarely happens). Credits are only consumed on successful responses.
How does the Website Crawler API work?
POST a starting URL to /v1/web/crawl. The crawler discovers same-domain links, follows them up to your maxDepth, filters them by your urlRegex pattern, fetches each page, and converts the HTML to clean Markdown. The response is an array of pages with markdown + per-page metadata (URL, title, status, depth).
What's the difference between the Crawler API and the Sitemap API?
Sitemap Extractor reads the URLs the site declares in sitemap.xml — fast, no rendering, just URL list. The Crawler walks links from the start URL to discover pages (whether or not they're in the sitemap) AND extracts page content as Markdown. Use Sitemap when you only need URLs; use Crawler when you need URLs + body text.
How much does it cost to crawl a website?
1 credit per successfully crawled page. Failed pages don't consume credits. Cap costs by setting maxPages and maxDepth. For example, maxDepth=2 and maxPages=50 crawls at most the first 50 pages within 2 hops of the start URL.
Can I filter which pages get crawled?
Yes — three knobs: (1) urlRegex matches URLs you want to follow (e.g. "/blog/.*" to scope to a blog), (2) maxDepth caps how many hops from the start URL, (3) followSubdomains controls whether subdomains are in scope.
Does it follow subdomains?
By default it stays on the apex domain (and treats www as equivalent). Pass followSubdomains: true to also crawl subdomains like docs.example.com or blog.example.com from a start URL on example.com.
Can I use this for RAG or LLM ingestion?
Yes — that's the most common use case. Crawl a docs site or knowledge base, store the returned Markdown in a vector DB, and your LLM has the full corpus indexed. The output is already LLM-ready: clean text, no HTML noise, no boilerplate.
Is there a free tier for the website crawler?
Yes — the free tier covers thousands of monthly page crawls. A single API key also unlocks single-page web scraping (HTML, Markdown, images), sitemaps, brand data, and the rest of the Context.dev stack.

Ship an agent that actually knows things.

Free tier, 10-minute integration, and the same API powering agents at Mintlify, daily.dev, and Propane. No credit card to start.