How many URLs does the sitemap extractor return?

The sitemap extractor returns up to 500 deduplicated page URLs per domain. Non-page resources like images, PDFs, and video files are automatically filtered out so you only get navigable page URLs.

Does it support sitemap index files?

Yes. The API handles sitemap index files recursively. It discovers child sitemaps and fetches them in parallel with concurrency control. The response metadata tells you how many sitemaps were discovered, fetched, and skipped.

What is the domain parameter format?

Pass the domain without the protocol, e.g. 'example.com' or 'blog.example.com'. The API automatically normalizes and validates the domain before crawling.

Sitemap Extractor {API}

Extract every page URL from any website sitemap with a single API call.

Pass a domain name and get back up to 500 deduplicated page URLs. Sitemap index files are crawled recursively. Non-page resources are filtered out automatically.

Perfect for building content indexes, seeding crawlers, or auditing a competitor's full site structure in seconds.

No credit card required

View Documentation

What You Get

Each request crawls a domain's sitemaps and returns all discoverable page URLs.

Up to 500 page URLs

Deduplicated, page-only results with non-page resources like images and PDFs automatically filtered out.

Sitemap index support

Recursively crawls nested sitemap index files with parallel fetching and concurrency control.

Crawl metadata

See how many sitemaps were discovered, fetched, skipped, and errored in a single response.

Normalized domain input

Pass just the domain name, no protocol needed. The API validates and normalizes domains automatically.

How It Works

We discover every relevant sitemap, follow sitemap indexes, and return a clean, deduplicated list of page URLs.

— step 01 —

Send a domain

Pass the domain name (e.g., "example.com" or "blog.example.com"). No protocol required.

— step 02 —

Sitemaps discovered

The API checks robots.txt and common sitemap paths, then recursively follows sitemap index files.

— step 03 —

URLs extracted and deduplicated

All page URLs are collected from every sitemap, deduplicated, and filtered to exclude non-page resources.

— step 04 —

Clean URL list returned

You get up to 500 normalized page URLs plus crawl metadata about the sitemap discovery process.

API Response

Discovered URLs for context.dev

GET /v1/web/scrape/sitemap?domain=context.dev

Try in API Playground

{
  "success": true,
  "domain": "context.dev",
  "urls": [
    "https://context.dev/",
    "https://context.dev/pricing",
    "https://context.dev/blog",
    "https://context.dev/data/logo-api",
    "https://context.dev/use-cases/logo-link",
    "... up to 500 URLs"
  ],
  "meta": {
    "sitemapsDiscovered": 3,
    "sitemapsFetched": 3,
    "sitemapsSkipped": 0,
    "errors": 0
  }
}

Ship an agent that actually knows things.

Free tier, 10-minute integration, and the same API powering agents at Mintlify, daily.dev, and Propane. No credit card to start.

Get API Access

Book a call