Sitemap Extractor {API}
Extract every page URL from any website sitemap with a single API call.
Pass a domain name and get back up to 500 deduplicated page URLs. Sitemap index files are crawled recursively. Non-page resources are filtered out automatically.
Perfect for building content indexes, seeding crawlers, or auditing a competitor's full site structure in seconds.
What You Get
Each request crawls a domain's sitemaps and returns all discoverable page URLs.
Up to 500 page URLs
Deduplicated, page-only results with non-page resources like images and PDFs automatically filtered out.
Sitemap index support
Recursively crawls nested sitemap index files with parallel fetching and concurrency control.
Crawl metadata
See how many sitemaps were discovered, fetched, skipped, and errored in a single response.
Normalized domain input
Pass just the domain name, no protocol needed. The API validates and normalizes domains automatically.
How It Works
We discover every relevant sitemap, follow sitemap indexes, and return a clean, deduplicated list of page URLs.
Send a domain
Pass the domain name (e.g., "example.com" or "blog.example.com"). No protocol required.
Sitemaps discovered
The API checks robots.txt and common sitemap paths, then recursively follows sitemap index files.
URLs extracted and deduplicated
All page URLs are collected from every sitemap, deduplicated, and filtered to exclude non-page resources.
Clean URL list returned
You get up to 500 normalized page URLs plus crawl metadata about the sitemap discovery process.
API Response
Discovered URLs for context.dev
GET /v1/web/scrape/sitemap?domain=context.dev{
"success": true,
"domain": "context.dev",
"urls": [
"https://context.dev/",
"https://context.dev/pricing",
"https://context.dev/blog",
"https://context.dev/data/logo-api",
"https://context.dev/use-cases/logo-link",
"... up to 500 URLs"
],
"meta": {
"sitemapsDiscovered": 3,
"sitemapsFetched": 3,
"sitemapsSkipped": 0,
"errors": 0
}
}Context at scale
Join 5,000+ businesses using Context.dev to enrich their products with structured web data.













