Web Scraping & Crawling

What is a sitemap?

An XML file that lists every important URL on a site so search engines and crawlers can discover them efficiently.

A sitemap is a structured index of a site's public URLs, served from a path like /sitemap.xml. Each <url> entry usually carries the location, a lastmod timestamp, and an optional change frequency and priority. Large sites split their URL inventory into a sitemap index that points to multiple child sitemaps.

For SEO, sitemaps speed up the discovery of new and updated pages, Google does not need them, but it uses them when they exist. For programmatic use, sitemaps are the cleanest possible enumeration of a domain: they are exactly what a polite crawler should hit first before falling back to link-following.

Brand.dev's Sitemap Extractor API takes any domain, finds the sitemap (or builds one by parsing the index), and returns the deduplicated URL list, including handling of nested sitemap indexes, gzipped feeds, and robots.txt Sitemap directives.

In the wild

  • /sitemap.xml, the canonical location
  • /sitemap_index.xml, an index pointing to per-section sitemaps (often used by WordPress)
  • Image and video sitemaps for richer indexing of media-heavy sites

How Brand.dev uses sitemap

Endpoints in the Brand.dev API where this concept comes up directly.

FAQ

Do I need a sitemap?

If your site is small and well-linked, no, Google will find every page anyway. If you have orphaned pages, a large catalog, or content that updates frequently, a sitemap meaningfully accelerates indexing.

How do I find a site's sitemap?

Try /sitemap.xml first, then /sitemap_index.xml, then check robots.txt for a Sitemap: directive. Brand.dev's sitemap API does this lookup for you.

What's the difference between an XML and HTML sitemap?

XML sitemaps are for crawlers; HTML sitemaps are for humans navigating the site. They serve different audiences and rarely contain the same URLs.

Related terms

Ship an agent that actually knows things.

Free tier, 10-minute integration, and the same API powering agents at Mintlify, daily.dev, and Propane. No credit card to start.