Web Scraping & Crawling
What is a user agent?
The HTTP header a client sends to identify itself to a server, typically containing the browser name, version, and operating system.
Also known as: UA, User-Agent string
Every HTTP request carries a User-Agent header. A real Chrome browser sends something like Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36. Servers use that string to log analytics, vary content (mobile vs desktop), gate features, and (more often than they admit) decide whether to serve you at all.
For crawlers, the User-Agent is both a courtesy and a target. A polite bot identifies itself honestly: Googlebot/2.1 (+http://www.google.com/bot.html). An anti-bot stack, conversely, looks at the UA as one of dozens of signals; sending an obviously generic Python UA is a fast way to get blocked. Scrapers commonly rotate through a pool of recent browser UAs to blend in.
On the modern web, the UA string is being phased out for client identification in favor of User-Agent Client Hints (Sec-CH-UA headers), which let sites request only the bits they need. Most servers still parse the legacy UA, so it is not going anywhere fast.
In the wild
- →A sitemap generator setting
User-Agent: MyCrawler/1.0 (admin@example.com)so site owners can contact it - →A scraper rotating between 50 real Chrome and Firefox UAs to avoid pattern-matching blocks
- →A server returning a stripped-down HTML page when the UA matches a known bot to save bandwidth
How Brand.dev uses user agent
Endpoints in the Brand.dev API where this concept comes up directly.
FAQ
What user agent should my crawler send?
A descriptive one: name, version, and a contact URL or email. Hiding behind a fake browser UA is acceptable when targets block all non-browser UAs by default; it stops being acceptable when used to evade an explicit block.
Can servers trust the User-Agent header?
No. It is client-controlled and trivially spoofable. Use it for analytics and content negotiation, never for security decisions.
What are User-Agent Client Hints?
A Chrome-led replacement that splits the UA into multiple opt-in headers (Sec-CH-UA, Sec-CH-UA-Platform, Sec-CH-UA-Mobile). Servers request specific bits via the Accept-CH header rather than parsing one giant string.
Related terms
The application protocol the web is built on, a simple request/response format for asking a server for a resource.
A program that systematically follows links between web pages to discover and index content at scale.
Programmatically extracting structured data from websites that were designed to be read by humans.
A plain-text file at the root of a domain that tells crawlers which paths they are allowed (or not allowed) to fetch.
A server-side policy that caps how many requests a client can make in a given window, returning 429 Too Many Requests when the cap is exceeded.