Web Scraping & Crawling

What is a CAPTCHA?

A challenge-response test designed to distinguish humans from bots, usually presented as image, audio, or behavioral puzzles.

Also known as: reCAPTCHA, human verification

CAPTCHA stands for "Completely Automated Public Turing test to tell Computers and Humans Apart." It dates back to 2003 and originally meant distorted-text puzzles. Today the field is dominated by Google's reCAPTCHA v3 and Cloudflare Turnstile, which run silently in the background and score user behavior instead of showing a puzzle most of the time.

CAPTCHAs exist to stop automated abuse: scraping, credential stuffing, fake account creation, comment spam. They work by exploiting tasks that are still hard for bots (image classification, mouse-movement patterns, browser-fingerprint consistency) even after a decade of rapid ML progress.

For scrapers, CAPTCHAs are an obstacle. The honest workaround is to use the site's public API or buy data through a partnership. The technical workarounds (captcha-solving services, residential proxies, fingerprint randomization) work but get more expensive every year as detection tightens.

In the wild

→hCaptcha image-grid challenges ("select all squares with a bus")
→reCAPTCHA v3 returning a 0–1 risk score with no user interaction
→Cloudflare Turnstile running invisibly until risk exceeds a threshold

How Brand.dev uses captcha

Endpoints in the Brand.dev API where this concept comes up directly.

Web Scrape HTML API Website Crawler API

FAQ

Why am I getting CAPTCHAs on a site I use normally?

Usually because something on your network (a VPN, a shared IP, an outdated browser) bumped your risk score. Switching networks or browsers typically clears it.

Are CAPTCHA solvers legal?

Selling CAPTCHA-solving services is legal. Using them to access a site in violation of its terms of service is a contract issue, and may be a CFAA issue depending on jurisdiction.

How do I avoid CAPTCHAs while scraping?

Slow down, rotate residential IPs, and use a stable real-browser fingerprint. The cheapest fix is usually to scrape during off-peak hours from clean IPs.

Related terms

Web Scraping

Programmatically extracting structured data from websites that were designed to be read by humans.

Proxy

A server that forwards your network requests, presenting its own IP address to the destination instead of yours.

Rate Limiting

A server-side policy that caps how many requests a client can make in a given window, returning 429 Too Many Requests when the cap is exceeded.

Residential Proxy

A proxy that routes your traffic through an IP address assigned by a consumer ISP, making your requests look like ordinary home users.

←All glossary terms

Ship an agent that actually knows things.

Free tier, 10-minute integration, and the same API powering agents at Mintlify, daily.dev, and Propane. No credit card to start.

Get API Access

Book a call