Web Scraping & Crawling
What is a CAPTCHA?
A challenge-response test designed to distinguish humans from bots, usually presented as image, audio, or behavioral puzzles.
Also known as: reCAPTCHA, human verification
CAPTCHA stands for "Completely Automated Public Turing test to tell Computers and Humans Apart." It dates back to 2003 and originally meant distorted-text puzzles. Today the field is dominated by Google's reCAPTCHA v3 and Cloudflare Turnstile, which run silently in the background and score user behavior instead of showing a puzzle most of the time.
CAPTCHAs exist to stop automated abuse: scraping, credential stuffing, fake account creation, comment spam. They work by exploiting tasks that are still hard for bots (image classification, mouse-movement patterns, browser-fingerprint consistency) even after a decade of rapid ML progress.
For scrapers, CAPTCHAs are an obstacle. The honest workaround is to use the site's public API or buy data through a partnership. The technical workarounds (captcha-solving services, residential proxies, fingerprint randomization) work but get more expensive every year as detection tightens.
In the wild
- →hCaptcha image-grid challenges ("select all squares with a bus")
- →reCAPTCHA v3 returning a 0–1 risk score with no user interaction
- →Cloudflare Turnstile running invisibly until risk exceeds a threshold
How Brand.dev uses captcha
Endpoints in the Brand.dev API where this concept comes up directly.
FAQ
Why am I getting CAPTCHAs on a site I use normally?
Usually because something on your network (a VPN, a shared IP, an outdated browser) bumped your risk score. Switching networks or browsers typically clears it.
Are CAPTCHA solvers legal?
Selling CAPTCHA-solving services is legal. Using them to access a site in violation of its terms of service is a contract issue, and may be a CFAA issue depending on jurisdiction.
How do I avoid CAPTCHAs while scraping?
Slow down, rotate residential IPs, and use a stable real-browser fingerprint. The cheapest fix is usually to scrape during off-peak hours from clean IPs.
Related terms
Programmatically extracting structured data from websites that were designed to be read by humans.
A server that forwards your network requests, presenting its own IP address to the destination instead of yours.
A server-side policy that caps how many requests a client can make in a given window, returning 429 Too Many Requests when the cap is exceeded.
A proxy that routes your traffic through an IP address assigned by a consumer ISP, making your requests look like ordinary home users.