Web Scraping & Crawling

What is Selenium?

An open-source framework that drives real web browsers programmatically, originally built for end-to-end testing and now widely used for scraping JavaScript-heavy sites.

Also known as: Selenium WebDriver

Selenium exposes a single API (the WebDriver protocol) that controls Chrome, Firefox, Edge, and Safari. Your script tells the browser to open a URL, click an element, fill a form, or read the rendered DOM, and Selenium translates each call into the browser's native automation channel. Because it drives an actual browser, every page behaves the same way it would for a human visitor: JavaScript executes, fetch calls fire, cookies persist.

For scraping, that fidelity is the main draw. A static HTTP fetch of a React or Angular page returns a near-empty shell, the data only arrives after the bundle runs. Selenium waits for the script to settle and then hands you the fully hydrated DOM. The cost is speed: a Selenium session is one to two orders of magnitude slower than a raw HTTP request, and you pay for the memory of a full browser per worker.

Selenium is older than Puppeteer or Playwright and supports more languages (Python, Java, Ruby, C#, JavaScript) which is why it remains popular in QA shops and any team with a non-Node stack. For greenfield scraping work most engineers now reach for Playwright first, but Selenium's ecosystem of bindings and plugins keeps it relevant.

In the wild

→Driving Chrome to log in to a partner portal and download a daily report
→Running an end-to-end test suite in CI against a staging environment
→Scraping a single-page app whose data only appears after a few hundred milliseconds of XHR

How Brand.dev uses selenium

Endpoints in the Brand.dev API where this concept comes up directly.

Web Scrape HTML API Markdown Scrape API Website Crawler API

FAQ

Selenium or Playwright?

Playwright if you are starting fresh and live in JavaScript or Python: it has a faster API, built-in auto-waiting, and better debugging. Selenium if your stack is Java, C#, or Ruby, or you have an existing test suite to extend.

Is Selenium good for scraping?

It works, but it is heavy. Reach for it when the target site genuinely requires JavaScript execution. For static or server-rendered pages, a plain HTTP client plus an HTML parser is faster and cheaper.

How do I avoid getting blocked using Selenium?

Use undetected-chromedriver or stealth plugins to mask the WebDriver flags Cloudflare and DataDome look for, rotate residential IPs, and throttle aggressively. Even then, anti-bot vendors update faster than open-source patches.

Related terms

Playwright

A Microsoft-maintained library for driving Chrome, Firefox, and WebKit headlessly with a unified API.

Headless Browser

A real browser engine running without a visible UI, controlled programmatically through an automation API.

Web Scraping

Programmatically extracting structured data from websites that were designed to be read by humans.

DOM

The Document Object Model, a tree of objects that represents an HTML document in memory and lets JavaScript manipulate it.

Web Crawler

A program that systematically follows links between web pages to discover and index content at scale.

←All glossary terms