Web Scraping & Crawling

What is a CSS selector?

Q: Can I use pseudo-classes when scraping?

Some, like `:nth-of-type` or `:first-child`, are supported by most parsers. Browser-only pseudo-classes (`:hover`, `:focus`) are meaningless server-side.

A pattern that identifies elements in an HTML document by tag, class, id, attribute, or position, used by stylesheets and (heavily) by web scrapers.

CSS selectors started as the targeting mechanism for stylesheets: .price styles every element with class "price". The same syntax doubles as the de facto query language for the DOM, exposed through document.querySelectorAll in browsers and through every major scraping library (BeautifulSoup, Cheerio, Scrapy, Playwright, Puppeteer).

Selectors compose well. article.featured > h2 a[href^="/blog"] reads as: an a tag whose href starts with /blog, inside an h2, that is a direct child of an article with class featured. Combinators (> child, + adjacent sibling, ~ general sibling) and attribute selectors ([type="email"], [data-id*="user"]) cover most extraction patterns.

Where CSS falls short is positional logic and text matching. There is no "element whose text contains X" in CSS, and :nth-child works on element index rather than filtered subsets. Scrapers that need those reach for XPath or post-filter the matches in code.

In the wild

→h1.product-title to grab the headline of a product page
→meta[property="og:image"] to pull the Open Graph image URL out of <head>
→a.btn-primary[href*="checkout"] to find a checkout CTA reliably across page variants

How Brand.dev uses css selector

Endpoints in the Brand.dev API where this concept comes up directly.

Web Scrape HTML API Markdown Scrape API Image Extractor API

FAQ

Are CSS selectors faster than XPath?

In browsers, yes, querySelectorAll is highly optimized. In server-side parsers like BeautifulSoup or lxml the gap is narrower, but CSS still tends to be faster for simple cases.

How specific should my selectors be?

Specific enough to disambiguate, loose enough to survive minor markup changes. Anchor on stable attributes (data-test, role, aria-label) rather than auto-generated class names from CSS-in-JS.

Can I use pseudo-classes when scraping?

Some, like :nth-of-type or :first-child, are supported by most parsers. Browser-only pseudo-classes (:hover, :focus) are meaningless server-side.

Related terms

XPath

A query language for selecting nodes in an XML or HTML document using path expressions, widely used by scrapers when CSS selectors are not expressive enough.

DOM

The Document Object Model, a tree of objects that represents an HTML document in memory and lets JavaScript manipulate it.

CSS

Cascading Style Sheets, the language browsers use to style HTML: colors, typography, layout, animation, and responsive behavior.

Web Scraping

Programmatically extracting structured data from websites that were designed to be read by humans.

BeautifulSoup

A Python library for parsing HTML and XML and extracting data from it using a friendly, forgiving API.

←All glossary terms