Web Scraping & Crawling

What is a CSS selector?

A pattern that identifies elements in an HTML document by tag, class, id, attribute, or position, used by stylesheets and (heavily) by web scrapers.

CSS selectors started as the targeting mechanism for stylesheets: .price styles every element with class "price". The same syntax doubles as the de facto query language for the DOM, exposed through document.querySelectorAll in browsers and through every major scraping library (BeautifulSoup, Cheerio, Scrapy, Playwright, Puppeteer).

Selectors compose well. article.featured > h2 a[href^="/blog"] reads as: an a tag whose href starts with /blog, inside an h2, that is a direct child of an article with class featured. Combinators (> child, + adjacent sibling, ~ general sibling) and attribute selectors ([type="email"], [data-id*="user"]) cover most extraction patterns.

Where CSS falls short is positional logic and text matching. There is no "element whose text contains X" in CSS, and :nth-child works on element index rather than filtered subsets. Scrapers that need those reach for XPath or post-filter the matches in code.

In the wild

  • h1.product-title to grab the headline of a product page
  • meta[property="og:image"] to pull the Open Graph image URL out of <head>
  • a.btn-primary[href*="checkout"] to find a checkout CTA reliably across page variants

How Brand.dev uses css selector

Endpoints in the Brand.dev API where this concept comes up directly.

FAQ

Are CSS selectors faster than XPath?

In browsers, yes, querySelectorAll is highly optimized. In server-side parsers like BeautifulSoup or lxml the gap is narrower, but CSS still tends to be faster for simple cases.

How specific should my selectors be?

Specific enough to disambiguate, loose enough to survive minor markup changes. Anchor on stable attributes (data-test, role, aria-label) rather than auto-generated class names from CSS-in-JS.

Can I use pseudo-classes when scraping?

Some, like :nth-of-type or :first-child, are supported by most parsers. Browser-only pseudo-classes (:hover, :focus) are meaningless server-side.

Related terms

Ship an agent that actually knows things.

Free tier, 10-minute integration, and the same API powering agents at Mintlify, daily.dev, and Propane. No credit card to start.