Web Content & Formats

What is a regular expression?

A pattern language for matching, searching, and extracting substrings from text, used everywhere from code editors to log parsing to data validation.

Also known as: regular expression

A regular expression describes a set of strings using a compact syntax: \d+ matches one or more digits, [A-Za-z]+ matches a run of letters, ^https?:// matches anything starting with http:// or https://. Most languages ship with a regex engine in the standard library; the syntax is mostly portable across engines, with a few quirks (Perl-compatible PCRE, ECMAScript flavor, RE2).

Regex shines for text processing where the structure is locally regular: parsing log lines, validating email shapes, extracting URLs from prose, find-and-replace in code. It is famously poor at parsing genuinely nested grammars (HTML, JSON, programming languages) where a real parser does the job in a third of the lines and never produces the wrong result.

For data extraction work, regex usually plays a supporting role: clean up boilerplate, normalize whitespace, isolate a phone number from a longer string. The mistake teams make is reaching for regex when a proper parser exists; the bigger mistake is reaching for a parser when one well-written regex would do.

In the wild

  • /[\w.-]+@[\w.-]+\.\w+/g to find email-shaped strings in a page
  • /^[A-Z]{2}\d{2}[A-Z]{4}\d{14}$/ to validate an IBAN
  • A log-parsing pipeline using regex to pull out timestamp, level, and request-id fields

How Brand.dev uses regex

Endpoints in the Brand.dev API where this concept comes up directly.

FAQ

When should I NOT use regex?

For nested or recursive structures (HTML, JSON, programming languages), use a real parser. The classic Stack Overflow rant about parsing HTML with regex is right about this.

What is catastrophic backtracking?

A pathology where a poorly written regex (often with nested quantifiers like (a+)+) takes exponential time on certain inputs. RE2 (used by Go) avoids this by design; PCRE-style engines do not.

PCRE vs ECMAScript regex?

PCRE supports lookbehind, recursion, named captures, and a richer syntax; ECMAScript regex (in browsers) has been catching up but lacks some PCRE features. Test on the engine your code actually runs on.

Related terms

Ship an agent that actually knows things.

Free tier, 10-minute integration, and the same API powering agents at Mintlify, daily.dev, and Propane. No credit card to start.