Web Content & Formats

What is HTML?

HyperText Markup Language, the markup standard that defines the structure and semantics of every web page.

Also known as: HyperText Markup Language

HTML uses a tree of tags (<html>, <head>, <body>, <h1>, <p>, <a>) to describe what content means: this is a heading, this is a link, this is a navigation block, this is the main article. Browsers turn that tree (the DOM) into a rendered page using CSS for visual styling and JavaScript for behavior.

The current standard is HTML Living Standard (maintained by WHATWG; the W3C HTML5 line was retired in favor of it). Modern HTML adds semantic elements (<article>, <section>, <nav>, <aside>), form types (email, tel, date), embedded media (<video>, <audio>), and a steady stream of accessibility attributes (aria-*, role).

For crawlers and brand-data tools, HTML is the primary input format. Logos hide in <link rel="icon"> and <meta property="og:image">. Colors leak through inline <style> tags and class names that map to CSS variables. Company descriptions, addresses, and contact details live in JSON-LD blocks or microdata. Almost every brand signal you can extract starts with parsing HTML.

In the wild

  • A scraper reading <meta property="og:image"> to grab the share preview image
  • A semantic HTML page using <article> and <time datetime="..."> so screen readers and search engines parse it correctly
  • An LLM ingesting cleaned HTML (with <script> and <style> removed) as context for question answering

How Brand.dev uses html

Endpoints in the Brand.dev API where this concept comes up directly.

FAQ

HTML vs HTML5?

"HTML5" was the marketing name for the major 2014 revision; the spec is now a "living standard" that updates continuously. Practically, when people say HTML5 they mean modern HTML with semantic elements, video/audio, and the Canvas API.

Is HTML a programming language?

No. It is a markup language: it describes structure, not behavior. JavaScript is the programming language that runs alongside it.

How clean is real-world HTML?

Far less clean than spec-compliant. Browsers are extremely forgiving (mismatched tags, missing closes, attribute quoting variations), so real pages often only render correctly because of that forgiveness. Server-side parsers should use a tolerant library, not a strict XML parser.

Related terms

Ship an agent that actually knows things.

Free tier, 10-minute integration, and the same API powering agents at Mintlify, daily.dev, and Propane. No credit card to start.