Web Content & Formats
What is HTML?
HyperText Markup Language, the markup standard that defines the structure and semantics of every web page.
Also known as: HyperText Markup Language
HTML uses a tree of tags (<html>, <head>, <body>, <h1>, <p>, <a>) to describe what content means: this is a heading, this is a link, this is a navigation block, this is the main article. Browsers turn that tree (the DOM) into a rendered page using CSS for visual styling and JavaScript for behavior.
The current standard is HTML Living Standard (maintained by WHATWG; the W3C HTML5 line was retired in favor of it). Modern HTML adds semantic elements (<article>, <section>, <nav>, <aside>), form types (email, tel, date), embedded media (<video>, <audio>), and a steady stream of accessibility attributes (aria-*, role).
For crawlers and brand-data tools, HTML is the primary input format. Logos hide in <link rel="icon"> and <meta property="og:image">. Colors leak through inline <style> tags and class names that map to CSS variables. Company descriptions, addresses, and contact details live in JSON-LD blocks or microdata. Almost every brand signal you can extract starts with parsing HTML.
In the wild
- →A scraper reading
<meta property="og:image">to grab the share preview image - →A semantic HTML page using
<article>and<time datetime="...">so screen readers and search engines parse it correctly - →An LLM ingesting cleaned HTML (with
<script>and<style>removed) as context for question answering
How Brand.dev uses html
Endpoints in the Brand.dev API where this concept comes up directly.
FAQ
HTML vs HTML5?
"HTML5" was the marketing name for the major 2014 revision; the spec is now a "living standard" that updates continuously. Practically, when people say HTML5 they mean modern HTML with semantic elements, video/audio, and the Canvas API.
Is HTML a programming language?
No. It is a markup language: it describes structure, not behavior. JavaScript is the programming language that runs alongside it.
How clean is real-world HTML?
Far less clean than spec-compliant. Browsers are extremely forgiving (mismatched tags, missing closes, attribute quoting variations), so real pages often only render correctly because of that forgiveness. Server-side parsers should use a tolerant library, not a strict XML parser.
Related terms
Cascading Style Sheets, the language browsers use to style HTML: colors, typography, layout, animation, and responsive behavior.
The Document Object Model, a tree of objects that represents an HTML document in memory and lets JavaScript manipulate it.
Programmatically extracting structured data from websites that were designed to be read by humans.
A Python library for parsing HTML and XML and extracting data from it using a friendly, forgiving API.
An HTML element in the `<head>` that supplies metadata about the page, title, description, viewport, social previews, robots directives.
A meta-tag protocol Facebook introduced in 2010 that tells social platforms how to render a link preview, title, description, image, and type.