Glossary

The web, defined.

96 plain-English definitions for the crawling, HTTP, API, and brand-identity primitives you run into shipping production data pipelines. No jargon, no fluff.

Web Scraping & Crawling

How sites get fetched, indexed, and protected.

18

Web Crawler

A program that systematically follows links between web pages to discover and index content at scale.

Web Scraping

Programmatically extracting structured data from websites that were designed to be read by humans.

robots.txt

A plain-text file at the root of a domain that tells crawlers which paths they are allowed (or not allowed) to fetch.

Sitemap

An XML file that lists every important URL on a site so search engines and crawlers can discover them efficiently.

Headless Browser

A real browser engine running without a visible UI, controlled programmatically through an automation API.

Playwright

A Microsoft-maintained library for driving Chrome, Firefox, and WebKit headlessly with a unified API.

CAPTCHA

A challenge-response test designed to distinguish humans from bots, usually presented as image, audio, or behavioral puzzles.

Proxy

A server that forwards your network requests, presenting its own IP address to the destination instead of yours.

Residential Proxy

A proxy that routes your traffic through an IP address assigned by a consumer ISP, making your requests look like ordinary home users.

Rate Limiting

A server-side policy that caps how many requests a client can make in a given window, returning 429 Too Many Requests when the cap is exceeded.

Selenium

An open-source framework that drives real web browsers programmatically, originally built for end-to-end testing and now widely used for scraping JavaScript-heavy sites.

Puppeteer

A Node.js library from the Chrome team that drives Chromium over the DevTools Protocol, used for scraping, screenshots, PDF generation, and headless testing.

XPath

A query language for selecting nodes in an XML or HTML document using path expressions, widely used by scrapers when CSS selectors are not expressive enough.

CSS Selector

A pattern that identifies elements in an HTML document by tag, class, id, attribute, or position, used by stylesheets and (heavily) by web scrapers.

Scrapy

A Python framework for building large-scale web crawlers, with batteries included for scheduling, retries, deduplication, and data pipelines.

BeautifulSoup

A Python library for parsing HTML and XML and extracting data from it using a friendly, forgiving API.

User Agent

The HTTP header a client sends to identify itself to a server, typically containing the browser name, version, and operating system.

Data Extraction

The process of pulling structured data out of unstructured or semi-structured sources like web pages, PDFs, or emails.

HTTP & Networking

The wire-level protocols that move every page.

19

HTTP

The application protocol the web is built on, a simple request/response format for asking a server for a resource.

HTTPS

HTTP encrypted with TLS, the same protocol, but every byte on the wire is authenticated and protected from eavesdroppers.

SSL

A deprecated cryptographic protocol that secured network traffic before TLS replaced it. The name persists colloquially.

TLS

The cryptographic protocol that encrypts and authenticates network traffic—the security layer under HTTPS, SMTPS, and most modern protocols.

DNS

The Domain Name System, the distributed database that translates human-readable domain names into the IP addresses computers actually route to.

CDN

A Content Delivery Network, a globally distributed cache of edge servers that serves your assets from the location closest to each user.

Domain

The human-readable name that identifies a site on the internet, the part that maps to an IP address through DNS.

Subdomain

A prefix added to a parent domain to identify a separate section, app, or service, like `blog.example.com` or `api.example.com`.

TLD

The top-level domain, the rightmost piece of a domain name, like `.com`, `.org`, `.io`, or `.dev`.

CNAME

A DNS record that aliases one hostname to another, so the resolver follows the chain to whatever IP the target eventually points to.

IP Address

A numeric label assigned to every device on a network that uses the Internet Protocol, used to route packets between hosts.

IPv6

The successor to IPv4, using 128-bit addresses to provide a vastly larger address space and a few protocol simplifications.

TCP

Transmission Control Protocol, a reliable, ordered, connection-oriented transport that sits below HTTP and most other application protocols.

HTTP Status Code

A three-digit number returned by an HTTP server that summarizes the result of a request, grouped into 1xx informational, 2xx success, 3xx redirect, 4xx client error, and 5xx server error.

DNS Lookup

The process of translating a hostname (like example.com) into the IP address a client uses to actually connect.

SSL Certificate

A digital document that binds a public key to a domain name, used by browsers to verify that a site is who it claims to be when establishing an HTTPS connection.

WHOIS

A protocol and public lookup service for retrieving the registration record of a domain, IP block, or autonomous system.

FQDN

A Fully Qualified Domain Name, a hostname that specifies its exact location in the DNS hierarchy, including every label up to the root.

CORS

Cross-Origin Resource Sharing, the browser security model that decides whether JavaScript on one origin can read responses from another.

APIs & Authentication

Patterns for calling, and securing, services.

17

API

An Application Programming Interface, a contract that lets one program request actions or data from another in a stable, documented way.

REST API

An API that follows REST conventions, using HTTP methods on resource URLs to model create/read/update/delete operations.

GraphQL

A query language for APIs that lets the client specify exactly the fields it wants from a typed graph of data, returned in one round trip.

Webhook

A user-defined HTTP callback, your URL gets POSTed to whenever an event happens in someone else's system, instead of you polling for changes.

WebSocket

A protocol for full-duplex, persistent communication between a browser (or other client) and a server over a single long-lived TCP connection.

API Key

A secret string that identifies and authenticates a client when calling an API, usually passed in a header on each request.

OAuth

A protocol that lets users grant a third-party app limited access to their data on another service, without sharing their password.

JWT

A JSON Web Token, a compact, signed piece of JSON used to convey claims (who the user is, what they can do) between systems.

OpenAPI

A specification format for describing REST APIs in a machine-readable schema, used to generate documentation, client SDKs, and server stubs.

Swagger

A toolset (and historical predecessor of OpenAPI) for designing, documenting, and consuming REST APIs through a machine-readable schema.

XML

Extensible Markup Language, a self-describing text format for structured data, predating JSON and still ubiquitous in enterprise systems, sitemaps, and RSS feeds.

YAML

A human-readable data serialization format that uses indentation rather than braces, popular for configuration files (CI pipelines, Kubernetes manifests, OpenAPI specs).

CSV

Comma-Separated Values, a plain-text tabular format where each line is a row and columns are separated by a delimiter (usually a comma).

SAML

Security Assertion Markup Language, an XML-based standard for exchanging authentication and authorization data between an identity provider and a service provider.

SSO

Single Sign-On, a session and user-authentication scheme that lets one login grant access to multiple independent applications.

Idempotent

A property of an operation where running it multiple times produces the same result as running it once, critical for safe retries in distributed systems.

MCP (Model Context Protocol)

An open protocol from Anthropic for connecting LLMs to external tools, data sources, and prompts through a standardized client/server interface.

Web Content & Formats

The structures that data lives in once you fetch it.

13

DOM

The Document Object Model, a tree of objects that represents an HTML document in memory and lets JavaScript manipulate it.

JSON

JavaScript Object Notation, a lightweight text format for representing structured data, supported natively by every modern language.

Markdown

A lightweight plain-text formatting syntax that converts deterministically to HTML, created in 2004 to be readable as source and as rendered output.

Slug

The human-readable, URL-safe portion of a URL that identifies a specific page, usually a lowercased, hyphenated version of the page title.

Cookie

A small piece of data a server sends a browser, echoed back on subsequent requests to the same site—the standard mechanism for sessions and tracking.

SVG

Scalable Vector Graphics, an XML-based image format that describes shapes mathematically, so the image is sharp at any resolution.

Favicon

The small icon a browser shows in the tab bar, bookmarks, and history for a given site, the most-overlooked piece of brand identity on the web.

HTML

HyperText Markup Language, the markup standard that defines the structure and semantics of every web page.

CSS

Cascading Style Sheets, the language browsers use to style HTML: colors, typography, layout, animation, and responsive behavior.

Regex

A pattern language for matching, searching, and extracting substrings from text, used everywhere from code editors to log parsing to data validation.

Base64

A binary-to-text encoding that represents arbitrary bytes as ASCII characters using a 64-character alphabet, used to embed binary data in text formats.

UUID

A Universally Unique Identifier, a 128-bit value formatted as 32 hex digits with hyphens, used as a globally unique key without coordination between systems.

WebP

A modern image format from Google that offers smaller file sizes than JPEG and PNG with comparable quality, supporting both lossy and lossless compression and animation.

SEO & Metadata

Signals search engines and LLMs read off the page.

13

Open Graph

A meta-tag protocol Facebook introduced in 2010 that tells social platforms how to render a link preview, title, description, image, and type.

Meta Tag

An HTML element in the `<head>` that supplies metadata about the page, title, description, viewport, social previews, robots directives.

Alt Text

A short text description of an image, set via the `alt` attribute, used by screen readers and search engines to understand what the image shows.

Canonical URL

The "official" URL for a piece of content when multiple URLs could return the same content, declared via `<link rel="canonical" href="…">`.

Schema Markup

Structured data added to a page using the schema.org vocabulary so search engines can understand its meaning and show rich results.

Structured Data

Information on a page formatted so that machines can parse its meaning, not just its text, the foundation for rich snippets and AI-powered search.

JSON-LD

JSON for Linking Data, a way to embed schema.org structured data into a page using plain JSON in a `<script type="application/ld+json">` block.

SEO

Search Engine Optimization, the practice of structuring a site so that it ranks well in unpaid (organic) search results.

SERP

Search Engine Results Page, the page a search engine returns for a given query, including organic results, ads, and rich features like featured snippets and AI overviews.

Backlink

A hyperlink from one website to another, treated by search engines as a signal of authority and relevance.

Meta Description

A short summary of a page provided in HTML metadata, often used by search engines and link previews as the descriptive text below the title.

Core Web Vitals

A set of three Google-defined page-experience metrics (LCP, INP, CLS) that measure load speed, interactivity, and visual stability, used as ranking signals.

Title Tag

The HTML element (`<title>`) that defines the page's title, used as the clickable headline on a search results page and in browser tabs.

Brand & Design

Identity primitives Brand.dev pulls back from a domain.

16

Logo

The primary visual mark a company uses to identify itself, typically a wordmark, symbol, or combination, optimized to be recognizable at any size.

Typography

The craft of arranging type (choosing typefaces, weights, sizes, and spacing) to give written language its visual personality.

NAICS Code

A six-digit industry classification used by US, Canadian, and Mexican government agencies to categorize every business in North America.

SIC Code

The Standard Industrial Classification, a four-digit industry taxonomy used by the SEC and a long tail of legacy systems, predecessor to NAICS.

Brand Identity

The complete set of visual and verbal elements (logo, colors, typography, voice, imagery) that a company uses to express itself consistently across every touchpoint.

Color Palette

The defined set of colors a brand or design system uses, typically organized into primary, secondary, neutral, and semantic groups.

Hex Color

A six-digit (or three-digit) hexadecimal representation of a color's red, green, and blue components, the most common way to write colors in CSS and design tools.

RGB

A color model that represents colors as additive combinations of red, green, and blue light, used by every digital screen.

CMYK

A subtractive color model used in print that mixes cyan, magenta, yellow, and black inks to produce colors on paper.

Pantone

A standardized spot-color system used in print, where each color is a pre-mixed ink with a globally consistent reference number (e.g., Pantone 286 C).

Typeface

A complete set of letterforms designed to share a coherent visual style; what most people informally call a "font."

Google Fonts

A free open-source web font library hosted by Google, used by millions of sites to serve typefaces with one or two lines of HTML or CSS.

EIN

Employer Identification Number, a nine-digit federal tax ID assigned by the IRS to businesses, nonprofits, and other entities operating in the United States.

DUNS Number

A nine-digit business identifier issued by Dun & Bradstreet, used globally for credit reporting, government contracting, and supply chain verification.

Ticker Symbol

A short alphabetic code used to uniquely identify a publicly traded security on a stock exchange (e.g., AAPL for Apple, MSFT for Microsoft).

Wordmark

A logo type that consists of the brand's name set in a custom or distinctive typeface, with no separate symbol (e.g., Google, Coca-Cola, FedEx).

Ship an agent that actually knows things.

Free tier, 10-minute integration, and the same API powering agents at Mintlify, daily.dev, and Propane. No credit card to start.