Baseline Design System — Web View & LLM-Readability Research
Research compiled March 2026. Covers design system documentation best practices, established system comparisons, and the emerging field of machine-readable design systems for AI consumption.
Part 1: Design System Documentation — State of Play
Two archetypes
Design system documentation sites split into two recognizable patterns:
The platform archetype (Carbon, Material Design 3, Polaris) serves large ecosystems with multiple consuming teams across frameworks and platforms. Hundreds of pages, deep API documentation, interactive playgrounds, separate entry paths for designers and developers. Information architecture is broad and deep: persistent sidebar navigation with nested categories, typically organized as Foundations → Components → Patterns → Resources.
The product-team archetype (Nordhealth's Nord, Duet by LocalTapiola, Edison for healthcare) serves a smaller number of internal teams building specific products. Leaner, more opinionated, stronger editorial voice. Flatter IA — fewer nesting levels, more curated pages, guidelines embedded alongside components rather than in a separate "principles" section.
Baseline fits the product-team archetype. Small design team, specific set of products, a design system that needs to function as AI context as much as human documentation.
The component page anatomy — settled consensus
Across established systems, a component documentation page follows a consistent structure:
- Description — one to two sentences on purpose
- Live example — rendered component (not static image), front and center
- Variants — all available configurations with code
- Usage guidance — when to use, when not to use, do/don't pairs
- Accessibility — keyboard nav, screen reader, contrast
- API/Props — technical reference
Nathan Curtis (EightShapes) advocates this specific priority: introduction first, then examples front and center since that's what people come for, then design reference with do's and don'ts, and finally code reference. The debate between threading design and code on one scrolling page vs. splitting into tabs is ongoing. Carbon uses tabs. Polaris and Nord use single pages.
The "do and don't" pattern is near-universal and the single most effective convention because it addresses the judgment gap: knowing what a component is matters less than knowing when to use it and when not to.
What the best sites get right
Live rendering over static images. The best systems use live components in their documentation — which doubles as dogfooding since building a documentation site with the system is the ultimate test. Nord does this exceptionally well.
Audience-aware navigation. Carbon's homepage routes the right people to the right content quickly, with "Get Started" guides for designers and developers as primary entry points. The sidebar is comprehensive but secondary — the homepage does the routing.
Component grouping by function, not alphabet. Datadog's DRUIDS organizes by area of use (form components, dialog components). Polaris groups by role: actions, layout, feedback, selection, navigation. Maps to how people think when building interfaces.
Progressive disclosure in the documentation itself. Component page leads with rendered example and one-line description. Variants next. Detailed API further down. Accessibility at the end.
The Nordhealth case — closest analogue
Nord started with user research: interviewed ~60 people across teams before building anything. Their documentation architecture mirrors their modular package structure: first-level packages (Webfonts, Design Tokens, Nordicons) are atoms with no dependencies, while second-level packages (CSS Framework, Web Components, Themes) depend on the first level but hide those dependencies from the consuming developer.
This maps to Baseline's token architecture: primitive tokens → semantic tokens → surface mode tokens → component tokens.
Nord ships with two distinctive brands (therapy and veterinary) and lets users switch the active brand from the documentation site — directly relevant if Baseline ever needs to show how tokens express differently across communicative vs. instrumental surface temperatures.
Nord produced 400+ pages of documentation and live code examples, scoring 100% on Lighthouse — built by a team of four. Proof that a small team can produce world-class documentation if the structure is right.
Recommended IA for Baseline web view
- Get started — what Baseline is, who it's for, how to install/use tokens
- Foundations — design principles (progressive enhancement, emotional modes, surface temperature), color, typography, spacing, grid
- Tokens — full reference organized by tier: primitives → semantics → surface mode → component
- Components — grouped by function (layout, input, feedback, navigation, data display)
- Patterns — composite guidance spanning multiple components (privacy patterns, patient identification, proportional confirmation)
- Content — voice, tone, error message structure, copy rules
Surface temperature should be a foundation-level concept with its own page showing how tokens express differently in communicative vs. instrumental mode — Baseline's intellectual signature.
Emotional mode framework (anxiety-driven, aspiration-driven, latent anxiety) belongs in Patterns, not Foundations — it governs how components are composed, not what they are.
Systems to avoid emulating
Carbon and Material Design 3 are magnificent but designed for a scale of organization and contribution that doesn't match Aleris. Building that documentation infrastructure would consume more time than the design system itself.
Reference design systems
| System | URL | Why it's relevant |
|---|---|---|
| Nord (Nordhealth) | nordhealth.design | Healthcare context, small team, public docs, multi-brand switching |
| Polaris (Shopify) | polaris.shopify.com | Content design guidelines, component grouping by function |
| Carbon (IBM) | carbondesignsystem.com | Navigation patterns, sidebar + TOC structure |
| Material Design 3 | m3.material.io | Token architecture (reference → system → component), foundations organization |
| DRUIDS (Datadog) | — | Grouping components by use-context rather than alphabetical |
| Edison | — | Healthcare-specific design system (used by healthcare workers daily) |
| Sainsbury's Luna | — | Multi-brand design system with theme switching per brand |
Part 2: Making Design Systems LLM-Readable
The problem space
LLMs consuming design systems face three failure modes:
Fabrication. LLMs don't look up design system tokens — they generate plausible-looking ones. If the system uses --space-200 for 8px, the LLM might write padding: 12px because 12px is a reasonable number. It's not wrong — it's just not yours.
Amnesia. Every new session starts from zero context. The LLM doesn't know it used a specific value yesterday. By session five, you have three different values in the same prototype, all of them "fine."
Invisible intent. Point an LLM at a component library and it sees APIs. What it can't extract from source code: when to pick one component over another, what spacing to use between them, how to compose them into layouts that follow conventions. That knowledge lives in designers' heads.
The emerging approach — three layers
Based on work at Indeed (Diana Wolosin's metadata architecture), the Hardik Pandya/Atlaskit approach, Figma's MCP server, and the KickstartDS component builder.
Layer 1: Closed token set with explicit enumeration (addresses fabrication)
Highest-value, lowest-cost intervention. A JSON file enumerating every token with its value, semantic purpose, and constraints:
{
"tokens": {
"color": {
"petrol": {
"value": "#004851",
"usage": "Primary text, headings, dark UI surfaces",
"never": "Background color on patient-facing surfaces"
},
"orange": {
"value": "#F58C61",
"usage": "Primary CTAs, accent, active indicators",
"constraint": "One primary CTA per screen maximum"
}
},
"spacing": {
"sm": { "value": "16px", "usage": "Mobile margins, tight gaps" },
"md": { "value": "24px", "usage": "Desktop margins, grid gutters, standard gaps" }
}
}
}
The never and constraint fields turn the token set from a palette into a rule system. Instead of padding: 16px scattered across files, you create var(--space-200) and the LLM picks from a closed set of named variables instead of inventing plausible values.
The analogy from Hardik Pandya: "Think of it like Infrastructure as Code — before IaC, every server was configured by hand and no two were quite the same. This does the same for design decisions: makes them machine-readable so LLMs stop guessing."
Layer 2: Component spec files with rigid template (addresses invisible intent)
Each component gets a structured markdown file with YAML frontmatter that follows an invariant template. The template matters more than the content — the LLM learns the structure once and can reliably extract from any component file.
Diana Wolosin at Indeed is building what she calls a "semantic intelligence layer" — structured metadata with explicit, machine-readable rules that AI agents can interpret without guesswork. Instead of just "Button," the metadata provides variant: primary, state: hover, platform: iOS.
Proposed template for Baseline:
---
name: PatientIdentificationBanner
category: organism
surface: [instrumental, communicative]
emotional_mode: [anxiety-driven, latent-anxiety]
status: stable
tokens_used: [petrol, petrol-60, sand, spacing-md, rounded-xl]
related: [StatusBanner, PhaseIndicator]
---
## When to use
[One line: the decision point]
## When NOT to use
[Explicit anti-patterns]
## Anatomy
[Named parts: label, secondary-identifier, expandable-section]
## Surface behavior
- Communicative: [how it renders on warm surfaces]
- Instrumental: [how it renders on dense/clinical surfaces]
## Privacy behavior
- Patient view: [mask by default, tap-to-reveal national ID]
- Staff view: [name prominent, secondary identifier visible, full ID expandable]
## States
[default, loading, error, expanded, collapsed]
## Code example
[Minimal, copy-pasteable]
The frontmatter is the critical addition — parseable YAML that an LLM, MCP server, or search index can filter on without reading the prose. "Give me all components that work in anxiety-driven mode" becomes a metadata query, not full-text search.
Layer 3: Composition rules and decision trees (addresses "how to combine")
The hardest to formalize and most Baseline-specific. Covers questions LLMs can't answer from component specs alone: "I'm building a pre-visit checklist for an anxiety-driven patient on a communicative surface — what components do I compose, in what order, with what spacing?"
Proposed format — pattern recipes:
---
pattern: pre-visit-checklist
emotional_mode: anxiety-driven
surface: communicative
components_used: [PhaseIndicator, ChecklistItem, StatusBanner, ChatAccess]
---
## Composition order
1. PhaseIndicator (shows where patient is in journey — reduces uncertainty)
2. StatusBanner (if pending staff action — gives patient certainty)
3. ChecklistItem[] (patient-actionable tasks — ordered by urgency)
4. ChatAccess (always visible — safety net)
## Spacing
- Between PhaseIndicator and first content: spacing-lg
- Between checklist items: spacing-sm
- ChatAccess: sticky bottom, spacing-md padding
## Decision rules
- If all checklist items complete → show completion summary, not empty list
- If staff action pending → StatusBanner before checklist, not after
- Never show more than 5 uncompleted items without progressive disclosure
This encodes the design judgment that currently lives in one person's head.
The audit layer
Hardik Pandya's Atlaskit experiment produced a token audit script that scans CSS files, finds hardcoded values, and suggests the correct token. Returns exit code 1 on errors for CI integration. Their results: 418 raw values across 28 files replaced with 230+ named tokens. Zero hardcoded values remaining.
The audit script is the enforcement mechanism that keeps the structured data honest. Without it, the never field on a token is powerful but becomes a liability when rules change and nobody updates the JSON.
MCP — the logical endpoint
Figma's MCP server (beta, 2025) brings design context directly to the IDE — style and variable usage, variable code syntax, and Code Connect mappings. The more designs utilize the design system, the more useful the MCP server becomes. Figma also added automated design system rule generation: the MCP server can scan a codebase and output a structured rules file.
A Baseline MCP server would expose the structured data layers as queryable tools: "look up component by emotional mode," "get all tokens for communicative surface," "what's the composition pattern for this flow." But it requires the underlying data layers to exist first.
Implementation sequence: JSON token manifest → component spec files → pattern files → MCP server. Each layer feeds the next.
The practical data architecture
For a system serving humans on a web view, Claude Code via context files, and potentially a future MCP server:
Source of truth: Structured markdown files with YAML frontmatter, living in the Baseline repo alongside the code. These are what Claude Code reads. Also what the web view renders.
Derived artifact 1: baseline-tokens.json — auto-generated from CSS custom properties. The closed token enumeration. Claude Code checks this before using any value.
Derived artifact 2: baseline-manifest.json — indexes all component specs by frontmatter fields (category, surface, emotional_mode, status, tokens_used, related). What a search function, web view filter, or future MCP server queries against.
The web view becomes a rendering layer over the markdown source, not a separate content silo. Humans see the rich rendered version. Claude Code reads the markdown directly. Both consume the same underlying data.
Key people and projects in this space
| Person/Team | Organization | Work |
|---|---|---|
| Diana Wolosin | Indeed | Machine-readable design system metadata, MCP server architecture, evaluation frameworks |
| Hardik Pandya | Independent | "Expose Your Design System to LLMs" — spec files + token layer + audit approach |
| Nathan Curtis | EightShapes | Component documentation architecture, "Components as Data" |
| Pierre Bremell | Volvo Group | AI-ready design system architecture with MCP integration |
| Figma MCP team | Figma | MCP server bringing design context to IDE, automated rule generation |
| KickstartDS | Open source | Design System Component Builder MCP Server (JSON Schema-first development) |
| Oleksandra Huba | Independent | Figma-to-code workflow with MCP, structured layer naming |
Part 3: Gaps and Open Questions
What doesn't exist yet
Nobody has published a working example of a design system that explicitly encodes emotional context or patient mode as machine-readable metadata. Indeed's component metadata operates at variant/state/platform level, not emotional-mode/surface-temperature level. Baseline's distinctive features — the ones that make it more than a token file — are exactly the ones hardest to formalize for machine consumption.
Unresolved tensions
Specificity vs. maintenance. The more judgment-laden the metadata, the more useful it is for Claude Code, but the harder it is to keep accurate. A never field on a token is powerful but becomes a liability if the rule changes and nobody updates the JSON.
Progressive disclosure vs. machine parsing. Humans benefit from progressive disclosure, rich visual examples, and interactive playgrounds. LLMs benefit from structured, hierarchical text with clear naming and explicit rules. The resolution is probably the dual-layer approach: rich rendered view for humans, well-structured semantic source for AI, both from the same markdown files.
Component-level vs. composition-level documentation. Most design systems document components individually. The composition question — "how do I combine these for a specific scenario?" — is rarely formalized. This is where Baseline could be genuinely novel.
Audience weighting. Who is the web view primarily for?
- If for the incoming front-end developer → prioritize clear usage guidance, live examples, copy-paste code
- If for external credibility (recruiting, vendor conversations, licensing) → prioritize editorial quality, brand expression, completeness
- If for Claude Code → prioritize structured machine-readable content with explicit rules
These aren't mutually exclusive but the weighting changes build priority.
Connection to other Aleris workstreams
Patinfo CMS: The "text must never pass through an LLM — only metadata is AI-generated" pipeline rule has a parallel here: design judgment is human-authored; the structured representation of that judgment is what AI consumes.
Front-end developer test: "Can the design documents transfer capability to a developer?" has a parallel question: "Can the structured metadata transfer capability to Claude Code with the same fidelity?" If component spec files are good enough for an LLM to produce correct output, they're probably good enough for a human developer too. The two audiences validate each other.
MyAleris onboarding: The incoming developer's ability to produce correct output from Baseline's documents is the first real test of whether the documentation works. Their questions and failure points directly inform what the web view needs to prioritize.
Sources
Design system documentation practices
- Nathan Curtis, "Documenting Components" (EightShapes, 2018) — component page architecture
- Backlight.dev, "Design System Documentation Best Practices" — information architecture for docs
- Knapsack, "Getting Started with Design System Documentation" — IA iteration and proof of concept
- Design System Central, "Navigation Patterns to Steal from IBM's Carbon" — sidebar, TOC, homepage routing
- Storybook, "4 Ways to Document Your Design System" — Docs addon, live rendering, MDX
- redesigningdesign.systems/component-process/documentation — component page template with all sections
Established design systems
- nordhealth.design — Nord Design System (healthcare, small team, multi-brand, web components)
- carbondesignsystem.com — IBM Carbon (navigation patterns, comprehensive documentation)
- m3.material.io — Material Design 3 (token architecture, foundations organization)
- polaris.shopify.com — Shopify Polaris (content design, component grouping)
LLM-readable design systems
- Diana Wolosin, "AI Metadata: Powering a Design System MCP" (Medium/Design Systems Collective, August 2025) — metadata architecture for MCP servers
- Diana Wolosin, "Machine-Readable Design Systems for MCP and LLMs" (Into Design Systems conference, March 2026) — benchmarking MCP configurations
- Hardik Pandya, "Expose Your Design System to LLMs" (hvpandya.com, March 2026) — spec files + token layer + audit approach, Atlaskit case study
- Oleksandra Huba, "Dear LLM, here's how my design system works" (UX Collective, November 2025) — Figma MCP workflow, Code Connect, structured layer naming
- Pierre Bremell, "How to build an AI design system with MCP" (Medium/Bootcamp, August 2025) — Volvo Group, RAG preparation, metadata-rich content
- Figma Blog, "Design Systems And AI: Why MCP Servers Are The Unlock" (October 2025) — Figma MCP server, automated rule generation, Code Connect
MCP protocol
- modelcontextprotocol.io — official MCP specification and architecture
- KickstartDS Design System Component Builder MCP Server — JSON Schema-first component generation