Laptop showing the Generative UI playground with Open Brain artifact
Design·23 May 2026

I built a generative UI for my knowledge base. Then I found Google had named the category.

I spent six weeks building a generative UI for my personal knowledge base of typed components, agent composition, write-back actions, dynamic layouts.

Then I discovered Google had already formalised the category as A2UI: an open protocol for agents to send declarative UI descriptions that clients render natively. I hadn't built the protocol. I'd independently stumbled into the same pattern space.

Six weeks of dogfooding taught me why the category matters.

What I was actually trying to build

I have a personal knowledge base, thousands of markdown notes in Obsidian, indexed into Supabase with embeddings, exposed via MCP, queryable from Claude, ChatGPT, and Cursor. It works.

What didn't work was the interface. When I asked "what am I working on this week?", I got a wall of text. When I asked "describe Open Brain to me", I got a wall of text. When I asked "what's in my inbox?", I got a wall of text.

Every query rendered the same way regardless of the data shape underneath. That felt wrong, and the reason runs deeper than aesthetics.

Traditional software assumes the interface is known ahead of time. A dashboard is designed for a known query, a known shape, a known set of columns. But agentic systems produce outputs whose structure changes query to query. A standup has time, status, blockers. An entity has relations, a timeline, recent docs. An inbox triage has items with destinations. Fixed dashboards become too rigid. Chat collapses every response into the same linear format. Generative UI sits between those extremes.

So I built a playground. Small Next.js app. Eight typed components EntityDossier, RelationGraph, ContextualStandup, ContradictionPair, Timeline, SignalQueue, InboxTriage, DecisionCard. Two agents. A Planner that reads the query, picks MCP tools, fetches data. A Composer that reads the data shape and picks the right component. TypeScript, Vercel, streaming over SSE.

I didn't realise I was building an instance of a real category. I just knew chat wasn't enough and dashboards were too rigid.

Session 1: nothing worked

I ran the first dogfood query "what am I working on this week?" and the system rendered nothing. Rate limit hit before the Composer could fire. The context normalisation layer was making prompts 92% larger instead of smaller. Latency over a minute. Three CSS bugs in the render.

Sessions 1, 2, and 3 were infrastructure work. By session 4 both agents ran live, context size had dropped by 50%, and the system produced recognisable views.

That's when the real problem appeared.

Session 5: the deeper problem

I ran the standup query for the fifth time. The system produced what it had produced every other time a ContextualStandup component with three columns. Thinking, Doing, Next. Signals in Thinking, issues in Doing, more issues in Next.

It was clean. It was correct. It was completely wrong.

What I wrote in my dogfood notes that night:

This is mainly just pulling through my Linear issues. It's not really pulling out the connections within the brain, the semantic layer of knowledge gaps and understanding. I'm also not getting the just-in-time, non-deterministic UI approach that I was hoping for.

The Composer's decision was always the same because the Planner's classification was always the same. intent_type: standup → emit ContextualStandup. Every time. The brain's semantic layer is the entity graph, the contradiction layer, the relationship network. It was never being touched.

I'd built a sophisticated dashboard that thought it was generative.

The lesson worth pushing hardest on: encode the rules tightly enough to prevent agent drift, and you've eliminated the reason to use an agent. Loosen them, and the output becomes inconsistent. The fix isn't to find the right balance, it's to make the rule a prior and the override a first-class concept.

What I changed

Three changes, two prompt rewrites and one schema addition.

Two-pass fetch. The Planner now runs a surface-layer fetch (signals, recent docs, in-progress issues) and then a semantic-layer fetch (entity relations for the dominant entity, contradictions for unresolved decisions, search for thematic threads). The Planner now has to inspect the surface-layer results and decide whether the graph explains them better.

dominant_pattern + relationships[]. The Planner writes two new fields. dominant_pattern is an editorial read not "standup query" but "the xyz project connects all active threads" or "two contradictory unresolved decisions, new evidence this week." relationships[] is two to four high-signal connections worth knowing about.

Three-step composition. The Composer's job changed from "look up the component for this intent_type" to "read the dominant_pattern, read the relationships, then choose." intent_type became a prior, not a rule. A standup where one entity dominates leads with EntityDossier. A standup with active contradictions leads with ContradictionPair. Same query type, different structures.

Session 7: the artefact

I ran "Describe Open Brain to me" and got an entity query, first under the new prompts.

The system produced three components in a split layout. EntityDossier on the left with summary, recent docs, relations. RelationGraph on the upper right with the network. Timeline on the lower right plotting the project's two-week evolution. Below the docs, two action buttons: CAPTURE THOUGHT and PROMOTE TO DECISION write-back into the brain itself.

No template specified that layout. No developer told the system entity queries get three components. The Composer read the entity shape, found docs plus relations plus a temporal arc, chose split, chose three components, ordered them editorially.

That was the session where the system finally did what I'd designed it to do.

What this is and isn't

What I built was not an A2UI implementation in the protocol sense. It was a small, opinionated generative UI system that converged on many of the same ideas independently.

A few weeks after session 7 I went looking to see if anyone else had named the pattern. That's when I found Google's A2UI (a2ui.org). It's a protocol with declarative component descriptions, no executable code, framework-agnostic renderers, secure by design. It defines surfaces, components, data binding, an adjacency list layout model, custom catalogs. In other words: agents describe what interface should appear, and the client decides how to render it safely. The Landscape Architect demo on their site shows the same pattern I was reaching toward.

A2UI is also distinct from AG-UI, an open protocol from CopilotKit (Agent–User Interaction) a bi-directional event stream over HTTP for agent-to-frontend communication. The two are complementary; AG-UI even supports the A2UI spec.

If you're starting from scratch today, build on A2UI. Don't roll your own. What's still worth sharing from my version is the practitioner experience, six weeks of dogfooding the pattern in production-adjacent conditions, on real personal data, every session documented.

What the six weeks taught me

The semantic layer is the product. Most of session 5's frustration was that I'd built infrastructure to query the brain's surface, where recent docs, active issues, active signals live. But the Planner had no reason to reach for the graph. The graph is what makes the brain a brain. If the agent doesn't touch it, you have a feed reader.

A read-only interface ends the loop.

A read-write interface continues it.

The first six sessions produced beautiful read-only renders. Session 7 was the first with write-back. The difference isn't a feature, it's a category shift. Daily-habit interfaces only happen on the second kind.

Latency is the silent killer. My system takes ~60 seconds end-to-end. That's not a daily-habit interface. Background processing helps, when you submit, switch apps, the result is waiting. But the target has to be under ten seconds. That's an infrastructure problem, not a product problem, and it's the only thing standing between my version being a demo and being a tool.

The harder question

If Agent to user interface is the future of digital interfaces and Google formalising the protocol strongly suggests it is, then what changes for the people who design interfaces?

The answer is specific: designers stop designing screens and start designing component palettes and editorial rules.

You stop drawing the dashboard. You define the eight or twelve typed components the agent is allowed to choose from. You write the rules for when each one is appropriate. You shape the editorial voice. You decide what the agent is allowed to override. You become the design system author of a system that designs its own views.

That's the real shift A2UI points toward. Not AI replacing interface design, but interface design moving up a level of abstraction.

Personal knowledge bases happen to be a useful pressure test because the information shape changes constantly. But the same pattern applies anywhere agents interact with complex, shifting context.

Happy to compare notes with anyone working in this space, particularly anyone implementing A2UI for real.

Richard Simms — Principal Product Designer working on agentic UX. More on the craft: rsimms.com · The system itself: sentiuma.com

FAQ

What is A2UI?

A2UI is an open protocol from Google for generative user interfaces. Agents send declarative component descriptions, and the client decides how to render them safely.

How is A2UI different from AG-UI?

A2UI describes what the interface should be. AG-UI describes how the agent and frontend communicate. A2UI is a generative UI protocol; AG-UI is a bi-directional event stream for agent-to-frontend traffic. They're complementary and AG-UI even supports the A2UI spec.

What components did you build for the generative UI playground?

Eight typed components: EntityDossier, RelationGraph, ContextualStandup, ContradictionPair, Timeline, SignalQueue, InboxTriage, and DecisionCard. The Composer agent picks from these based on the shape of the data, not the shape of the query.

Why can't you just use chat for personal knowledge queries?

Chat collapses every response into linear text, regardless of what the underlying data looks like. A standup has columns. An entity has a network. An inbox has actions. Forcing all three into prose loses the structure that makes the answer useful.

What's the biggest lesson from six weeks of dogfooding this?

Stability and variance are in direct tension, and the override path matters more than the rules. Encode the rules tightly and the agent stops being generative. Loosen them and the output drifts. The fix isn't to find the right balance, it's to make the rule a prior and the override a first-class concept.

Back to Stories
Case StudiesStoriesAboutWhy me
ContactLinkedInInstagram

© 2026 Richard Simms. All rights reserved.