Anthropic developed a multi-agent system that lets Claude build complete full-stack applications over multi-hour sessions without human intervention. The architecture separates three distinct roles: a planner converts simple prompts into detailed specifications, a generator implements features in sprints using React, Vite, FastAPI, and SQLite, and an evaluator tests the running application with Playwright and provides feedback against defined criteria. The critical insight came from frontend design work: when agents evaluate their own output, they consistently overrate mediocre work. A separate evaluator agent tuned for skepticism delivers actionable feedback instead. The system runs continuous iteration loops where the generator refines or pivots entirely based on evaluator feedback, sometimes producing unexpected creative solutions like reimagining a museum website as navigable 3D space in CSS.