← Back to app

Story of a Lifetime

Behind the scenes — how it works and how it was built

Contents

  1. How this was built
  2. Agent pipeline
  3. Key design decisions
  4. The five agents
  5. World types
  6. Google Cloud deployment
  7. Git workflow
  8. Testing
  9. API endpoints
  10. Cost model
  11. Technology stack

How this was built

This application was built through an extended collaboration with Claude — first via Claude.ai on desktop, then through Claude Code CLI as the tooling matured. The project began with a question about whether the story generation idea was even a good fit for an agent demo, and grew into a live multi-agent system deployed on Google Cloud Run.

The honest split of roles:

Rajiv — product thinking, architecture decisions, UX design, testing, and judgment

Every meaningful decision in this project came from the human side. Whether this was a good agent use case. The three-agent separation of concerns. Using a dedicated API key per project. Keeping prompts outside code. The human-in-the-loop checkpoint. The world type concept and its four definitions. The original vs revised story framing. Pushing back when something felt wrong. Asking why. Knowing when to stop adding features. These are not small things — they are the project. The code is just the expression of those decisions.

The user experience was shaped entirely from the human side too — the three-column layout, hiding the facts textarea until the agent populates it, moving world type from an informational panel into an active input, removing the creativity slider when it became redundant with world type, the progressive reveal of agent controls after approval, collapsing setup cards once facts are generated to keep the interface clean, the dark About card as an elevator pitch, the punchy tagline. Every one of these was observed, questioned, and decided during the conversation — not generated.

Testing was also entirely human-driven. Every agent interaction was manually exercised across world types, contradiction levels, and story lengths. Edge cases — the Critic rejecting all three attempts, empty fact lists, very long generations — were caught and resolved through hands-on use. A unit test suite (40 tests, 7 classes) was added as the codebase matured, covering configuration, request models, result classes, prompt templates, and the zero-fact short-circuit. The decision to add tests, and the decision to refactor the flat package structure into layered sub-packages, both came from the human side — as did the direction to extract shared agent infrastructure into a base class once the duplication became visible.

Claude — code generation, implementation, and technical boilerplate

Claude wrote all the Java, HTML, CSS, and JavaScript. It set up the Spring Boot structure, wired the agents together, handled the OkHttp calls, built the Mermaid diagram, generated the Dockerfile and deployment scripts, and kept the growing codebase consistent across dozens of iterations. It also pushed back where it had a view — on prompt design, agent temperature choices, and what "genuinely agentic" means versus a dressed-up prompt chain.

Is this "vibe coding"?

Partially — but not entirely. Vibe coding in its purest form is "describe what you want, accept what you get, don't worry about understanding it." That's not what happened here. The architectural decisions were deliberate and reasoned. The human understood what each agent was doing and why. When something broke, the cause was diagnosed — not just retried. The better description is AI-assisted development with genuine human judgment driving the architecture. The speed at which a working multi-agent system went from idea to production is very much the vibe coding energy. The depth of the decisions behind it is not.

What this demonstrates about building with AI

The parts Claude could not contribute: knowing what problem to solve, deciding what made a demo compelling, choosing when the Critic loop was the right thing to show an audience, recognising that "append-only facts" needed rethinking when the Fact Generator arrived. The parts a human could not contribute at this speed: the Spring Boot wiring, the OkHttp timeout configuration, the Mermaid diagram syntax, the CSS grid layout, all of it produced in seconds. The collaboration worked because both sides stayed in their lane.

Agent pipeline

Five agents working in sequence, with a human checkpoint after fact generation and a Critic feedback loop before writing:

graph TD A([Beginning + Ending]) --> B[Fact Generator] B --> C([You review & approve facts]) C --> D[Planner] D --> E[Critic] E -->|rejected| D E -->|approved| F[Writer] F --> G([Original story]) G -.->|add more facts| D2[Planner] D2 --> E2[Critic] E2 -->|approved| F2[Writer] F2 --> H([Revised story]) H -.-> I[Explainer] style A fill:#fff,stroke:#ddd,color:#1a1a1a style B fill:#FAEEDA,stroke:#854F0B,color:#633806 style C fill:#eaf3de,stroke:#3b6d11,color:#27500A style D fill:#EEEDFE,stroke:#534AB7,color:#3C3489 style E fill:#FAECE7,stroke:#993C1D,color:#712B13 style F fill:#E1F5EE,stroke:#0F6E56,color:#085041 style G fill:#f1efe8,stroke:#888780,color:#444441 style D2 fill:#EEEDFE,stroke:#534AB7,color:#3C3489 style E2 fill:#FAECE7,stroke:#993C1D,color:#712B13 style F2 fill:#E1F5EE,stroke:#0F6E56,color:#085041 style H fill:#EEEDFE,stroke:#534AB7,color:#3C3489 style I fill:#E6F1FB,stroke:#185FA5,color:#0C447C

Human-in-the-loop: The pipeline pauses after the Fact Generator. The human can edit, reorder, add, or regenerate facts freely. Only after explicit approval does Planner → Critic → Writer proceed. This is a deliberate architectural choice, not a UX afterthought.

Key design decisions

Human-in-the-loop is a first-class agent concept

The Fact Generator produces facts, but the human approves them before the pipeline proceeds. This is not just a UX choice — it demonstrates one of the most important agent patterns: knowing when to pause for human judgment rather than running fully autonomously.

World type flows through both Fact Generator and Writer

A single control shapes both what facts are invented and how the story is told. This keeps the two agents coherent — an outlandish fact set paired with plain literary prose would feel wrong, and vice versa.

Facts are append-only after approval

Once approved, facts become the baseline and cannot be removed — only added to. New facts must be reconciled with all previous ones. This mirrors how real environments work and is what makes the original vs revised comparison meaningful.

The Critic loop is what makes it agentic

A pipeline executes in sequence. An agent evaluates and decides its next step. The Critic's reject-and-replan cycle — with the rejection reason fed back into the Planner — is what crosses that line.

Agents have single responsibilities

Fact Generator never plans. Planner never writes prose. Writer never judges feasibility. Critic never generates content. Clean separation makes each agent inspectable, testable, and independently tunable.

Config and prompts live outside code

Model, temperature, and all prompt wording are in files — not hardcoded. Experiment freely without recompiling.

Creativity is derived, not user-controlled

Rather than exposing a separate creativity slider, creativity is derived automatically from the world type the user already picked. Fewer controls, more coherent output — grounded → low creativity, outlandish → high creativity.

Layered packages over a flat structure

The codebase started with all classes in a single flat package — acceptable early on, but increasingly hard to navigate as the agent count grew. Rajiv directed a structural refactoring into five sub-packages: config, model, agent, result, and web. At the same time, duplicated infrastructure across all five agents (OkHttp client setup, prompt loading, Claude API call, cost calculation) was extracted into a shared BaseAgent class, and the four common fields on every result class were pulled into a shared AgentResult base. Zero behaviour change — purely structural, with all 40 unit tests passing before and after.

The five agents

Fact Generator

Invents life facts shaped by contradiction level, world type, and a requested fact count. Creativity is derived automatically from world type. Output is reviewed and optionally edited by a human before the pipeline proceeds.

claude-sonnet-4-6 · temp 0.9 · prompt: factgenerator_prompt.txt

Planner

Reads the world model (start, end, facts) and produces a structured 5–7 milestone outline. On retry attempts, the Critic's rejection reason is injected so the Planner knows exactly what to fix.

claude-sonnet-4-6 · temp 0.3 · prompt: planner_prompt.txt

Critic

Evaluates the Planner's outline for timeline plausibility, fact consistency, and logical gaps. Returns APPROVED or REJECTED with a specific reason. Max 3 attempts before falling back to the last outline.

claude-sonnet-4-6 · temp 0.1 · prompt: critic_prompt.txt

Writer

Takes the approved outline and generates full prose. Writing style is shaped by world type — plain literary fiction for grounded, epic narrative for outlandish. Used for both the original and revised story.

claude-opus-4-6 · temp 0.8 · prompt: writer_prompt.txt

Explainer

Runs automatically after the revised story is generated. Compares both stories side by side, identifies what changed, and explains exactly which new facts caused each divergence. Cost and time tracked independently from the main pipeline.

claude-sonnet-4-6 · temp 0.3 · prompt: explainer_prompt.txt

World types

A single control that flows through both the Fact Generator and the Writer, keeping the invented facts and the prose style coherent with each other:

Grounded

Real world, real rules. No heightened elements.

Schindler's List, The Pursuit of Happyness

Realistic

Heightened but believable. Extraordinary within reality.

Forrest Gump, Slumdog Millionaire

Fantastical

One impossible element in an otherwise real world.

Harry Potter, Inception

Outlandish

A completely different universe with its own rules.

Game of Thrones, Star Trek, Dune

Creativity level (low / medium / high) is derived automatically from the world type — it is passed to the Fact Generator but is not a user-facing control.

Google Cloud deployment

The application runs on Google Cloud Run — containerised, serverless, and auto-scaling to zero when idle. A single deploy script handles everything from build to live.

ComponentDetails
PlatformGoogle Cloud Run — us-central1 region
ContainerDocker, eclipse-temurin:17-jre-focal base image
RegistryGoogle Container Registry (gcr.io)
API keyGoogle Secret Manager — secret named ANTHROPIC_API_KEY, injected as STORY_OF_LIFETIME_ANTHROPIC_API_KEY
Deploy./deploy.sh — one-command build and deploy
Read timeout120 seconds — accommodates Claude's longest generation times

Cloud Run scales to zero when idle and spins up within seconds on first request. The 120-second OkHttp read timeout is deliberate — Claude can take 60–90 seconds for long story generations and Cloud Run must not recycle the container mid-request.

One-time IAM setup grants the default compute service account access to the Secret Manager secret. After that, every ./deploy.sh run rebuilds the image, pushes it, and deploys the new revision in a single command.

Git workflow

Every change follows a feature branch → PR → review → merge cycle. No commits go directly to main.

StepWhoAction
1. BranchClaudeCreates a feature/<name> branch for the task
2. ImplementClaudeMakes all changes on the branch, commits with co-authorship line
3. PRClaudeOpens a pull request via gh pr create with summary and test plan
4. ReviewRajivChecks out the branch, runs the app or tests, and reviews the diff
5. MergeClaudeMerges once approved via gh pr merge
6. CleanupClaudeChecks out main, pulls, deletes the local branch

GitHub is configured to auto-delete the remote branch on merge. Commits include a Co-Authored-By: Claude Sonnet 4.6 trailer to attribute AI contribution in the git log. The gh CLI is authenticated and used for all GitHub operations — PR creation, merge, and status checks.

Testing

The project has two complementary layers of testing — automated unit tests covering the Java layer, and manual end-to-end testing through the live UI.

Unit tests — 40 tests across 7 classes

JUnit 5 via spring-boot-starter-test. All tests run without an API key — they test only local logic, not live Claude calls. The factCount=0 short-circuit in FactGeneratorAgent makes it possible to exercise the agent class itself without any network call.

Coverage: AppConfig defaults and story length parsing · WorldModel constructor and getters · CriticResult APPROVED/REJECTED decision logic · FactGenerateRequest field defaults and setters · all five agent result classes · every prompt template file (exists + all required placeholders present) · FactGeneratorAgent zero-count short-circuit.

Manual end-to-end testing

Every agent interaction is exercised against the live Claude API across world types, contradiction levels, and story lengths. Edge cases — the Critic rejecting all three attempts, empty fact lists, very long generations, the 0-fact path — are caught through hands-on use. The feedback loop between running the app and directing the next change is where most of the product quality came from.

Run unit tests locally with mvn test. No API key required.

API endpoints

MethodPathWhat it does
POST/api/generate-factsRuns FactGeneratorAgent. Returns fact list for human review.
POST/api/generateRuns Planner → Critic loop → Writer. Returns outline, story, critic decisions, per-agent cost.
POST/api/explainRuns Explainer on original vs revised. Returns diff analysis and cost.
GET/Single-page application (index.html)
GET/architecture.htmlThis document

Cost model

Token usage and cost are tracked independently per agent on every request. Pricing as of 2025:

ModelInput per 1M tokensOutput per 1M tokens
claude-opus-4-6$15.00$75.00
claude-sonnet-4-6$3.00$15.00
claude-haiku-4-5-20251001$0.25$1.25

A typical medium-length generation (FactGen + Planner + Critic + Writer) costs roughly $0.05–$0.10 with default settings. Running the revised story + Explainer adds approximately $0.01–$0.03.

Technology stack

LayerTechnology
BackendJava 17, Spring Boot 3.2, Maven
FrontendSingle-page HTML/CSS/JS — no framework, no build step
LLM APIAnthropic Claude — configurable model per agent
HTTP clientOkHttp 4.12 (30s connect/write, 120s read timeout)
JSONJackson Databind
HostingGoogle Cloud Run — containerised, serverless, auto-scaling
SecretsGoogle Secret Manager — API key injected as env variable
ContainerDocker, eclipse-temurin:17-jre-focal base image