Architecture Decision Records
Key architectural decisions and the reasoning behind them.
ADR Index
| # | Decision | Status |
|---|---|---|
| 1 | Microservices with household-scoped isolation | Accepted |
| 2 | Vertex AI + LangChain for AI agent | Accepted |
| 3 | Hyperbeam for cloud browser sessions | Accepted |
| 4 | Dual persistence backend (Firestore + in-memory) | Accepted |
| 5 | Voice architecture: Model B (speech-to-text → orchestrator → TTS) | Accepted |
| 6 | API Gateway as single ingress | Accepted |
ADR-1: Microservices with Household-Scoped Isolation
Context: AnyaSelf manages family data (wardrobe, finances, personal style). We needed to decide between a monolith and microservices.
Decision: Decompose into 8 services, each scoped to householdId as the primary data partition key.
Rationale:
- Each service has distinct scaling characteristics (VTO is GPU-bound, CartPrep needs headless browsers, Commerce is read-heavy)
- Household scoping provides a natural isolation boundary — no data leaks between families
- Independent deployment: a bug in VTO inference doesn't bring down the purchase flow
- Each service can be developed and tested independently
Consequences:
- More operational complexity (8 Docker images, service discovery)
- Inter-service communication adds latency vs. in-process calls
- Distributed transactions not supported — eventual consistency via audit trail
ADR-2: Vertex AI + LangChain for AI Agent
Context: The Orchestrator needs an LLM with tool-calling capabilities to coordinate tasks across services.
Decision: Use Google Cloud Vertex AI with the LangchainAgent reasoning engine from the Vertex AI SDK.
Rationale:
- Vertex AI provides managed, scalable LLM hosting on Google Cloud (our primary cloud)
- LangChain's
Toolabstraction maps cleanly to our service API pattern - The reasoning engine handles turn-by-turn tool selection, execution, and response synthesis
- Model-agnostic: can swap between Gemini Pro, Flash, and other models via config
Consequences:
- Tight coupling to GCP ecosystem (acceptable since we're already on GCP)
ORCHESTRATOR_REQUIRE_VERTEX_AGENT=falseallows full-stack local dev without GCP credentials (stub mode)- Agent tool implementation lives in the Orchestrator → each tool is an HTTP call to a downstream service
ADR-3: Hyperbeam for Cloud Browser Sessions
Context: The agent needs to browse external websites (brand stores) to find products, add to cart, and visually verify pages. Running Chromium inside the backend services creates security and scaling issues.
Decision: Use Hyperbeam's cloud browser API for ephemeral, embeddable Chromium instances.
Rationale:
- No local Chromium = no sandbox escape risk on backend servers
- Browser sessions are embeddable in the frontend via iframe
- Both the agent and user can interact with the same browser (takeover pattern)
- Session recordings are built-in for audit purposes
- Per-session billing vs. always-on headless browser pools
Consequences:
- External dependency on Hyperbeam availability
- Chrome Extension required for DOM indexing (see hyperbeam-bridge docs)
- Adds latency vs. local headless browser (offset by not needing GPU-capable browser servers)
ADR-4: Dual Persistence Backend
Context: Development velocity requires running the full stack locally without cloud credentials. Production requires durable storage.
Decision: Every service implements the Repository interface with two backends: firestore (production) and inmemory (development).
Rationale:
- Developers can
docker compose upand have a working system in minutes - No Firestore emulator setup required
- Repository pattern keeps business logic clean of persistence concerns
- Backend selection via single env var:
PERSISTENCE_BACKEND=inmemory|firestore
Consequences:
- Feature parity must be maintained across both backends
- In-memory backend loses data on restart (by design for dev)
- Repository interface may not expose all Firestore-specific features (e.g., transactions)
ADR-5: Voice Architecture — Model B
Context: AnyaSelf has a voice AI assistant (Aura). We needed to choose how speech interacts with the backend.
Decision: Model B — Client speech → Gemini Live STT → Text to Orchestrator /chat → Response text → Gemini Live TTS.
Rationale:
- Reuses the existing Orchestrator mission loop (all tools, context, and policies apply to voice)
- Voice and text share the same mission state — no parallel conversation tracks
- Barge-in (user interrupts agent mid-speech) cleanly maps to "cancel TTS, send new chat turn"
- Modality-agnostic: the Orchestrator doesn't know whether input came from voice or keyboard
Consequences:
- Added round-trip latency (STT → HTTP → TTS) vs. real-time end-to-end voice
- Client must manage TTS playback state and barge-in timing
- Voice tools registry in
voice-tools.tsmaps tool calls to visual actions in the frontend
ADR-6: API Gateway as Single Ingress
Context: Client applications need to talk to 8 different services. Direct client-to-service communication is impractical for auth, CORS, and API versioning.
Decision: All requests flow through the API Gateway. The gateway handles JWT verification, request routing, and cross-cutting concerns.
Rationale:
- Single CORS origin
- Centralized auth: JWT verification happens once, principal is forwarded to downstream services
- Purchase flow hardening (intent tokens, confirmation TTL) lives in one place
- Voice WebSocket proxy centralizes GCP OAuth2 credential management
Consequences:
- Gateway is a single point of failure (mitigated by Cloud Run auto-scaling and health checks)
- Gateway routes file is the largest in the codebase (867 lines)
- Adds one network hop to every request