Architecture Decision Records

ADR Index

#	Decision	Status
1	Microservices with household-scoped isolation	Accepted
2	Vertex AI + LangChain for AI agent	Accepted
3	Hyperbeam for cloud browser sessions	Accepted
4	Dual persistence backend (Firestore + in-memory)	Accepted
5	Voice architecture: Model B (speech-to-text → orchestrator → TTS)	Accepted
6	API Gateway as single ingress	Accepted

ADR-1: Microservices with Household-Scoped Isolation

Context: AnyaSelf manages family data (wardrobe, finances, personal style). We needed to decide between a monolith and microservices.

Decision: Decompose into 8 services, each scoped to householdId as the primary data partition key.

Rationale:

Each service has distinct scaling characteristics (VTO is GPU-bound, CartPrep needs headless browsers, Commerce is read-heavy)
Household scoping provides a natural isolation boundary — no data leaks between families
Independent deployment: a bug in VTO inference doesn't bring down the purchase flow
Each service can be developed and tested independently

Consequences:

More operational complexity (8 Docker images, service discovery)
Inter-service communication adds latency vs. in-process calls
Distributed transactions not supported — eventual consistency via audit trail

ADR-2: Vertex AI + LangChain for AI Agent

Context: The Orchestrator needs an LLM with tool-calling capabilities to coordinate tasks across services.

Decision: Use Google Cloud Vertex AI with the LangchainAgent reasoning engine from the Vertex AI SDK.

Rationale:

Vertex AI provides managed, scalable LLM hosting on Google Cloud (our primary cloud)
LangChain's Tool abstraction maps cleanly to our service API pattern
The reasoning engine handles turn-by-turn tool selection, execution, and response synthesis
Model-agnostic: can swap between Gemini Pro, Flash, and other models via config

Consequences:

Tight coupling to GCP ecosystem (acceptable since we're already on GCP)
ORCHESTRATOR_REQUIRE_VERTEX_AGENT=false allows full-stack local dev without GCP credentials (stub mode)
Agent tool implementation lives in the Orchestrator → each tool is an HTTP call to a downstream service

ADR-3: Hyperbeam for Cloud Browser Sessions

Context: The agent needs to browse external websites (brand stores) to find products, add to cart, and visually verify pages. Running Chromium inside the backend services creates security and scaling issues.

Decision: Use Hyperbeam's cloud browser API for ephemeral, embeddable Chromium instances.

Rationale:

No local Chromium = no sandbox escape risk on backend servers
Browser sessions are embeddable in the frontend via iframe
Both the agent and user can interact with the same browser (takeover pattern)
Session recordings are built-in for audit purposes
Per-session billing vs. always-on headless browser pools

Consequences:

External dependency on Hyperbeam availability
Chrome Extension required for DOM indexing (see hyperbeam-bridge docs)
Adds latency vs. local headless browser (offset by not needing GPU-capable browser servers)

ADR-4: Dual Persistence Backend

Context: Development velocity requires running the full stack locally without cloud credentials. Production requires durable storage.

Decision: Every service implements the Repository interface with two backends: firestore (production) and inmemory (development).

Rationale:

Developers can docker compose up and have a working system in minutes
No Firestore emulator setup required
Repository pattern keeps business logic clean of persistence concerns
Backend selection via single env var: PERSISTENCE_BACKEND=inmemory|firestore

Consequences:

Feature parity must be maintained across both backends
In-memory backend loses data on restart (by design for dev)
Repository interface may not expose all Firestore-specific features (e.g., transactions)

ADR-5: Voice Architecture — Model B

Context: AnyaSelf has a voice AI assistant (Aura). We needed to choose how speech interacts with the backend.

Decision: Model B — Client speech → Gemini Live STT → Text to Orchestrator /chat → Response text → Gemini Live TTS.

Rationale:

Reuses the existing Orchestrator mission loop (all tools, context, and policies apply to voice)
Voice and text share the same mission state — no parallel conversation tracks
Barge-in (user interrupts agent mid-speech) cleanly maps to "cancel TTS, send new chat turn"
Modality-agnostic: the Orchestrator doesn't know whether input came from voice or keyboard

Consequences:

Added round-trip latency (STT → HTTP → TTS) vs. real-time end-to-end voice
Client must manage TTS playback state and barge-in timing
Voice tools registry in voice-tools.ts maps tool calls to visual actions in the frontend

ADR-6: API Gateway as Single Ingress

Context: Client applications need to talk to 8 different services. Direct client-to-service communication is impractical for auth, CORS, and API versioning.

Decision: All requests flow through the API Gateway. The gateway handles JWT verification, request routing, and cross-cutting concerns.

Rationale:

Single CORS origin
Centralized auth: JWT verification happens once, principal is forwarded to downstream services
Purchase flow hardening (intent tokens, confirmation TTL) lives in one place
Voice WebSocket proxy centralizes GCP OAuth2 credential management

Consequences:

Gateway is a single point of failure (mitigated by Cloud Run auto-scaling and health checks)
Gateway routes file is the largest in the codebase (867 lines)
Adds one network hop to every request

Architecture Decision Records

ADR Index

ADR-1: Microservices with Household-Scoped Isolation

ADR-2: Vertex AI + LangChain for AI Agent

ADR-3: Hyperbeam for Cloud Browser Sessions

ADR-4: Dual Persistence Backend

ADR-5: Voice Architecture — Model B

ADR-6: API Gateway as Single Ingress

On this page