AnyaSelf Docs

Architecture Decision Records

Key architectural decisions and the reasoning behind them.

ADR Index

#DecisionStatus
1Microservices with household-scoped isolationAccepted
2Vertex AI + LangChain for AI agentAccepted
3Hyperbeam for cloud browser sessionsAccepted
4Dual persistence backend (Firestore + in-memory)Accepted
5Voice architecture: Model B (speech-to-text → orchestrator → TTS)Accepted
6API Gateway as single ingressAccepted

ADR-1: Microservices with Household-Scoped Isolation

Context: AnyaSelf manages family data (wardrobe, finances, personal style). We needed to decide between a monolith and microservices.

Decision: Decompose into 8 services, each scoped to householdId as the primary data partition key.

Rationale:

  • Each service has distinct scaling characteristics (VTO is GPU-bound, CartPrep needs headless browsers, Commerce is read-heavy)
  • Household scoping provides a natural isolation boundary — no data leaks between families
  • Independent deployment: a bug in VTO inference doesn't bring down the purchase flow
  • Each service can be developed and tested independently

Consequences:

  • More operational complexity (8 Docker images, service discovery)
  • Inter-service communication adds latency vs. in-process calls
  • Distributed transactions not supported — eventual consistency via audit trail

ADR-2: Vertex AI + LangChain for AI Agent

Context: The Orchestrator needs an LLM with tool-calling capabilities to coordinate tasks across services.

Decision: Use Google Cloud Vertex AI with the LangchainAgent reasoning engine from the Vertex AI SDK.

Rationale:

  • Vertex AI provides managed, scalable LLM hosting on Google Cloud (our primary cloud)
  • LangChain's Tool abstraction maps cleanly to our service API pattern
  • The reasoning engine handles turn-by-turn tool selection, execution, and response synthesis
  • Model-agnostic: can swap between Gemini Pro, Flash, and other models via config

Consequences:

  • Tight coupling to GCP ecosystem (acceptable since we're already on GCP)
  • ORCHESTRATOR_REQUIRE_VERTEX_AGENT=false allows full-stack local dev without GCP credentials (stub mode)
  • Agent tool implementation lives in the Orchestrator → each tool is an HTTP call to a downstream service

ADR-3: Hyperbeam for Cloud Browser Sessions

Context: The agent needs to browse external websites (brand stores) to find products, add to cart, and visually verify pages. Running Chromium inside the backend services creates security and scaling issues.

Decision: Use Hyperbeam's cloud browser API for ephemeral, embeddable Chromium instances.

Rationale:

  • No local Chromium = no sandbox escape risk on backend servers
  • Browser sessions are embeddable in the frontend via iframe
  • Both the agent and user can interact with the same browser (takeover pattern)
  • Session recordings are built-in for audit purposes
  • Per-session billing vs. always-on headless browser pools

Consequences:

  • External dependency on Hyperbeam availability
  • Chrome Extension required for DOM indexing (see hyperbeam-bridge docs)
  • Adds latency vs. local headless browser (offset by not needing GPU-capable browser servers)

ADR-4: Dual Persistence Backend

Context: Development velocity requires running the full stack locally without cloud credentials. Production requires durable storage.

Decision: Every service implements the Repository interface with two backends: firestore (production) and inmemory (development).

Rationale:

  • Developers can docker compose up and have a working system in minutes
  • No Firestore emulator setup required
  • Repository pattern keeps business logic clean of persistence concerns
  • Backend selection via single env var: PERSISTENCE_BACKEND=inmemory|firestore

Consequences:

  • Feature parity must be maintained across both backends
  • In-memory backend loses data on restart (by design for dev)
  • Repository interface may not expose all Firestore-specific features (e.g., transactions)

ADR-5: Voice Architecture — Model B

Context: AnyaSelf has a voice AI assistant (Aura). We needed to choose how speech interacts with the backend.

Decision: Model B — Client speech → Gemini Live STT → Text to Orchestrator /chat → Response text → Gemini Live TTS.

Rationale:

  • Reuses the existing Orchestrator mission loop (all tools, context, and policies apply to voice)
  • Voice and text share the same mission state — no parallel conversation tracks
  • Barge-in (user interrupts agent mid-speech) cleanly maps to "cancel TTS, send new chat turn"
  • Modality-agnostic: the Orchestrator doesn't know whether input came from voice or keyboard

Consequences:

  • Added round-trip latency (STT → HTTP → TTS) vs. real-time end-to-end voice
  • Client must manage TTS playback state and barge-in timing
  • Voice tools registry in voice-tools.ts maps tool calls to visual actions in the frontend

ADR-6: API Gateway as Single Ingress

Context: Client applications need to talk to 8 different services. Direct client-to-service communication is impractical for auth, CORS, and API versioning.

Decision: All requests flow through the API Gateway. The gateway handles JWT verification, request routing, and cross-cutting concerns.

Rationale:

  • Single CORS origin
  • Centralized auth: JWT verification happens once, principal is forwarded to downstream services
  • Purchase flow hardening (intent tokens, confirmation TTL) lives in one place
  • Voice WebSocket proxy centralizes GCP OAuth2 credential management

Consequences:

  • Gateway is a single point of failure (mitigated by Cloud Run auto-scaling and health checks)
  • Gateway routes file is the largest in the codebase (867 lines)
  • Adds one network hop to every request

On this page