Deployment & Production

Running AnyaSelf locally via docker compose is trivial, but deploying the stack into production requires hardening every simulated and stub layer into real infrastructure.

Pre-Deployment Checklist

Before deploying, ensure every item is addressed:

Item	What to Do	Severity
`AUTH_JWT_SECRET`	Set a strong, unique secret (min 32 chars). The default `dev-secret-change-me` is rejected at startup in production.	🔴 Critical
`ORCHESTRATOR_REQUIRE_VERTEX_AGENT`	Set to `true`. Otherwise the agent silently returns stub responses.	🔴 Critical
`REQUIRE_INTERNAL_EVENT_TOKEN`	Set to `true`. Protects internal Orchestrator endpoints.	🔴 Critical
`REQUIRE_BUYFLOW_INTERNAL_TOKEN`	Set to `true`. Protects `cart-ready` and `failed` buy-flow endpoints.	🔴 Critical
`PERSISTENCE_BACKEND`	Set to `firestore` for all services.	🔴 Critical
`ORCHESTRATOR_INTERNAL_TOKEN`	Set a unique, strong token. Must match across orchestrator, hyperbeam-bridge, and artifacts-audit.	🔴 Critical
`BUYFLOW_INTERNAL_TOKEN`	Set a unique, strong token for the API Gateway.	🔴 Critical
`FIRESTORE_PROJECT_ID`	Set to your GCP project ID.	🟡 Required
`WARDROBE_STORAGE_BACKEND`	Set to `gcs` with a valid `GCS_BUCKET`.	🟡 Required
`HYPERBEAM_ENFORCE_EVENT_SIGNATURES`	Set to `true` with a real `HYPERBEAM_EVENT_SIGNING_SECRET`.	🟡 Required
`AUTH_EXTERNAL_LOGIN_ENABLED`	Set to `true` with valid JWKS URL, issuer, and audience.	🟡 Required

1. Google Cloud Configuration

AnyaSelf is built for Google Cloud. Production requires:

GCP Project with Vertex AI API enabled
Service Account with:
- aiplatform.user — for Orchestrator Vertex AI Agent
- storage.objectAdmin — for GCS image uploads (Wardrobe, VTO, Artifacts)
- firestore.user — for data persistence
Workload Identity binding for Kubernetes/Cloud Run pods

# Verify Vertex AI is enabled
gcloud services list --enabled --filter="aiplatform.googleapis.com"

# Bind workload identity (Cloud Run example)
gcloud run services update orchestrator \
  --service-account=anyaself-backend@PROJECT_ID.iam.gserviceaccount.com

2. Shared Secrets Management

AnyaSelf uses shared secrets for internal trust:

Secret	Shared Across	Purpose
`AUTH_JWT_SECRET`	All services	JWT signing/verification
`ORCHESTRATOR_INTERNAL_TOKEN`	orchestrator, hyperbeam-bridge, artifacts-audit	M2M bridge events
`BUYFLOW_INTERNAL_TOKEN`	api-gateway	Purchase flow transitions
`HYPERBEAM_EVENT_SIGNING_SECRET`	hyperbeam-bridge	HMAC event verification

[!WARNING] Use a secrets manager (GCP Secret Manager, HashiCorp Vault) — never commit secrets to source control. Cloud Run supports mounting secrets as environment variables from Secret Manager.

# Example: Create secret in GCP Secret Manager
gcloud secrets create auth-jwt-secret --replication-policy=automatic
echo -n "your-strong-secret-here" | gcloud secrets versions add auth-jwt-secret --data-file=-

# Reference in Cloud Run deployment
gcloud run services update api-gateway \
  --set-secrets="AUTH_JWT_SECRET=auth-jwt-secret:latest"

3. Database Setup

All services use Firestore in production (PERSISTENCE_BACKEND=firestore).

Required:

Firestore database in Native mode (not Datastore mode)
Set FIRESTORE_PROJECT_ID or GOOGLE_CLOUD_PROJECT
Service account with Firestore read/write permissions

Firestore Collections (auto-created by services):

households, household_members, purchase_requests (API Gateway)
missions (Orchestrator)
wardrobe_items, outfits, feed_collections (Wardrobe)
offers (Commerce)
vto_jobs (VTO)
cartprep_jobs (CartPrep)
hyperbeam_sessions (Hyperbeam Bridge)
artifacts, audit_events (Artifacts & Audit)

[!NOTE] The FIRESTORE_DEV_FALLBACK_TO_INMEMORY=true default means services will silently fall back to in-memory if Firestore is unreachable in dev mode. In production, set APP_ENV=prod to disable this behavior.

4. Scaling the GPU Workers

The vto service runs diffusion model inference and requires GPU nodes when using VTO_INFERENCE_BACKEND=inline or remote.

Cloud Run GPU Configuration

gcloud run services update vto \
  --gpu=1 \
  --gpu-type=nvidia-l4 \
  --memory=16Gi \
  --cpu=4 \
  --max-instances=5

Scaling Recommendations

Service	CPU	Memory	GPU	Min Instances	Max Instances
api-gateway	1	512Mi	—	1	10
orchestrator	2	1Gi	—	1	5
wardrobe	1	512Mi	—	1	5
commerce	1	512Mi	—	1	3
vto	4	16Gi	L4/T4	0	5
headless-cartprep	2	1Gi	—	0	3
hyperbeam-bridge	1	512Mi	—	1	5
artifacts-audit	1	512Mi	—	1	3

[!NOTE] Set VTO min-instances=0 with VTO_INFERENCE_BACKEND=remote to avoid paying for idle GPU time. Use min-instances=1 only if you need warm start latency.

5. Cloud Run Deployment

Use the included deployment scripts:

# Deploy all backend services
bash scripts/deploy_backend_cloudrun.sh

# Deploy the frontend
bash scripts/deploy_frontend_cloudrun.sh

Manual single-service deployment:

# Build and push Docker image
docker build -t gcr.io/PROJECT_ID/anyaself-orchestrator:latest \
  -f services/orchestrator/Dockerfile services/orchestrator

docker push gcr.io/PROJECT_ID/anyaself-orchestrator:latest

# Deploy to Cloud Run
gcloud run deploy orchestrator \
  --image=gcr.io/PROJECT_ID/anyaself-orchestrator:latest \
  --region=us-central1 \
  --port=8003 \
  --allow-unauthenticated

6. Observability

Structured Logging

Every service emits structured JSON logs via Python's logging module. Each log entry includes:

{
  "event": "http_request",
  "service": "anyaself-api-gateway",
  "requestId": "req_abc123",
  "method": "POST",
  "path": "/api/v1/households/h1/orchestrator/missions/chat",
  "statusCode": 200,
  "latencyMs": 245.5
}

X-Request-Id headers are propagated across all inter-service calls for distributed tracing.

Health Checks

Every service exposes GET /health:

{ "status": "ok", "service": "anyaself-orchestrator" }

Configure Cloud Run health checks to hit this endpoint.

Recommended Alerts

Alert	Condition	Severity
Service down	Health check fails 3 consecutive times	Critical
High latency	p95 latency > 5s on `/chat` endpoint	Warning
Error rate	5xx rate > 5% in 5-minute window	Critical
VTO queue depth	Queued jobs > 50	Warning
Purchase failure	`FAILED` purchase requests > 3/hour	Critical

7. Rollback Strategy

Cloud Run maintains previous revisions. To rollback:

# List revisions
gcloud run revisions list --service=orchestrator

# Route 100% traffic to a previous revision
gcloud run services update-traffic orchestrator \
  --to-revisions=orchestrator-00005-abc=100

8. Production Validation

After deployment, run the production readiness checker:

python ops/validate_production_readiness.py

This validates environment configuration, service connectivity, and security settings.

Deployment & Production

On this page