AnyaSelf Docs

Deployment & Production

How to harden and deploy AnyaSelf for production.

Running AnyaSelf locally via docker compose is trivial, but deploying the stack into production requires hardening every simulated and stub layer into real infrastructure.

Pre-Deployment Checklist

Before deploying, ensure every item is addressed:

ItemWhat to DoSeverity
AUTH_JWT_SECRETSet a strong, unique secret (min 32 chars). The default dev-secret-change-me is rejected at startup in production.🔴 Critical
ORCHESTRATOR_REQUIRE_VERTEX_AGENTSet to true. Otherwise the agent silently returns stub responses.🔴 Critical
REQUIRE_INTERNAL_EVENT_TOKENSet to true. Protects internal Orchestrator endpoints.🔴 Critical
REQUIRE_BUYFLOW_INTERNAL_TOKENSet to true. Protects cart-ready and failed buy-flow endpoints.🔴 Critical
PERSISTENCE_BACKENDSet to firestore for all services.🔴 Critical
ORCHESTRATOR_INTERNAL_TOKENSet a unique, strong token. Must match across orchestrator, hyperbeam-bridge, and artifacts-audit.🔴 Critical
BUYFLOW_INTERNAL_TOKENSet a unique, strong token for the API Gateway.🔴 Critical
FIRESTORE_PROJECT_IDSet to your GCP project ID.🟡 Required
WARDROBE_STORAGE_BACKENDSet to gcs with a valid GCS_BUCKET.🟡 Required
HYPERBEAM_ENFORCE_EVENT_SIGNATURESSet to true with a real HYPERBEAM_EVENT_SIGNING_SECRET.🟡 Required
AUTH_EXTERNAL_LOGIN_ENABLEDSet to true with valid JWKS URL, issuer, and audience.🟡 Required

1. Google Cloud Configuration

AnyaSelf is built for Google Cloud. Production requires:

  • GCP Project with Vertex AI API enabled
  • Service Account with:
    • aiplatform.user — for Orchestrator Vertex AI Agent
    • storage.objectAdmin — for GCS image uploads (Wardrobe, VTO, Artifacts)
    • firestore.user — for data persistence
  • Workload Identity binding for Kubernetes/Cloud Run pods
# Verify Vertex AI is enabled
gcloud services list --enabled --filter="aiplatform.googleapis.com"

# Bind workload identity (Cloud Run example)
gcloud run services update orchestrator \
  --service-account=anyaself-backend@PROJECT_ID.iam.gserviceaccount.com

2. Shared Secrets Management

AnyaSelf uses shared secrets for internal trust:

SecretShared AcrossPurpose
AUTH_JWT_SECRETAll servicesJWT signing/verification
ORCHESTRATOR_INTERNAL_TOKENorchestrator, hyperbeam-bridge, artifacts-auditM2M bridge events
BUYFLOW_INTERNAL_TOKENapi-gatewayPurchase flow transitions
HYPERBEAM_EVENT_SIGNING_SECREThyperbeam-bridgeHMAC event verification

[!WARNING] Use a secrets manager (GCP Secret Manager, HashiCorp Vault) — never commit secrets to source control. Cloud Run supports mounting secrets as environment variables from Secret Manager.

# Example: Create secret in GCP Secret Manager
gcloud secrets create auth-jwt-secret --replication-policy=automatic
echo -n "your-strong-secret-here" | gcloud secrets versions add auth-jwt-secret --data-file=-

# Reference in Cloud Run deployment
gcloud run services update api-gateway \
  --set-secrets="AUTH_JWT_SECRET=auth-jwt-secret:latest"

3. Database Setup

All services use Firestore in production (PERSISTENCE_BACKEND=firestore).

Required:

  • Firestore database in Native mode (not Datastore mode)
  • Set FIRESTORE_PROJECT_ID or GOOGLE_CLOUD_PROJECT
  • Service account with Firestore read/write permissions

Firestore Collections (auto-created by services):

  • households, household_members, purchase_requests (API Gateway)
  • missions (Orchestrator)
  • wardrobe_items, outfits, feed_collections (Wardrobe)
  • offers (Commerce)
  • vto_jobs (VTO)
  • cartprep_jobs (CartPrep)
  • hyperbeam_sessions (Hyperbeam Bridge)
  • artifacts, audit_events (Artifacts & Audit)

[!NOTE] The FIRESTORE_DEV_FALLBACK_TO_INMEMORY=true default means services will silently fall back to in-memory if Firestore is unreachable in dev mode. In production, set APP_ENV=prod to disable this behavior.

4. Scaling the GPU Workers

The vto service runs diffusion model inference and requires GPU nodes when using VTO_INFERENCE_BACKEND=inline or remote.

Cloud Run GPU Configuration

gcloud run services update vto \
  --gpu=1 \
  --gpu-type=nvidia-l4 \
  --memory=16Gi \
  --cpu=4 \
  --max-instances=5

Scaling Recommendations

ServiceCPUMemoryGPUMin InstancesMax Instances
api-gateway1512Mi110
orchestrator21Gi15
wardrobe1512Mi15
commerce1512Mi13
vto416GiL4/T405
headless-cartprep21Gi03
hyperbeam-bridge1512Mi15
artifacts-audit1512Mi13

[!NOTE] Set VTO min-instances=0 with VTO_INFERENCE_BACKEND=remote to avoid paying for idle GPU time. Use min-instances=1 only if you need warm start latency.

5. Cloud Run Deployment

Use the included deployment scripts:

# Deploy all backend services
bash scripts/deploy_backend_cloudrun.sh

# Deploy the frontend
bash scripts/deploy_frontend_cloudrun.sh

Manual single-service deployment:

# Build and push Docker image
docker build -t gcr.io/PROJECT_ID/anyaself-orchestrator:latest \
  -f services/orchestrator/Dockerfile services/orchestrator

docker push gcr.io/PROJECT_ID/anyaself-orchestrator:latest

# Deploy to Cloud Run
gcloud run deploy orchestrator \
  --image=gcr.io/PROJECT_ID/anyaself-orchestrator:latest \
  --region=us-central1 \
  --port=8003 \
  --allow-unauthenticated

6. Observability

Structured Logging

Every service emits structured JSON logs via Python's logging module. Each log entry includes:

{
  "event": "http_request",
  "service": "anyaself-api-gateway",
  "requestId": "req_abc123",
  "method": "POST",
  "path": "/api/v1/households/h1/orchestrator/missions/chat",
  "statusCode": 200,
  "latencyMs": 245.5
}

X-Request-Id headers are propagated across all inter-service calls for distributed tracing.

Health Checks

Every service exposes GET /health:

{ "status": "ok", "service": "anyaself-orchestrator" }

Configure Cloud Run health checks to hit this endpoint.

AlertConditionSeverity
Service downHealth check fails 3 consecutive timesCritical
High latencyp95 latency > 5s on /chat endpointWarning
Error rate5xx rate > 5% in 5-minute windowCritical
VTO queue depthQueued jobs > 50Warning
Purchase failureFAILED purchase requests > 3/hourCritical

7. Rollback Strategy

Cloud Run maintains previous revisions. To rollback:

# List revisions
gcloud run revisions list --service=orchestrator

# Route 100% traffic to a previous revision
gcloud run services update-traffic orchestrator \
  --to-revisions=orchestrator-00005-abc=100

8. Production Validation

After deployment, run the production readiness checker:

python ops/validate_production_readiness.py

This validates environment configuration, service connectivity, and security settings.

On this page