Deployment & Production
How to harden and deploy AnyaSelf for production.
Running AnyaSelf locally via docker compose is trivial, but deploying the stack into production requires hardening every simulated and stub layer into real infrastructure.
Pre-Deployment Checklist
Before deploying, ensure every item is addressed:
| Item | What to Do | Severity |
|---|---|---|
AUTH_JWT_SECRET | Set a strong, unique secret (min 32 chars). The default dev-secret-change-me is rejected at startup in production. | 🔴 Critical |
ORCHESTRATOR_REQUIRE_VERTEX_AGENT | Set to true. Otherwise the agent silently returns stub responses. | 🔴 Critical |
REQUIRE_INTERNAL_EVENT_TOKEN | Set to true. Protects internal Orchestrator endpoints. | 🔴 Critical |
REQUIRE_BUYFLOW_INTERNAL_TOKEN | Set to true. Protects cart-ready and failed buy-flow endpoints. | 🔴 Critical |
PERSISTENCE_BACKEND | Set to firestore for all services. | 🔴 Critical |
ORCHESTRATOR_INTERNAL_TOKEN | Set a unique, strong token. Must match across orchestrator, hyperbeam-bridge, and artifacts-audit. | 🔴 Critical |
BUYFLOW_INTERNAL_TOKEN | Set a unique, strong token for the API Gateway. | 🔴 Critical |
FIRESTORE_PROJECT_ID | Set to your GCP project ID. | 🟡 Required |
WARDROBE_STORAGE_BACKEND | Set to gcs with a valid GCS_BUCKET. | 🟡 Required |
HYPERBEAM_ENFORCE_EVENT_SIGNATURES | Set to true with a real HYPERBEAM_EVENT_SIGNING_SECRET. | 🟡 Required |
AUTH_EXTERNAL_LOGIN_ENABLED | Set to true with valid JWKS URL, issuer, and audience. | 🟡 Required |
1. Google Cloud Configuration
AnyaSelf is built for Google Cloud. Production requires:
- GCP Project with Vertex AI API enabled
- Service Account with:
aiplatform.user— for Orchestrator Vertex AI Agentstorage.objectAdmin— for GCS image uploads (Wardrobe, VTO, Artifacts)firestore.user— for data persistence
- Workload Identity binding for Kubernetes/Cloud Run pods
# Verify Vertex AI is enabled
gcloud services list --enabled --filter="aiplatform.googleapis.com"
# Bind workload identity (Cloud Run example)
gcloud run services update orchestrator \
--service-account=anyaself-backend@PROJECT_ID.iam.gserviceaccount.com2. Shared Secrets Management
AnyaSelf uses shared secrets for internal trust:
| Secret | Shared Across | Purpose |
|---|---|---|
AUTH_JWT_SECRET | All services | JWT signing/verification |
ORCHESTRATOR_INTERNAL_TOKEN | orchestrator, hyperbeam-bridge, artifacts-audit | M2M bridge events |
BUYFLOW_INTERNAL_TOKEN | api-gateway | Purchase flow transitions |
HYPERBEAM_EVENT_SIGNING_SECRET | hyperbeam-bridge | HMAC event verification |
[!WARNING] Use a secrets manager (GCP Secret Manager, HashiCorp Vault) — never commit secrets to source control. Cloud Run supports mounting secrets as environment variables from Secret Manager.
# Example: Create secret in GCP Secret Manager
gcloud secrets create auth-jwt-secret --replication-policy=automatic
echo -n "your-strong-secret-here" | gcloud secrets versions add auth-jwt-secret --data-file=-
# Reference in Cloud Run deployment
gcloud run services update api-gateway \
--set-secrets="AUTH_JWT_SECRET=auth-jwt-secret:latest"3. Database Setup
All services use Firestore in production (PERSISTENCE_BACKEND=firestore).
Required:
- Firestore database in Native mode (not Datastore mode)
- Set
FIRESTORE_PROJECT_IDorGOOGLE_CLOUD_PROJECT - Service account with Firestore read/write permissions
Firestore Collections (auto-created by services):
households,household_members,purchase_requests(API Gateway)missions(Orchestrator)wardrobe_items,outfits,feed_collections(Wardrobe)offers(Commerce)vto_jobs(VTO)cartprep_jobs(CartPrep)hyperbeam_sessions(Hyperbeam Bridge)artifacts,audit_events(Artifacts & Audit)
[!NOTE] The
FIRESTORE_DEV_FALLBACK_TO_INMEMORY=truedefault means services will silently fall back to in-memory if Firestore is unreachable in dev mode. In production, setAPP_ENV=prodto disable this behavior.
4. Scaling the GPU Workers
The vto service runs diffusion model inference and requires GPU nodes when using VTO_INFERENCE_BACKEND=inline or remote.
Cloud Run GPU Configuration
gcloud run services update vto \
--gpu=1 \
--gpu-type=nvidia-l4 \
--memory=16Gi \
--cpu=4 \
--max-instances=5Scaling Recommendations
| Service | CPU | Memory | GPU | Min Instances | Max Instances |
|---|---|---|---|---|---|
| api-gateway | 1 | 512Mi | — | 1 | 10 |
| orchestrator | 2 | 1Gi | — | 1 | 5 |
| wardrobe | 1 | 512Mi | — | 1 | 5 |
| commerce | 1 | 512Mi | — | 1 | 3 |
| vto | 4 | 16Gi | L4/T4 | 0 | 5 |
| headless-cartprep | 2 | 1Gi | — | 0 | 3 |
| hyperbeam-bridge | 1 | 512Mi | — | 1 | 5 |
| artifacts-audit | 1 | 512Mi | — | 1 | 3 |
[!NOTE] Set VTO
min-instances=0withVTO_INFERENCE_BACKEND=remoteto avoid paying for idle GPU time. Usemin-instances=1only if you need warm start latency.
5. Cloud Run Deployment
Use the included deployment scripts:
# Deploy all backend services
bash scripts/deploy_backend_cloudrun.sh
# Deploy the frontend
bash scripts/deploy_frontend_cloudrun.shManual single-service deployment:
# Build and push Docker image
docker build -t gcr.io/PROJECT_ID/anyaself-orchestrator:latest \
-f services/orchestrator/Dockerfile services/orchestrator
docker push gcr.io/PROJECT_ID/anyaself-orchestrator:latest
# Deploy to Cloud Run
gcloud run deploy orchestrator \
--image=gcr.io/PROJECT_ID/anyaself-orchestrator:latest \
--region=us-central1 \
--port=8003 \
--allow-unauthenticated6. Observability
Structured Logging
Every service emits structured JSON logs via Python's logging module. Each log entry includes:
{
"event": "http_request",
"service": "anyaself-api-gateway",
"requestId": "req_abc123",
"method": "POST",
"path": "/api/v1/households/h1/orchestrator/missions/chat",
"statusCode": 200,
"latencyMs": 245.5
}X-Request-Id headers are propagated across all inter-service calls for distributed tracing.
Health Checks
Every service exposes GET /health:
{ "status": "ok", "service": "anyaself-orchestrator" }Configure Cloud Run health checks to hit this endpoint.
Recommended Alerts
| Alert | Condition | Severity |
|---|---|---|
| Service down | Health check fails 3 consecutive times | Critical |
| High latency | p95 latency > 5s on /chat endpoint | Warning |
| Error rate | 5xx rate > 5% in 5-minute window | Critical |
| VTO queue depth | Queued jobs > 50 | Warning |
| Purchase failure | FAILED purchase requests > 3/hour | Critical |
7. Rollback Strategy
Cloud Run maintains previous revisions. To rollback:
# List revisions
gcloud run revisions list --service=orchestrator
# Route 100% traffic to a previous revision
gcloud run services update-traffic orchestrator \
--to-revisions=orchestrator-00005-abc=1008. Production Validation
After deployment, run the production readiness checker:
python ops/validate_production_readiness.pyThis validates environment configuration, service connectivity, and security settings.