Azure Container Apps for MCP Servers: A Production Architecture Guide
By CorpusIQ LLC
The hosting question for MCP servers usually comes up the week after the prototype works.
Kubernetes is overkill for a stateless HTTP service with 50 connectors. Lambda-style FaaS has cold start problems that hurt MCP response times. VMs waste money. Container Apps hit a sweet spot for the workload profile MCP servers actually have: bursty traffic, stateless handlers, connection pooling to external APIs, and a strong preference for predictable latency.
This is the architecture CorpusIQ runs in production on Azure Container Apps. Not the platonic ideal. The one that actually shipped and works.
Why Container Apps, specifically
Three workload characteristics pushed us toward Container Apps over the alternatives.
Bursty but not cold-start sensitive. An MCP server handles periods of high request volume followed by quiet periods. Lambda-style FaaS handles the burstiness but adds cold start penalties. Container Apps keep minimum replicas warm while scaling up under load.
Stateless request handlers with external state. All state lives in the underlying connector APIs or in your own database. The MCP server itself is pure compute. This maps perfectly to the Container Apps model.
Strong Azure AD integration. JWT validation against Microsoft Entra works without custom code. OAuth token storage in Key Vault works without custom code. Log ingestion into Log Analytics works without custom code.
The alternatives all have merit. AWS ECS Fargate is architecturally similar. Google Cloud Run is the closest analogue on GCP. Kubernetes is right when you need custom networking, stateful workloads, or operator-level control. Pick your platform based on the rest of your stack.
The replica configuration that works
Three settings on the Container App dominate everything else.
Minimum replicas: 1. Not zero. Zero-replica configurations save money but introduce cold starts on the first request after idle. For a user-facing AI product, the first request is the most important, and cold starts on it feel like broken software.
Maximum replicas: start at 10. Most SMB-scale MCP servers never exceed this. The limit is a safety net against runaway scaling.
Scale rule: HTTP concurrency, target 50. Each replica handles up to 50 concurrent requests before Container Apps spins up another. The right number depends on your handler latency. Fifty is a defensible starting point for mixed workloads.
Avoid autoscaling on CPU or memory. The binding constraint is usually external API latency and concurrent connection count, not compute.
The revision strategy
Container Apps supports single-revision and multiple-revision modes. For production MCP servers, run single-revision mode with blue-green deployment via traffic splitting.
The pattern: deploy a new revision at 0% traffic. Run smoke tests against the new revision's direct URL. Shift 10% of traffic to the new revision. Watch error rates for 30 minutes. Shift to 100%. Deactivate the old revision.
This works because Container Apps can hold multiple revisions live simultaneously and split traffic between them at the ingress layer. Rollbacks are instant.
Secret management without leaks
MCP servers carry a lot of secrets. OAuth client IDs and client secrets for every connector. JWT signing keys. Database connection strings. API keys for embedded LLM inference.
Azure Key Vault is the right home for all of them. Container Apps has native Key Vault integration through managed identity.
Create a user-assigned managed identity for the Container App. Grant that identity Key Vault Secrets User role on your Key Vault. In the Container App configuration, reference secrets by Key Vault URI, not by value. Container Apps retrieves the secrets at runtime using the managed identity.
Three gotchas to avoid. Never put secrets in environment variables as plaintext. They show up in portal logs, deployment configs, and any debugging output. Rotate the Key Vault secrets, not the Container App configuration. When a secret rotates, only Key Vault changes. Use separate Key Vaults for dev and prod. A single Key Vault shared across environments means a dev misconfiguration can leak prod secrets.
Observability that actually works
Application Insights for performance telemetry. Every MCP tool call has latency, error rate, and outcome. Application Insights captures this automatically when you enable it on the Container App.
Log Analytics for audit events. The structured AUDIT events flow into ContainerAppConsoleLogs_CL automatically. Parse with KQL.
Alerts on three specific conditions. Replica count at maximum (scale ceiling hit). Error rate above 2% sustained for 5 minutes (real outage). Authentication failure rate spike (possible credential compromise or provider outage).
Skip noise-heavy alerts in early production. Request volume alerts produce false positives. Start with the three above.
The retention problem
Log Analytics default retention is 30 days. For compliance, 30 days is not enough. SOC 2 Type 2 expects at least one year. FINRA expects seven years. HIPAA expects six years.
Two approaches work. Extend Log Analytics retention. Up to 730 days is supported. Cost scales linearly with ingestion volume and retention period. Export to cold storage nightly. Configure Log Analytics to export audit logs to Azure Storage via Diagnostic Settings. Retain in hot Log Analytics for 90 days. Retain in cold Storage for seven years. Cost is roughly 5% of the hot-retention approach.
Networking decisions
Two defensible patterns.
Public ingress, consumption-only plan. The MCP server is publicly reachable. JWT authentication gates every request. Simple, cheap, fast to deploy. Works for SMB-tier customers.
Internal ingress, dedicated plan, peered with customer VNet. For enterprise customers who require private connectivity. Significantly more complex but required for some regulated deployments.
Do not build the private networking architecture before you have a customer paying for it.
The deployment pipeline
GitHub Actions with Azure container-apps-deploy-action works for most teams. The pipeline we run: on PR, test and build; on merge to main, deploy to staging, run end-to-end tests, then deploy to production at 0% traffic, shift 10%, wait 30 minutes, shift to 100%. Whole pipeline takes 15 to 20 minutes from merge to production.
The cost envelope
For a production MCP server serving low-hundreds of active SMB users: Container Apps consumption runs roughly $200 to $400 per month. Log Analytics ingestion roughly $150 to $300 per month. Key Vault operations under $10 per month. Container Registry under $20 per month. Application Insights roughly $100 to $200 per month.
Call it $600 to $1,000 per month for the hosting and observability stack. Scales roughly linearly with user count up to the mid-thousands.
The choice that matters
If you are picking infrastructure for an MCP server today, pick the platform your team already knows. Azure, AWS, and GCP all have good container platforms. The architectural patterns transfer.
The actual production concerns, observability, secret management, retention, scaling, deployment discipline, are shared across every platform. Invest there.
Try CorpusIQ Free
Connect your first tool in under 2 minutes
30-day free trial. Cancel anytime. All 22+ connectors included.
Start free trial →