Auto-Scaling ComfyUI-API and ComfyUI on Azure Container Apps

#webdev #containers #ai #aiops

Introduction

Last month, I published “Auto-Scaling ComfyUI-API and ComfyUI: Orchestrating GPU Workloads with Azure Kubernetes Service and KEDA” post. That work focused on cluster-level control, event-based autoscaling, and production-grade GPU orchestration.

After reviewing the official GPU workload guidance for Azure Container Apps, I evaluated whether the same ComfyUI and ComfyUI-API containers could run effectively in a fully managed, serverless container model—without owning cluster operations.

I implemented and validated this approach over several days. The result is a clean deployment model for running ComfyUI and a companion ComfyUI-API backend on Azure Container Apps with GPU-enabled workload profiles.

The architecture removes VM lifecycle management, node pool configuration, and CUDA driver maintenance from your responsibility. You package ComfyUI and the API as Docker images, push them to a container registry, and deploy them to Azure Container Apps configured with GPU profiles. Azure provisions NVIDIA GPUs on demand and scales replicas based on traffic. When idle, the environment can scale to zero, eliminating unnecessary GPU cost.

This produces a production-ready, HTTP-accessible ComfyUI platform that:

Exposes REST endpoints for automated image generation through ComfyUI-API
Utilises NVIDIA T4-class GPUs for inference acceleration
Scales dynamically based on workload
Scales to zero when inactive to control cost

This guide focuses on the concrete deployment steps and the API-level submission required to run ComfyUI-API/ComfyUI on Azure Container Apps with GPU support for image-generation workloads.

Create Azure Container Apps (ACA) on Azure Portal

Go to the https://portal.azure.com, and do some authentication with your Azure account.

Step 1:

(1): Input your Azure resource group
(2): Your ACA name
(3): The region you want to host your ACA

Step 2:

(1) Input the Registry Server, in this case, ghcr.io and input the image and tag, in this case, ghcr.io/thangchung/agent-engineering-experiment/comfyui-api:qwenvl-1
(2) We need to choose GPU, and the GPU type is Consumption - GPU NC8as-T4

Step 3:

(1) We need to enable Ingress
(2) And Accepting traffic from anywhere - because of dev/test
(3) Target port, in this case 3000, because our container is exposed at port 3000

Step 4:

Finally, we click the Create button and wait a bit for ACA provisioning.

Test it out

Go to https://<your aca name>.westus2.azurecontainerapps.io/docs, you should see:

You should input some prompt (Happy New Year of the Fire Horse! The image depicts a white horse leaping into the sky.) as shown in the picture below.

Then, click the execute button, wait a bit, then go to https://webhook.site/1ec2e965-3d92-4e2d-9714-ce7368b80995, you should see all the webhook requests, pick base64, and convert it to a picture, then you will see:

Bonus: .NET Aspire on ACA

(1) At the Overview page, you can enable Aspire Dashboard.

And now you can see Resources, Console logs, Traces, and Metrics of the ACA resources you run.

How cool is that? <3

Conclusion

This experiment confirms that ComfyUI and ComfyUI-API run reliably on Azure Container Apps with GPU-enabled workload profiles. The containers deploy cleanly, GPU allocation is handled transparently, ingress is straightforward, and scale behaviour aligns with real image generation demand. There is no cluster management, no node pool tuning, and no driver maintenance. The operational surface area is significantly smaller than a Kubernetes-based approach while still delivering GPU acceleration, HTTP exposure, and scale-to-zero cost control.

For teams that need GPU-backed inference without owning orchestration complexity, this model is viable and production-ready. The trade-off is reduced low-level control compared to AKS, but the gain in simplicity and operational efficiency is substantial.

Happy New Lunar Year 2026. May the Year of the Horse bring speed, endurance, and decisive execution.