Autoscaling
Scale replicas up and down based on live CPU and memory. Pay for what you use, not what you provisioned.
The Scaling agent watches your replicas' live CPU and memory usage and scales them up under load, back down when traffic drops. Turn it on for any web or worker deployment and the agent owns replica count from then on.
Enable it from the deployment Settings tab at app.ownkube.io/dashboard/deployment/:deploymentId.
How it works
The Scaling agent watches your containers' CPU and memory usage. When average usage crosses a target threshold, the agent adds replicas; when usage drops below the threshold, it removes them. Example output: "Traffic up 2.4x in 5 min. Scaled api-gateway to 3 replicas. ETA: 12s."
CPU-based
Scale up when average CPU exceeds your target (e.g. 70%).
Memory-based
Scale up when average memory exceeds your target (e.g. 80%).
You can use one signal, both, or neither. The Scaling agent takes whichever triggers first.
What you configure
- Min replicas: the floor. Autoscaler won't go below this number.
- Max replicas: the ceiling. Capped at 100 per deployment.
- Target CPU utilization: percentage of the CPU request (e.g. 70 means scale up once containers average 70% CPU).
- Target memory utilization: percentage of the memory request.
Tip
Resource requests matter when autoscaling is on. The target utilization is measured against the request, so an honest request gives you honest scaling. Set your request to your app's steady-state floor, not a wishful peak.
A typical setup
For a web service that sees variable traffic through the day:
| Setting | Value |
|---|---|
| Min replicas | 2 |
| Max replicas | 10 |
| Target CPU utilization | 70% |
| Target memory utilization | 80% |
| CPU request | 250m |
| Memory request | 256Mi |
This keeps at least two replicas always warm, bursts up to ten under load, and scales back down once traffic drops.
When the Scaling agent takes over
Once autoscaling is enabled, the fixed replica count field is ignored. The Scaling agent owns replica count from that point forward. Turn autoscaling off to return to a manual replica count.
Autoscaling is supported on web and worker deployments. Jobs run to completion and don't scale; databases have their own scaling model described on the Databases page.
Cluster-shape differences
Autoscaling pairs with node autoscaling. If your deployment scales up and the cluster is out of capacity, new nodes are added automatically. Scale back down releases nodes you no longer need.
Replica count scales within the capacity of your single EC2 instance. If you hit the instance ceiling, resize to a larger instance type from the cluster detail page. See Clusters for the instance size table.
On the Scaling agent's roadmap
- Scale-to-zero for non-production environments during idle periods
- Schedule-based scaling (e.g. "scale up at 9am on weekdays, down at 7pm")
- Custom metrics: scale on queue depth, request rate, or app-specific signals
- Predictive pre-scaling ahead of historical traffic patterns, so you stop hitting cold-start penalties on the morning rush
Limits and constraints
- Max replicas is capped at 100 per deployment
- Scale-to-zero isn't supported yet. Minimum is 1 when autoscaling is on; you can pause a deployment to 0 manually.
- Autoscaling uses resource requests as its baseline. Deployments without requests can't autoscale reliably.
Deployments
Resource types and the full config field reference.
Cost optimization
How autoscaling fits into Ownkube's overall cost model.
Don't see a feature you need? Email support@ownkube.io. Ownkube is shaped by the teams using it and we ship what our users ask for.