Autoscaling

Scale replicas up and down based on live CPU and memory. Pay for what you use, not what you provisioned.

The Scaling agent watches your replicas' live CPU and memory usage and scales them up under load, back down when traffic drops. Turn it on for any web or worker deployment and the agent owns replica count from then on.

Enable it from the deployment Settings tab at app.ownkube.io/dashboard/deployment/:deploymentId.

How it works

The Scaling agent watches your containers' CPU and memory usage. When average usage crosses a target threshold, the agent adds replicas; when usage drops below the threshold, it removes them. Example output: "Traffic up 2.4x in 5 min. Scaled api-gateway to 3 replicas. ETA: 12s."

CPU-based

Scale up when average CPU exceeds your target (e.g. 70%).

Memory-based

Scale up when average memory exceeds your target (e.g. 80%).

You can use one signal, both, or neither. The Scaling agent takes whichever triggers first.

What you configure

Min replicas: the floor. Autoscaler won't go below this number.
Max replicas: the ceiling. Capped at 100 per deployment.
Target CPU utilization: percentage of the CPU request (e.g. 70 means scale up once containers average 70% CPU).
Target memory utilization: percentage of the memory request.

Tip

Resource requests matter when autoscaling is on. The target utilization is measured against the request, so an honest request gives you honest scaling. Set your request to your app's steady-state floor, not a wishful peak.

A typical setup

For a web service that sees variable traffic through the day:

Setting	Value
Min replicas	2
Max replicas	10
Target CPU utilization	70%
Target memory utilization	80%
CPU request	250m
Memory request	256Mi

This keeps at least two replicas always warm, bursts up to ten under load, and scales back down once traffic drops.

Scale-to-zero for non-production environments during idle periods
Schedule-based scaling (e.g. "scale up at 9am on weekdays, down at 7pm")
Custom metrics: scale on queue depth, request rate, or app-specific signals
Predictive pre-scaling ahead of historical traffic patterns, so you stop hitting cold-start penalties on the morning rush

Limits and constraints

Max replicas is capped at 100 per deployment
Scale-to-zero isn't supported yet. Minimum is 1 when autoscaling is on; you can pause a deployment to 0 manually.
Autoscaling uses resource requests as its baseline. Deployments without requests can't autoscale reliably.

Deployments

Resource types and the full config field reference.

Cost optimization

How autoscaling fits into Ownkube's overall cost model.

Don't see a feature you need? Email support@ownkube.io. Ownkube is shaped by the teams using it and we ship what our users ask for.

How it works

CPU-based

Memory-based

What you configure

A typical setup

When the Scaling agent takes over

Cluster-shape differences

On the Scaling agent's roadmap

Limits and constraints

Deployments

Cost optimization

On this page