Autoscaling

Scale replicas up and down based on live CPU and memory. Pay for what you use, not what you provisioned.

The Scaling agent watches your replicas' live CPU and memory usage and scales them up under load, back down when traffic drops. Turn it on for any web or worker deployment and the agent owns replica count from then on.

Enable it from the deployment Settings tab at app.ownkube.io/dashboard/deployment/:deploymentId.

How it works

The Scaling agent watches your containers' CPU and memory usage. When average usage crosses a target threshold, the agent adds replicas; when usage drops below the threshold, it removes them. Example output: "Traffic up 2.4x in 5 min. Scaled api-gateway to 3 replicas. ETA: 12s."

CPU-based

Scale up when average CPU exceeds your target (e.g. 70%).

Memory-based

Scale up when average memory exceeds your target (e.g. 80%).

You can use one signal, both, or neither. The Scaling agent takes whichever triggers first.

What you configure

  • Min replicas: the floor. Autoscaler won't go below this number.
  • Max replicas: the ceiling. Capped at 100 per deployment.
  • Target CPU utilization: percentage of the CPU request (e.g. 70 means scale up once containers average 70% CPU).
  • Target memory utilization: percentage of the memory request.

Tip

Resource requests matter when autoscaling is on. The target utilization is measured against the request, so an honest request gives you honest scaling. Set your request to your app's steady-state floor, not a wishful peak.

A typical setup

For a web service that sees variable traffic through the day:

SettingValue
Min replicas2
Max replicas10
Target CPU utilization70%
Target memory utilization80%
CPU request250m
Memory request256Mi

This keeps at least two replicas always warm, bursts up to ten under load, and scales back down once traffic drops.

When the Scaling agent takes over

Once autoscaling is enabled, the fixed replica count field is ignored. The Scaling agent owns replica count from that point forward. Turn autoscaling off to return to a manual replica count.

Autoscaling is supported on web and worker deployments. Jobs run to completion and don't scale; databases have their own scaling model described on the Databases page.

Cluster-shape differences

Autoscaling pairs with node autoscaling. If your deployment scales up and the cluster is out of capacity, new nodes are added automatically. Scale back down releases nodes you no longer need.

Replica count scales within the capacity of your single EC2 instance. If you hit the instance ceiling, resize to a larger instance type from the cluster detail page. See Clusters for the instance size table.

On the Scaling agent's roadmap

  • Scale-to-zero for non-production environments during idle periods
  • Schedule-based scaling (e.g. "scale up at 9am on weekdays, down at 7pm")
  • Custom metrics: scale on queue depth, request rate, or app-specific signals
  • Predictive pre-scaling ahead of historical traffic patterns, so you stop hitting cold-start penalties on the morning rush

Limits and constraints

  • Max replicas is capped at 100 per deployment
  • Scale-to-zero isn't supported yet. Minimum is 1 when autoscaling is on; you can pause a deployment to 0 manually.
  • Autoscaling uses resource requests as its baseline. Deployments without requests can't autoscale reliably.

Don't see a feature you need? Email support@ownkube.io. Ownkube is shaped by the teams using it and we ship what our users ask for.

On this page