TemperStack
Intermediate12 min readUpdated Mar 18, 2026

How to set up horizontal pod scaling on DigitalOcean

Quick Answer

Horizontal Pod Autoscaler (HPA) on DigitalOcean Kubernetes automatically scales your application pods based on CPU, memory, or custom metrics. Enable the metrics server, create an HPA resource with target metrics, and Kubernetes will automatically add or remove pods to meet demand.

Prerequisites

  1. Active DigitalOcean Kubernetes cluster
  2. kubectl configured for your DOKS cluster
  3. Basic understanding of Kubernetes pods and deployments
  4. Application already deployed to the cluster
1

Verify metrics server is enabled

Connect to your DigitalOcean Kubernetes cluster and check if the metrics server is running:

kubectl get deployment metrics-server -n kube-system

If not found, enable it in your DOKS cluster settings in the DigitalOcean control panel under Kubernetes > Your Cluster > Settings > Add-ons and enable Metrics Server.
Tip
The metrics server is required for HPA to collect CPU and memory metrics from your pods.
2

Configure resource requests for your deployment

Edit your deployment to include CPU and memory resource requests, which are required for HPA:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app
        image: my-app:latest
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 512Mi

Apply the changes: kubectl apply -f deployment.yaml
Tip
Resource requests are mandatory for CPU-based scaling and recommended for memory-based scaling.
3

Create the Horizontal Pod Autoscaler

Create an HPA resource file hpa.yaml:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Apply the HPA: kubectl apply -f hpa.yaml
Tip
Start with conservative scaling limits and adjust based on your application's performance and cost requirements.
4

Verify HPA status and configuration

Check that your HPA is active and collecting metrics:

kubectl get hpa my-app-hpa

View detailed HPA information:

kubectl describe hpa my-app-hpa

You should see current CPU utilization, target utilization, and current replica count. It may take a few minutes for metrics to appear.
5

Configure memory-based scaling (optional)

Add memory metrics to your HPA for more comprehensive scaling:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Apply the updated configuration: kubectl apply -f hpa.yaml
Tip
Memory-based scaling helps prevent out-of-memory errors during traffic spikes.
6

Test the autoscaling behavior

Generate load on your application to test scaling. Use a load testing tool or create a temporary pod:

kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh

Inside the pod, generate requests:

while true; do wget -q -O- http://my-app-service/; done

Monitor the HPA response: kubectl get hpa my-app-hpa --watch
Tip
Scaling decisions typically take 15-30 seconds for scale-up and 5 minutes for scale-down to prevent flapping.
7

Monitor and adjust scaling parameters

Use DigitalOcean's monitoring dashboard to track your cluster performance under Kubernetes > Your Cluster > Insights. Adjust HPA parameters based on observed behavior:

  • Lower averageUtilization for more aggressive scaling
  • Increase maxReplicas if you hit the limit during peak traffic
  • Adjust minReplicas based on baseline traffic requirements

Update your HPA configuration and reapply: kubectl apply -f hpa.yaml
Tip
Review scaling events regularly using kubectl describe hpa to optimize your thresholds.

Troubleshooting

HPA shows 'unknown' for current CPU utilization
Ensure the metrics server is running (kubectl get pods -n kube-system | grep metrics-server) and your deployment has CPU resource requests defined. Wait 2-3 minutes after deployment for metrics to populate.
Pods are not scaling up despite high CPU usage
Check if you've reached the maxReplicas limit and verify your DigitalOcean node pool has sufficient capacity. Use kubectl describe nodes to check resource availability and consider adding more nodes if needed.
Scaling happens too frequently (flapping)
Add stabilization windows to your HPA configuration:
behavior:
  scaleUp:
    stabilizationWindowSeconds: 60
  scaleDown:
    stabilizationWindowSeconds: 300

This prevents rapid scaling changes.
HPA cannot scale below minimum replicas during low traffic
This is expected behavior. If you want fewer replicas during off-peak hours, consider using Vertical Pod Autoscaler (VPA) alongside HPA or manually adjust minReplicas in your HPA configuration for different time periods.

Related Guides

More DigitalOcean Tutorials

Other Tool Tutorials

Ready to get started with DigitalOcean?

Put this tutorial into practice. Visit DigitalOcean and follow the steps above.

Visit DigitalOcean