Reynaldi Wong • Jan 30, 2026

Kubernetes Autoscaling: HPA and VPA

A little bit of Features

What happen if turns out the workload is too much?

Kubernetes have this feature called:

Vertical Pod Autoscaling (VPA) can help us scale the resources (CPU and memory) of each pod based on the workload.
Horizontal Pod Autoscaling (HPA) which can help us scale the number of pods (instances of our application) up or down based on the workload.

graph TB
    %% Styles
    classDef base fill:transparent,stroke:#fff,stroke-width:1px,color:#fff
    classDef stress fill:transparent,stroke:#ff6b6b,stroke-width:2px,color:#ff6b6b
    classDef success fill:transparent,stroke:#4ade80,stroke-width:2px,color:#4ade80
    classDef sub fill:transparent,stroke:#444,stroke-width:1px,color:#fff

    %% Horizontal Pod Autoscaling
    subgraph HPA [Horizontal Pod Autoscaling]
        direction TB
        subgraph HPA_Before [" "]
            P1("🔥 Pod (Overloaded)"):::stress
        end
        
        P1 ===>|Scale Out| P2 & P3 & P4

        subgraph HPA_After [" "]
            direction LR
            P2("✅ Pod 1"):::success
            P3("✅ Pod 2"):::success
            P4("✅ Pod 3"):::success
        end
    end

    %% Vertical Pod Autoscaling
    subgraph VPA [Vertical Pod Autoscaling]
        direction TB
        subgraph VPA_Before [" "]
            P5("⚠️ Small Pod"):::stress
        end
        
        P5 ===>|Scale Up| P6

        subgraph VPA_After [" "]
            P6("💪 LARGE POD"):::success
        end
    end
    
    %% Apply Subgraph Styles
    class HPA,HPA_Before,HPA_After,VPA,VPA_Before,VPA_After sub
    
    %% Link Styling
    linkStyle default stroke:#666,stroke-width:2px,color:#888

Deep Dive: Horizontal Pod Autoscaling (HPA)

Horizontal Pod Autoscaling (HPA) is like calling for backup. When your application is getting overwhelmed by traffic, HPA automatically deploys more identical copies (pods) of your application to share the load. When the traffic dies down, it dismisses the extra pods to save resources.

How it works

HPA monitors specific metrics (like CPU usage or custom metrics like request rate). It periodically checks these metrics against the target value you’ve set.

Monitor: The Metrics Server collects resource usage data from each pod.

Calculate: HPA calculates the desired number of replicas using a formula:

desiredReplicas = ⌈ currentReplicas × (currentMetricValue / desiredMetricValue) ⌉

Scale: If the calculation shows a need for more or fewer pods, HPA updates the deployment to scale the replica count.

When to use it

Stateless Applications: Web servers, API gateways, and microservices that don’t store data locally are perfect candidates.
Variable Traffic: Applications that see spikes during the day and lulls at night.

Deep Dive: Vertical Pod Autoscaling (VPA)

Vertical Pod Autoscaling (VPA) is like giving your existing worker steroids (or a diet plan). Instead of adding more workers, VPA makes the existing workers stronger by giving them more CPU and Memory.

How it works

VPA consists of three main components:

Recommender: It watches the history of resource usage and recommends the ideal CPU and memory requests.
Updater: It decides which pods need to be restarted to apply the new resource limits.
Admission Controller: When a pod is created (or recreated), it intercepts the creation request and overrides the resource requests with the recommended values.

Key Considerations

Restart Required: To change the CPU/Memory of a running pod, Kubernetes usually needs to restart it. This can cause brief interruptions if not managed carefully.
Stateful Workloads: VPA is often better for databases or legacy applications that are hard to scale horizontally (difficult to run multiple copies of).
Don’t mix with HPA on CPU/Memory: Using HPA and VPA together on the same metric (e.g., CPU) can lead to a conflict where they fight each other. (e.g., HPA adds pods because CPU is high, while VPA tries to make pods larger).

Summary: Which one should I use?

Feature	Best For...	Pros	Cons
HPA	Handling traffic spikes in stateless apps	Responsive scaling; High Availability	Potential cold starts; Complex for stateful apps
VPA	Optimizing resource usage for heavy tasks	Efficient resource utilization; Good for legacy apps	Requires pod restart; Slower reaction to spikes