Kubernetes Autoscaling คืออะไร? สอน HPA, VPA, KEDA และ Cluster Autoscaler 2026

Kubernetes Autoscaling คือความสามารถในการปรับจำนวน Pods หรือ Nodes อัตโนมัติตาม Workload ที่เปลี่ยนแปลง เมื่อ Traffic เพิ่ม Kubernetes จะเพิ่ม Pods/Nodes เมื่อ Traffic ลด ก็จะลดลง ทำให้ Application รองรับ Traffic ได้ทุกระดับโดยไม่ต้อง Provision resources เกินความจำเป็น ช่วยทั้งเรื่อง Reliability และ Cost optimization

Kubernetes มี Autoscaling หลายระดับ ได้แก่ HPA (Horizontal Pod Autoscaler), VPA (Vertical Pod Autoscaler), KEDA (Kubernetes Event-Driven Autoscaling), Cluster Autoscaler, และ Karpenter แต่ละตัวมีจุดประสงค์ต่างกัน บทความนี้จะอธิบายทั้งหมดพร้อม YAML config ที่ใช้ได้จริง

ทำไม Autoscaling ถึงสำคัญ?

ถ้าไม่มี Autoscaling คุณต้อง Provision resources สำหรับ Peak traffic ตลอดเวลา ทำให้เสียเงินค่า Cloud ไปกับ Resources ที่ไม่ได้ใช้ตอน Off-peak (อาจถึง 60-70% ของเวลาทั้งหมด) หรือถ้า Provision น้อยเกินไป Application จะล่มเมื่อ Traffic พุ่ง

Autoscaling แก้ปัญหานี้โดย เพิ่ม Pods/Nodes อัตโนมัติเมื่อ Traffic สูง ลด Pods/Nodes เมื่อ Traffic ต่ำ (ประหยัดค่า Cloud), รักษา Performance (latency, throughput) ให้คงที่, ไม่ต้อง On-call แก้ปัญหา Scaling มือ, รองรับ Unpredictable traffic (viral content, flash sales)

HPA — Horizontal Pod Autoscaler

HPA เป็น Autoscaler ที่ใช้บ่อยที่สุดใน Kubernetes ทำหน้าที่เพิ่มหรือลดจำนวน Pod replicas ตาม Metrics ที่กำหนด (เช่น CPU, Memory หรือ Custom metrics)

HPA ด้วย CPU/Memory

# hpa-cpu.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2      # ขั้นต่ำ 2 pods (HA)
  maxReplicas: 20     # สูงสุด 20 pods
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70  # Scale เมื่อ CPU เฉลี่ย > 70%
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80  # Scale เมื่อ Memory เฉลี่ย > 80%
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60   # รอ 60 วินาที ก่อน Scale up
      policies:
        - type: Pods
          value: 4                     # เพิ่มทีละไม่เกิน 4 pods
          periodSeconds: 60
        - type: Percent
          value: 100                   # หรือเพิ่มได้ 100% ของปัจจุบัน
          periodSeconds: 60
      selectPolicy: Max
    scaleDown:
      stabilizationWindowSeconds: 300  # รอ 5 นาที ก่อน Scale down
      policies:
        - type: Pods
          value: 1                     # ลดทีละ 1 pod
          periodSeconds: 60

# Deployment ต้องมี resources requests (สำคัญมาก!)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
        - name: web-app
          image: myapp:latest
          resources:
            requests:        # HPA ใช้ requests เป็นฐานคำนวณ
              cpu: "250m"    # 0.25 CPU core
              memory: "256Mi"
            limits:
              cpu: "500m"
              memory: "512Mi"
          ports:
            - containerPort: 8080

HPA ด้วย Custom Metrics

# HPA ด้วย Custom Metrics จาก Prometheus
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa-custom
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 30
  metrics:
    # CPU (พื้นฐาน)
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    # Custom metric: requests per second ต่อ pod
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "100"  # Scale เมื่อ > 100 RPS/pod
    # External metric: queue length จาก RabbitMQ
    - type: External
      external:
        metric:
          name: rabbitmq_queue_messages
          selector:
            matchLabels:
              queue: "orders"
        target:
          type: AverageValue
          averageValue: "30"  # Scale เมื่อ > 30 messages/pod

# ต้องติดตั้ง Prometheus Adapter เพื่อ expose custom metrics ให้ HPA
# helm install prometheus-adapter prometheus-community/prometheus-adapter

VPA — Vertical Pod Autoscaler

VPA ปรับ Resource requests/limits (CPU, Memory) ของ Pod อัตโนมัติ แทนที่จะเพิ่มจำนวน Pod (horizontal) VPA จะเพิ่ม/ลด Resources ของแต่ละ Pod (vertical):

# vpa.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  updatePolicy:
    updateMode: "Auto"  # Auto = ปรับ resources อัตโนมัติ (restart pod)
                        # "Off" = แค่ recommend ไม่ปรับ
                        # "Initial" = ปรับตอนสร้าง pod ใหม่เท่านั้น
  resourcePolicy:
    containerPolicies:
      - containerName: web-app
        minAllowed:
          cpu: "100m"
          memory: "128Mi"
        maxAllowed:
          cpu: "2"
          memory: "4Gi"
        controlledResources: ["cpu", "memory"]

# ดู VPA recommendations
# kubectl describe vpa web-app-vpa
# Output:
# Recommendation:
#   Container: web-app
#     Lower Bound:  cpu: 150m, memory: 200Mi
#     Target:       cpu: 350m, memory: 400Mi
#     Upper Bound:  cpu: 800m, memory: 1Gi

VPA vs HPA: ใช้ VPA เมื่อ Application ไม่สามารถ Scale horizontally ได้ (เช่น Stateful apps, Database) หรือต้องการ Right-sizing resources ใช้ HPA เมื่อ Application เป็น Stateless และ Scale horizontally ได้ ปกติใช้ HPA เป็นหลัก + VPA ในโหมด "Off" เพื่อดู Recommendations

KEDA — Kubernetes Event-Driven Autoscaling

KEDA เป็น Event-driven Autoscaler ที่ทรงพลังกว่า HPA มาก รองรับ Scalers มากกว่า 60 ตัว จาก Event sources ต่างๆ เช่น Kafka, RabbitMQ, Redis, Prometheus, Cron, AWS SQS, Azure Queue และอื่นๆ อีกมาก จุดเด่นของ KEDA คือสามารถ Scale to Zero ได้ (ลด pods เหลือ 0 เมื่อไม่มี events)

# ติดตั้ง KEDA
helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --namespace keda --create-namespace

KEDA + Kafka

# keda-kafka.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: order-processor
  namespace: production
spec:
  scaleTargetRef:
    name: order-processor  # Deployment name
  pollingInterval: 15      # ตรวจสอบทุก 15 วินาที
  cooldownPeriod: 60       # รอ 60 วินาที ก่อน scale to zero
  minReplicaCount: 0       # Scale to Zero ได้!
  maxReplicaCount: 50
  triggers:
    - type: kafka
      metadata:
        bootstrapServers: kafka.production.svc:9092
        consumerGroup: order-processor-group
        topic: orders
        lagThreshold: "100"  # Scale เมื่อ consumer lag > 100

KEDA + RabbitMQ

# keda-rabbitmq.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: email-sender
spec:
  scaleTargetRef:
    name: email-sender
  minReplicaCount: 0
  maxReplicaCount: 20
  triggers:
    - type: rabbitmq
      metadata:
        host: amqp://user:password@rabbitmq.default.svc:5672/
        queueName: emails
        queueLength: "50"  # Scale เมื่อ queue > 50 messages

KEDA + Prometheus

# keda-prometheus.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: api-server
spec:
  scaleTargetRef:
    name: api-server
  minReplicaCount: 1
  maxReplicaCount: 30
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://prometheus.monitoring.svc:9090
        metricName: http_requests_total
        query: sum(rate(http_requests_total{service="api-server"}[2m]))
        threshold: "500"  # Scale เมื่อ > 500 RPS

KEDA + Cron (Scheduled Scaling)

# keda-cron.yaml — Scale ตามเวลา
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: batch-processor
spec:
  scaleTargetRef:
    name: batch-processor
  minReplicaCount: 0
  maxReplicaCount: 10
  triggers:
    - type: cron
      metadata:
        timezone: Asia/Bangkok
        start: 0 8 * * *     # เริ่ม 08:00 ทุกวัน
        end: 0 20 * * *      # จบ 20:00 ทุกวัน
        desiredReplicas: "5" # Scale เป็น 5 pods ในช่วงเวลา

Cluster Autoscaler

Cluster Autoscaler ปรับจำนวน Nodes ใน Cluster อัตโนมัติ เมื่อ Pods ถูกสร้างแต่ไม่มี Node ที่มี Resources เพียงพอ Cluster Autoscaler จะสั่ง Cloud provider เพิ่ม Node ใหม่ เมื่อ Node มี Utilization ต่ำ (Pods ไม่ค่อยใช้ Resources) จะลด Node ทิ้ง

# ตั้งค่า Cluster Autoscaler (EKS)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    spec:
      containers:
        - name: cluster-autoscaler
          image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.29.0
          command:
            - ./cluster-autoscaler
            - --v=4
            - --cloud-provider=aws
            - --skip-nodes-with-local-storage=false
            - --expander=least-waste  # เลือก Node ที่จะ waste น้อยที่สุด
            - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
            - --scale-down-delay-after-add=5m     # รอ 5 นาที หลังเพิ่ม Node
            - --scale-down-unneeded-time=5m        # Node ต้อง idle 5 นาที ก่อนลด
            - --scale-down-utilization-threshold=0.5  # ลด Node ถ้า util < 50%
            - --max-node-provision-time=15m
            - --balance-similar-node-groups

Karpenter — Just-in-Time Nodes (AWS)

Karpenter เป็น Node provisioner รุ่นใหม่จาก AWS ที่ออกแบบมาทดแทน Cluster Autoscaler โดยเฉพาะ ข้อดีเหนือ Cluster Autoscaler:

คุณสมบัติ	Cluster Autoscaler	Karpenter
Provisioning Speed	นาที (ผ่าน ASG)	วินาที (เรียก EC2 API โดยตรง)
Instance Type Selection	กำหนดใน ASG (fixed)	เลือก Instance type ที่เหมาะสมที่สุดอัตโนมัติ
Spot Instance	ต้องตั้ง ASG แยก	ผสม On-demand + Spot อัตโนมัติ
Node Consolidation	ไม่มี	ย้าย Pods รวม Nodes เพื่อประหยัด
Multi-AZ	ต้องตั้ง ASG ต่อ AZ	กระจาย AZ อัตโนมัติ
Cloud Support	AWS, GCP, Azure	AWS (primary), Azure (beta)

# karpenter-nodepool.yaml
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: "karpenter.sh/capacity-type"
          operator: In
          values: ["on-demand", "spot"]  # ใช้ทั้ง On-demand และ Spot
        - key: "node.kubernetes.io/instance-type"
          operator: In
          values:
            - m5.large
            - m5.xlarge
            - m6i.large
            - m6i.xlarge
            - c5.large
            - c5.xlarge
        - key: "topology.kubernetes.io/zone"
          operator: In
          values: ["ap-southeast-1a", "ap-southeast-1b", "ap-southeast-1c"]
      nodeClassRef:
        name: default
  limits:
    cpu: "100"          # จำกัดไม่เกิน 100 CPUs ทั้งหมด
    memory: "400Gi"
  disruption:
    consolidationPolicy: WhenUnderutilized  # ย้าย Pods รวม Nodes เมื่อ idle
    expireAfter: 720h   # Recycle nodes ทุก 30 วัน

HPA + VPA ด้วย Goldilocks

Goldilocks คือเครื่องมือจาก Fairwinds ที่ช่วย Right-size resources โดยใช้ VPA recommendations:

# ติดตั้ง Goldilocks
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install goldilocks fairwinds-stable/goldilocks --namespace goldilocks --create-namespace

# Label namespace ที่ต้องการ monitor
kubectl label namespace production goldilocks.fairwinds.com/enabled=true

# เข้า Goldilocks Dashboard
kubectl port-forward -n goldilocks svc/goldilocks-dashboard 8080:80
# เปิด http://localhost:8080 จะเห็น recommendations สำหรับทุก Deployment

# Workflow ที่แนะนำ:
# 1. ติดตั้ง VPA ในโหมด "Off" (recommend only)
# 2. ติดตั้ง Goldilocks dashboard
# 3. ดู recommendations แล้วปรับ resources requests/limits ตาม
# 4. ใช้ HPA สำหรับ horizontal scaling
# ผลลัพธ์: Right-sized pods + Auto-scaling = Cost optimized

Scaling to Zero ด้วย KEDA

Scale to Zero คือความสามารถในการลด Pods เหลือ 0 เมื่อไม่มี Workload เป็นฟีเจอร์ที่ทรงพลังสำหรับ Cost optimization:

# ตัวอย่าง: Worker ที่ Scale to Zero เมื่อ Queue ว่าง
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: image-processor
spec:
  scaleTargetRef:
    name: image-processor
  idleReplicaCount: 0    # 0 pods เมื่อไม่มีงาน
  minReplicaCount: 1     # ขั้นต่ำ 1 pod เมื่อมีงาน
  maxReplicaCount: 10
  cooldownPeriod: 300    # รอ 5 นาที ก่อน scale to zero
  triggers:
    - type: aws-sqs-queue
      metadata:
        queueURL: https://sqs.ap-southeast-1.amazonaws.com/123456789/image-queue
        queueLength: "5"
        awsRegion: ap-southeast-1

# เมื่อมี Message ใน SQS:
# 0 pods → KEDA detect → Scale to 1 pod → Process messages
# Messages เพิ่ม → Scale to 2, 3, ... 10 pods
# Messages หมด → รอ 5 นาที → Scale to 0 pods
# ประหยัดเงินได้มากสำหรับ Batch/Event-driven workloads

Autoscaling Strategies

Reactive Autoscaling (ตอบสนอง)

Scale ตาม Metrics ปัจจุบัน (CPU, Memory, RPS) เมื่อ Metric ข้ามเกณฑ์ → Scale up เมื่อ Metric ลดลง → Scale down

ข้อดี: ง่าย เข้าใจง่าย ใช้ได้กับทุกสถานการณ์

ข้อเสีย: มี Lag time (ต้องรอ Pod เริ่ม + Ready ก่อน) ไม่ทันถ้า Traffic พุ่งเร็วมาก

Predictive Autoscaling (คาดการณ์)

ใช้ ML/AI วิเคราะห์ Pattern ของ Traffic ในอดีต แล้ว Scale ล่วงหน้า เช่น ถ้าทุกวันจันทร์เช้า Traffic พุ่ง ก็ Scale up ตั้งแต่ 07:30

เครื่องมือ: KEDA Cron scaler (manual), Predictive HPA (beta ใน K8s 1.30+)

Mixed Strategy (ผสม — แนะนำ)

ใช้ KEDA Cron สำหรับ Predictable patterns (ตอนเช้า, วันทำงาน) + HPA สำหรับ Unexpected spikes + Cluster Autoscaler/Karpenter สำหรับ Node-level scaling

Pod Disruption Budgets (PDB)

PDB กำหนดจำนวน Pods ขั้นต่ำที่ต้อง Available ระหว่าง Scaling/Upgrades/Maintenance เพื่อป้องกันไม่ให้ Scale down หรือ Node drain ทำให้ Service ไม่พร้อมใช้งาน:

# pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-app-pdb
spec:
  minAvailable: 2         # ต้องมีอย่างน้อย 2 pods Available เสมอ
  # หรือใช้ maxUnavailable: 1  # ลดได้ทีละ 1 pod เท่านั้น
  selector:
    matchLabels:
      app: web-app

Autoscaling Best Practices

1. ตั้ง Resources Requests เสมอ: HPA ต้องใช้ requests เป็นฐานคำนวณ ถ้าไม่ตั้ง requests HPA จะไม่ทำงาน

2. ตั้ง PDB: ป้องกัน Pods ถูก Evict พร้อมกันจนเกินไประหว่าง Scale down

3. ใช้ Pod Readiness Probes: HPA จะนับเฉพาะ Ready pods เท่านั้น ถ้า Pod ยังไม่ Ready (app กำลัง start) จะไม่ถูกนับ

4. Scale Down ช้าๆ: ตั้ง stabilizationWindowSeconds ให้ Scale down ช้ากว่า Scale up เพื่อป้องกัน Flapping (scale up/down วนไป)

5. Monitor และ Alert: ตั้ง Alert เมื่อ HPA ถึง maxReplicas (อาจต้องเพิ่ม max), เมื่อ Cluster Autoscaler ไม่สามารถเพิ่ม Node ได้ (quota limit, instance type หมด)

6. ทดสอบ Autoscaling: ใช้ Load testing tools (k6, locust, vegeta) ทดสอบว่า Autoscaling ทำงานถูกต้องก่อน Production

Testing Autoscaling

# ทดสอบ HPA ด้วย Load generator
# 1. สร้าง Load generator pod
kubectl run loadgen --image=busybox --restart=Never -- /bin/sh -c   "while true; do wget -q -O- http://web-app-service; done"

# 2. ดู HPA status
kubectl get hpa web-app-hpa -w
# NAME          REFERENCE        TARGETS   MINPODS   MAXPODS   REPLICAS
# web-app-hpa   Deployment/...   72%/70%   2         20        5

# 3. ดู Pods เพิ่มขึ้น
kubectl get pods -l app=web-app -w

# 4. ลบ Load generator
kubectl delete pod loadgen

# 5. รอดู Scale down (5-10 นาที)
kubectl get hpa web-app-hpa -w

# ทดสอบ KEDA (Kafka)
# 1. ส่ง Messages เข้า Kafka topic
kafkacat -P -b kafka:9092 -t orders < test_messages.txt

# 2. ดู KEDA scale pods
kubectl get pods -l app=order-processor -w

# ทดสอบด้วย k6 (Load testing tool)
# k6 run --vus 100 --duration 10m load-test.js

Cost Optimization ด้วย Autoscaling

Autoscaling ช่วยลดค่า Cloud ได้ 30-60%:

Strategy	ประหยัดได้	วิธีทำ
Scale to Zero (KEDA)	60-80%	Batch jobs, Event processors ที่ไม่ต้อง run ตลอด
Right-sizing (VPA/Goldilocks)	20-40%	ลด Over-provisioned resources
Spot Instances (Karpenter)	50-70%	ใช้ Spot instances สำหรับ Stateless workloads
Schedule Scaling (KEDA Cron)	30-50%	Scale down ตอนกลางคืน/วันหยุด
Node Consolidation (Karpenter)	15-25%	รวม Pods ให้น้อย Nodes ลง

สรุป

Kubernetes Autoscaling มีหลายระดับ แต่ละตัวเหมาะกับสถานการณ์ต่างกัน HPA เป็นพื้นฐานที่ทุก Production cluster ต้องมี, VPA ช่วย Right-sizing resources, KEDA เหมาะกับ Event-driven workloads และ Scale to Zero, Cluster Autoscaler/Karpenter ดูแลระดับ Node

สำหรับ Production แนะนำใช้ HPA (horizontal) + VPA ในโหมด Off (recommendations) + Karpenter หรือ Cluster Autoscaler (nodes) + PDB (ป้องกัน disruption) + KEDA สำหรับ Event-driven workloads ที่ต้อง Scale to Zero การผสมเครื่องมือเหล่านี้จะทำให้ Kubernetes cluster ของคุณ ทั้ง Reliable และ Cost-optimized อย่างแท้จริง