Rate Limiting คืออะไร? สอนป้องกัน API ด้วย Rate Limit, Throttling และ DDoS Protection 2026

ในยุคที่ API เป็นหัวใจของทุกแอปพลิเคชัน ตั้งแต่แอปมือถือ เว็บแอป ไปจนถึงระบบ IoT การป้องกัน API จากการถูกโจมตีหรือใช้งานเกินขีดจำกัดเป็นสิ่งที่นักพัฒนาทุกคนต้องให้ความสำคัญ Rate Limiting คือเทคนิคแรกที่ต้องทำเมื่อเปิด API ให้โลกภายนอกเข้าถึงได้ หากไม่มี Rate Limiting แม้แต่ผู้ใช้คนเดียวก็อาจทำให้เซิร์ฟเวอร์ล่มได้ภายในไม่กี่วินาที

บทความนี้จะอธิบายทุกอย่างเกี่ยวกับ Rate Limiting ตั้งแต่แนวคิดพื้นฐาน Algorithm ที่ใช้ การ Implement ด้วย Node.js และ Python การใช้ Redis สำหรับ Distributed System การตั้งค่า Nginx Rate Limiting ไปจนถึง DDoS Protection แบบครบวงจร เนื้อหาทั้งหมดอ้างอิงจากแนวปฏิบัติที่ดีที่สุดในปี 2026

ทำไม Rate Limiting ถึงสำคัญ?

Rate Limiting คือกระบวนการจำกัดจำนวน Request ที่ผู้ใช้หรือ Client สามารถส่งมายัง API ได้ภายในช่วงเวลาหนึ่ง เช่น อนุญาตให้ส่งได้ 100 Request ต่อนาที หากเกินกว่านั้นจะตอบกลับด้วย HTTP 429 Too Many Requests

เหตุผลหลักที่ต้องมี Rate Limiting มีดังนี้:

ป้องกัน Abuse — ผู้ใช้บางคนอาจส่ง Request มากเกินไปโดยไม่ตั้งใจ (เช่น Script ที่วนลูป) หรือตั้งใจ (เช่น Scraping ข้อมูล) Rate Limiting ช่วยป้องกันไม่ให้ผู้ใช้คนเดียวกระทบการใช้งานของคนอื่น
ป้องกัน DDoS Attack — การโจมตีแบบ Distributed Denial of Service พยายามส่ง Request จำนวนมหาศาลเพื่อทำให้เซิร์ฟเวอร์ล่ม Rate Limiting เป็นด่านแรกในการรับมือ
ควบคุมต้นทุน — API แต่ละ Request มีต้นทุน ทั้ง CPU, Memory, Database Query และ Bandwidth หากปล่อยให้ใช้งานไม่จำกัด ค่าใช้จ่ายอาจพุ่งขึ้นอย่างควบคุมไม่ได้
รับประกันคุณภาพบริการ (QoS) — ช่วยให้ทุกผู้ใช้ได้รับบริการที่ดีเท่าเทียมกัน ไม่ให้ผู้ใช้รายใดผูกขาดทรัพยากร
ป้องกัน Brute Force Attack — จำกัดจำนวนครั้งที่พยายาม Login ช่วยป้องกันการเดารหัสผ่าน
ปฏิบัติตามข้อตกลง SLA — หลายองค์กรกำหนด SLA ที่ระบุจำนวน Request ที่อนุญาตตาม Plan ที่สมัคร

Rate Limiting Algorithms อธิบายทุกแบบ

มี Algorithm หลายแบบที่ใช้ในการ Implement Rate Limiting แต่ละแบบมีข้อดีข้อเสียต่างกัน การเลือกใช้ขึ้นอยู่กับความต้องการของระบบ

1. Fixed Window Counter

เป็นวิธีที่ง่ายที่สุด แบ่งเวลาเป็นช่วงคงที่ (เช่น ทุก 1 นาที) แล้วนับจำนวน Request ในแต่ละช่วง ถ้าเกิน Limit จะปฏิเสธ

# Fixed Window Counter - Python pseudocode
import time

class FixedWindowCounter:
    def __init__(self, limit, window_seconds):
        self.limit = limit
        self.window = window_seconds
        self.counters = {}  # key -> (window_start, count)

    def is_allowed(self, key):
        now = time.time()
        window_start = int(now // self.window) * self.window

        if key not in self.counters or self.counters[key][0] != window_start:
            self.counters[key] = (window_start, 0)

        if self.counters[key][1] >= self.limit:
            return False

        self.counters[key] = (window_start, self.counters[key][1] + 1)
        return True

# ใช้งาน: 100 requests ต่อ 60 วินาที
limiter = FixedWindowCounter(limit=100, window_seconds=60)

ข้อดี: Implement ง่าย ใช้ Memory น้อย ทำงานเร็ว เหมาะกับระบบที่ไม่ต้องการความแม่นยำสูง

ข้อเสีย: มีปัญหา "Boundary Issue" คือ ถ้าส่ง Request จำนวนมากตอนท้ายของ Window หนึ่ง และต้นของ Window ถัดไป อาจได้ Traffic เป็น 2 เท่าของ Limit ในช่วงเวลาสั้นๆ ตัวอย่างเช่น ถ้า Limit 100 ต่อนาที ผู้ใช้ส่ง 100 Request ที่วินาทีที่ 59 และอีก 100 Request ที่วินาทีที่ 60 ก็เท่ากับ 200 Request ใน 2 วินาที

2. Sliding Window Log

บันทึก Timestamp ของทุก Request แล้วนับจำนวน Request ย้อนหลังไปตาม Window Size แก้ปัญหา Boundary Issue ของ Fixed Window

# Sliding Window Log - Python pseudocode
import time
from collections import defaultdict

class SlidingWindowLog:
    def __init__(self, limit, window_seconds):
        self.limit = limit
        self.window = window_seconds
        self.logs = defaultdict(list)  # key -> [timestamps]

    def is_allowed(self, key):
        now = time.time()
        cutoff = now - self.window

        # ลบ Timestamp ที่เก่าเกินไป
        self.logs[key] = [ts for ts in self.logs[key] if ts > cutoff]

        if len(self.logs[key]) >= self.limit:
            return False

        self.logs[key].append(now)
        return True

# ใช้งาน: 100 requests ต่อ 60 วินาที
limiter = SlidingWindowLog(limit=100, window_seconds=60)

ข้อดี: แม่นยำมาก ไม่มีปัญหา Boundary Issue เห็นข้อมูลชัดเจน

ข้อเสีย: ใช้ Memory มาก เพราะต้องเก็บ Timestamp ทุก Request ไม่เหมาะกับระบบที่มี Traffic สูงมาก

3. Sliding Window Counter

ผสมผสานข้อดีของ Fixed Window Counter (ประหยัด Memory) และ Sliding Window Log (แม่นยำ) โดยใช้การคำนวณน้ำหนักของ Window ปัจจุบันและ Window ก่อนหน้า

# Sliding Window Counter - Python pseudocode
import time

class SlidingWindowCounter:
    def __init__(self, limit, window_seconds):
        self.limit = limit
        self.window = window_seconds
        self.counters = {}  # key -> {window: count}

    def is_allowed(self, key):
        now = time.time()
        current_window = int(now // self.window) * self.window
        previous_window = current_window - self.window

        if key not in self.counters:
            self.counters[key] = {}

        prev_count = self.counters[key].get(previous_window, 0)
        curr_count = self.counters[key].get(current_window, 0)

        # คำนวณน้ำหนัก: เวลาที่ผ่านไปของ window ปัจจุบัน
        elapsed = now - current_window
        weight = 1 - (elapsed / self.window)
        estimated = prev_count * weight + curr_count

        if estimated >= self.limit:
            return False

        self.counters[key][current_window] = curr_count + 1
        return True

ข้อดี: ใช้ Memory น้อย แม่นยำพอสมควร เป็นตัวเลือกยอดนิยมในระบบ Production

ข้อเสีย: ค่าที่ได้เป็นการประมาณ ไม่แม่นยำ 100% แต่ในทางปฏิบัติเพียงพอสำหรับงานส่วนใหญ่

4. Token Bucket Algorithm

เปรียบเสมือนถังที่เติม Token เข้าไปอย่างสม่ำเสมอ แต่ละ Request ใช้ 1 Token ถ้า Token หมดก็ต้องรอ อนุญาตให้มี Burst ได้ตามจำนวน Token ที่สะสม

# Token Bucket - Python pseudocode
import time

class TokenBucket:
    def __init__(self, capacity, refill_rate):
        self.capacity = capacity        # จำนวน Token สูงสุด
        self.refill_rate = refill_rate  # Token ที่เติมต่อวินาที
        self.buckets = {}              # key -> (tokens, last_refill)

    def is_allowed(self, key, tokens_needed=1):
        now = time.time()

        if key not in self.buckets:
            self.buckets[key] = (self.capacity, now)

        current_tokens, last_refill = self.buckets[key]

        # เติม Token ตามเวลาที่ผ่านไป
        elapsed = now - last_refill
        new_tokens = min(self.capacity, current_tokens + elapsed * self.refill_rate)

        if new_tokens < tokens_needed:
            return False

        self.buckets[key] = (new_tokens - tokens_needed, now)
        return True

# ใช้งาน: ถังจุ 100 Token, เติม 10 Token ต่อวินาที
limiter = TokenBucket(capacity=100, refill_rate=10)

ข้อดี: รองรับ Burst Traffic ได้ดี ใช้ Memory น้อย เป็นที่นิยมมากในระบบจริง Amazon API Gateway ใช้ Algorithm นี้

ข้อเสีย: การตั้งค่า Capacity และ Refill Rate ต้องคิดดีๆ ไม่งั้นอาจทำให้ Burst มากเกินไปหรือรัดเกินไป

5. Leaky Bucket Algorithm

เปรียบเสมือนถังที่มีรูรั่ว Request เข้ามาเติมในถัง และจะถูกประมวลผลในอัตราคงที่ (Leak Rate) ถ้าถังเต็มก็ปฏิเสธ Request ใหม่ ทำให้ Output Rate คงที่สม่ำเสมอ

# Leaky Bucket - Python pseudocode
import time
from collections import deque

class LeakyBucket:
    def __init__(self, capacity, leak_rate):
        self.capacity = capacity      # ขนาดถัง
        self.leak_rate = leak_rate    # จำนวนที่ปล่อยต่อวินาที
        self.buckets = {}            # key -> (queue_size, last_leak)

    def is_allowed(self, key):
        now = time.time()

        if key not in self.buckets:
            self.buckets[key] = (0, now)

        queue_size, last_leak = self.buckets[key]

        # ลด Queue ตามเวลาที่ผ่านไป
        elapsed = now - last_leak
        leaked = int(elapsed * self.leak_rate)
        queue_size = max(0, queue_size - leaked)

        if leaked > 0:
            last_leak = now

        if queue_size >= self.capacity:
            return False

        self.buckets[key] = (queue_size + 1, last_leak)
        return True

# ใช้งาน: ถังจุ 50, ปล่อย 5 ต่อวินาที
limiter = LeakyBucket(capacity=50, leak_rate=5)

ข้อดี: Output Rate คงที่สม่ำเสมอ เหมาะกับระบบที่ต้องการควบคุม Processing Rate อย่างเข้มงวด

ข้อเสีย: ไม่รองรับ Burst เลย ทำให้ประสบการณ์ผู้ใช้อาจไม่ดีในบางสถานการณ์

เปรียบเทียบ Algorithm ทั้ง 5 แบบ

Algorithm	Memory	ความแม่นยำ	Burst	ความยาก	ใช้บ่อย
Fixed Window	น้อย	ต่ำ	มี Boundary	ง่าย	ระบบเล็ก
Sliding Log	มาก	สูงมาก	ไม่มี	กลาง	ระบบเล็ก
Sliding Counter	น้อย	ดี	เล็กน้อย	กลาง	Cloudflare
Token Bucket	น้อย	ดี	รองรับ	กลาง	AWS, Stripe
Leaky Bucket	น้อย	ดี	ไม่รองรับ	กลาง	Nginx

Implement Rate Limiting ด้วย Node.js

สำหรับ Node.js กับ Express สามารถใช้ Library express-rate-limit ที่เป็นมาตรฐานและใช้งานง่ายมาก

// ติดตั้ง
// npm install express-rate-limit

const express = require('express');
const rateLimit = require('express-rate-limit');
const app = express();

// Rate limiter พื้นฐาน
const generalLimiter = rateLimit({
  windowMs: 15 * 60 * 1000,  // 15 นาที
  max: 100,                   // 100 requests ต่อ window
  standardHeaders: true,      // ส่ง RateLimit-* headers
  legacyHeaders: false,       // ปิด X-RateLimit-* headers เก่า
  message: {
    status: 429,
    error: 'Too Many Requests',
    message: 'กรุณารอสักครู่แล้วลองใหม่อีกครั้ง',
    retryAfter: '15 minutes'
  }
});

// Rate limiter สำหรับ Login (เข้มงวดกว่า)
const loginLimiter = rateLimit({
  windowMs: 15 * 60 * 1000,
  max: 5,    // 5 ครั้งต่อ 15 นาที
  message: {
    status: 429,
    error: 'Too many login attempts',
    message: 'ล็อกอินผิดหลายครั้งเกินไป กรุณารอ 15 นาที'
  },
  skipSuccessfulRequests: true  // ไม่นับ Request ที่สำเร็จ
});

// Rate limiter สำหรับ API Key (per key)
const apiKeyLimiter = rateLimit({
  windowMs: 60 * 1000,  // 1 นาที
  max: 60,
  keyGenerator: (req) => {
    return req.headers['x-api-key'] || req.ip;
  },
  handler: (req, res) => {
    res.status(429).json({
      error: 'Rate limit exceeded',
      retryAfter: Math.ceil(req.rateLimit.resetTime / 1000)
    });
  }
});

// ใช้งาน
app.use(generalLimiter);
app.use('/api/auth/login', loginLimiter);
app.use('/api/v1', apiKeyLimiter);

app.get('/api/data', (req, res) => {
  res.json({ message: 'Success', data: [] });
});

app.listen(3000);

เคล็ดลับ: ใช้ skipSuccessfulRequests: true สำหรับ Login Endpoint เพื่อไม่นับ Request ที่ Login สำเร็จ ผู้ใช้จริงจะไม่ถูกจำกัด เฉพาะคนที่พยายาม Brute Force เท่านั้นที่จะโดน

Implement Rate Limiting ด้วย Python

FastAPI + SlowApi

# pip install slowapi

from fastapi import FastAPI, Request
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address
from slowapi.errors import RateLimitExceeded

limiter = Limiter(key_func=get_remote_address)
app = FastAPI()
app.state.limiter = limiter
app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)

@app.get("/api/data")
@limiter.limit("100/minute")
async def get_data(request: Request):
    return {"message": "Success", "data": []}

@app.post("/api/auth/login")
@limiter.limit("5/15minutes")
async def login(request: Request):
    return {"token": "jwt_token_here"}

# Rate limit ต่าง key function
def get_api_key(request: Request):
    return request.headers.get("X-API-Key", get_remote_address(request))

@app.get("/api/premium")
@limiter.limit("1000/hour", key_func=get_api_key)
async def premium_endpoint(request: Request):
    return {"premium": True}

Flask + Flask-Limiter

# pip install Flask-Limiter

from flask import Flask
from flask_limiter import Limiter
from flask_limiter.util import get_remote_address

app = Flask(__name__)
limiter = Limiter(
    get_remote_address,
    app=app,
    default_limits=["200 per day", "50 per hour"],
    storage_uri="redis://localhost:6379"  # ใช้ Redis
)

@app.route("/api/data")
@limiter.limit("100/minute")
def get_data():
    return {"message": "Success"}

@app.route("/api/upload", methods=["POST"])
@limiter.limit("10/hour")
def upload_file():
    return {"uploaded": True}

# Exempt จาก Rate Limit
@app.route("/health")
@limiter.exempt
def health_check():
    return {"status": "ok"}

Redis-based Distributed Rate Limiting

ในระบบ Production ที่มีหลายเซิร์ฟเวอร์ การเก็บ Counter ไว้ใน Memory ของเซิร์ฟเวอร์แต่ละตัวไม่ได้ผล เพราะผู้ใช้อาจถูก Load Balance ไปคนละเครื่อง ทำให้ Limit ไม่แม่นยำ ต้องใช้ Redis เป็น Central Store เพื่อให้ทุกเซิร์ฟเวอร์อ่านเขียน Counter ร่วมกัน

# Redis Token Bucket Implementation
import redis
import time

r = redis.Redis(host='localhost', port=6379, db=0)

# Lua Script สำหรับ Atomic Token Bucket (ป้องกัน Race Condition)
TOKEN_BUCKET_SCRIPT = """
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local requested = tonumber(ARGV[4])

local bucket = redis.call('hmget', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1])
local last_refill = tonumber(bucket[2])

if tokens == nil then
    tokens = capacity
    last_refill = now
end

local elapsed = now - last_refill
local new_tokens = math.min(capacity, tokens + elapsed * refill_rate)

if new_tokens >= requested then
    new_tokens = new_tokens - requested
    redis.call('hmset', key, 'tokens', new_tokens, 'last_refill', now)
    redis.call('expire', key, math.ceil(capacity / refill_rate) * 2)
    return 1
else
    redis.call('hmset', key, 'tokens', new_tokens, 'last_refill', now)
    redis.call('expire', key, math.ceil(capacity / refill_rate) * 2)
    return 0
end
"""

# ลงทะเบียน Lua Script
token_bucket_sha = r.register_script(TOKEN_BUCKET_SCRIPT)

def check_rate_limit(user_id, capacity=100, refill_rate=10):
    key = f"rate_limit:{user_id}"
    now = time.time()
    result = token_bucket_sha(
        keys=[key],
        args=[capacity, refill_rate, now, 1]
    )
    return bool(result)

# ใช้งาน
if check_rate_limit("user_123"):
    print("Request allowed")
else:
    print("Rate limit exceeded")

ทำไมต้องใช้ Lua Script? เพราะ Redis execute Lua Script แบบ Atomic ทำให้ไม่มี Race Condition แม้จะมีหลาย Server เรียกพร้อมกัน ถ้าใช้ GET แล้ว SET แยกกัน อาจเกิดปัญหาผู้ใช้ส่ง Request พร้อมกันหลาย Request แล้วทุก Request เห็น Token เหลือพอ

Redis Sliding Window Counter

# Redis Sliding Window Counter ด้วย Sorted Set
import redis
import time
import uuid

r = redis.Redis(host='localhost', port=6379, db=0)

SLIDING_WINDOW_SCRIPT = """
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
local member = ARGV[4]

-- ลบ Request ที่เก่าเกินไป
redis.call('zremrangebyscore', key, 0, now - window)

-- นับ Request ปัจจุบัน
local count = redis.call('zcard', key)

if count < limit then
    redis.call('zadd', key, now, member)
    redis.call('expire', key, window)
    return 1
else
    return 0
end
"""

sliding_sha = r.register_script(SLIDING_WINDOW_SCRIPT)

def check_sliding_window(user_id, limit=100, window=60):
    key = f"sliding:{user_id}"
    now = time.time()
    member = str(uuid.uuid4())
    result = sliding_sha(
        keys=[key],
        args=[now, window, limit, member]
    )
    return bool(result)

Nginx Rate Limiting

Nginx มีโมดูล ngx_http_limit_req_module ในตัว ใช้ Leaky Bucket Algorithm สามารถตั้งค่า Rate Limiting ได้โดยไม่ต้องเขียน Code ในแอปพลิเคชัน เป็นด่านแรกก่อนที่ Request จะเข้าถึงแอป

# nginx.conf

http {
    # กำหนด Zone สำหรับ Rate Limiting
    # $binary_remote_addr ใช้ IP address เป็น key (ใช้ memory น้อยกว่า $remote_addr)
    # zone=api:10m คือชื่อ zone "api" ใช้ memory 10MB (~160,000 IPs)
    # rate=10r/s คือ 10 requests ต่อวินาที
    limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;

    # Zone สำหรับ Login
    limit_req_zone $binary_remote_addr zone=login:5m rate=1r/s;

    # Zone ตาม API Key
    map $http_x_api_key $api_key {
        default $binary_remote_addr;
        "~.+" $http_x_api_key;
    }
    limit_req_zone $api_key zone=api_key:10m rate=30r/s;

    # Custom error page สำหรับ 429
    limit_req_status 429;

    server {
        listen 80;
        server_name api.example.com;

        # API endpoints
        location /api/ {
            # burst=20 อนุญาต Burst 20 requests
            # nodelay ประมวลผล Burst ทันทีไม่ต้องรอ
            limit_req zone=api burst=20 nodelay;

            proxy_pass http://backend;
        }

        # Login endpoint (เข้มงวดกว่า)
        location /api/auth/login {
            limit_req zone=login burst=5 nodelay;
            proxy_pass http://backend;
        }

        # Premium API
        location /api/v2/ {
            limit_req zone=api_key burst=50 nodelay;
            proxy_pass http://backend;
        }

        # Health check ไม่ต้อง Rate Limit
        location /health {
            proxy_pass http://backend;
        }

        # Custom 429 response
        error_page 429 = @rate_limited;
        location @rate_limited {
            default_type application/json;
            return 429 '{"error":"Too Many Requests","retry_after":60}';
        }
    }
}

burst vs nodelay: burst=20 อนุญาตให้ Request เกิน Rate ได้ 20 ตัว โดยจะ Queue ไว้ เพิ่ม nodelay เพื่อประมวลผล Burst ทันทีแทนที่จะชะลอ ถ้าไม่ใส่ nodelay Request ที่เกิน Rate จะถูก Delay ให้เป็นไปตาม Rate ที่กำหนด

API Gateway Rate Limiting

Kong Gateway

Kong เป็น API Gateway ยอดนิยมที่มี Rate Limiting Plugin ในตัว รองรับทั้ง Local และ Redis-based Distributed Rate Limiting

# เปิดใช้ Rate Limiting Plugin ของ Kong
curl -X POST http://localhost:8001/services/my-api/plugins \
  --data "name=rate-limiting" \
  --data "config.minute=100" \
  --data "config.hour=1000" \
  --data "config.policy=redis" \
  --data "config.redis_host=redis" \
  --data "config.redis_port=6379" \
  --data "config.limit_by=consumer" \
  --data "config.fault_tolerant=true"

# Rate Limiting ตาม Consumer Group (Tiered)
# Free Tier: 100/hour
curl -X POST http://localhost:8001/consumer_groups/free/plugins \
  --data "name=rate-limiting-advanced" \
  --data "config.limit=[100]" \
  --data "config.window_size=[3600]"

# Pro Tier: 10,000/hour
curl -X POST http://localhost:8001/consumer_groups/pro/plugins \
  --data "name=rate-limiting-advanced" \
  --data "config.limit=[10000]" \
  --data "config.window_size=[3600]"

AWS API Gateway

# AWS API Gateway Throttling
# ตั้งค่าผ่าน AWS CLI

# Account Level: 10,000 requests/second, burst 5,000
aws apigateway update-account \
  --patch-operations op=replace,path=/throttle/rateLimit,value=10000 \
  op=replace,path=/throttle/burstLimit,value=5000

# Stage Level
aws apigateway update-stage \
  --rest-api-id abc123 \
  --stage-name prod \
  --patch-operations \
    op=replace,path=/*/throttling/rateLimit,value=1000 \
    op=replace,path=/*/throttling/burstLimit,value=500

# Usage Plan สำหรับ API Key
aws apigateway create-usage-plan \
  --name "Basic Plan" \
  --throttle burstLimit=50,rateLimit=100 \
  --quota limit=10000,period=MONTH

Rate Limiting Headers

เมื่อ Implement Rate Limiting ควรส่ง HTTP Headers กลับไปให้ Client ทราบสถานะ มีทั้ง Standard (RFC 6585, RFC 7231) และ De-facto Standard ที่นิยมใช้กัน

Header	ความหมาย	ตัวอย่าง
`X-RateLimit-Limit`	จำนวน Request สูงสุดต่อ Window	100
`X-RateLimit-Remaining`	จำนวน Request ที่เหลือ	45
`X-RateLimit-Reset`	เวลาที่ Counter จะ Reset (Unix timestamp)	1717200000
`Retry-After`	จำนวนวินาทีที่ต้องรอก่อนลองใหม่	30
`RateLimit-Policy`	นโยบาย Rate Limit (IETF draft)	100;w=60

// Express Middleware สำหรับ Rate Limit Headers
function rateLimitHeaders(req, res, next) {
    const limit = 100;
    const remaining = Math.max(0, limit - req.rateUsage);
    const resetTime = Math.ceil(Date.now() / 60000) * 60;

    res.set({
        'X-RateLimit-Limit': limit,
        'X-RateLimit-Remaining': remaining,
        'X-RateLimit-Reset': resetTime,
        'RateLimit-Policy': `${limit};w=60`
    });

    if (remaining === 0) {
        res.set('Retry-After', 60);
        return res.status(429).json({
            error: 'Rate limit exceeded',
            retryAfter: 60,
            limit: limit,
            resetAt: new Date(resetTime * 1000).toISOString()
        });
    }

    next();
}

Rate Limiting Strategies

การเลือก Strategy ที่เหมาะสมเป็นสิ่งสำคัญมาก ขึ้นอยู่กับลักษณะของ API และผู้ใช้

1. Per IP Address

เหมาะกับ Public API ที่ไม่ต้อง Authentication แต่มีข้อจำกัดคือ ผู้ใช้หลายคนอาจใช้ IP เดียวกัน (เช่น อยู่หลัง NAT หรือ Corporate Proxy) และผู้โจมตีอาจใช้หลาย IP (Rotating Proxy)

2. Per User (Authenticated)

เหมาะกับ API ที่ต้อง Login ทำให้ Rate Limit แม่นยำกว่า Per IP เพราะผูกกับตัวตนผู้ใช้จริง ไม่ว่าจะเปลี่ยน IP กี่ครั้งก็ตาม

3. Per API Key

เหมาะกับ B2B API ที่ให้บริการแก่ Developer หรือองค์กร ทำให้สามารถกำหนด Limit ต่าง Plan ได้ เช่น Free 100/ชม. Pro 10,000/ชม. Enterprise 100,000/ชม.

4. Tiered Rate Limiting

กำหนด Rate Limit หลายระดับตาม Plan ของผู้ใช้ เป็นวิธีที่นิยมที่สุดในบริการ SaaS สมัยใหม่

# Tiered Rate Limiting Configuration
RATE_LIMITS = {
    "free": {
        "requests_per_minute": 20,
        "requests_per_hour": 500,
        "requests_per_day": 5000,
        "burst": 5
    },
    "pro": {
        "requests_per_minute": 200,
        "requests_per_hour": 10000,
        "requests_per_day": 100000,
        "burst": 50
    },
    "enterprise": {
        "requests_per_minute": 2000,
        "requests_per_hour": 100000,
        "requests_per_day": 1000000,
        "burst": 500
    }
}

API Throttling vs Rate Limiting

หลายคนสับสนระหว่าง Throttling กับ Rate Limiting แม้ทั้งสองจะเกี่ยวข้องกับการจำกัด Request แต่มีความแตกต่างสำคัญ

ด้าน	Rate Limiting	Throttling
การทำงาน	ปฏิเสธ Request ที่เกิน Limit (429)	ชะลอ Request โดย Delay หรือ Queue
ผลลัพธ์	Request ถูก Reject ทันที	Request ถูกประมวลผลแต่ช้าลง
ประสบการณ์ผู้ใช้	ได้ Error ต้อง Retry	ช้าลงแต่ได้ผลลัพธ์
ใช้เมื่อ	ต้องการปกป้องระบบอย่างเข้มงวด	ต้องการให้ผู้ใช้ได้ผลลัพธ์เสมอ

ในทางปฏิบัติ ระบบที่ดีมักใช้ทั้งสองร่วมกัน เริ่มจาก Throttling ก่อน (ชะลอ) และถ้ายังมากเกินไปจึงใช้ Rate Limiting (ปฏิเสธ) แนวคิดนี้เรียกว่า Graceful Degradation ช่วยให้ผู้ใช้ได้รับประสบการณ์ที่ดีขึ้น ไม่ถูก Reject ทันทีที่เกิน Limit เล็กน้อย

DDoS Protection แบบครบวงจร

Rate Limiting เป็นเพียงส่วนหนึ่งของการป้องกัน DDoS Attack ระบบที่ดีต้องมีหลายชั้นป้องกัน ทำงานร่วมกันแบบ Defense in Depth

Layer 1: CDN / Edge Protection

Cloudflare เป็นตัวเลือกยอดนิยมที่ให้บริการ DDoS Protection ในระดับ Network (L3/L4) และ Application (L7) มี Rate Limiting Rule ที่ตั้งค่าได้ง่ายผ่าน Dashboard หรือ API รองรับ Bot Detection และ Challenge (CAPTCHA) อัตโนมัติ

# Cloudflare Rate Limiting Rule (API)
curl -X POST "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/rulesets" \
  -H "Authorization: Bearer $API_TOKEN" \
  -H "Content-Type: application/json" \
  --data '{
    "name": "API Rate Limit",
    "kind": "zone",
    "phase": "http_ratelimit",
    "rules": [
      {
        "action": "block",
        "ratelimit": {
          "characteristics": ["ip.src"],
          "period": 60,
          "requests_per_period": 100,
          "mitigation_timeout": 600
        },
        "expression": "(http.request.uri.path contains "/api/")"
      }
    ]
  }'

Layer 2: AWS Shield & WAF

AWS Shield Standard ให้บริการฟรีสำหรับ AWS Resources ทุกตัว ป้องกัน DDoS ระดับ L3/L4 ส่วน AWS Shield Advanced ให้การป้องกันระดับสูงขึ้นพร้อม DDoS Response Team (DRT) และ Cost Protection

AWS WAF (Web Application Firewall) ทำงานร่วมกับ CloudFront, ALB, API Gateway ตั้งค่า Rule ได้ละเอียด เช่น Rate-based Rule, IP Set Rule, Geographic Match Rule

# AWS WAF Rate-based Rule
aws wafv2 create-rule-group \
  --name "api-rate-limit" \
  --scope REGIONAL \
  --capacity 100 \
  --rules '[
    {
      "Name": "RateLimit100",
      "Priority": 1,
      "Action": {"Block": {}},
      "Statement": {
        "RateBasedStatement": {
          "Limit": 100,
          "AggregateKeyType": "IP"
        }
      },
      "VisibilityConfig": {
        "SampledRequestsEnabled": true,
        "CloudWatchMetricsEnabled": true,
        "MetricName": "RateLimit100"
      }
    }
  ]'

Layer 3: Application Level

นี่คือ Rate Limiting ในแอปพลิเคชันที่เราพูดถึงข้างต้น ทำงานเป็นด่านสุดท้าย ใช้ Logic เฉพาะทาง เช่น Rate Limit ตาม User Plan หรือตาม API Endpoint

Bot Detection และ CAPTCHA Integration

นอกจาก Rate Limiting แล้ว การตรวจจับ Bot เป็นอีกวิธีสำคัญในการป้องกัน API จากการใช้งานที่ไม่เหมาะสม วิธีตรวจจับ Bot มีหลายแบบ

User-Agent Analysis — ตรวจสอบ User-Agent ว่าเป็น Browser จริงหรือ Bot แต่ปลอมแปลงได้ง่าย จึงใช้เป็นแค่ปัจจัยหนึ่ง
Behavioral Analysis — วิเคราะห์พฤติกรรม เช่น ความเร็วในการเลื่อนเมาส์ Pattern การคลิก เวลาที่ใช้ในแต่ละหน้า Bot มักมีพฤติกรรมที่ผิดปกติ
JavaScript Challenge — ส่ง JavaScript ให้ Client ทำ Bot ธรรมดาที่ไม่มี JS Engine จะทำไม่ได้ แต่ Bot ขั้นสูง เช่น Puppeteer สามารถทำได้
CAPTCHA — ใช้ reCAPTCHA, hCaptcha หรือ Cloudflare Turnstile เป็นด่านท้ายเมื่อสงสัยว่าเป็น Bot ควรใช้เฉพาะเมื่อจำเป็น เพราะกระทบ UX
Fingerprinting — สร้าง Fingerprint จากข้อมูลของ Browser เช่น Screen Resolution, Installed Fonts, WebGL Renderer ช่วยระบุตัวตนแม้เปลี่ยน IP

// reCAPTCHA v3 Integration (ไม่ต้องให้ User คลิก)
// Frontend
async function submitForm() {
  const token = await grecaptcha.execute('SITE_KEY', {action: 'login'});

  const response = await fetch('/api/auth/login', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'X-Recaptcha-Token': token
    },
    body: JSON.stringify({ email, password })
  });
}

// Backend (Node.js)
async function verifyRecaptcha(token) {
  const response = await fetch(
    `https://www.google.com/recaptcha/api/siteverify?secret=${SECRET_KEY}&response=${token}`,
    { method: 'POST' }
  );
  const data = await response.json();
  return data.success && data.score >= 0.5;  // score 0.0 = bot, 1.0 = human
}

API Key Management

การจัดการ API Key อย่างเป็นระบบเป็นส่วนสำคัญของ API Security ช่วยให้สามารถติดตาม ควบคุม และเพิกถอนสิทธิ์การเข้าถึง API ได้

สร้าง Key ที่ปลอดภัย — ใช้ Cryptographically Secure Random Generator ความยาวอย่างน้อย 32 bytes (256 bits)
Hash Key ก่อนเก็บ — เก็บเฉพาะ Hash ของ Key ในฐานข้อมูล ไม่เก็บ Key จริง เหมือนการเก็บรหัสผ่าน
กำหนดวันหมดอายุ — Key ควรมีวันหมดอายุ ป้องกันกรณี Key รั่วไหลแล้วถูกใช้ไปเรื่อยๆ
กำหนด Scope — แต่ละ Key ควรมีสิทธิ์เฉพาะที่จำเป็น ไม่ให้สิทธิ์เกินไป (Principle of Least Privilege)
Rotation Policy — บังคับให้เปลี่ยน Key ทุก 90 วัน หรือหลังเกิดเหตุการณ์ด้านความปลอดภัย
Audit Log — บันทึกการใช้งาน Key ทุกครั้ง เพื่อตรวจสอบย้อนหลังและตรวจจับพฤติกรรมผิดปกติ

import hashlib, secrets, datetime

def generate_api_key():
    """สร้าง API Key ที่ปลอดภัย"""
    prefix = "sk_live_"  # prefix ช่วยให้ระบุประเภท Key ได้
    key = secrets.token_urlsafe(32)
    full_key = f"{prefix}{key}"
    key_hash = hashlib.sha256(full_key.encode()).hexdigest()
    return full_key, key_hash

def store_api_key(user_id, key_hash, scopes, expires_days=90):
    """เก็บ Hash ของ Key ในฐานข้อมูล"""
    return {
        "user_id": user_id,
        "key_hash": key_hash,
        "scopes": scopes,
        "created_at": datetime.datetime.utcnow(),
        "expires_at": datetime.datetime.utcnow() + datetime.timedelta(days=expires_days),
        "is_active": True,
        "last_used": None
    }

Monitoring และ Alerting on Rate Limits

การ Monitor Rate Limiting เป็นสิ่งสำคัญเพื่อให้รู้ว่า Limit ที่ตั้งไว้เหมาะสมหรือไม่ และตรวจจับการโจมตีได้ทัน Metric สำคัญที่ต้องเก็บมีดังนี้

Total Requests — จำนวน Request ทั้งหมด แยกตาม Endpoint, Method, Status Code
Rate Limited Requests (429) — จำนวน Request ที่ถูก Rate Limit ถ้าเยอะเกินไปอาจหมายความว่า Limit ต่ำเกินไป หรือกำลังถูกโจมตี
Top Rate Limited Users/IPs — ใครที่ถูก Rate Limit บ่อยที่สุด อาจเป็น Bot หรือผู้ใช้ที่ต้องการ Plan ที่สูงขึ้น
Request Latency — ถ้า Latency สูงขึ้นอาจหมายความว่าระบบกำลังโหลดมาก ต้องปรับ Rate Limit
Error Rate — อัตราส่วน Request ที่ Error (5xx) ต่อ Request ทั้งหมด

# Prometheus Metrics สำหรับ Rate Limiting (Python)
from prometheus_client import Counter, Histogram, Gauge

# Counter สำหรับ Request ทั้งหมด
http_requests_total = Counter(
    'http_requests_total',
    'Total HTTP requests',
    ['method', 'endpoint', 'status']
)

# Counter สำหรับ Rate Limited Requests
rate_limited_total = Counter(
    'rate_limited_requests_total',
    'Total rate limited requests',
    ['endpoint', 'limit_type']
)

# Gauge สำหรับ Current Usage
rate_limit_usage = Gauge(
    'rate_limit_usage_ratio',
    'Current rate limit usage ratio',
    ['user_id', 'plan']
)

# ตั้ง Alert ใน Prometheus
# alert.rules.yml
# groups:
#   - name: rate_limiting
#     rules:
#       - alert: HighRateLimitRate
#         expr: rate(rate_limited_requests_total[5m]) > 100
#         for: 5m
#         labels:
#           severity: warning
#         annotations:
#           summary: "High rate limiting detected"

การออกแบบ Rate Limit Policy

การออกแบบ Policy ที่ดีต้องคำนึงถึงหลายปัจจัย ไม่ใช่แค่กำหนดตัวเลขสุ่มๆ แนวทางมีดังนี้

วิเคราะห์ Traffic Pattern — ดูข้อมูลย้อนหลัง ค่า p50 p95 p99 ของ Request ต่อ User เพื่อกำหนด Limit ที่สมเหตุสมผล
แยก Limit ตาม Endpoint — Endpoint ที่ใช้ทรัพยากรน้อย เช่น GET /status ควรมี Limit สูงกว่า Endpoint ที่หนัก เช่น POST /report/generate
ใช้หลาย Window — กำหนดทั้ง Per-second, Per-minute, Per-hour, Per-day เพื่อป้องกันทั้ง Burst และ Sustained abuse
Whitelist Internal Services — Service ภายในไม่ควรถูก Rate Limit เหมือน External Client
Graceful Degradation — เมื่อ Load สูง ค่อยๆ ลด Limit แทนที่จะตัดขาดทันที
Communicate ให้ชัดเจน — เขียน Documentation ที่อธิบาย Rate Limit ของแต่ละ Plan ให้ Developer เข้าใจ

Client-side Handling of 429 Responses

ฝั่ง Client ก็ต้องจัดการ 429 Response อย่างเหมาะสม ไม่ใช่แค่ Retry ทันที เพราะจะยิ่งทำให้ปัญหาแย่ลง

// JavaScript: Exponential Backoff with Jitter
async function fetchWithRetry(url, options = {}, maxRetries = 3) {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      const response = await fetch(url, options);

      if (response.status === 429) {
        if (attempt === maxRetries) {
          throw new Error('Rate limit exceeded after max retries');
        }

        // อ่าน Retry-After header
        const retryAfter = response.headers.get('Retry-After');
        let waitTime;

        if (retryAfter) {
          waitTime = parseInt(retryAfter) * 1000;
        } else {
          // Exponential Backoff with Jitter
          const baseDelay = 1000;  // 1 second
          const maxDelay = 30000;  // 30 seconds
          waitTime = Math.min(
            maxDelay,
            baseDelay * Math.pow(2, attempt) + Math.random() * 1000
          );
        }

        console.log(`Rate limited. Waiting ${waitTime}ms before retry...`);
        await new Promise(resolve => setTimeout(resolve, waitTime));
        continue;
      }

      if (!response.ok) {
        throw new Error(`HTTP ${response.status}`);
      }

      return response;
    } catch (error) {
      if (attempt === maxRetries) throw error;
    }
  }
}

// ใช้งาน
const data = await fetchWithRetry('/api/data');
const json = await data.json();

# Python: httpx with Retry
import httpx
from tenacity import retry, wait_exponential, retry_if_result

def is_rate_limited(response):
    return response.status_code == 429

@retry(
    retry=retry_if_result(is_rate_limited),
    wait=wait_exponential(multiplier=1, min=1, max=60)
)
def fetch_data(url):
    response = httpx.get(url, headers={"X-API-Key": "your_key"})
    return response

# ใช้งาน
result = fetch_data("https://api.example.com/data")

Jitter คืออะไร? Jitter คือการเพิ่มเวลาสุ่มเล็กน้อยใน Retry Delay เพื่อป้องกัน "Thundering Herd Problem" หากไม่มี Jitter Client หลายตัวที่ถูก Rate Limit พร้อมกันจะ Retry พร้อมกัน ทำให้เกิด Burst อีกครั้ง

Best Practices สรุปรวม

ใช้หลายชั้นป้องกัน — CDN (Cloudflare) + Nginx Rate Limit + Application Rate Limit + Database Connection Limit
เลือก Algorithm ให้เหมาะ — Token Bucket สำหรับ API ทั่วไป Leaky Bucket สำหรับระบบที่ต้องการ Consistent Rate
ใช้ Redis สำหรับ Distributed System — อย่าเก็บ Counter ไว้ใน Memory ของ App Server
ส่ง Rate Limit Headers ทุกครั้ง — ช่วยให้ Client จัดการ 429 ได้อย่างเหมาะสม
Monitor อย่างต่อเนื่อง — ตั้ง Alert สำหรับ 429 Rate ที่สูงผิดปกติ
Document ให้ชัดเจน — เขียนในเอกสาร API ว่าแต่ละ Plan มี Limit เท่าไหร่
Test Rate Limiting — ใช้ Load Testing Tool เช่น k6 หรือ wrk ทดสอบว่า Rate Limit ทำงานถูกต้อง
อย่า Rate Limit Health Check — Endpoint สำหรับ Monitoring ไม่ควรถูก Rate Limit
Log Rate Limit Events — เก็บ Log ว่าใครถูก Rate Limit เมื่อไหร่ เพื่อวิเคราะห์ย้อนหลัง
ปรับ Limit ตาม Feedback — ถ้าผู้ใช้จริงถูก Rate Limit บ่อย อาจต้องเพิ่ม Limit ให้สูงขึ้น

สรุป

Rate Limiting เป็นด่านแรกและสำคัญที่สุดในการป้องกัน API ทุก API ที่เปิดให้ภายนอกเข้าถึงต้องมี Rate Limiting ไม่ว่าจะเป็น Internal API ก็ควรมีเพื่อป้องกัน Cascading Failure การเลือก Algorithm ที่เหมาะสม ไม่ว่าจะเป็น Token Bucket สำหรับความยืดหยุ่น หรือ Sliding Window Counter สำหรับความแม่นยำ ขึ้นอยู่กับลักษณะของระบบ สิ่งสำคัญคือต้องมีหลายชั้นป้องกัน ตั้งแต่ CDN Level, Nginx Level ไปจนถึง Application Level และต้อง Monitor อย่างต่อเนื่องเพื่อปรับ Policy ให้เหมาะสมกับ Traffic จริง

เริ่มต้นวันนี้ด้วยการเพิ่ม Rate Limiting ให้ API ของคุณ แม้จะเป็นโปรเจกต์เล็กๆ ก็ตาม เพราะเมื่อ API โดน Abuse แล้ว การเพิ่ม Rate Limiting ย้อนหลังมักจะยากและมีผลกระทบต่อผู้ใช้จริง การวางระบบตั้งแต่แรกจะง่ายและปลอดภัยกว่ามาก