Zero Downtime Deployment
Better Uptime Monitoring Zero Downtime Blue-Green Canary Rolling Update Load Balancer Health Check Status Page Incident Management On-call
| Strategy | Downtime | Rollback | Risk | Cost |
|---|---|---|---|---|
| Blue-Green | 0 | ทันที (สลับ) | ต่ำ | สูง (2x Infra) |
| Canary | 0 | เร็ว (ลด Traffic) | ต่ำมาก | ปานกลาง |
| Rolling | 0 | ช้า (Rollback ทีละ Pod) | ปานกลาง | ต่ำ |
| Recreate | มี | Deploy ใหม่ | สูง | ต่ำ |
Deployment Strategies
# === Zero Downtime Deployment Strategies ===
# Kubernetes Blue-Green Deployment
# apiVersion: v1
# kind: Service
# metadata:
# name: my-app
# spec:
# selector:
# app: my-app
# version: green # Switch between blue/green
# ports:
# - port: 80
# targetPort: 8080
#
# ---
# apiVersion: apps/v1
# kind: Deployment
# metadata:
# name: my-app-green
# spec:
# replicas: 3
# selector:
# matchLabels:
# app: my-app
# version: green
# template:
# metadata:
# labels:
# app: my-app
# version: green
# spec:
# containers:
# - name: my-app
# image: my-app:2.0.0
# ports:
# - containerPort: 8080
# readinessProbe:
# httpGet:
# path: /health
# port: 8080
# initialDelaySeconds: 5
# periodSeconds: 10
# livenessProbe:
# httpGet:
# path: /health
# port: 8080
# initialDelaySeconds: 15
# periodSeconds: 20
# Rolling Update (Kubernetes Default)
# apiVersion: apps/v1
# kind: Deployment
# spec:
# strategy:
# type: RollingUpdate
# rollingUpdate:
# maxSurge: 25%
# maxUnavailable: 25%
from dataclasses import dataclass
from typing import List
@dataclass
class DeploymentStrategy:
name: str
downtime: str
rollback_time: str
traffic_control: str
complexity: str
use_case: str
strategies = [
DeploymentStrategy("Blue-Green", "0", "< 1 min", "All-or-nothing", "Medium",
"Critical apps ต้อง Rollback เร็ว"),
DeploymentStrategy("Canary", "0", "< 5 min", "Gradual %", "High",
"Apps ที่ต้องทดสอบกับ Real Traffic"),
DeploymentStrategy("Rolling Update", "0", "5-15 min", "Per Instance", "Low",
"Kubernetes Default ทั่วไป"),
DeploymentStrategy("Feature Flag", "0", "Instant", "Per Feature", "Medium",
"Toggle Features ไม่ต้อง Deploy ใหม่"),
]
print("=== Deployment Strategies ===")
for s in strategies:
print(f"\n [{s.name}]")
print(f" Downtime: {s.downtime} | Rollback: {s.rollback_time}")
print(f" Traffic: {s.traffic_control} | Complexity: {s.complexity}")
print(f" Use: {s.use_case}")
Monitoring Setup
# === Better Uptime + Monitoring Setup ===
# Better Uptime API
# curl -X POST https://betteruptime.com/api/v2/monitors \
# -H "Authorization: Bearer YOUR_API_TOKEN" \
# -H "Content-Type: application/json" \
# -d '{
# "monitor_type": "status",
# "url": "https://example.com",
# "pronounceable_name": "Main Website",
# "check_frequency": 30,
# "http_method": "get",
# "expected_status_codes": [200],
# "regions": ["us", "eu", "as"],
# "confirmation_period": 60,
# "call": true,
# "sms": true,
# "email": true
# }'
# Deployment Health Check Script
# import requests
# import time
#
# def wait_for_healthy(url, timeout=300, interval=5):
# start = time.time()
# while time.time() - start < timeout:
# try:
# response = requests.get(f"{url}/health", timeout=5)
# if response.status_code == 200:
# data = response.json()
# if data.get("status") == "healthy":
# print(f"Healthy after {time.time()-start:.0f}s")
# return True
# except requests.RequestException:
# pass
# time.sleep(interval)
# raise TimeoutError(f"Not healthy after {timeout}s")
#
# # Canary Deployment Script
# def canary_deploy(service, new_version, steps=[5, 25, 50, 100]):
# for pct in steps:
# print(f"Setting traffic to {pct}% for {new_version}")
# set_traffic_split(service, new_version, pct)
# time.sleep(300) # Wait 5 min
#
# metrics = get_metrics(service, window="5m")
# if metrics["error_rate"] > 0.01:
# print(f"Error rate {metrics['error_rate']:.2%} > 1%, rolling back")
# set_traffic_split(service, new_version, 0)
# return False
# if metrics["p99_latency"] > 2000:
# print(f"P99 {metrics['p99_latency']}ms > 2000ms, rolling back")
# set_traffic_split(service, new_version, 0)
# return False
#
# print(f"Metrics OK at {pct}%")
# return True
monitors = {
"Website": {"type": "HTTPS", "interval": 30, "regions": 3, "uptime": "99.95%"},
"API": {"type": "HTTPS", "interval": 30, "regions": 3, "uptime": "99.98%"},
"Database": {"type": "TCP", "interval": 60, "regions": 1, "uptime": "99.99%"},
"CDN": {"type": "HTTPS", "interval": 60, "regions": 5, "uptime": "99.99%"},
"Cron Jobs": {"type": "Heartbeat", "interval": 300, "regions": 1, "uptime": "99.90%"},
}
print("\nMonitoring Dashboard:")
for name, info in monitors.items():
print(f" [{info['uptime']}] {name} — {info['type']} every {info['interval']}s "
f"({info['regions']} regions)")
Status Page และ Incident
# === Status Page & Incident Management ===
@dataclass
class Incident:
title: str
severity: str
status: str
duration_min: int
root_cause: str
resolution: str
incidents = [
Incident("API Latency Spike", "Minor", "Resolved", 12,
"Database connection pool exhaustion",
"Increased pool size from 20 to 50"),
Incident("CDN Cache Miss", "Major", "Resolved", 25,
"Cache invalidation after deployment",
"Pre-warm cache before switching traffic"),
Incident("SSL Certificate Renewal", "Maintenance", "Completed", 5,
"Scheduled certificate rotation",
"Auto-renewed via Let's Encrypt"),
]
print("Incident History:")
for inc in incidents:
print(f"\n [{inc.severity}] {inc.title} — {inc.status}")
print(f" Duration: {inc.duration_min} min")
print(f" Cause: {inc.root_cause}")
print(f" Fix: {inc.resolution}")
# Deploy Checklist
checklist = [
"Pre-deploy: Run full test suite",
"Pre-deploy: Check database migrations",
"Pre-deploy: Notify team via Slack",
"Deploy: Use chosen strategy (Blue-Green/Canary/Rolling)",
"Post-deploy: Verify health checks pass",
"Post-deploy: Check error rates < 0.1%",
"Post-deploy: Check P99 latency < 2s",
"Post-deploy: Verify monitoring alerts normal",
"Post-deploy: Run smoke tests",
"Rollback: If metrics exceed thresholds, rollback immediately",
]
print(f"\n\nDeploy Checklist:")
for i, item in enumerate(checklist, 1):
print(f" {i}. {item}")
เคล็ดลับ
- Health Check: ทุก Instance ต้องมี Health Check Endpoint
- Canary: เริ่มที่ 5% รอ 5 นาที ดู Metrics ก่อนเพิ่ม
- Rollback: Automate Rollback เมื่อ Error Rate สูง
- Status Page: แจ้งลูกค้าเมื่อมี Incident สร้างความเชื่อมั่น
- Pre-warm: Pre-warm Cache หลัง Deploy ก่อนรับ Traffic
การนำความรู้ไปประยุกต์ใช้งานจริง
แหล่งเรียนรู้ที่แนะนำ ได้แก่ Official Documentation ที่อัพเดทล่าสุดเสมอ Online Course จาก Coursera Udemy edX ช่อง YouTube คุณภาพทั้งไทยและอังกฤษ และ Community อย่าง Discord Reddit Stack Overflow ที่ช่วยแลกเปลี่ยนประสบการณ์กับนักพัฒนาทั่วโลก
เปรียบเทียบข้อดีและข้อเสีย
จากตารางเปรียบเทียบจะเห็นว่าข้อดีมีมากกว่าข้อเสียอย่างชัดเจน โดยเฉพาะในแง่ของประสิทธิภาพและความสามารถในการ Scale สำหรับข้อเสียส่วนใหญ่สามารถแก้ไขได้ด้วยการเรียนรู้อย่างเป็นระบบและวางแผนทรัพยากรให้เหมาะสม
Better Uptime คืออะไร
Monitoring Platform ตรวจ Uptime ทุก 30 วินาที Phone SMS Slack Email Status Page Incident Management On-call Heartbeat
Zero Downtime Deployment คืออะไร
Deploy ไม่มี Downtime Blue-Green Canary Rolling Update Load Balancer Health Check ผู้ใช้ไม่รู้สึกเปลี่ยนแปลง
Blue-Green กับ Canary ต่างกันอย่างไร
Blue-Green 2 Environment สลับ Traffic ทั้งหมด Rollback เร็ว Canary ค่อยๆเพิ่ม 5% 25% 50% 100% ตรวจ Metrics ลดความเสี่ยง
Rolling Update ทำงานอย่างไร
อัปเดตทีละ Instance Health Check ก่อนถัดไป Kubernetes Default maxSurge maxUnavailable Rollback ถ้ามีปัญหา
สรุป
Better Uptime Monitoring Zero Downtime Blue-Green Canary Rolling Update Health Check Status Page Incident Management Rollback Kubernetes Load Balancer Pre-warm Cache Deploy Checklist
