TF Serving Progressive Delivery
TensorFlow Serving Progressive Delivery Canary A/B Testing Blue-Green Shadow Istio Argo Rollouts Flagger Kubernetes Production
| Strategy | Traffic Split | Duration | Risk | Automation |
|---|---|---|---|---|
| Canary | 5% → 25% → 50% → 100% | 30-60 นาที | ต่ำมาก | Argo Rollouts / Flagger |
| A/B Testing | 50% / 50% | 1-7 วัน | ปานกลาง | Istio + Custom Metrics |
| Blue-Green | 0% → 100% | ทันที | สูง (All or Nothing) | Kubernetes Service Switch |
| Shadow | 100% Mirror (No Response) | 1-3 วัน | ไม่มี | Istio Mirror |
| Feature Flag | Per User Segment | ตามต้องการ | ต่ำ | LaunchDarkly / Unleash |
TF Serving Setup
# === TensorFlow Serving Deployment ===
# Docker Run
# docker run -p 8501:8501 -p 8500:8500 \
# --mount type=bind, source=/models/my_model, target=/models/my_model \
# -e MODEL_NAME=my_model \
# tensorflow/serving
# model_config.list
# model_config_list {
# config {
# name: "my_model"
# base_path: "/models/my_model"
# model_platform: "tensorflow"
# model_version_policy {
# specific { versions: 1 versions: 2 }
# }
# version_labels {
# key: "stable" value: 1
# key: "canary" value: 2
# }
# }
# }
# Kubernetes Deployment
# apiVersion: apps/v1
# kind: Deployment
# metadata:
# name: tf-serving-v1
# spec:
# replicas: 3
# template:
# spec:
# containers:
# - name: tf-serving
# image: tensorflow/serving:latest
# args: ["--model_config_file=/config/model.config"]
# ports:
# - containerPort: 8501 # REST
# - containerPort: 8500 # gRPC
# resources:
# requests: { cpu: "2", memory: "4Gi" }
# limits: { cpu: "4", memory: "8Gi", nvidia.com/gpu: "1" }
# readinessProbe:
# httpGet:
# path: /v1/models/my_model
# port: 8501
from dataclasses import dataclass
@dataclass
class ServingConfig:
config: str
value: str
purpose: str
tip: str
configs = [
ServingConfig("model_config_file",
"/config/model.config",
"กำหนด Model Name Path Version Policy",
"ใช้ specific versions สำหรับ Canary"),
ServingConfig("enable_batching",
"true + batching_parameters_file",
"รวม Request เพิ่ม Throughput",
"max_batch_size=32 batch_timeout=10ms"),
ServingConfig("rest_api_port",
"8501",
"REST API Endpoint",
"POST /v1/models/{name}:predict"),
ServingConfig("grpc_port",
"8500",
"gRPC Endpoint (เร็วกว่า REST)",
"ใช้ gRPC สำหรับ Internal Service"),
ServingConfig("monitoring_config_file",
"prometheus_config",
"Export Metrics สำหรับ Prometheus",
"ดู Latency Throughput Error Rate"),
]
print("=== TF Serving Config ===")
for c in configs:
print(f" [{c.config}] = {c.value}")
print(f" Purpose: {c.purpose}")
print(f" Tip: {c.tip}")
Canary with Istio
# === Canary Deployment with Istio ===
# VirtualService (Traffic Split)
# apiVersion: networking.istio.io/v1beta1
# kind: VirtualService
# metadata:
# name: tf-serving
# spec:
# hosts: ["tf-serving"]
# http:
# - route:
# - destination:
# host: tf-serving-v1
# port: { number: 8501 }
# weight: 95
# - destination:
# host: tf-serving-v2
# port: { number: 8501 }
# weight: 5
# Argo Rollouts
# apiVersion: argoproj.io/v1alpha1
# kind: Rollout
# metadata:
# name: tf-serving
# spec:
# strategy:
# canary:
# steps:
# - setWeight: 5
# - pause: { duration: 10m }
# - analysis:
# templates: [{ templateName: tf-serving-analysis }]
# - setWeight: 25
# - pause: { duration: 10m }
# - setWeight: 50
# - pause: { duration: 10m }
# - setWeight: 100
@dataclass
class CanaryStep:
step: int
weight: str
duration: str
action: str
rollback_trigger: str
steps = [
CanaryStep(1, "5% Canary", "10 นาที",
"Deploy V2 ส่ง 5% Traffic Monitor Baseline",
"Error Rate > 1% Latency P99 > 2x"),
CanaryStep(2, "25% Canary", "10 นาที",
"เพิ่ม Traffic ถ้า Step 1 OK",
"Error Rate > 0.5% Latency P99 > 1.5x"),
CanaryStep(3, "50% Canary", "10 นาที",
"เพิ่ม Traffic ตรวจ A/B Significance",
"Accuracy Drop > 2% Business Metric Drop"),
CanaryStep(4, "100% Production", "ถาวร",
"Full Promotion ลบ V1 ออก",
"Monitor 24 ชม. หลัง Promotion"),
]
print("=== Canary Steps ===")
for s in steps:
print(f" Step {s.step}: {s.weight} ({s.duration})")
print(f" Action: {s.action}")
print(f" Rollback if: {s.rollback_trigger}")
Monitoring & Rollback
# === Model Monitoring ===
@dataclass
class ModelMetric:
metric: str
source: str
baseline: str
alert: str
rollback: str
metrics = [
ModelMetric("Inference Latency P99",
"Prometheus (TF Serving Metrics)",
"< 50ms",
"> 75ms (1.5x)",
"> 100ms (2x) → Auto Rollback"),
ModelMetric("Error Rate (5xx)",
"Prometheus / Istio Metrics",
"< 0.1%",
"> 0.5%",
"> 1% → Auto Rollback"),
ModelMetric("Model Accuracy",
"Custom Metrics / Evidently AI",
"> 95%",
"< 93% (2% drop)",
"< 90% (5% drop) → Auto Rollback"),
ModelMetric("Data Drift",
"Evidently AI / Alibi Detect",
"KS-test p > 0.05",
"p < 0.05 (Drift Detected)",
"Multiple Features Drift → Retrain"),
ModelMetric("GPU Utilization",
"NVIDIA DCGM / Prometheus",
"< 80%",
"> 85%",
"> 95% → Scale Up / Rollback"),
ModelMetric("Business Metric (CTR)",
"Analytics Platform",
"CTR 3.5%",
"CTR < 3.0% (15% drop)",
"CTR < 2.5% (30% drop) → Rollback"),
]
print("=== Model Metrics ===")
for m in metrics:
print(f" [{m.metric}] Baseline: {m.baseline}")
print(f" Source: {m.source}")
print(f" Alert: {m.alert}")
print(f" Rollback: {m.rollback}")
เคล็ดลับ
- Canary: เริ่ม 5% Monitor 10 นาทีก่อนเพิ่ม
- gRPC: ใช้ gRPC สำหรับ Internal เร็วกว่า REST 2-5x
- Batching: เปิด Batching เพิ่ม Throughput 2-4x
- Argo: ใช้ Argo Rollouts ทำ Canary อัตโนมัติ
- Drift: ตรวจ Data Drift ทุกสัปดาห์ Retrain ถ้า Drift
TensorFlow Serving คืออะไร
Production ML Serving gRPC REST API C++ Low Latency Model Versioning Hot-reload Batching GPU Docker Kubernetes SavedModel
Progressive Delivery คืออะไร
ค่อยๆเพิ่ม Traffic Canary 5-100% A/B Testing Blue-Green Shadow Feature Flag Istio Argo Rollouts Flagger Kubernetes
Canary Deploy ทำอย่างไร
Deploy V2 5% Traffic Istio VirtualService Weight Monitor Latency Accuracy Error เพิ่ม 25% 50% 100% Argo Rollouts Auto Rollback
Monitor Model อย่างไร
Latency P99 Error Rate Accuracy Data Drift GPU Utilization Business Metrics Prometheus Grafana Evidently AI Auto Rollback Threshold
สรุป
TensorFlow Serving Progressive Delivery Canary Istio Argo Rollouts Monitoring Latency Accuracy Drift Rollback Production
