SiamCafe.net Blog
Technology

TensorFlow Serving Progressive Delivery

tensorflow serving progressive delivery
TensorFlow Serving Progressive Delivery | SiamCafe Blog
2026-02-14· อ. บอม — SiamCafe.net· 11,485 คำ

TF Serving Progressive Delivery

TensorFlow Serving Progressive Delivery Canary A/B Testing Blue-Green Shadow Istio Argo Rollouts Flagger Kubernetes Production

StrategyTraffic SplitDurationRiskAutomation
Canary5% → 25% → 50% → 100%30-60 นาทีต่ำมากArgo Rollouts / Flagger
A/B Testing50% / 50%1-7 วันปานกลางIstio + Custom Metrics
Blue-Green0% → 100%ทันทีสูง (All or Nothing)Kubernetes Service Switch
Shadow100% Mirror (No Response)1-3 วันไม่มีIstio Mirror
Feature FlagPer User Segmentตามต้องการต่ำLaunchDarkly / Unleash

TF Serving Setup

# === TensorFlow Serving Deployment ===

# Docker Run
# docker run -p 8501:8501 -p 8500:8500 \
#   --mount type=bind, source=/models/my_model, target=/models/my_model \
#   -e MODEL_NAME=my_model \
#   tensorflow/serving

# model_config.list
# model_config_list {
#   config {
#     name: "my_model"
#     base_path: "/models/my_model"
#     model_platform: "tensorflow"
#     model_version_policy {
#       specific { versions: 1 versions: 2 }
#     }
#     version_labels {
#       key: "stable" value: 1
#       key: "canary" value: 2
#     }
#   }
# }

# Kubernetes Deployment
# apiVersion: apps/v1
# kind: Deployment
# metadata:
#   name: tf-serving-v1
# spec:
#   replicas: 3
#   template:
#     spec:
#       containers:
#         - name: tf-serving
#           image: tensorflow/serving:latest
#           args: ["--model_config_file=/config/model.config"]
#           ports:
#             - containerPort: 8501  # REST
#             - containerPort: 8500  # gRPC
#           resources:
#             requests: { cpu: "2", memory: "4Gi" }
#             limits: { cpu: "4", memory: "8Gi", nvidia.com/gpu: "1" }
#           readinessProbe:
#             httpGet:
#               path: /v1/models/my_model
#               port: 8501

from dataclasses import dataclass

@dataclass
class ServingConfig:
    config: str
    value: str
    purpose: str
    tip: str

configs = [
    ServingConfig("model_config_file",
        "/config/model.config",
        "กำหนด Model Name Path Version Policy",
        "ใช้ specific versions สำหรับ Canary"),
    ServingConfig("enable_batching",
        "true + batching_parameters_file",
        "รวม Request เพิ่ม Throughput",
        "max_batch_size=32 batch_timeout=10ms"),
    ServingConfig("rest_api_port",
        "8501",
        "REST API Endpoint",
        "POST /v1/models/{name}:predict"),
    ServingConfig("grpc_port",
        "8500",
        "gRPC Endpoint (เร็วกว่า REST)",
        "ใช้ gRPC สำหรับ Internal Service"),
    ServingConfig("monitoring_config_file",
        "prometheus_config",
        "Export Metrics สำหรับ Prometheus",
        "ดู Latency Throughput Error Rate"),
]

print("=== TF Serving Config ===")
for c in configs:
    print(f"  [{c.config}] = {c.value}")
    print(f"    Purpose: {c.purpose}")
    print(f"    Tip: {c.tip}")

Canary with Istio

# === Canary Deployment with Istio ===

# VirtualService (Traffic Split)
# apiVersion: networking.istio.io/v1beta1
# kind: VirtualService
# metadata:
#   name: tf-serving
# spec:
#   hosts: ["tf-serving"]
#   http:
#     - route:
#         - destination:
#             host: tf-serving-v1
#             port: { number: 8501 }
#           weight: 95
#         - destination:
#             host: tf-serving-v2
#             port: { number: 8501 }
#           weight: 5

# Argo Rollouts
# apiVersion: argoproj.io/v1alpha1
# kind: Rollout
# metadata:
#   name: tf-serving
# spec:
#   strategy:
#     canary:
#       steps:
#         - setWeight: 5
#         - pause: { duration: 10m }
#         - analysis:
#             templates: [{ templateName: tf-serving-analysis }]
#         - setWeight: 25
#         - pause: { duration: 10m }
#         - setWeight: 50
#         - pause: { duration: 10m }
#         - setWeight: 100

@dataclass
class CanaryStep:
    step: int
    weight: str
    duration: str
    action: str
    rollback_trigger: str

steps = [
    CanaryStep(1, "5% Canary", "10 นาที",
        "Deploy V2 ส่ง 5% Traffic Monitor Baseline",
        "Error Rate > 1% Latency P99 > 2x"),
    CanaryStep(2, "25% Canary", "10 นาที",
        "เพิ่ม Traffic ถ้า Step 1 OK",
        "Error Rate > 0.5% Latency P99 > 1.5x"),
    CanaryStep(3, "50% Canary", "10 นาที",
        "เพิ่ม Traffic ตรวจ A/B Significance",
        "Accuracy Drop > 2% Business Metric Drop"),
    CanaryStep(4, "100% Production", "ถาวร",
        "Full Promotion ลบ V1 ออก",
        "Monitor 24 ชม. หลัง Promotion"),
]

print("=== Canary Steps ===")
for s in steps:
    print(f"  Step {s.step}: {s.weight} ({s.duration})")
    print(f"    Action: {s.action}")
    print(f"    Rollback if: {s.rollback_trigger}")

Monitoring & Rollback

# === Model Monitoring ===

@dataclass
class ModelMetric:
    metric: str
    source: str
    baseline: str
    alert: str
    rollback: str

metrics = [
    ModelMetric("Inference Latency P99",
        "Prometheus (TF Serving Metrics)",
        "< 50ms",
        "> 75ms (1.5x)",
        "> 100ms (2x) → Auto Rollback"),
    ModelMetric("Error Rate (5xx)",
        "Prometheus / Istio Metrics",
        "< 0.1%",
        "> 0.5%",
        "> 1% → Auto Rollback"),
    ModelMetric("Model Accuracy",
        "Custom Metrics / Evidently AI",
        "> 95%",
        "< 93% (2% drop)",
        "< 90% (5% drop) → Auto Rollback"),
    ModelMetric("Data Drift",
        "Evidently AI / Alibi Detect",
        "KS-test p > 0.05",
        "p < 0.05 (Drift Detected)",
        "Multiple Features Drift → Retrain"),
    ModelMetric("GPU Utilization",
        "NVIDIA DCGM / Prometheus",
        "< 80%",
        "> 85%",
        "> 95% → Scale Up / Rollback"),
    ModelMetric("Business Metric (CTR)",
        "Analytics Platform",
        "CTR 3.5%",
        "CTR < 3.0% (15% drop)",
        "CTR < 2.5% (30% drop) → Rollback"),
]

print("=== Model Metrics ===")
for m in metrics:
    print(f"  [{m.metric}] Baseline: {m.baseline}")
    print(f"    Source: {m.source}")
    print(f"    Alert: {m.alert}")
    print(f"    Rollback: {m.rollback}")

เคล็ดลับ

TensorFlow Serving คืออะไร

Production ML Serving gRPC REST API C++ Low Latency Model Versioning Hot-reload Batching GPU Docker Kubernetes SavedModel

Progressive Delivery คืออะไร

ค่อยๆเพิ่ม Traffic Canary 5-100% A/B Testing Blue-Green Shadow Feature Flag Istio Argo Rollouts Flagger Kubernetes

Canary Deploy ทำอย่างไร

Deploy V2 5% Traffic Istio VirtualService Weight Monitor Latency Accuracy Error เพิ่ม 25% 50% 100% Argo Rollouts Auto Rollback

Monitor Model อย่างไร

Latency P99 Error Rate Accuracy Data Drift GPU Utilization Business Metrics Prometheus Grafana Evidently AI Auto Rollback Threshold

สรุป

TensorFlow Serving Progressive Delivery Canary Istio Argo Rollouts Monitoring Latency Accuracy Drift Rollback Production

📖 บทความที่เกี่ยวข้อง

TensorFlow Serving Hexagonal Architectureอ่านบทความ → TensorFlow Serving Code Review Best Practiceอ่านบทความ → TensorFlow Serving Certification Pathอ่านบทความ → TensorFlow Serving Load Testing Strategyอ่านบทความ → TensorFlow Serving Scaling Strategy วิธี Scaleอ่านบทความ →

📚 ดูบทความทั้งหมด →