TensorFlow Serving Progressive Delivery — Deploy
TF Serving Progressive Delivery

TensorFlow Serving Progressive Delivery Canary A/B Testing Blue-Green Shadow Istio Argo Rollouts Flagger Kubernetes Production
| Strategy | Traffic Split | Duration | Risk | Automation |
|---|---|---|---|---|
| Canary | 5% → 25% → 50% → 100% | 30-60 นาที | ต่ำมาก | Argo Rollouts / Flagger |
| A/B Testing | 50% / 50% | 1-7 วัน | ปานกลาง | Istio + Custom Metrics |
| Blue-Green | 0% → 100% | ทันที | สูง (All or Nothing) | Kubernetes Service Switch |
| Shadow | 100% Mirror (No Response) | 1-3 วัน | ไม่มี | Istio Mirror |
| Feature Flag | Per User Segment | ตามต้องการ | ต่ำ | LaunchDarkly / Unleash |
TF Serving Setup
=== TensorFlow Serving Deployment ===
อ่านเพิ่ม: LLM Inference vLLM Pub Sub Architecture | SiamCafe Blog · อ่านเพิ่ม: LocalAI Self-hosted Career Development IT | SiamCafe Blog · อ่านเพิ่ม: Linkerd Service Mesh Service Mesh Setup | SiamCafe Blog
Docker Run
docker run -p 8501:8501 -p 8500:8500 \
--mount type=bind, source=/models/my_model, target=/models/my_model \
-e MODEL_NAME=my_model \
tensorflow/serving
model_config.list
model_config_list {
config {
name: "my_model"
base_path: "/models/my_model"
model_platform: "tensorflow"
model_version_policy {
specific { versions: 1 versions: 2 }
}
version_labels {
key: "stable" value: 1
key: "canary" value: 2
}
}
}
เนื้อหาเกี่ยวข้อง — แนะนำให้อ่าน Tailscale Mesh Cost Optimization ลดค่าใช้จ่าย
Kubernetes Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: tf-serving-v1
spec:
replicas: 3
template:
spec:
containers:
- name: tf-serving
image: tensorflow/serving:latest
แนะนำเพิ่มเติม — หนังสือเทรดที่ SiamCafeBook
args: ["--model_config_file=/config/model.config"]
ports:
- containerPort: 8501 # REST
- containerPort: 8500 # gRPC
resources:
requests: { cpu: "2", memory: "4Gi" }
limits: { cpu: "4", memory: "8Gi", nvidia.com/gpu: "1" }
readinessProbe:
httpGet:
path: /v1/models/my_model
port: 8501
from dataclasses import dataclass
@dataclass
class ServingConfig:
config: str
value: str
purpose: str
tip: str
configs = [
ServingConfig("model_config_file",
"/config/model.config",
"กำหนด Model Name Path Version Policy",
เนื้อหาเกี่ยวข้อง — แนะนำให้อ่าน iot คืออะไรมีอะไรบ้าง
"ใช้ specific versions สำหรับ Canary"),
ServingConfig("enable_batching",
"true + batching_parameters_file",
"รวม Request เพิ่ม Throughput",
"max_batch_size=32 batch_timeout=10ms"),
ServingConfig("rest_api_port",
"8501",
"REST API Endpoint",
"POST /v1/models/{name}:predict"),
ServingConfig("grpc_port",
"8500",
"gRPC Endpoint (เร็วกว่า REST)",
แนะนำเพิ่มเติม — XM Signal
"ใช้ gRPC สำหรับ Internal Service"),
ServingConfig("monitoring_config_file",
"prometheus_config",
"Export Metrics สำหรับ Prometheus",
"ดู Latency Throughput Error Rate"),
]
print("=== TF Serving Config ===")
for c in configs:
print(f" [{c.config}] = {c.value}")
print(f" Purpose: {c.purpose}")
print(f" Tip: {c.tip}")
เนื้อหาเกี่ยวข้อง — แนะนำให้อ่าน Multus CNI Developer Experience DX
Canary with Istio

=== Canary Deployment with Istio ===
VirtualService (Traffic Split)
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: tf-serving
spec:
hosts: ["tf-serving"]
http:
- route:
- destination:
host: tf-serving-v1
port: { number: 8501 }
weight: 95
- destination:
host: tf-serving-v2
port: { number: 8501 }
weight: 5
Argo Rollouts
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: tf-serving
spec:
strategy:
canary:
steps:
- setWeight: 5
- pause: { duration: 10m }
- analysis:
templates: [{ templateName: tf-serving-analysis }]
- setWeight: 25
- pause: { duration: 10m }
- setWeight: 50
- pause: { duration: 10m }
- setWeight: 100
@dataclass
class CanaryStep:
step: int
weight: str
duration: str
เนื้อหาเกี่ยวข้อง — CrewAI Multi-Agent สำหรับมือใหม่ Step by Step
action: str
rollback_trigger: str
steps = [
CanaryStep(1, "5% Canary", "10 นาที",
"Deploy V2 ส่ง 5% Traffic Monitor Baseline",
"Error Rate > 1% Latency P99 > 2x"),
CanaryStep(2, "25% Canary", "10 นาที",
"เพิ่ม Traffic ถ้า Step 1 OK",
"Error Rate > 0.5% Latency P99 > 1.5x"),
CanaryStep(3, "50% Canary", "10 นาที",
"เพิ่ม Traffic ตรวจ A/B Significance",
"Accuracy Drop > 2% Business Metric Drop"),
CanaryStep(4, "100% Production", "ถาวร",
"Full Promotion ลบ V1 ออก",
"Monitor 24 ชม. หลัง Promotion"),
]
print("=== Canary Steps ===")
for s in steps:
print(f" Step {s.step}: {s.weight} ({s.duration})")
print(f" Action: {s.action}")
print(f" Rollback if: {s.rollback_trigger}")
Monitoring & Rollback
# === Model Monitoring ===
@dataclass
class ModelMetric:
metric: str
source: str
baseline: str
alert: str
rollback: str
metrics = [
ModelMetric("Inference Latency P99",
"Prometheus (TF Serving Metrics)",
"< 50ms",
"> 75ms (1.5x)",
"> 100ms (2x) → Auto Rollback"),
ModelMetric("Error Rate (5xx)",
"Prometheus / Istio Metrics",
"< 0.1%",
"> 0.5%",
"> 1% → Auto Rollback"),
ModelMetric("Model Accuracy",
"Custom Metrics / Evidently AI",
"> 95%",
"< 93% (2% drop)",
"< 90% (5% drop) → Auto Rollback"),
ModelMetric("Data Drift",
"Evidently AI / Alibi Detect",
"KS-test p > 0.05",
"p < 0.05 (Drift Detected)",
"Multiple Features Drift → Retrain"),
ModelMetric("GPU Utilization",
"NVIDIA DCGM / Prometheus",
"< 80%",
"> 85%",
"> 95% → Scale Up / Rollback"),
ModelMetric("Business Metric (CTR)",
"Analytics Platform",
"CTR 3.5%",
"CTR < 3.0% (15% drop)",
"CTR < 2.5% (30% drop) → Rollback"),
]
print("=== Model Metrics ===")
for m in metrics:
print(f" [{m.metric}] Baseline: {m.baseline}")
print(f" Source: {m.source}")
print(f" Alert: {m.alert}")
print(f" Rollback: {m.rollback}")
เคล็ดลับ
- Canary: เริ่ม 5% Monitor 10 นาทีก่อนเพิ่ม
- gRPC: ใช้ gRPC สำหรับ Internal เร็วกว่า REST 2-5x
- Batching: เปิด Batching เพิ่ม Throughput 2-4x
- Argo: ใช้ Argo Rollouts ทำ Canary อัตโนมัติ
- Drift: ตรวจ Data Drift ทุกสัปดาห์ Retrain ถ้า Drift
TensorFlow Serving คืออะไร
Production ML Serving gRPC REST API C++ Low Latency Model Versioning Hot-reload Batching GPU Docker Kubernetes SavedModel





