SiamCafe.net Blog
Technology

MLOps Pipeline Container Orchestration

mlops pipeline container orchestration
MLOps Pipeline Container Orchestration | SiamCafe Blog
2026-01-08· อ. บอม — SiamCafe.net· 8,362 คำ

MLOps Container

MLOps Pipeline Container Orchestration Kubernetes Kubeflow MLflow Model Training Serving Docker GPU CI/CD Monitoring Drift Retraining Production

PlatformTypeK8s RequiredComplexityเหมาะกับ
KubeflowFull Platformใช่สูงEnterprise
MLflowTracking + Registryไม่จำเป็นต่ำAll sizes
Vertex AIManaged (GCP)ไม่จำเป็นปานกลางGCP Users
SageMakerManaged (AWS)ไม่จำเป็นปานกลางAWS Users

ML Pipeline

# === MLOps Pipeline ===

# Dockerfile — ML Model Serving
# FROM python:3.11-slim
# WORKDIR /app
# COPY requirements.txt .
# RUN pip install --no-cache-dir -r requirements.txt
# COPY model/ ./model/
# COPY app.py .
# EXPOSE 8000
# CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

# FastAPI — Model Serving
# from fastapi import FastAPI
# import joblib
# import numpy as np
#
# app = FastAPI()
# model = joblib.load("model/model.pkl")
#
# @app.post("/predict")
# async def predict(features: list[float]):
#     prediction = model.predict([features])
#     return {"prediction": prediction.tolist()}
#
# @app.get("/health")
# async def health():
#     return {"status": "healthy", "model_version": "v1.2.3"}

# MLflow — Experiment Tracking
# import mlflow
# import mlflow.sklearn
#
# mlflow.set_tracking_uri("http://mlflow-server:5000")
# mlflow.set_experiment("fraud-detection")
#
# with mlflow.start_run():
#     mlflow.log_param("n_estimators", 100)
#     mlflow.log_param("max_depth", 10)
#     mlflow.log_metric("accuracy", 0.95)
#     mlflow.log_metric("f1_score", 0.92)
#     mlflow.sklearn.log_model(model, "model")

from dataclasses import dataclass

@dataclass
class MLPipeline:
    name: str
    stage: str
    container: str
    gpu: bool
    duration: str
    status: str

pipelines = [
    MLPipeline("Data Ingestion", "Extract", "data-loader:v2", False, "15 min", "Completed"),
    MLPipeline("Feature Engineering", "Transform", "feature-eng:v3", False, "30 min", "Completed"),
    MLPipeline("Model Training", "Train", "trainer:v5-gpu", True, "2 hours", "Running"),
    MLPipeline("Evaluation", "Evaluate", "evaluator:v2", False, "10 min", "Pending"),
    MLPipeline("Model Registry", "Register", "mlflow:v2", False, "2 min", "Pending"),
    MLPipeline("Deployment", "Deploy", "kserve:v1", True, "5 min", "Pending"),
]

print("=== ML Pipeline ===")
for p in pipelines:
    gpu_str = "GPU" if p.gpu else "CPU"
    print(f"  [{p.status}] {p.name} ({p.stage})")
    print(f"    Container: {p.container} | {gpu_str} | Duration: {p.duration}")

Kubernetes Deployment

# === K8s ML Deployment ===

# KServe — Model Serving
# apiVersion: serving.kserve.io/v1beta1
# kind: InferenceService
# metadata:
#   name: fraud-detector
# spec:
#   predictor:
#     model:
#       modelFormat:
#         name: sklearn
#       storageUri: s3://models/fraud-detector/v1.2.3
#       resources:
#         requests:
#           cpu: "2"
#           memory: "4Gi"
#         limits:
#           cpu: "4"
#           memory: "8Gi"
#           nvidia.com/gpu: "1"
#     minReplicas: 2
#     maxReplicas: 10
#     scaleTarget: 10  # concurrent requests

# Canary Deployment
# spec:
#   predictor:
#     canaryTrafficPercent: 10
#     model:
#       storageUri: s3://models/fraud-detector/v1.3.0
#   # 90% -> v1.2.3, 10% -> v1.3.0

# GPU Node Pool
# kubectl label nodes gpu-node-1 accelerator=nvidia-a100
# nodeSelector:
#   accelerator: nvidia-a100
# tolerations:
#   - key: nvidia.com/gpu
#     operator: Exists
#     effect: NoSchedule

@dataclass
class ModelDeployment:
    model: str
    version: str
    replicas: int
    gpu: str
    rps: int
    latency_p99_ms: int
    accuracy: float

deployments = [
    ModelDeployment("fraud-detector", "v1.2.3", 4, "T4", 500, 45, 0.952),
    ModelDeployment("recommendation", "v3.1.0", 6, "A100", 2000, 25, 0.891),
    ModelDeployment("sentiment-analysis", "v2.0.1", 2, "None", 300, 80, 0.934),
    ModelDeployment("image-classifier", "v1.5.2", 3, "T4", 100, 120, 0.967),
    ModelDeployment("text-embedding", "v1.0.0", 8, "A100", 5000, 15, 0.0),
]

print("\n=== Model Deployments ===")
for d in deployments:
    print(f"  [{d.model}] {d.version} x{d.replicas}")
    print(f"    GPU: {d.gpu} | RPS: {d.rps} | p99: {d.latency_p99_ms}ms | Acc: {d.accuracy:.1%}")

Monitoring และ Retraining

# === Model Monitoring ===

# Prometheus Metrics for ML
# - model_prediction_latency_seconds
# - model_prediction_total (by class)
# - model_accuracy_score
# - feature_drift_score
# - data_quality_score

# Grafana Dashboard
# - Prediction latency p50/p95/p99
# - Prediction volume per model
# - Feature distribution shift
# - Accuracy over time
# - GPU utilization per model

# Auto-retrain Trigger
# if feature_drift > threshold:
#     trigger_pipeline("retrain")
# if accuracy < sla_target:
#     trigger_pipeline("retrain")
# schedule: weekly retrain with fresh data

monitoring = {
    "Prediction Latency p99": "45ms (target < 100ms)",
    "Throughput": "3,500 predictions/sec",
    "Model Accuracy (7d)": "95.2% (target > 93%)",
    "Feature Drift Score": "0.12 (threshold: 0.3)",
    "Data Quality Score": "98.5%",
    "GPU Utilization": "72% average",
    "Failed Predictions": "0.05%",
    "Auto-retrain Count (30d)": "2 triggered",
}

print("ML Monitoring Dashboard:")
for k, v in monitoring.items():
    print(f"  {k}: {v}")

best_practices = [
    "Docker Image: Pin ทุก Library Version ใน requirements.txt",
    "MLflow: Track ทุก Experiment Parameter Metric Artifact",
    "KServe: ใช้ Canary สำหรับ Model ใหม่ทุกครั้ง",
    "GPU: ใช้ Node Pool แยก GPU/CPU Workload",
    "Monitoring: Alert เมื่อ Accuracy ต่ำกว่า SLA",
    "Drift: ตรวจ Feature Drift ทุกวัน Auto-retrain เมื่อ Drift",
    "Registry: Version ทุก Model ใน MLflow Registry",
]

print(f"\n\nBest Practices:")
for i, p in enumerate(best_practices, 1):
    print(f"  {i}. {p}")

เคล็ดลับ

การนำไปใช้งานจริงในองค์กร

สำหรับองค์กรขนาดกลางถึงใหญ่ แนะนำให้ใช้หลัก Three-Tier Architecture คือ Core Layer ที่เป็นแกนกลางของระบบ Distribution Layer ที่ทำหน้าที่กระจาย Traffic และ Access Layer ที่เชื่อมต่อกับผู้ใช้โดยตรง การแบ่ง Layer ชัดเจนช่วยให้การ Troubleshoot ง่ายขึ้นและสามารถ Scale ระบบได้ตามความต้องการ

เรื่อง Network Security ก็สำคัญไม่แพ้กัน ควรติดตั้ง Next-Generation Firewall ที่สามารถ Deep Packet Inspection ได้ ใช้ Network Segmentation แยก VLAN สำหรับแต่ละแผนก ติดตั้ง IDS/IPS เพื่อตรวจจับการโจมตี และทำ Regular Security Audit อย่างน้อยปีละ 2 ครั้ง

MLOps คืออะไร

ML Operations DevOps + ML CI/CD Model Version Control Data Monitoring Drift Retraining MLflow Kubeflow Vertex SageMaker Production

ทำไมต้องใช้ Container กับ ML

Reproducibility Dependency Scalability Portability GPU Sharing Isolation Kubernetes Auto-scaling Environment เหมือนกัน Local Cloud

Kubeflow กับ MLflow ต่างกันอย่างไร

Kubeflow Full Platform K8s Pipeline Notebook Serving หนัก MLflow Tracking Registry Serving เบา ไม่ต้อง K8s ใช้ร่วมกันได้

Deploy ML Model อย่างไร

Docker FastAPI Flask REST API Kubernetes KServe Seldon A/B Canary Auto-scaling GPU Node Pool Monitoring Latency Accuracy Drift

สรุป

MLOps Pipeline Container Orchestration Kubernetes Kubeflow MLflow Docker GPU KServe Canary Model Serving Training Monitoring Drift Retraining Production CI/CD

📖 บทความที่เกี่ยวข้อง

Embedding Model Container Orchestrationอ่านบทความ → Zipkin Tracing Container Orchestrationอ่านบทความ → CSS Container Queries Distributed Systemอ่านบทความ → Talos Linux Docker Container Deployอ่านบทความ → Delta Lake Container Orchestrationอ่านบทความ →

📚 ดูบทความทั้งหมด →