MLflow Experiment กับ Remote Work Setup —

MLflow คืออะไร

MLflow เป็น Open-source Platform ที่พัฒนาโดย Databricks สำหรับจัดการ Machine Learning Lifecycle ทั้งหมด ตั้งแต่การทดลอง (Experiment Tracking) การจัดการ Model (Model Registry) ไปจนถึงการ Deploy Model (Model Serving) เป็นเครื่องมือที่ได้รับความนิยมสูงสุดในหมู่ Data Scientists และ ML Engineers เพราะใช้ง่าย รองรับ Framework หลากหลาย และ Integrate กับ Cloud Services ได้ดี

เนื้อหาเกี่ยวข้อง — บทความที่เกี่ยวข้อง: fear and greed index stock

เมื่อทีม ML ทำงานแบบ Remote การมี Central Tracking Server เป็นสิ่งจำเป็นเพราะทุกคนต้องสามารถบันทึก Experiment Results ไปที่เดียวกัน เปรียบเทียบ Model Performance ข้าม Experiments และ Reproduce Results ได้ทุกเมื่อ

เนื้อหาเกี่ยวข้อง — แนะนำให้อ่าน Prometheus Federation GitOps Workflow

สถาปัตยกรรม MLflow Remote Tracking

MLflow Tracking Server: เก็บ Metadata ของ Experiments (Parameters, Metrics, Tags) ใน Backend Store
Backend Store: PostgreSQL หรือ MySQL สำหรับเก็บ Experiment Metadata
Artifact Store: S3, GCS หรือ MinIO สำหรับเก็บ Model Files, Plots, Data Samples
Reverse Proxy: Nginx + HTTPS สำหรับ Security
Authentication: Basic Auth, OAuth2 Proxy หรือ VPN

การติดตั้ง MLflow Tracking Server

# docker-compose.yml สำหรับ MLflow Tracking Server

version: "3.8"



services:

  postgres:

    image: postgres:16-alpine

    environment:

      POSTGRES_DB: mlflow

      POSTGRES_USER: mlflow

      POSTGRES_PASSWORD: 

    volumes:

      - postgres_data:/var/lib/postgresql/data

    healthcheck:

      test: ["CMD-SHELL", "pg_isready -U mlflow"]

      interval: 10s

      timeout: 5s

      retries: 5



  minio:

    image: minio/minio:latest

    command: server /data --console-address ":9001"

    environment:

      MINIO_ROOT_USER: 

      MINIO_ROOT_PASSWORD: 

    volumes:

      - minio_data:/data

    ports:

      - "9000:9000"

      - "9001:9001"

    healthcheck:

      test: ["CMD", "mc", "ready", "local"]

      interval: 10s

      timeout: 5s

      retries: 5



  create-bucket:

    image: minio/mc

    depends_on:

      minio:

        condition: service_healthy

    entrypoint: >

      /bin/sh -c "

      mc alias set myminio http://minio:9000 minioadmin minioadmin123;

      mc mb --ignore-existing myminio/mlflow-artifacts;

      mc anonymous set download myminio/mlflow-artifacts;

      exit 0;

      "



  mlflow:

    image: python:3.11-slim

    depends_on:

      postgres:

        condition: service_healthy

      minio:

        condition: service_healthy

    command: >

      bash -c "

      pip install mlflow psycopg2-binary boto3 &&

      mlflow server

        --backend-store-uri postgresql://mlflow:@postgres:5432/mlflow

        --default-artifact-root s3://mlflow-artifacts/

        --host 0.0.0.0

        --port 5000

        --serve-artifacts

      "

    environment:

      MLFLOW_S3_ENDPOINT_URL: http://minio:9000

      AWS_ACCESS_KEY_ID: 

      AWS_SECRET_ACCESS_KEY: 

    ports:

      - "5000:5000"



  nginx:

    image: nginx:alpine

    ports:

      - "443:443"

      - "80:80"

    volumes:

      - ./nginx.conf:/etc/nginx/conf.d/default.conf

      - ./ssl:/etc/nginx/ssl

      - ./htpasswd:/etc/nginx/.htpasswd

    depends_on:

      - mlflow



volumes:

  postgres_data:

  minio_data:



---

# nginx.conf — Reverse Proxy กับ Basic Auth

server {

    listen 80;

    server_name mlflow.company.com;

    return 301 https://$server_name$request_uri;

}



server {

    listen 443 ssl;

    server_name mlflow.company.com;



    ssl_certificate /etc/nginx/ssl/fullchain.pem;

    ssl_certificate_key /etc/nginx/ssl/privkey.pem;



    auth_basic "MLflow Tracking Server";

    auth_basic_user_file /etc/nginx/.htpasswd;



    location / {

        proxy_pass http://mlflow:5000;

        proxy_set_header Host $host;

        proxy_set_header X-Real-IP $remote_addr;

        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

        proxy_set_header X-Forwarded-Proto $scheme;

    }

}



---

# สร้าง htpasswd

sudo apt install apache2-utils

htpasswd -c ./htpasswd mluser



# Deploy

docker compose up -d



# ตรวจสอบ

docker compose logs mlflow

curl -u mluser:password https://mlflow.company.com/api/2.0/mlflow/experiments/search

การใช้งาน MLflow Client จาก Remote

# Python Script สำหรับ Log Experiment จาก Remote Machine

import mlflow

import mlflow.sklearn

from sklearn.ensemble import RandomForestClassifier

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score, f1_score, classification_report

import os

import json



# ตั้งค่า Tracking URI (Remote Server)

os.environ["MLFLOW_TRACKING_URI"] = "https://mlflow.company.com"

os.environ["MLFLOW_TRACKING_USERNAME"] = "mluser"

os.environ["MLFLOW_TRACKING_PASSWORD"] = "password"



# ตั้งค่า Artifact Store (MinIO/S3)

os.environ["MLFLOW_S3_ENDPOINT_URL"] = "https://minio.company.com"

os.environ["AWS_ACCESS_KEY_ID"] = "minioadmin"

os.environ["AWS_SECRET_ACCESS_KEY"] = "minioadmin123"



# โหลดข้อมูล

X, y = load_iris(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(

    X, y, test_size=0.2, random_state=42

)



# สร้าง/เลือก Experiment

mlflow.set_experiment("iris-classification")



# Parameters ที่ต้องการทดลอง

param_grid = [

    {"n_estimators": 50, "max_depth": 3, "min_samples_split": 2},

    {"n_estimators": 100, "max_depth": 5, "min_samples_split": 5},

    {"n_estimators": 200, "max_depth": 10, "min_samples_split": 10},

    {"n_estimators": 100, "max_depth": None, "min_samples_split": 2},

]



best_accuracy = 0

best_run_id = None



for params in param_grid:

    with mlflow.start_run(run_name=f"rf-{params['n_estimators']}-{params['max_depth']}"):

        # Log Parameters

        mlflow.log_params(params)

        mlflow.set_tag("model_type", "RandomForest")

        mlflow.set_tag("engineer", "bom")

        mlflow.set_tag("dataset", "iris")



        # Train Model

        model = RandomForestClassifier(**params, random_state=42, n_jobs=-1)

        model.fit(X_train, y_train)



        # Predict และ Evaluate

        y_pred = model.predict(X_test)

        accuracy = accuracy_score(y_test, y_pred)

        f1 = f1_score(y_test, y_pred, average="weighted")



        # Log Metrics

        mlflow.log_metrics({

            "accuracy": accuracy,

            "f1_score": f1,

            "train_samples": len(X_train),

            "test_samples": len(X_test),

        })



        # Log Classification Report

        report = classification_report(y_test, y_pred, output_dict=True)

        with open("classification_report.json", "w") as f:

            json.dump(report, f, indent=2)

        mlflow.log_artifact("classification_report.json")



        # Log Model

        mlflow.sklearn.log_model(

            model, "model",

            registered_model_name="iris-classifier",

        )



        # Track Best Model

        if accuracy > best_accuracy:

            best_accuracy = accuracy

            best_run_id = mlflow.active_run().info.run_id



        print(f"  Params: {params}")

        print(f"  Accuracy: {accuracy:.4f} | F1: {f1:.4f}")



print(f"\nBest Run: {best_run_id} (Accuracy: {best_accuracy:.4f})")

Model Registry และ Deployment

# จัดการ Model Versions ด้วย Model Registry

import mlflow

from mlflow.tracking import MlflowClient



client = MlflowClient()



# ดู Model Versions ทั้งหมด

model_name = "iris-classifier"

versions = client.search_model_versions(f"name='{model_name}'")

for v in versions:

    print(f"Version {v.version}: Run={v.run_id}, Stage={v.current_stage}")



# Promote Model ไป Staging

client.transition_model_version_stage(

    name=model_name,

    version=2,

    stage="Staging",

    archive_existing_versions=False,

)



# Promote ไป Production หลังทดสอบแล้ว

client.transition_model_version_stage(

    name=model_name,

    version=2,

    stage="Production",

    archive_existing_versions=True,

)



# โหลด Production Model มาใช้งาน

model_uri = f"models:/{model_name}/Production"

model = mlflow.sklearn.load_model(model_uri)



# Predict

prediction = model.predict([[5.1, 3.5, 1.4, 0.2]])

print(f"Prediction: {prediction}")



---

# Serve Model เป็น REST API

# mlflow models serve -m "models:/iris-classifier/Production" -p 8080



# ทดสอบ API

# curl -X POST http://localhost:8080/invocations \

#   -H "Content-Type: application/json" \

#   -d '{"inputs": [[5.1, 3.5, 1.4, 0.2]]}'

Kubernetes Deployment สำหรับ Production

# kubernetes/mlflow-deployment.yaml

apiVersion: apps/v1

kind: Deployment

metadata:

  name: mlflow-tracking

  namespace: ml-platform

spec:

  replicas: 2

  selector:

    matchLabels:

      app: mlflow-tracking

  template:

    metadata:

      labels:

        app: mlflow-tracking

    spec:

      containers:

        - name: mlflow

          image: ghcr.io/mlflow/mlflow:2.10.0

          command:

            - mlflow

            - server

            - --backend-store-uri

            - postgresql://$(POSTGRES_USER):$(POSTGRES_PASSWORD)@postgres:5432/mlflow

            - --default-artifact-root

            - s3://mlflow-artifacts/

            - --host

            - "0.0.0.0"

            - --port

            - "5000"

            - --serve-artifacts

          env:

            - name: POSTGRES_USER

              valueFrom:

                secretKeyRef:

                  name: mlflow-secrets

                  key: postgres-user

            - name: POSTGRES_PASSWORD

              valueFrom:

                secretKeyRef:

                  name: mlflow-secrets

                  key: postgres-password

            - name: AWS_ACCESS_KEY_ID

              valueFrom:

                secretKeyRef:

                  name: mlflow-secrets

                  key: s3-access-key

            - name: AWS_SECRET_ACCESS_KEY

              valueFrom:

                secretKeyRef:

                  name: mlflow-secrets

                  key: s3-secret-key

            - name: MLFLOW_S3_ENDPOINT_URL

              value: "https://s3.company.com"

          ports:

            - containerPort: 5000

          resources:

            requests:

              cpu: 500m

              memory: 1Gi

            limits:

              cpu: 2000m

              memory: 4Gi

          livenessProbe:

            httpGet:

              path: /health

              port: 5000

            initialDelaySeconds: 30

            periodSeconds: 30

---

apiVersion: v1

kind: Service

metadata:

  name: mlflow-tracking

  namespace: ml-platform

spec:

  selector:

    app: mlflow-tracking

  ports:

    - port: 5000

      targetPort: 5000

Best Practices สำหรับ MLflow Remote Setup

ใช้ PostgreSQL เป็น Backend Store: SQLite ไม่รองรับ Concurrent Access ต้องใช้ PostgreSQL หรือ MySQL สำหรับทีม
ใช้ Object Storage สำหรับ Artifacts: S3, GCS หรือ MinIO เหมาะกว่า Local Filesystem เพราะ Scale ได้และเข้าถึงจากทุกที่
ตั้ง Naming Convention: กำหนดรูปแบบชื่อ Experiment, Run Name และ Tags ให้เป็นมาตรฐาน
Log ทุกอย่าง: Log Parameters, Metrics, Artifacts, Environment Info, Git Commit Hash เพื่อ Reproducibility
ใช้ Model Registry: จัดการ Model Versions ด้วย Staging/Production Stages แทนการ Copy ไฟล์ด้วยมือ
Backup เป็นประจำ: Backup PostgreSQL Database และ Artifact Store ทุกวัน
Monitor Server: ติดตาม Disk Usage, Memory, CPU ของ Tracking Server เพราะ Artifacts อาจใช้พื้นที่มาก

MLflow คืออะไรและใช้ทำอะไร

MLflow เป็น Open-source Platform สำหรับจัดการ Machine Learning Lifecycle ครอบคลุม Experiment Tracking ที่บันทึก Parameters, Metrics และ Artifacts ทุกการทดลอง, Model Registry สำหรับจัดการ Model Versions และ Model Serving สำหรับ Deploy Model เป็น REST API

แนะนำเพิ่มเติม — หนังสือเทรดที่ SiamCafeBook

เนื้อหาเกี่ยวข้อง — ทำความเข้าใจ ขาวสถต — คู่มือฉบับสมบูรณ์ 2026