Uptime Kuma MLOps
Uptime Kuma Open Source Self-hosted Monitoring MLOps Workflow ML Pipeline Health Check Alert Telegram Slack Discord Status Page Docker Heartbeat Push Monitor
| Monitor Tool | Self-hosted | ราคา | Notifications | เหมาะกับ |
|---|---|---|---|---|
| Uptime Kuma | ใช่ | ฟรี | 20+ | Self-hosted |
| BetterUptime | ไม่ | $20+/mo | 10+ | SaaS |
| UptimeRobot | ไม่ | Free/Paid | 5+ | Simple |
| Grafana+Prometheus | ใช่ | ฟรี | 10+ | Advanced |
Installation และ Setup
# === Uptime Kuma Setup ===
# Docker — Single Command
# docker run -d \
# --restart=always \
# --name uptime-kuma \
# -p 3001:3001 \
# -v uptime-kuma:/app/data \
# louislam/uptime-kuma:1
# Docker Compose — Production
# version: "3"
# services:
# uptime-kuma:
# image: louislam/uptime-kuma:1
# container_name: uptime-kuma
# restart: always
# ports:
# - "3001:3001"
# volumes:
# - ./data:/app/data
# environment:
# - NODE_EXTRA_CA_CERTS=/app/data/certs/ca.pem
#
# caddy:
# image: caddy:2
# ports:
# - "80:80"
# - "443:443"
# volumes:
# - ./Caddyfile:/etc/caddy/Caddyfile
# depends_on:
# - uptime-kuma
# Caddyfile — Auto SSL
# status.example.com {
# reverse_proxy uptime-kuma:3001
# }
# API — Create Monitor Programmatically
# import requests
#
# KUMA_URL = "http://localhost:3001"
# TOKEN = "your-api-token"
#
# def create_monitor(name, url, monitor_type="http"):
# response = requests.post(
# f"{KUMA_URL}/api/monitors",
# headers={"Authorization": f"Bearer {TOKEN}"},
# json={
# "name": name,
# "url": url,
# "type": monitor_type,
# "interval": 60,
# "retryInterval": 30,
# "maxretries": 3,
# "accepted_statuscodes": ["200-299"],
# }
# )
# return response.json()
from dataclasses import dataclass
@dataclass
class Monitor:
name: str
type: str
target: str
interval: int
uptime_30d: float
avg_ms: int
status: str
monitors = [
Monitor("ML API /predict", "HTTP", "https://ml.example.com/predict", 30, 99.95, 85, "Up"),
Monitor("ML API /health", "HTTP", "https://ml.example.com/health", 60, 99.99, 12, "Up"),
Monitor("Model Registry", "HTTP", "https://mlflow.example.com", 60, 99.90, 150, "Up"),
Monitor("GPU Server SSH", "TCP", "gpu-01:22", 60, 99.85, 5, "Up"),
Monitor("Training Pipeline", "Push", "Heartbeat every 1h", 3600, 98.5, 0, "Up"),
Monitor("Inference GPU", "TCP", "gpu-02:8080", 30, 99.92, 8, "Up"),
Monitor("Data Pipeline", "Push", "Heartbeat every 6h", 21600, 99.0, 0, "Up"),
]
print("=== Uptime Kuma Monitors ===")
for m in monitors:
print(f" [{m.status}] {m.name} ({m.type})")
print(f" Target: {m.target} | Interval: {m.interval}s")
print(f" Uptime: {m.uptime_30d}% | Avg: {m.avg_ms}ms")
MLOps Monitoring
# === MLOps Workflow Monitoring ===
# Push Monitor — Training Pipeline Heartbeat
# import requests
#
# PUSH_URL = "https://status.example.com/api/push/abc123"
#
# def report_training_step(epoch, loss, accuracy):
# """Send heartbeat after each training epoch"""
# msg = f"Epoch {epoch}: loss={loss:.4f}, acc={accuracy:.4f}"
# requests.get(f"{PUSH_URL}?status=up&msg={msg}&ping=")
#
# def report_training_failed(error):
# """Report training failure"""
# requests.get(f"{PUSH_URL}?status=down&msg={error}")
#
# # In training loop:
# for epoch in range(100):
# loss, acc = train_epoch(model, dataloader)
# report_training_step(epoch, loss, acc)
#
# # Keyword Monitor — Check Model Version
# # Monitor URL: https://ml.example.com/health
# # Expected keyword: "model_version":"v2.1"
# # Alert if model version changed unexpectedly
# HTTP Monitor — Inference Latency
# Monitor: https://ml.example.com/predict
# Method: POST
# Body: {"text": "test input"}
# Expected status: 200
# Max response time: 500ms (alert if exceeded)
@dataclass
class MLWorkflow:
stage: str
monitor_type: str
check: str
alert_condition: str
severity: str
workflows = [
MLWorkflow("Data Ingestion", "Push", "Heartbeat ทุก 1h", "Missing > 2h", "Warning"),
MLWorkflow("Feature Engineering", "Push", "Heartbeat ทุก 30m", "Missing > 1h", "Warning"),
MLWorkflow("Model Training", "Push", "Heartbeat ทุก Epoch", "Missing > 2h", "Critical"),
MLWorkflow("Model Evaluation", "HTTP", "Eval endpoint", "Score < threshold", "Critical"),
MLWorkflow("Model Registry", "HTTP", "MLflow /health", "Down > 3min", "High"),
MLWorkflow("Model Serving", "HTTP", "/predict latency", "p99 > 500ms", "Critical"),
MLWorkflow("GPU Health", "TCP", "GPU server port", "Unreachable", "Critical"),
MLWorkflow("Data Quality", "Push", "DQ check ทุก 6h", "Missing > 12h", "Warning"),
]
print("\n=== MLOps Workflow Monitors ===")
for w in workflows:
print(f" [{w.severity}] {w.stage}")
print(f" Type: {w.monitor_type} | Check: {w.check}")
print(f" Alert: {w.alert_condition}")
Alert และ Status Page
# === Alert Configuration ===
# Telegram Alert
# Bot Token: from @BotFather
# Chat ID: from @userinfobot
# Setup in Uptime Kuma: Settings > Notifications > Telegram
# Line Notify
# Token: from https://notify-bot.line.me/
# Setup: Notifications > Line Notify > Token
# Discord Webhook
# URL: Server Settings > Integrations > Webhooks
# Setup: Notifications > Discord > Webhook URL
# Status Page
# Uptime Kuma > Status Pages > New
# Public URL: https://status.example.com
# Groups:
# - ML API Services
# - Training Pipeline
# - Infrastructure
# Custom Domain + SSL via Caddy/Nginx
@dataclass
class AlertChannel:
channel: str
setup: str
speed: str
use_case: str
enabled: bool
channels = [
AlertChannel("Telegram", "Bot Token + Chat ID", "< 5s", "Primary on-call", True),
AlertChannel("Slack", "Webhook URL", "< 5s", "Team channel", True),
AlertChannel("Discord", "Webhook URL", "< 5s", "Dev team", True),
AlertChannel("Line Notify", "Token", "< 10s", "Thai team", True),
AlertChannel("Email", "SMTP config", "< 60s", "Management", True),
AlertChannel("PagerDuty", "Integration key", "< 5s", "Critical only", False),
AlertChannel("Webhook", "Custom URL", "< 5s", "Automation", True),
]
print("Alert Channels:")
for c in channels:
status = "ON" if c.enabled else "OFF"
print(f" [{status}] {c.channel}")
print(f" Setup: {c.setup} | Speed: {c.speed} | Use: {c.use_case}")
# Maintenance Window
maintenance = [
"ตั้ง Maintenance Window ก่อน Deploy ป้องกัน False Alert",
"ใช้ API ตั้ง Maintenance อัตโนมัติจาก CI/CD",
"Status Page แสดง Scheduled Maintenance ให้ลูกค้า",
"Backup Data ทุกวัน /app/data/kuma.db",
"Update Image ทุกเดือน docker pull louislam/uptime-kuma:1",
]
print(f"\n\nMaintenance Tips:")
for i, m in enumerate(maintenance, 1):
print(f" {i}. {m}")
เคล็ดลับ
- Docker: ติดตั้ง Docker คำสั่งเดียว ง่ายมาก
- Push: ใช้ Push Monitor สำหรับ ML Pipeline Heartbeat
- Keyword: ตรวจ Model Version ใน Health Response
- SSL: ใช้ Caddy Auto SSL สำหรับ Status Page
- Backup: Backup kuma.db ทุกวัน สำคัญมาก
การนำไปใช้งานจริงในองค์กร
สำหรับองค์กรขนาดกลางถึงใหญ่ แนะนำให้ใช้หลัก Three-Tier Architecture คือ Core Layer ที่เป็นแกนกลางของระบบ Distribution Layer ที่ทำหน้าที่กระจาย Traffic และ Access Layer ที่เชื่อมต่อกับผู้ใช้โดยตรง การแบ่ง Layer ชัดเจนช่วยให้การ Troubleshoot ง่ายขึ้นและสามารถ Scale ระบบได้ตามความต้องการ
เรื่อง Network Security ก็สำคัญไม่แพ้กัน ควรติดตั้ง Next-Generation Firewall ที่สามารถ Deep Packet Inspection ได้ ใช้ Network Segmentation แยก VLAN สำหรับแต่ละแผนก ติดตั้ง IDS/IPS เพื่อตรวจจับการโจมตี และทำ Regular Security Audit อย่างน้อยปีละ 2 ครั้ง
Uptime Kuma คืออะไร
Open Source Self-hosted Monitoring Docker HTTP TCP DNS Push Heartbeat Status Page 20+ Alert Channel ฟรี Telegram Slack Discord Line
ใช้ Uptime Kuma กับ MLOps อย่างไร
HTTP Monitor ML API Health Push Monitor Training Heartbeat Keyword Model Version TCP GPU Server Alert Inference Latency Threshold
ติดตั้ง Uptime Kuma อย่างไร
docker run louislam/uptime-kuma port 3001 Admin Account Docker Compose Production Caddy Nginx SSL Let's Encrypt Backup อัตโนมัติ
Alert Integration มีอะไรบ้าง
Telegram Discord Slack Line Notify Email PagerDuty Opsgenie Teams Pushover Gotify ntfy Webhook 20+ Channel แยกต่อ Monitor Retry Escalation
สรุป
Uptime Kuma Self-hosted Monitoring MLOps Workflow Push Heartbeat HTTP TCP Alert Telegram Slack Discord Line Status Page Docker Caddy SSL Production Pipeline
