SiamCafe.net Blog
Cybersecurity

Uptime Kuma Monitoring และ Alerting Self-Hosted Uptime Monitor ฟร

uptime kuma monitoring monitoring และ alerting
Uptime Kuma Monitoring Monitoring และ Alerting | SiamCafe Blog
2025-12-27· อ. บอม — SiamCafe.net· 1,488 คำ

Uptime Kuma ?????????????????????

Uptime Kuma ???????????? open-source self-hosted monitoring tool ??????????????????????????????????????? uptime ????????? websites, APIs, DNS, TCP ports ????????? services ??????????????? ?????? UI ?????????????????? ?????????????????????????????? ?????????????????? notification ????????????????????????????????? ??????????????????????????? Node.js ????????? Vue.js

?????????????????????????????????????????? Monitor types ???????????????????????? (HTTP, HTTPS, TCP, Ping, DNS, Docker, gRPC, MQTT), Status Pages ??????????????????????????? status page ????????????????????????????????????????????????????????????????????????, Notifications ?????????????????? 90+ ????????????????????? (Telegram, Slack, Discord, Email, LINE, Webhook), Dashboard ???????????? uptime %, response time, certificate expiry, Maintenance Windows ??????????????????????????? maintenance ????????? alert

??????????????????????????? Uptime Kuma ????????? open-source ???????????????????????? license, Self-hosted ???????????????????????????????????? server ??????????????????, ???????????????????????? Prometheus+Grafana ?????????????????? uptime monitoring, ?????????????????? Docker ??????????????????????????????????????????, Community active updates ????????????

??????????????????????????????????????????????????? Uptime Kuma

????????????????????????????????? Uptime Kuma

# === Uptime Kuma Installation ===

# 1. Docker (???????????????)
docker run -d \
  --name uptime-kuma \
  --restart=always \
  -p 3001:3001 \
  -v uptime-kuma:/app/data \
  louislam/uptime-kuma:1

# 2. Docker Compose (Production)
cat > docker-compose.yml << 'EOF'
version: '3.8'
services:
  uptime-kuma:
    image: louislam/uptime-kuma:1
    container_name: uptime-kuma
    restart: always
    ports:
      - "3001:3001"
    volumes:
      - uptime-kuma-data:/app/data
      - /var/run/docker.sock:/var/run/docker.sock:ro
    environment:
      - NODE_ENV=production
    healthcheck:
      test: ["CMD", "node", "extra/healthcheck.js"]
      interval: 30s
      timeout: 10s
      retries: 3

volumes:
  uptime-kuma-data:
    driver: local
EOF

docker compose up -d

# 3. Kubernetes Deployment
cat > uptime-kuma-k8s.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
  name: uptime-kuma
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: uptime-kuma
  template:
    metadata:
      labels:
        app: uptime-kuma
    spec:
      containers:
        - name: uptime-kuma
          image: louislam/uptime-kuma:1
          ports:
            - containerPort: 3001
          volumeMounts:
            - name: data
              mountPath: /app/data
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 512Mi
          livenessProbe:
            httpGet:
              path: /
              port: 3001
            initialDelaySeconds: 30
            periodSeconds: 30
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: uptime-kuma-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: uptime-kuma
  namespace: monitoring
spec:
  selector:
    app: uptime-kuma
  ports:
    - port: 3001
      targetPort: 3001
  type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: uptime-kuma
  namespace: monitoring
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt
spec:
  tls:
    - hosts: ["status.example.com"]
      secretName: uptime-kuma-tls
  rules:
    - host: status.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: uptime-kuma
                port:
                  number: 3001
EOF

kubectl apply -f uptime-kuma-k8s.yaml

# 4. Reverse Proxy (Nginx)
cat > /etc/nginx/sites-available/uptime-kuma << 'NGINX'
server {
    listen 443 ssl http2;
    server_name status.example.com;
    
    ssl_certificate /etc/letsencrypt/live/status.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/status.example.com/privkey.pem;
    
    location / {
        proxy_pass http://localhost:3001;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}
NGINX

echo "Uptime Kuma installed"

????????????????????? Monitors ????????? Alerting

????????????????????? monitors ?????????????????? services ???????????????

#!/usr/bin/env python3
# uptime_kuma_api.py ??? Uptime Kuma API Client
import json
import logging
from typing import Dict, List

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("kuma")

class UptimeKumaManager:
    """Manage Uptime Kuma monitors via API"""
    
    def __init__(self, base_url="http://localhost:3001"):
        self.base_url = base_url
    
    def monitor_configs(self):
        """Common monitor configurations"""
        return {
            "website_monitors": [
                {
                    "name": "Main Website",
                    "type": "http",
                    "url": "https://www.example.com",
                    "interval": 60,
                    "retryInterval": 30,
                    "maxretries": 3,
                    "accepted_statuscodes": ["200-299"],
                    "keyword": "Welcome",
                    "description": "Monitor main website availability",
                },
                {
                    "name": "API Health",
                    "type": "http",
                    "url": "https://api.example.com/health",
                    "interval": 30,
                    "method": "GET",
                    "accepted_statuscodes": ["200"],
                    "headers": '{"Authorization": "Bearer health-check-token"}',
                    "body": None,
                    "maxredirects": 3,
                },
                {
                    "name": "Admin Panel",
                    "type": "http",
                    "url": "https://admin.example.com",
                    "interval": 120,
                    "accepted_statuscodes": ["200-399"],
                    "keyword": "Dashboard",
                },
            ],
            "infrastructure_monitors": [
                {
                    "name": "Database (PostgreSQL)",
                    "type": "tcp",
                    "hostname": "db.internal",
                    "port": 5432,
                    "interval": 30,
                },
                {
                    "name": "Redis Cache",
                    "type": "tcp",
                    "hostname": "redis.internal",
                    "port": 6379,
                    "interval": 30,
                },
                {
                    "name": "DNS Resolution",
                    "type": "dns",
                    "hostname": "example.com",
                    "dns_resolve_server": "8.8.8.8",
                    "port": 53,
                    "interval": 300,
                },
                {
                    "name": "Mail Server",
                    "type": "tcp",
                    "hostname": "mail.example.com",
                    "port": 587,
                    "interval": 120,
                },
            ],
            "ssl_monitors": [
                {
                    "name": "SSL Certificate - Main",
                    "type": "http",
                    "url": "https://www.example.com",
                    "interval": 3600,
                    "expiryNotification": True,
                    "description": "Alert 30 days before SSL expiry",
                },
            ],
            "docker_monitors": [
                {
                    "name": "Nginx Container",
                    "type": "docker",
                    "docker_container": "nginx",
                    "docker_host": "/var/run/docker.sock",
                    "interval": 30,
                },
                {
                    "name": "App Container",
                    "type": "docker",
                    "docker_container": "app",
                    "docker_host": "/var/run/docker.sock",
                    "interval": 30,
                },
            ],
        }
    
    def notification_configs(self):
        """Notification channel configurations"""
        return {
            "telegram": {
                "type": "telegram",
                "config": {
                    "telegramBotToken": "",
                    "telegramChatID": "",
                },
            },
            "slack": {
                "type": "slack",
                "config": {
                    "slackwebhookURL": "",
                    "slackchannelnotify": True,
                },
            },
            "line_notify": {
                "type": "line",
                "config": {
                    "lineNotifyAccessToken": "",
                },
            },
            "discord": {
                "type": "discord",
                "config": {
                    "discordWebhookUrl": "",
                },
            },
            "email_smtp": {
                "type": "smtp",
                "config": {
                    "smtpHost": "smtp.gmail.com",
                    "smtpPort": 587,
                    "smtpSecure": False,
                    "smtpUsername": "",
                    "smtpPassword": "",
                    "smtpFrom": "monitor@example.com",
                    "smtpTo": "team@example.com",
                },
            },
        }

manager = UptimeKumaManager()
configs = manager.monitor_configs()
print("Monitor Configurations:")
for category, monitors in configs.items():
    print(f"\n  {category}:")
    for m in monitors:
        print(f"    {m['name']} ({m['type']}): interval={m['interval']}s")

notifs = manager.notification_configs()
print(f"\nNotification Channels:")
for name, config in notifs.items():
    print(f"  {name}: {config['type']}")

Advanced Monitoring Patterns

Patterns ??????????????????????????????????????? monitoring

# === Advanced Monitoring ===

# 1. Status Page Configuration
cat > status_page.yaml << 'EOF'
status_pages:
  public:
    title: "System Status"
    slug: "status"
    description: "Real-time status of our services"
    theme: "auto"
    published: true
    groups:
      - name: "Core Services"
        monitors:
          - "Main Website"
          - "API Health"
          - "Admin Panel"
      - name: "Infrastructure"
        monitors:
          - "Database (PostgreSQL)"
          - "Redis Cache"
          - "DNS Resolution"
      - name: "Communication"
        monitors:
          - "Mail Server"
          - "LINE Bot"
    
  internal:
    title: "Internal System Status"
    slug: "internal-status"
    published: false
    groups:
      - name: "Docker Containers"
        monitors:
          - "Nginx Container"
          - "App Container"
          - "DB Container"
      - name: "Background Jobs"
        monitors:
          - "Cron Job Health"
          - "Queue Worker"

maintenance:
  scheduled:
    - title: "Database Migration"
      start: "2024-06-20T02:00:00+07:00"
      end: "2024-06-20T04:00:00+07:00"
      monitors: ["Database (PostgreSQL)"]
      strategy: "manual"
    - title: "Monthly Server Update"
      cron: "0 2 1 * *"
      duration: 120
      monitors: ["all"]
      strategy: "recurring"
EOF

# 2. Health check endpoint for complex checks
cat > healthcheck.py << 'PYTHON'
#!/usr/bin/env python3
"""Advanced health check endpoint"""
from http.server import HTTPServer, BaseHTTPRequestHandler
import json
import logging

logging.basicConfig(level=logging.INFO)

class HealthHandler(BaseHTTPRequestHandler):
    def do_GET(self):
        if self.path == "/health":
            checks = self.run_checks()
            status = 200 if all(c["status"] == "ok" for c in checks.values()) else 503
            
            self.send_response(status)
            self.send_header("Content-Type", "application/json")
            self.end_headers()
            self.wfile.write(json.dumps({
                "status": "healthy" if status == 200 else "unhealthy",
                "checks": checks,
            }).encode())
    
    def run_checks(self):
        return {
            "database": {"status": "ok", "latency_ms": 5},
            "redis": {"status": "ok", "latency_ms": 1},
            "disk": {"status": "ok", "usage_pct": 45},
            "memory": {"status": "ok", "usage_pct": 62},
        }

print("Health check server on :8080/health")
# HTTPServer(("0.0.0.0", 8080), HealthHandler).serve_forever()
PYTHON

echo "Advanced monitoring configured"

Notification Channels

????????????????????? notification ????????????????????????????????????

#!/usr/bin/env python3
# notification_setup.py ??? Notification Channel Setup
import json
import logging
from typing import Dict, List

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("notify")

class NotificationManager:
    def __init__(self):
        pass
    
    def escalation_policy(self):
        return {
            "levels": {
                "level_1": {
                    "trigger": "Service down for 1 minute",
                    "channels": ["Telegram (on-call)", "Slack (#alerts)"],
                    "wait_before_escalate": "5 minutes",
                },
                "level_2": {
                    "trigger": "Service down for 5 minutes (no ack)",
                    "channels": ["Phone call (on-call)", "Email (team lead)"],
                    "wait_before_escalate": "15 minutes",
                },
                "level_3": {
                    "trigger": "Service down for 20 minutes (no resolution)",
                    "channels": ["Phone (manager)", "Email (all-engineering)", "SMS"],
                    "action": "Activate incident response",
                },
            },
            "on_call_schedule": {
                "rotation": "weekly",
                "team": ["dev_a", "dev_b", "dev_c", "dev_d"],
                "current": "dev_a",
                "backup": "dev_b",
            },
        }
    
    def alert_templates(self):
        return {
            "down": {
                "title": "[DOWN] {monitor_name}",
                "body": "Service: {monitor_name}\nStatus: DOWN\nTime: {time}\nDuration: {duration}\nURL: {url}\nError: {error}",
            },
            "up": {
                "title": "[UP] {monitor_name}",
                "body": "Service: {monitor_name}\nStatus: UP (recovered)\nDowntime: {duration}\nResponse: {response_time}ms",
            },
            "ssl_expiry": {
                "title": "[SSL] Certificate expiring soon",
                "body": "Domain: {domain}\nExpires: {expiry_date}\nDays left: {days_left}\nAction: Renew certificate",
            },
        }
    
    def best_practices(self):
        return {
            "alert_fatigue": [
                "???????????? retry 3 ??????????????????????????? alert (????????????????????? flapping)",
                "Group alerts ????????????????????????????????????????????????",
                "????????? severity levels (critical/warning/info)",
                "Mute ???????????? maintenance window",
                "Review ????????? tune thresholds ????????????????????????",
            ],
            "monitoring_coverage": [
                "Monitor ????????? user-facing endpoints",
                "Monitor database connectivity",
                "Monitor SSL certificate expiry (30 days)",
                "Monitor disk space (> 80%)",
                "Monitor DNS resolution",
            ],
        }

manager = NotificationManager()
policy = manager.escalation_policy()
print("Escalation Policy:")
for level, info in policy["levels"].items():
    print(f"\n  {level}: {info['trigger']}")
    print(f"    Channels: {', '.join(info['channels'])}")

templates = manager.alert_templates()
print(f"\nAlert Templates:")
for name, tmpl in templates.items():
    print(f"  {name}: {tmpl['title']}")

bp = manager.best_practices()
print(f"\nBest Practices (Alert Fatigue):")
for tip in bp["alert_fatigue"][:3]:
    print(f"  - {tip}")

High Availability ????????? Backup

????????????????????? HA ????????? backup ?????????????????? Uptime Kuma

# === HA and Backup ===

# 1. Backup script
cat > backup_kuma.sh << 'BASH'
#!/bin/bash
# Backup Uptime Kuma data
BACKUP_DIR="/backup/uptime-kuma"
DATE=$(date +%Y%m%d_%H%M%S)
CONTAINER="uptime-kuma"

mkdir -p "$BACKUP_DIR"

# Stop container briefly for consistent backup
docker stop $CONTAINER

# Copy SQLite database
docker cp $CONTAINER:/app/data/kuma.db "$BACKUP_DIR/kuma_.db"

# Start container
docker start $CONTAINER

# Compress
gzip "$BACKUP_DIR/kuma_.db"

# Keep only last 30 backups
ls -t "$BACKUP_DIR"/kuma_*.db.gz | tail -n +31 | xargs rm -f 2>/dev/null

echo "Backup complete: kuma_.db.gz"
echo "Size: $(du -h "$BACKUP_DIR/kuma_.db.gz" | cut -f1)"
BASH

# 2. Restore script
cat > restore_kuma.sh << 'BASH'
#!/bin/bash
# Restore Uptime Kuma from backup
BACKUP_FILE=$1

if [ -z "$BACKUP_FILE" ]; then
    echo "Usage: $0 "
    exit 1
fi

CONTAINER="uptime-kuma"

# Decompress if needed
if [[ "$BACKUP_FILE" == *.gz ]]; then
    gunzip -k "$BACKUP_FILE"
    BACKUP_FILE=""
fi

# Stop container
docker stop $CONTAINER

# Restore database
docker cp "$BACKUP_FILE" $CONTAINER:/app/data/kuma.db

# Start container
docker start $CONTAINER

echo "Restore complete from: $BACKUP_FILE"
BASH

# 3. Cron job for daily backup
cat > /etc/cron.d/uptime-kuma-backup << 'CRON'
# Daily backup at 3 AM
0 3 * * * root /opt/scripts/backup_kuma.sh >> /var/log/kuma-backup.log 2>&1
CRON

# 4. HA with external monitoring
cat > ha_setup.yaml << 'EOF'
high_availability:
  strategy: "Active-Passive with external check"
  
  primary:
    url: "https://status.example.com"
    location: "Server A (Bangkok)"
    
  external_checks:
    description: "????????? external service monitor Uptime Kuma ?????????"
    services:
      - name: "UptimeRobot (free)"
        url: "https://uptimerobot.com"
        monitors: 50
        interval: "5 min"
        cost: "Free"
      - name: "Hetrix Tools (free)"
        url: "https://hetrixtools.com"
        monitors: 15
        interval: "1 min"
        cost: "Free"
    
  backup_plan:
    daily_backup: "SQLite DB backup via cron"
    retention: "30 days"
    recovery_time: "< 15 minutes"
    recovery_steps:
      - "Deploy new Uptime Kuma instance"
      - "Restore SQLite backup"
      - "Update DNS to new instance"
      - "Verify monitors are running"
EOF

chmod +x backup_kuma.sh restore_kuma.sh
echo "HA and backup configured"

FAQ ??????????????????????????????????????????

Q: Uptime Kuma ????????? Prometheus+Grafana ???????????????????????????????????????????

A: Uptime Kuma ???????????? uptime monitoring tool ?????????????????????????????????????????? service up/down, response time, SSL expiry UI ???????????? ????????????????????????????????? (Docker 1 command) ??????????????? SMB/startup ?????????????????????????????? monitoring ???????????? Prometheus+Grafana ???????????? full observability stack ???????????? metrics ???????????????????????? (CPU, memory, custom metrics), ?????? alerting ?????????????????????, dashboard customizable ????????? ????????? setup ?????????????????????????????????????????? ???????????? configure exporters, PromQL ??????????????? ????????? Uptime Kuma ?????????????????? uptime monitoring (website/API up/down) ????????? Prometheus+Grafana ?????????????????? infrastructure monitoring (CPU, memory, disk) ??????????????????????????????????????????????????????????????????????????? ???????????????????????????

Q: Uptime Kuma ?????????????????? monitors ???????????????????

A: ??????????????? hard limit ????????????????????? hardware ?????????????????? Server specs ????????? (1 CPU, 512MB RAM) ?????????????????? 100-200 monitors ????????????, Server ???????????? (2 CPU, 2GB RAM) ?????????????????? 500-1000 monitors, Interval ?????????????????????????????????????????? ????????? interval 60s ???????????????????????????????????????????????? interval 10s Tips ???????????? interval ???????????????????????????????????? Critical services 30s, Standard services 60s, Non-critical 300s, ?????? unnecessary monitors (????????????????????? monitor internal services ??????????????? health check ????????????????????????), ????????? keyword check ?????????????????????????????????????????? (??????????????? overhead)

Q: LINE Notify ???????????????????????????????????????????

A: ????????????????????? ??????????????? notify-bot.line.me/ja/ login ???????????? LINE account, ??????????????? token ??????????????? "1-on-1 chat" ???????????? group ??????????????????????????????, Copy token ?????????????????? ?????? Uptime Kuma Settings ??? Notifications ??? Add Notification ??????????????? "LINE Notify" ????????? Access Token ???????????? name ???????????? "LINE Alert" ?????? Test ???????????????????????????????????????????????? = ?????????????????? ???????????? assign notification ?????????????????? monitors ?????????????????????????????? ???????????????????????? LINE Notify ????????????????????????????????? 31 ?????????????????? 2025 ?????????????????????????????? LINE Messaging API ???????????? LINE Official Account ????????? ???????????????????????? ????????? Telegram Bot (????????? ????????????????????????) ???????????? Discord Webhook

Q: Monitor ??????????????????????????????????????????????????????????????????????????????????????????????

A: ??????????????????????????????????????????????????? Website HTTP monitor URL ?????????????????????, interval 60s, keyword ???????????????????????????????????????????????????, accepted status 200, API ??????????????? monitor /health endpoint interval 30s, SSL Certificate monitor HTTPS endpoint, alert 30 ??????????????????????????????????????????, DNS monitor domain name, DNS server 8.8.8.8, interval 300s, Database ????????? access ????????? TCP monitor port 3306 (MySQL) ???????????? 5432 (PostgreSQL) Notifications LINE Notify ???????????? Telegram ?????????????????? alert ???????????????, Email ?????????????????? daily/weekly summary, Slack/Discord ?????????????????? team channel Status Page ??????????????? public status page ???????????????????????????????????????????????? ???????????? status.example.co.th

📖 บทความที่เกี่ยวข้อง

Uptime Kuma Monitoring Edge Deploymentอ่านบทความ → Uptime Kuma Monitoring Pub Sub Architectureอ่านบทความ → Uptime Kuma Monitoring Disaster Recovery Planอ่านบทความ → Uptime Kuma Monitoring Edge Computingอ่านบทความ →

📚 ดูบทความทั้งหมด →