SiamCafe.net Blog
Cybersecurity

Uptime Kuma Monitoring Real Time Processing — ตรวจสอบสถานะแบบ Real-Time

uptime kuma monitoring real time processing
Uptime Kuma Monitoring Real-time Processing | SiamCafe Blog
2025-11-12· อ. บอม — SiamCafe.net· 1,679 คำ

Uptime Kuma คืออะไร

Uptime Kuma เป็น open source self-hosted monitoring tool สำหรับตรวจสอบสถานะของ websites, APIs, databases และ services ต่างๆ พัฒนาด้วย Node.js มี web UI ที่สวยงามและใช้งานง่าย รองรับ notification หลายช่องทาง เช่น LINE, Telegram, Discord, Slack, Email

จุดเด่นของ Uptime Kuma ได้แก่ Self-hosted ข้อมูลอยู่ในเครื่องของตัวเอง ไม่ต้องพึ่ง third-party, UI สวยงาม responsive ดูได้ทุกอุปกรณ์, รองรับ monitors หลายประเภท HTTP(s), TCP, Ping, DNS, Docker, Push, Steam Game Server, 90+ notification services, Status pages สร้างหน้าแสดงสถานะสำหรับผู้ใช้, Maintenance windows กำหนดเวลา maintenance ไม่แจ้งเตือน

Real-Time Processing สำหรับ monitoring หมายถึงการประมวลผล monitoring data แบบ real-time เพื่อ detect anomalies, trigger alerts และ automate responses ทันทีเมื่อเกิดปัญหา ไม่ต้องรอ batch processing

ติดตั้ง Uptime Kuma

Setup Uptime Kuma

# === Uptime Kuma Installation ===

# 1. Docker Installation (Recommended)
docker run -d \
  --name uptime-kuma \
  --restart=always \
  -p 3001:3001 \
  -v uptime-kuma:/app/data \
  louislam/uptime-kuma:1

# 2. Docker Compose
cat > docker-compose.yml << 'EOF'
version: '3.8'
services:
  uptime-kuma:
    image: louislam/uptime-kuma:1
    container_name: uptime-kuma
    restart: always
    ports:
      - "3001:3001"
    volumes:
      - uptime-kuma-data:/app/data
      - /var/run/docker.sock:/var/run/docker.sock:ro
    environment:
      - TZ=Asia/Bangkok

volumes:
  uptime-kuma-data:
EOF

docker-compose up -d

# 3. Access Web UI
# URL: http://your-server:3001
# Create admin account on first visit

# 4. Nginx Reverse Proxy with SSL
cat > /etc/nginx/sites-available/uptime-kuma << 'NGINX'
server {
    listen 443 ssl http2;
    server_name status.example.com;

    ssl_certificate /etc/letsencrypt/live/status.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/status.example.com/privkey.pem;

    location / {
        proxy_pass http://127.0.0.1:3001;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

server {
    listen 80;
    server_name status.example.com;
    return 301 https://$server_name$request_uri;
}
NGINX

sudo ln -s /etc/nginx/sites-available/uptime-kuma /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx

# 5. Node.js Installation (Alternative)
# ===================================
# git clone https://github.com/louislam/uptime-kuma.git
# cd uptime-kuma
# npm run setup
# npm run start-server
# Or use PM2:
# pm2 start server/server.js --name uptime-kuma

echo "Uptime Kuma installed"

ตั้งค่า Monitors และ Notifications

Configure monitors และ alerts

# === Monitor Configuration ===

# 1. HTTP(s) Monitor
# ===================================
# Settings > Add New Monitor
# Monitor Type: HTTP(s)
# Friendly Name: Production Website
# URL: https://example.com
# Heartbeat Interval: 60 seconds
# Retries: 3
# Heartbeat Retry Interval: 20 seconds
# Request Timeout: 30 seconds
# HTTP Method: GET
# Accepted Status Codes: 200-299
# Max Redirects: 10
#
# Advanced:
# - Enable certificate expiry notification (30 days)
# - Enable keyword monitoring (check for specific text)
# - Custom headers if needed (Authorization, etc.)

# 2. API Health Check
# ===================================
# Monitor Type: HTTP(s) - Keyword
# URL: https://api.example.com/health
# Keyword: "status":"ok"
# Method: GET
# Expected Status: 200
# Interval: 30 seconds

# 3. TCP Port Monitor
# ===================================
# Monitor Type: TCP Port
# Hostname: db.example.com
# Port: 3306 (MySQL)
# Interval: 60 seconds

# 4. Docker Container Monitor
# ===================================
# Monitor Type: Docker Container
# Container Name: nginx
# Docker Host: /var/run/docker.sock
# Interval: 30 seconds

# 5. DNS Monitor
# ===================================
# Monitor Type: DNS
# Hostname: example.com
# DNS Server: 8.8.8.8
# Expected Type: A
# Expected Value: 1.2.3.4

# 6. Notification Setup
# ===================================
# LINE Notify:
# Type: LINE Notify
# Token: (get from https://notify-bot.line.me/)
#
# Telegram:
# Type: Telegram
# Bot Token: (from @BotFather)
# Chat ID: (your chat/group ID)
#
# Discord:
# Type: Discord
# Webhook URL: (from Discord channel settings)
#
# Slack:
# Type: Slack
# Webhook URL: (from Slack app settings)
#
# Email (SMTP):
# Type: SMTP
# Hostname: smtp.gmail.com
# Port: 587
# Security: STARTTLS
# Username: your@gmail.com
# Password: app-specific-password

echo "Monitors configured"

Real-Time Processing Pipeline

สร้าง real-time monitoring pipeline

#!/usr/bin/env python3
# realtime_monitor.py — Real-Time Monitoring Pipeline
import json
import logging
from datetime import datetime, timedelta
from typing import Dict, List

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("realtime")

class RealTimeMonitorPipeline:
    def __init__(self):
        self.monitors = {}
        self.alerts = []
        self.metrics_buffer = []
    
    def ingest_heartbeat(self, monitor_id, data):
        """Process incoming heartbeat data in real-time"""
        processed = {
            "monitor_id": monitor_id,
            "timestamp": datetime.utcnow().isoformat(),
            "status": data.get("status"),  # up/down/pending
            "response_time_ms": data.get("response_time"),
            "status_code": data.get("status_code"),
            "message": data.get("message", ""),
        }
        
        self.metrics_buffer.append(processed)
        
        # Check for anomalies
        anomalies = self._detect_anomalies(monitor_id, processed)
        if anomalies:
            self._trigger_alert(monitor_id, anomalies)
        
        return processed
    
    def _detect_anomalies(self, monitor_id, current):
        """Real-time anomaly detection"""
        anomalies = []
        
        # Rule 1: Service down
        if current["status"] == "down":
            anomalies.append({
                "type": "service_down",
                "severity": "critical",
                "message": f"Monitor {monitor_id} is DOWN: {current['message']}",
            })
        
        # Rule 2: High response time
        if current["response_time_ms"] and current["response_time_ms"] > 5000:
            anomalies.append({
                "type": "high_latency",
                "severity": "warning",
                "message": f"Response time {current['response_time_ms']}ms exceeds threshold",
            })
        
        # Rule 3: Non-200 status code
        if current["status_code"] and current["status_code"] >= 500:
            anomalies.append({
                "type": "server_error",
                "severity": "high",
                "message": f"HTTP {current['status_code']} error",
            })
        
        # Rule 4: Response time trend (increasing)
        recent = [m for m in self.metrics_buffer[-10:] if m["monitor_id"] == monitor_id]
        if len(recent) >= 5:
            times = [m["response_time_ms"] for m in recent if m["response_time_ms"]]
            if len(times) >= 5 and all(times[i] < times[i+1] for i in range(len(times)-1)):
                anomalies.append({
                    "type": "degrading_performance",
                    "severity": "warning",
                    "message": "Response time consistently increasing",
                })
        
        return anomalies
    
    def _trigger_alert(self, monitor_id, anomalies):
        """Trigger alerts for detected anomalies"""
        for anomaly in anomalies:
            alert = {
                "monitor_id": monitor_id,
                "timestamp": datetime.utcnow().isoformat(),
                **anomaly,
            }
            self.alerts.append(alert)
            logger.warning(f"ALERT: {alert['severity']} - {alert['message']}")
    
    def get_dashboard(self):
        """Get real-time dashboard data"""
        total = len(self.metrics_buffer)
        up_count = sum(1 for m in self.metrics_buffer if m["status"] == "up")
        
        return {
            "total_heartbeats": total,
            "uptime_pct": round(up_count / max(total, 1) * 100, 2),
            "active_alerts": len([a for a in self.alerts if a["severity"] == "critical"]),
            "avg_response_ms": round(
                sum(m["response_time_ms"] for m in self.metrics_buffer if m["response_time_ms"]) / 
                max(sum(1 for m in self.metrics_buffer if m["response_time_ms"]), 1), 0
            ),
        }

pipeline = RealTimeMonitorPipeline()

# Simulate heartbeats
pipeline.ingest_heartbeat("web-prod", {"status": "up", "response_time": 150, "status_code": 200})
pipeline.ingest_heartbeat("web-prod", {"status": "up", "response_time": 180, "status_code": 200})
pipeline.ingest_heartbeat("api-prod", {"status": "down", "response_time": None, "status_code": 503, "message": "Connection refused"})
pipeline.ingest_heartbeat("web-prod", {"status": "up", "response_time": 6000, "status_code": 200})

dashboard = pipeline.get_dashboard()
print("Dashboard:", json.dumps(dashboard, indent=2))
print("Alerts:", json.dumps(pipeline.alerts, indent=2))

Advanced Monitoring Strategies

กลยุทธ์ monitoring ขั้นสูง

# === Advanced Monitoring ===

# 1. Multi-Location Monitoring
# ===================================
# Deploy Uptime Kuma at multiple locations:
# - Bangkok (primary)
# - Singapore
# - Tokyo
# - US West
#
# Compare results:
# - If DOWN from all locations → actual outage
# - If DOWN from 1 location → network issue at that location
# - Monitor from user's perspective (different ISPs)

# 2. Synthetic Monitoring
# ===================================
# Simulate real user workflows:
# Step 1: GET /login → expect 200
# Step 2: POST /api/auth → expect 200 + token
# Step 3: GET /api/dashboard → expect 200 + data
# Step 4: POST /api/logout → expect 200
#
# Use HTTP(s) monitors with chained checks
# Or use Push monitors with custom scripts

# 3. Status Page Configuration
# ===================================
# Uptime Kuma > Status Pages > Add
# Title: "Service Status"
# Slug: "status"
# Theme: Auto
# Published: Yes
#
# Groups:
# - Core Services: Website, API, Database
# - Infrastructure: CDN, DNS, Email
# - Third-party: Payment Gateway, SMS
#
# Custom domain: status.example.com
# Show history: 90 days

# 4. Maintenance Windows
# ===================================
# Settings > Maintenance > Add
# Title: "Database Maintenance"
# Strategy: Manual / Recurring
# Recurring: Every Sunday 02:00-04:00
# Affected monitors: db-primary, db-replica
# Status page message: "Scheduled database maintenance"

# 5. Monitor Groups
# ===================================
# Organize monitors by:
# - Environment: Production, Staging, Development
# - Type: Frontend, Backend, Database, Infrastructure
# - Priority: Critical, High, Medium, Low
# - Team: DevOps, Backend, Frontend

# 6. Response Time Thresholds
# ===================================
# Web pages: < 2 seconds (warning), < 5 seconds (critical)
# APIs: < 500ms (warning), < 2 seconds (critical)
# Database: < 100ms (warning), < 500ms (critical)
# DNS: < 50ms (warning), < 200ms (critical)

echo "Advanced monitoring configured"

Integration และ Automation

Integrate Uptime Kuma กับ tools อื่น

#!/usr/bin/env python3
# kuma_integration.py — Uptime Kuma Integrations
import json
import logging
from datetime import datetime
from typing import Dict, List

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("integration")

class KumaIntegration:
    def __init__(self, kuma_url, api_key=None):
        self.kuma_url = kuma_url
        self.api_key = api_key
    
    def webhook_handler(self, payload):
        """Handle Uptime Kuma webhook notifications"""
        event_type = payload.get("type", "unknown")
        monitor = payload.get("monitor", {})
        heartbeat = payload.get("heartbeat", {})
        
        result = {
            "event": event_type,
            "monitor_name": monitor.get("name"),
            "monitor_url": monitor.get("url"),
            "status": heartbeat.get("status"),
            "response_time": heartbeat.get("ping"),
            "message": heartbeat.get("msg"),
            "timestamp": datetime.utcnow().isoformat(),
        }
        
        # Route to appropriate handler
        if event_type == "down":
            self._handle_down(result)
        elif event_type == "up":
            self._handle_recovery(result)
        
        return result
    
    def _handle_down(self, event):
        """Handle service down event"""
        actions = [
            f"Send PagerDuty alert for {event['monitor_name']}",
            f"Create Jira incident ticket",
            f"Post to #incidents Slack channel",
            f"Start auto-remediation runbook",
        ]
        logger.critical(f"SERVICE DOWN: {event['monitor_name']}")
        return {"actions_taken": actions}
    
    def _handle_recovery(self, event):
        """Handle service recovery"""
        actions = [
            f"Resolve PagerDuty alert for {event['monitor_name']}",
            f"Update Jira ticket to resolved",
            f"Post recovery to #incidents Slack channel",
        ]
        logger.info(f"SERVICE RECOVERED: {event['monitor_name']}")
        return {"actions_taken": actions}
    
    def auto_remediation(self, monitor_name, issue_type):
        """Automated remediation based on issue type"""
        playbooks = {
            "high_memory": [
                "Clear application cache",
                "Restart PHP-FPM",
                "If still high, restart application",
            ],
            "disk_full": [
                "Clear log files older than 7 days",
                "Remove temp files",
                "Compress old logs",
            ],
            "service_down": [
                "Check if service process is running",
                "Attempt service restart",
                "If failed, check dependencies",
                "Escalate to on-call engineer",
            ],
            "ssl_expiring": [
                "Run certbot renew",
                "Verify new certificate",
                "Reload Nginx",
            ],
        }
        
        return {
            "monitor": monitor_name,
            "issue": issue_type,
            "playbook": playbooks.get(issue_type, ["Escalate to on-call"]),
            "auto_executed": issue_type in ["high_memory", "ssl_expiring"],
        }

integration = KumaIntegration("http://localhost:3001")

# Simulate webhook
down_event = integration.webhook_handler({
    "type": "down",
    "monitor": {"name": "Production API", "url": "https://api.example.com"},
    "heartbeat": {"status": 0, "ping": None, "msg": "Connection timeout"},
})
print("Down Event:", json.dumps(down_event, indent=2))

# Auto-remediation
remedy = integration.auto_remediation("Production API", "service_down")
print("\nRemediation:", json.dumps(remedy, indent=2))

FAQ คำถามที่พบบ่อย

Q: Uptime Kuma กับ UptimeRobot ต่างกันอย่างไร?

A: Uptime Kuma เป็น self-hosted ข้อมูลอยู่ใน server ตัวเอง ฟรีไม่จำกัด monitors, notifications ไม่ต้องจ่ายค่า subscription ต้อง maintain server เอง UptimeRobot เป็น SaaS ไม่ต้อง manage infrastructure free plan มี 50 monitors interval 5 นาที paid plan $7/mo ขึ้นไป ได้ 1 นาที interval multi-location monitoring สำหรับทีมเล็กที่มี technical skill แนะนำ Uptime Kuma (ประหยัดค่าใช้จ่าย) สำหรับทีมที่ไม่อยาก maintain server ใช้ UptimeRobot หรือ Better Uptime

Q: Uptime Kuma ใช้ resources เท่าไหร?

A: Uptime Kuma ใช้ resources น้อยมาก RAM ประมาณ 100-300MB ขึ้นกับจำนวน monitors CPU น้อยมาก (< 5%) สำหรับ 50-100 monitors Disk ขึ้นกับ retention period (default 180 days) ประมาณ 500MB-2GB VPS เล็กๆ 1 vCPU, 1GB RAM เพียงพอสำหรับ 100+ monitors รัน Docker container เดียวได้เลย ไม่ต้องมี database ภายนอก (ใช้ SQLite built-in)

Q: Monitor interval ควรตั้งเท่าไหร?

A: ขึ้นกับ criticality ของ service Critical services (payment, auth) ตั้ง 30 วินาที, Important services (API, website) ตั้ง 60 วินาที, Normal services (internal tools) ตั้ง 5 นาที, Low priority (dev, staging) ตั้ง 10-15 นาที interval สั้นเกินไปอาจ create load บน target service และสิ้นเปลือง resources ยาวเกินไปอาจ detect outage ช้า สำหรับ production website แนะนำ 60 วินาทีเป็น default

Q: จะ monitor อะไรบ้างสำหรับ production?

A: Must-have Website/landing page (HTTP 200), API endpoints สำคัญ (health check), Database connectivity (TCP port), SSL certificate expiry, DNS resolution Nice-to-have Response time trends, Third-party services (payment, email), CDN status, Background job processing, Disk space และ server resources (ใช้ Push monitor + script) เริ่มจาก monitor สิ่งที่ user-facing ก่อน แล้วค่อยเพิ่ม infrastructure monitors ตั้ง notification ให้แจ้ง LINE หรือ Telegram สำหรับ critical alerts

📖 บทความที่เกี่ยวข้อง

Uptime Kuma Monitoring Edge Deploymentอ่านบทความ → Uptime Kuma Monitoring Disaster Recovery Planอ่านบทความ → Uptime Kuma Monitoring Pub Sub Architectureอ่านบทความ → Uptime Kuma Monitoring DNS Managementอ่านบทความ →

📚 ดูบทความทั้งหมด →