Skaffold Dev Monitoring และ Alerting — ตรวจสอบระบบ Dev Environment
Skaffold Dev Monitoring

Skaffold Dev Monitoring Alerting Prometheus Grafana Loki Log Aggregation Health Check Observability Metrics Dashboard Alert Pod Status Resource Usage
| Tool | Purpose | Data Type | Query Language | เหมาะกับ |
|---|---|---|---|---|
| Prometheus | Metrics Collection | Time Series | PromQL | Metrics |
| Grafana | Visualization | Dashboard | Multiple | Dashboard |
| Loki | Log Aggregation | Logs | LogQL | Logs |
| Jaeger | Distributed Tracing | Traces | Search | Tracing |
| Alertmanager | Alert Routing | Alerts | Rules | Notification |
Prometheus Setup
=== Prometheus + Grafana for Skaffold Dev ===
Install with Helm
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install monitoring prometheus-community/kube-prometheus-stack \
--namespace monitoring --create-namespace \
--set grafana.adminPassword=admin123
Skaffold config — include monitoring
apiVersion: skaffold/v4beta7
kind: Config
metadata:
name: my-app
build:
artifacts:
- image: my-app
docker:
dockerfile: Dockerfile
deploy:
kubectl:
manifests:
- k8s/*.yaml
portForward:
- resourceType: service
resourceName: my-app
port: 8080
localPort: 8080
- resourceType: service
resourceName: monitoring-grafana
namespace: monitoring
port: 80
localPort: 3000
- resourceType: service
resourceName: monitoring-kube-prometheus-prometheus
namespace: monitoring
port: 9090
localPort: 9090
Application Metrics — Python Flask
from prometheus_client import Counter, Histogram, generate_latest
from flask import Flask, Response
app = Flask(__name__)
REQUEST_COUNT = Counter('http_requests_total',
'Total HTTP requests', ['method', 'endpoint', 'status'])
REQUEST_LATENCY = Histogram('http_request_duration_seconds',
'HTTP request latency', ['method', 'endpoint'])
@app.before_request
def before_request():
request.start_time = time.time()
@app.after_request
เนื้อหาเกี่ยวข้อง — อ่านต่อ: LocalAI Self-hosted CQRS Event Sourcing
def after_request(response):
latency = time.time() - request.start_time
REQUEST_COUNT.labels(request.method, request.path, response.status_code).inc()
REQUEST_LATENCY.labels(request.method, request.path).observe(latency)
return response
@app.route('/metrics')
def metrics():
return Response(generate_latest(), mimetype='text/plain')
ServiceMonitor — Prometheus scrape config
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
แนะนำเพิ่มเติม — ติดตาม XM Signal
metadata:
name: my-app
labels:
release: monitoring
spec:
selector:
matchLabels:
app: my-app
endpoints:
- port: http
path: /metrics
interval: 15s
from dataclasses import dataclass
@dataclass
class MetricConfig:
metric: str
type_val: str
labels: str
alert_threshold: str
dashboard: str
metrics = [
MetricConfig("http_requests_total", "Counter", "method endpoint status", "> 100 errors/min", "Request Rate"),
MetricConfig("http_request_duration_seconds", "Histogram", "method endpoint", "p99 > 500ms", "Latency"),
MetricConfig("container_cpu_usage_seconds_total", "Counter", "pod namespace", "> 80% limit", "CPU Usage"),
MetricConfig("container_memory_working_set_bytes", "Gauge", "pod namespace", "> 80% limit", "Memory"),
MetricConfig("kube_pod_status_phase", "Gauge", "pod phase", "phase != Running", "Pod Status"),
]
print("=== Monitoring Metrics ===")
for m in metrics:
print(f" [{m.metric}] Type: {m.type_val}")
print(f" Labels: {m.labels}")
print(f" Alert: {m.alert_threshold} | Dashboard: {m.dashboard}")
Log Aggregation
=== Loki Log Aggregation ===
เนื้อหาเกี่ยวข้อง — บทความที่เกี่ยวข้อง: Tailwind CSS v4 Pod Scheduling
Install Loki Stack
helm install loki grafana/loki-stack \
--namespace monitoring \
--set promtail.enabled=true \
--set grafana.enabled=false # use existing Grafana
Structured Logging — Python
import logging
import json
from datetime import datetime
class JSONFormatter(logging.Formatter):
def format(self, record):
log_entry = {
"timestamp": datetime.utcnow().isoformat(),
"level": record.levelname,
"message": record.getMessage(),
"logger": record.name,
"module": record.module,
"function": record.funcName,
"line": record.lineno,
}
แนะนำเพิ่มเติม — ระบบเทรดของ iCafeForex
if hasattr(record, 'correlation_id'):

log_entry['correlation_id'] = record.correlation_id
if record.exc_info:
log_entry['exception'] = self.formatException(record.exc_info)
return json.dumps(log_entry)
handler = logging.StreamHandler()
handler.setFormatter(JSONFormatter())
logger = logging.getLogger('my-app')
logger.addHandler(handler)
LogQL Queries in Grafana
{app="my-app"} | json | level="error"
{app="my-app"} | json | latency_ms > 500
{namespace="production"} |= "Exception"
rate({app="my-app"} | json | level="error" [5m])
@dataclass
class LogQuery:
name: str
logql: str
เนื้อหาเกี่ยวข้อง — MLOps Pipeline High Availability HA Setup
purpose: str
alert: bool
queries = [
LogQuery("Error Logs", '{app="my-app"} | json | level="error"', "ดู Error ทั้งหมด", True),
LogQuery("Slow Requests", '{app="my-app"} | json | latency_ms > 500', "Request ที่ช้า", True),
LogQuery("Exception Trace", '{app="my-app"} |= "Exception"', "ดู Stack Trace", False),
LogQuery("Error Rate", 'rate({app="my-app"} | json | level="error" [5m])', "อัตรา Error", True),
LogQuery("Request Volume", 'rate({app="my-app"} [1m])', "ปริมาณ Request", False),
]
print("\n=== LogQL Queries ===")
for q in queries:
alert_tag = "[ALERT]" if q.alert else ""
print(f" [{q.name}] {alert_tag}")
print(f" Query: {q.logql}")
print(f" Purpose: {q.purpose}")
Alerting
=== Alerting Configuration ===
PrometheusRule — Alert Definitions
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: my-app-alerts
labels:
release: monitoring
spec:
groups:
- name: my-app
rules:
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate on {{ $labels.instance }}"
description: "Error rate is {{ $value | humanizePercentage }}"
- alert: HighLatency
expr: histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) > 0.5
for: 5m
labels:
severity: warning
- alert: PodCrashLooping
expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
for: 5m
labels:
severity: critical
Alertmanager — Slack Notification
apiVersion: monitoring.coreos.com/v1alpha1
เนื้อหาเกี่ยวข้อง — อ่านต่อ: Copilot AI คืออะไร — ข้อมูลครบถ้วน 2026
kind: AlertmanagerConfig
spec:
route:
receiver: slack
groupBy: [alertname, namespace]
groupWait: 30s
groupInterval: 5m
repeatInterval: 4h
receivers:
- name: slack
slackConfigs:
- channel: '#alerts'
apiURL: 'https://hooks.slack.com/services/...'
title: '[{{ .Status }}] {{ .CommonLabels.alertname }}'
text: '{{ .CommonAnnotations.description }}'
@dataclass
class AlertRule:
name: str
condition: str
severity: str
action: str
notification: str
alerts = [
AlertRule("HighErrorRate", "5xx > 5% for 5min", "Critical", "Investigate immediately", "Slack + PagerDuty"),
AlertRule("HighLatency", "p99 > 500ms for 5min", "Warning", "Check slow endpoints", "Slack"),
AlertRule("PodCrashLooping", "Restart > 0 in 15min", "Critical", "Check pod logs", "Slack + PagerDuty"),
AlertRule("HighCPU", "CPU > 80% for 10min", "Warning", "Scale up or optimize", "Slack"),
AlertRule("HighMemory", "Memory > 85% for 5min", "Warning", "Check memory leaks", "Slack"),
AlertRule("DiskFull", "Disk > 90%", "Critical", "Clean up or expand", "Slack + PagerDuty"),
]
print("Alert Rules:")
for a in alerts:
print(f" [{a.severity}] {a.name}")
print(f" Condition: {a.condition}")
print(f" Action: {a.action} | Notify: {a.notification}")
เคล็ดลับ
- Metrics: เก็บ RED Metrics (Rate Errors Duration) ทุก Service
- Structured Log: ใช้ JSON Log Format ค้นหาง่าย
- Alert: ตั้ง Alert เฉพาะที่ต้อง Action ไม่ Alert มากจนชิน
- Dashboard: สร้าง Dashboard สำหรับทุก Service ดูภาพรวม
- Dev Parity: ใช้ Monitoring เหมือนกันทั้ง Dev และ Production
Skaffold Dev Monitoring คืออะไร
ตรวจสอบสถานะ Application Dev Skaffold Pod Status Log Port Forwarding Prometheus Grafana Loki Alertmanager Metrics Dashboard Bug Performance
ตั้งค่า Prometheus กับ Grafana อย่างไร
Helm install kube-prometheus-stack ServiceMonitor Dashboard CPU Memory Request Error Latency Alert Rules 80% CPU 5% Error Alertmanager Slack Email
Log Aggregation ทำอย่างไร
Loki Promtail Pod stdout Grafana LogQL Labels App Namespace Structured Logging JSON Log Level Correlation ID ติดตาม Request
Health Check ตั้งค่าอย่างไร
Liveness Probe Container Restart Readiness Probe Traffic Startup Probe httpGet /health /ready initialDelaySeconds periodSeconds Database Redis Dependencies
สรุป
Skaffold Dev Monitoring Alerting Prometheus Grafana Loki Log Aggregation Health Check Observability Metrics Dashboard Alert PromQL LogQL Production




