SiamCafe.net Blog
Cybersecurity

Skaffold Dev Monitoring และ Alerting

skaffold dev monitoring และ alerting
Skaffold Dev Monitoring และ Alerting | SiamCafe Blog
2026-05-14· อ. บอม — SiamCafe.net· 8,778 คำ

Skaffold Dev Monitoring

Skaffold Dev Monitoring Alerting Prometheus Grafana Loki Log Aggregation Health Check Observability Metrics Dashboard Alert Pod Status Resource Usage

ToolPurposeData TypeQuery Languageเหมาะกับ
PrometheusMetrics CollectionTime SeriesPromQLMetrics
GrafanaVisualizationDashboardMultipleDashboard
LokiLog AggregationLogsLogQLLogs
JaegerDistributed TracingTracesSearchTracing
AlertmanagerAlert RoutingAlertsRulesNotification

Prometheus Setup

# === Prometheus + Grafana for Skaffold Dev ===

# Install with Helm
# helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
# helm install monitoring prometheus-community/kube-prometheus-stack \
# --namespace monitoring --create-namespace \
# --set grafana.adminPassword=admin123

# Skaffold config — include monitoring
# apiVersion: skaffold/v4beta7
# kind: Config
# metadata:
# name: my-app
# build:
# artifacts:
# - image: my-app
# docker:
# dockerfile: Dockerfile
# deploy:
# kubectl:
# manifests:
# - k8s/*.yaml
# portForward:
# - resourceType: service
# resourceName: my-app
# port: 8080
# localPort: 8080
# - resourceType: service
# resourceName: monitoring-grafana
# namespace: monitoring
# port: 80
# localPort: 3000
# - resourceType: service
# resourceName: monitoring-kube-prometheus-prometheus
# namespace: monitoring
# port: 9090
# localPort: 9090

# Application Metrics — Python Flask
# from prometheus_client import Counter, Histogram, generate_latest
# from flask import Flask, Response
#
# app = Flask(__name__)
#
# REQUEST_COUNT = Counter('http_requests_total',
# 'Total HTTP requests', ['method', 'endpoint', 'status'])
# REQUEST_LATENCY = Histogram('http_request_duration_seconds',
# 'HTTP request latency', ['method', 'endpoint'])
#
# @app.before_request
# def before_request():
# request.start_time = time.time()
#
# @app.after_request
# def after_request(response):
# latency = time.time() - request.start_time
# REQUEST_COUNT.labels(request.method, request.path, response.status_code).inc()
# REQUEST_LATENCY.labels(request.method, request.path).observe(latency)
# return response
#
# @app.route('/metrics')
# def metrics():
# return Response(generate_latest(), mimetype='text/plain')

# ServiceMonitor — Prometheus scrape config
# apiVersion: monitoring.coreos.com/v1
# kind: ServiceMonitor
# metadata:
# name: my-app
# labels:
# release: monitoring
# spec:
# selector:
# matchLabels:
# app: my-app
# endpoints:
# - port: http
# path: /metrics
# interval: 15s

from dataclasses import dataclass

@dataclass
class MetricConfig:
 metric: str
 type_val: str
 labels: str
 alert_threshold: str
 dashboard: str

metrics = [
 MetricConfig("http_requests_total", "Counter", "method endpoint status", "> 100 errors/min", "Request Rate"),
 MetricConfig("http_request_duration_seconds", "Histogram", "method endpoint", "p99 > 500ms", "Latency"),
 MetricConfig("container_cpu_usage_seconds_total", "Counter", "pod namespace", "> 80% limit", "CPU Usage"),
 MetricConfig("container_memory_working_set_bytes", "Gauge", "pod namespace", "> 80% limit", "Memory"),
 MetricConfig("kube_pod_status_phase", "Gauge", "pod phase", "phase != Running", "Pod Status"),
]

print("=== Monitoring Metrics ===")
for m in metrics:
 print(f" [{m.metric}] Type: {m.type_val}")
 print(f" Labels: {m.labels}")
 print(f" Alert: {m.alert_threshold} | Dashboard: {m.dashboard}")

Log Aggregation

# === Loki Log Aggregation ===

# Install Loki Stack
# helm install loki grafana/loki-stack \
# --namespace monitoring \
# --set promtail.enabled=true \
# --set grafana.enabled=false # use existing Grafana

# Structured Logging — Python
# import logging
# import json
# from datetime import datetime
#
# class JSONFormatter(logging.Formatter):
# def format(self, record):
# log_entry = {
# "timestamp": datetime.utcnow().isoformat(),
# "level": record.levelname,
# "message": record.getMessage(),
# "logger": record.name,
# "module": record.module,
# "function": record.funcName,
# "line": record.lineno,
# }
# if hasattr(record, 'correlation_id'):
# log_entry['correlation_id'] = record.correlation_id
# if record.exc_info:
# log_entry['exception'] = self.formatException(record.exc_info)
# return json.dumps(log_entry)
#
# handler = logging.StreamHandler()
# handler.setFormatter(JSONFormatter())
# logger = logging.getLogger('my-app')
# logger.addHandler(handler)

# LogQL Queries in Grafana
# {app="my-app"} | json | level="error"
# {app="my-app"} | json | latency_ms > 500
# {namespace="production"} |= "Exception"
# rate({app="my-app"} | json | level="error" [5m])

@dataclass
class LogQuery:
 name: str
 logql: str
 purpose: str
 alert: bool

queries = [
 LogQuery("Error Logs", '{app="my-app"} | json | level="error"', "ดู Error ทั้งหมด", True),
 LogQuery("Slow Requests", '{app="my-app"} | json | latency_ms > 500', "Request ที่ช้า", True),
 LogQuery("Exception Trace", '{app="my-app"} |= "Exception"', "ดู Stack Trace", False),
 LogQuery("Error Rate", 'rate({app="my-app"} | json | level="error" [5m])', "อัตรา Error", True),
 LogQuery("Request Volume", 'rate({app="my-app"} [1m])', "ปริมาณ Request", False),
]

print("\n=== LogQL Queries ===")
for q in queries:
 alert_tag = "[ALERT]" if q.alert else ""
 print(f" [{q.name}] {alert_tag}")
 print(f" Query: {q.logql}")
 print(f" Purpose: {q.purpose}")

Alerting

# === Alerting Configuration ===

# PrometheusRule — Alert Definitions
# apiVersion: monitoring.coreos.com/v1
# kind: PrometheusRule
# metadata:
# name: my-app-alerts
# labels:
# release: monitoring
# spec:
# groups:
# - name: my-app
# rules:
# - alert: HighErrorRate
# expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
# for: 5m
# labels:
# severity: critical
# annotations:
# summary: "High error rate on {{ $labels.instance }}"
# description: "Error rate is {{ $value | humanizePercentage }}"
#
# - alert: HighLatency
# expr: histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) > 0.5
# for: 5m
# labels:
# severity: warning
#
# - alert: PodCrashLooping
# expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
# for: 5m
# labels:
# severity: critical

# Alertmanager — Slack Notification
# apiVersion: monitoring.coreos.com/v1alpha1
# kind: AlertmanagerConfig
# spec:
# route:
# receiver: slack
# groupBy: [alertname, namespace]
# groupWait: 30s
# groupInterval: 5m
# repeatInterval: 4h
# receivers:
# - name: slack
# slackConfigs:
# - channel: '#alerts'
# apiURL: 'https://hooks.slack.com/services/...'
# title: '[{{ .Status }}] {{ .CommonLabels.alertname }}'
# text: '{{ .CommonAnnotations.description }}'

@dataclass
class AlertRule:
 name: str
 condition: str
 severity: str
 action: str
 notification: str

alerts = [
 AlertRule("HighErrorRate", "5xx > 5% for 5min", "Critical", "Investigate immediately", "Slack + PagerDuty"),
 AlertRule("HighLatency", "p99 > 500ms for 5min", "Warning", "Check slow endpoints", "Slack"),
 AlertRule("PodCrashLooping", "Restart > 0 in 15min", "Critical", "Check pod logs", "Slack + PagerDuty"),
 AlertRule("HighCPU", "CPU > 80% for 10min", "Warning", "Scale up or optimize", "Slack"),
 AlertRule("HighMemory", "Memory > 85% for 5min", "Warning", "Check memory leaks", "Slack"),
 AlertRule("DiskFull", "Disk > 90%", "Critical", "Clean up or expand", "Slack + PagerDuty"),
]

print("Alert Rules:")
for a in alerts:
 print(f" [{a.severity}] {a.name}")
 print(f" Condition: {a.condition}")
 print(f" Action: {a.action} | Notify: {a.notification}")

เคล็ดลับ

Skaffold Dev Monitoring คืออะไร

ตรวจสอบสถานะ Application Dev Skaffold Pod Status Log Port Forwarding Prometheus Grafana Loki Alertmanager Metrics Dashboard Bug Performance

ตั้งค่า Prometheus กับ Grafana อย่างไร

Helm install kube-prometheus-stack ServiceMonitor Dashboard CPU Memory Request Error Latency Alert Rules 80% CPU 5% Error Alertmanager Slack Email

Log Aggregation ทำอย่างไร

Loki Promtail Pod stdout Grafana LogQL Labels App Namespace Structured Logging JSON Log Level Correlation ID ติดตาม Request

Health Check ตั้งค่าอย่างไร

Liveness Probe Container Restart Readiness Probe Traffic Startup Probe httpGet /health /ready initialDelaySeconds periodSeconds Database Redis Dependencies

สรุป

Skaffold Dev Monitoring Alerting Prometheus Grafana Loki Log Aggregation Health Check Observability Metrics Dashboard Alert PromQL LogQL Production

📖 บทความที่เกี่ยวข้อง

Skaffold Dev Agile Scrum Kanbanอ่านบทความ → Skaffold Dev RBAC ABAC Policyอ่านบทความ → Skaffold Dev สำหรับมือใหม่ Step by Stepอ่านบทความ → Skaffold Dev Stream Processingอ่านบทความ → Skaffold Dev Multi-cloud Strategyอ่านบทความ →

📚 ดูบทความทั้งหมด →