Skaffold Dev Monitoring และ Alerting — ตรวจสอบระบบ Dev Environment

Skaffold Dev Monitoring

Skaffold Dev Monitoring Alerting Prometheus Grafana Loki Log Aggregation Health Check Observability Metrics Dashboard Alert Pod Status Resource Usage

Tool	Purpose	Data Type	Query Language	เหมาะกับ
Prometheus	Metrics Collection	Time Series	PromQL	Metrics
Grafana	Visualization	Dashboard	Multiple	Dashboard
Loki	Log Aggregation	Logs	LogQL	Logs
Jaeger	Distributed Tracing	Traces	Search	Tracing
Alertmanager	Alert Routing	Alerts	Rules	Notification

Prometheus Setup

=== Prometheus + Grafana for Skaffold Dev ===

Install with Helm

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

helm install monitoring prometheus-community/kube-prometheus-stack \

--namespace monitoring --create-namespace \

--set grafana.adminPassword=admin123

Skaffold config — include monitoring

apiVersion: skaffold/v4beta7

kind: Config

metadata:

build:

artifacts:

image: my-app

docker:

dockerfile: Dockerfile

deploy:

kubectl:

manifests:

k8s/*.yaml

portForward:

resourceType: service

resourceName: my-app

port: 8080

localPort: 8080

resourceType: service

resourceName: monitoring-grafana

namespace: monitoring

port: 80

localPort: 3000

resourceType: service

resourceName: monitoring-kube-prometheus-prometheus

namespace: monitoring

port: 9090

localPort: 9090

Application Metrics — Python Flask

from prometheus_client import Counter, Histogram, generate_latest

from flask import Flask, Response

app = Flask(__name__)

REQUEST_COUNT = Counter('http_requests_total',

'Total HTTP requests', ['method', 'endpoint', 'status'])

REQUEST_LATENCY = Histogram('http_request_duration_seconds',

'HTTP request latency', ['method', 'endpoint'])

@app.before_request

def before_request():

request.start_time = time.time()

@app.after_request

เนื้อหาเกี่ยวข้อง — อ่านต่อ: LocalAI Self-hosted CQRS Event Sourcing

def after_request(response):

latency = time.time() - request.start_time

REQUEST_COUNT.labels(request.method, request.path, response.status_code).inc()

REQUEST_LATENCY.labels(request.method, request.path).observe(latency)

return response

@app.route('/metrics')

def metrics():

return Response(generate_latest(), mimetype='text/plain')

ServiceMonitor — Prometheus scrape config

apiVersion: monitoring.coreos.com/v1

kind: ServiceMonitor

แนะนำเพิ่มเติม — ติดตาม XM Signal

metadata:

labels:

release: monitoring

spec:

selector:

matchLabels:

app: my-app

endpoints:

port: http

path: /metrics

interval: 15s

from dataclasses import dataclass

@dataclass

class MetricConfig:

metric: str

type_val: str

labels: str

alert_threshold: str

dashboard: str

metrics = [

MetricConfig("http_requests_total", "Counter", "method endpoint status", "> 100 errors/min", "Request Rate"),

MetricConfig("http_request_duration_seconds", "Histogram", "method endpoint", "p99 > 500ms", "Latency"),

MetricConfig("container_cpu_usage_seconds_total", "Counter", "pod namespace", "> 80% limit", "CPU Usage"),

MetricConfig("container_memory_working_set_bytes", "Gauge", "pod namespace", "> 80% limit", "Memory"),

MetricConfig("kube_pod_status_phase", "Gauge", "pod phase", "phase != Running", "Pod Status"),

]

print("=== Monitoring Metrics ===")

for m in metrics:

print(f" [{m.metric}] Type: {m.type_val}")

print(f" Labels: {m.labels}")

print(f" Alert: {m.alert_threshold} | Dashboard: {m.dashboard}")

Log Aggregation

=== Loki Log Aggregation ===

เนื้อหาเกี่ยวข้อง — บทความที่เกี่ยวข้อง: Tailwind CSS v4 Pod Scheduling

Install Loki Stack

helm install loki grafana/loki-stack \

--namespace monitoring \

--set promtail.enabled=true \

--set grafana.enabled=false # use existing Grafana

Structured Logging — Python

import logging

import json

from datetime import datetime

class JSONFormatter(logging.Formatter):

def format(self, record):

log_entry = {

"timestamp": datetime.utcnow().isoformat(),

"level": record.levelname,

"message": record.getMessage(),

"logger": record.name,

"module": record.module,

"function": record.funcName,

"line": record.lineno,

}

แนะนำเพิ่มเติม — ระบบเทรดของ iCafeForex

if hasattr(record, 'correlation_id'):

log_entry['correlation_id'] = record.correlation_id

if record.exc_info:

log_entry['exception'] = self.formatException(record.exc_info)

return json.dumps(log_entry)

handler = logging.StreamHandler()

handler.setFormatter(JSONFormatter())

logger = logging.getLogger('my-app')

logger.addHandler(handler)

LogQL Queries in Grafana

{app="my-app"} | json | level="error"

{app="my-app"} | json | latency_ms > 500

{namespace="production"} |= "Exception"

rate({app="my-app"} | json | level="error" [5m])

@dataclass

class LogQuery:

logql: str

เนื้อหาเกี่ยวข้อง — MLOps Pipeline High Availability HA Setup

purpose: str

alert: bool

queries = [

LogQuery("Error Logs", '{app="my-app"} | json | level="error"', "ดู Error ทั้งหมด", True),

LogQuery("Slow Requests", '{app="my-app"} | json | latency_ms > 500', "Request ที่ช้า", True),

LogQuery("Exception Trace", '{app="my-app"} |= "Exception"', "ดู Stack Trace", False),

LogQuery("Error Rate", 'rate({app="my-app"} | json | level="error" [5m])', "อัตรา Error", True),

LogQuery("Request Volume", 'rate({app="my-app"} [1m])', "ปริมาณ Request", False),

]

print("\n=== LogQL Queries ===")

for q in queries:

alert_tag = "[ALERT]" if q.alert else ""

print(f" [{q.name}] {alert_tag}")

print(f" Query: {q.logql}")

print(f" Purpose: {q.purpose}")

Alerting

=== Alerting Configuration ===

PrometheusRule — Alert Definitions

apiVersion: monitoring.coreos.com/v1

kind: PrometheusRule

metadata:

labels:

release: monitoring

spec:

groups:

name: my-app

rules:

alert: HighErrorRate

expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05

for: 5m

labels:

severity: critical

annotations:

summary: "High error rate on {{ $labels.instance }}"

description: "Error rate is {{ $value | humanizePercentage }}"

alert: HighLatency

expr: histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) > 0.5

for: 5m

labels:

severity: warning

alert: PodCrashLooping

expr: rate(kube_pod_container_status_restarts_total[15m]) > 0

for: 5m

labels:

severity: critical

Alertmanager — Slack Notification

apiVersion: monitoring.coreos.com/v1alpha1

เนื้อหาเกี่ยวข้อง — อ่านต่อ: Copilot AI คืออะไร — ข้อมูลครบถ้วน 2026

kind: AlertmanagerConfig

spec:

route:

receiver: slack

groupBy: [alertname, namespace]

groupWait: 30s

groupInterval: 5m

repeatInterval: 4h

receivers:

name: slack

slackConfigs:

channel: '#alerts'

apiURL: 'https://hooks.slack.com/services/...'

title: '[{{ .Status }}] {{ .CommonLabels.alertname }}'

text: '{{ .CommonAnnotations.description }}'

@dataclass

class AlertRule:

condition: str

severity: str

action: str

notification: str

alerts = [

AlertRule("HighErrorRate", "5xx > 5% for 5min", "Critical", "Investigate immediately", "Slack + PagerDuty"),

AlertRule("HighLatency", "p99 > 500ms for 5min", "Warning", "Check slow endpoints", "Slack"),

AlertRule("PodCrashLooping", "Restart > 0 in 15min", "Critical", "Check pod logs", "Slack + PagerDuty"),

AlertRule("HighCPU", "CPU > 80% for 10min", "Warning", "Scale up or optimize", "Slack"),

AlertRule("HighMemory", "Memory > 85% for 5min", "Warning", "Check memory leaks", "Slack"),

AlertRule("DiskFull", "Disk > 90%", "Critical", "Clean up or expand", "Slack + PagerDuty"),

]

print("Alert Rules:")

for a in alerts:

print(f" [{a.severity}] {a.name}")

print(f" Condition: {a.condition}")

print(f" Action: {a.action} | Notify: {a.notification}")

เคล็ดลับ

Metrics: เก็บ RED Metrics (Rate Errors Duration) ทุก Service
Structured Log: ใช้ JSON Log Format ค้นหาง่าย
Alert: ตั้ง Alert เฉพาะที่ต้อง Action ไม่ Alert มากจนชิน
Dashboard: สร้าง Dashboard สำหรับทุก Service ดูภาพรวม
Dev Parity: ใช้ Monitoring เหมือนกันทั้ง Dev และ Production

Skaffold Dev Monitoring คืออะไร

ตรวจสอบสถานะ Application Dev Skaffold Pod Status Log Port Forwarding Prometheus Grafana Loki Alertmanager Metrics Dashboard Bug Performance

ตั้งค่า Prometheus กับ Grafana อย่างไร

Helm install kube-prometheus-stack ServiceMonitor Dashboard CPU Memory Request Error Latency Alert Rules 80% CPU 5% Error Alertmanager Slack Email

Log Aggregation ทำอย่างไร

Loki Promtail Pod stdout Grafana LogQL Labels App Namespace Structured Logging JSON Log Level Correlation ID ติดตาม Request

Health Check ตั้งค่าอย่างไร

Liveness Probe Container Restart Readiness Probe Traffic Startup Probe httpGet /health /ready initialDelaySeconds periodSeconds Database Redis Dependencies

สรุป

Skaffold Dev Monitoring Alerting Prometheus Grafana Loki Log Aggregation Health Check Observability Metrics Dashboard Alert PromQL LogQL Production

Skaffold Dev Monitoring และ Alerting — ตรวจสอบระบบ Dev Environment

Skaffold Dev Monitoring

Prometheus Setup

metadata:

build:

artifacts:

docker:

deploy:

kubectl:

manifests:

portForward:

def before_request():

def after_request(response):

def metrics():

metadata:

labels:

spec:

selector:

matchLabels:

endpoints:

class MetricConfig:

for m in metrics:

Log Aggregation

class JSONFormatter(logging.Formatter):

def format(self, record):

if hasattr(record, 'correlation_id'):

if record.exc_info:

class LogQuery:

for q in queries:

Alerting

metadata:

labels:

spec:

groups:

rules:

labels:

annotations:

labels:

labels:

spec:

route:

receivers:

slackConfigs:

class AlertRule:

for a in alerts:

เคล็ดลับ

Skaffold Dev Monitoring คืออะไร

ตั้งค่า Prometheus กับ Grafana อย่างไร

Log Aggregation ทำอย่างไร

Health Check ตั้งค่าอย่างไร

สรุป

บทความที่เกี่ยวข้อง

แนะนำจากเครือข่าย SiamCafe

บทความที่เกี่ยวข้อง