ai

Skaffold Dev Monitoring และ Alerting — ตรวจสอบระบบ Dev Environment

Skaffold Dev Monitoring และ Alerting — ตรวจสอบระบบ Dev Environment

Skaffold Dev Monitoring

Skaffold Dev Monitoring และ Alerting — ตรวจสอบระบบ Dev Environment

Skaffold Dev Monitoring Alerting Prometheus Grafana Loki Log Aggregation Health Check Observability Metrics Dashboard Alert Pod Status Resource Usage

ToolPurposeData TypeQuery Languageเหมาะกับ
PrometheusMetrics CollectionTime SeriesPromQLMetrics
GrafanaVisualizationDashboardMultipleDashboard
LokiLog AggregationLogsLogQLLogs
JaegerDistributed TracingTracesSearchTracing
AlertmanagerAlert RoutingAlertsRulesNotification

Prometheus Setup

=== Prometheus + Grafana for Skaffold Dev ===

Install with Helm

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

helm install monitoring prometheus-community/kube-prometheus-stack \

--namespace monitoring --create-namespace \

--set grafana.adminPassword=admin123

Skaffold config — include monitoring

apiVersion: skaffold/v4beta7

kind: Config

metadata:

name: my-app

build:

artifacts:

  • image: my-app

docker:

dockerfile: Dockerfile

deploy:

kubectl:

manifests:

  • k8s/*.yaml

portForward:

  • resourceType: service

resourceName: my-app

port: 8080

localPort: 8080

  • resourceType: service

resourceName: monitoring-grafana

namespace: monitoring

port: 80

localPort: 3000

  • resourceType: service

resourceName: monitoring-kube-prometheus-prometheus

namespace: monitoring

port: 9090

localPort: 9090

Application Metrics — Python Flask

from prometheus_client import Counter, Histogram, generate_latest

from flask import Flask, Response

app = Flask(__name__)

REQUEST_COUNT = Counter('http_requests_total',

'Total HTTP requests', ['method', 'endpoint', 'status'])

REQUEST_LATENCY = Histogram('http_request_duration_seconds',

'HTTP request latency', ['method', 'endpoint'])

@app.before_request

def before_request():

request.start_time = time.time()

@app.after_request

เนื้อหาเกี่ยวข้อง — อ่านต่อ: LocalAI Self-hosted CQRS Event Sourcing

def after_request(response):

latency = time.time() - request.start_time

REQUEST_COUNT.labels(request.method, request.path, response.status_code).inc()

REQUEST_LATENCY.labels(request.method, request.path).observe(latency)

return response

@app.route('/metrics')

def metrics():

return Response(generate_latest(), mimetype='text/plain')

ServiceMonitor — Prometheus scrape config

apiVersion: monitoring.coreos.com/v1

kind: ServiceMonitor

แนะนำเพิ่มเติม — ติดตาม XM Signal

metadata:

name: my-app

labels:

release: monitoring

spec:

selector:

matchLabels:

app: my-app

endpoints:

  • port: http

path: /metrics

interval: 15s

from dataclasses import dataclass

@dataclass

class MetricConfig:

metric: str

type_val: str

labels: str

alert_threshold: str

dashboard: str

metrics = [

MetricConfig("http_requests_total", "Counter", "method endpoint status", "> 100 errors/min", "Request Rate"),

MetricConfig("http_request_duration_seconds", "Histogram", "method endpoint", "p99 > 500ms", "Latency"),

MetricConfig("container_cpu_usage_seconds_total", "Counter", "pod namespace", "> 80% limit", "CPU Usage"),

MetricConfig("container_memory_working_set_bytes", "Gauge", "pod namespace", "> 80% limit", "Memory"),

MetricConfig("kube_pod_status_phase", "Gauge", "pod phase", "phase != Running", "Pod Status"),

]

print("=== Monitoring Metrics ===")

for m in metrics:

print(f" [{m.metric}] Type: {m.type_val}")

print(f" Labels: {m.labels}")

print(f" Alert: {m.alert_threshold} | Dashboard: {m.dashboard}")

Log Aggregation

=== Loki Log Aggregation ===

เนื้อหาเกี่ยวข้อง — บทความที่เกี่ยวข้อง: Tailwind CSS v4 Pod Scheduling

Install Loki Stack

helm install loki grafana/loki-stack \

--namespace monitoring \

--set promtail.enabled=true \

--set grafana.enabled=false # use existing Grafana

Structured Logging — Python

import logging

import json

from datetime import datetime

class JSONFormatter(logging.Formatter):

def format(self, record):

log_entry = {

"timestamp": datetime.utcnow().isoformat(),

"level": record.levelname,

"message": record.getMessage(),

"logger": record.name,

"module": record.module,

"function": record.funcName,

"line": record.lineno,

}

แนะนำเพิ่มเติม — ระบบเทรดของ iCafeForex

if hasattr(record, 'correlation_id'):

Skaffold Dev Monitoring และ Alerting — ตรวจสอบระบบ Dev Environment

log_entry['correlation_id'] = record.correlation_id

if record.exc_info:

log_entry['exception'] = self.formatException(record.exc_info)

return json.dumps(log_entry)

handler = logging.StreamHandler()

handler.setFormatter(JSONFormatter())

logger = logging.getLogger('my-app')

logger.addHandler(handler)

LogQL Queries in Grafana

{app="my-app"} | json | level="error"

{app="my-app"} | json | latency_ms > 500

{namespace="production"} |= "Exception"

rate({app="my-app"} | json | level="error" [5m])

@dataclass

class LogQuery:

name: str

logql: str

เนื้อหาเกี่ยวข้อง — MLOps Pipeline High Availability HA Setup

purpose: str

alert: bool

queries = [

LogQuery("Error Logs", '{app="my-app"} | json | level="error"', "ดู Error ทั้งหมด", True),

LogQuery("Slow Requests", '{app="my-app"} | json | latency_ms > 500', "Request ที่ช้า", True),

LogQuery("Exception Trace", '{app="my-app"} |= "Exception"', "ดู Stack Trace", False),

LogQuery("Error Rate", 'rate({app="my-app"} | json | level="error" [5m])', "อัตรา Error", True),

LogQuery("Request Volume", 'rate({app="my-app"} [1m])', "ปริมาณ Request", False),

]

print("\n=== LogQL Queries ===")

for q in queries:

alert_tag = "[ALERT]" if q.alert else ""

print(f" [{q.name}] {alert_tag}")

print(f" Query: {q.logql}")

print(f" Purpose: {q.purpose}")

Alerting

=== Alerting Configuration ===

PrometheusRule — Alert Definitions

apiVersion: monitoring.coreos.com/v1

kind: PrometheusRule

metadata:

name: my-app-alerts

labels:

release: monitoring

spec:

groups:

  • name: my-app

rules:

  • alert: HighErrorRate

expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05

for: 5m

labels:

severity: critical

annotations:

summary: "High error rate on {{ $labels.instance }}"

description: "Error rate is {{ $value | humanizePercentage }}"

  • alert: HighLatency

expr: histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) > 0.5

for: 5m

labels:

severity: warning

  • alert: PodCrashLooping

expr: rate(kube_pod_container_status_restarts_total[15m]) > 0

for: 5m

labels:

severity: critical

Alertmanager — Slack Notification

apiVersion: monitoring.coreos.com/v1alpha1

เนื้อหาเกี่ยวข้อง — อ่านต่อ: Copilot AI คืออะไร — ข้อมูลครบถ้วน 2026

kind: AlertmanagerConfig

spec:

route:

receiver: slack

groupBy: [alertname, namespace]

groupWait: 30s

groupInterval: 5m

repeatInterval: 4h

receivers:

  • name: slack

slackConfigs:

  • channel: '#alerts'

apiURL: 'https://hooks.slack.com/services/...'

title: '[{{ .Status }}] {{ .CommonLabels.alertname }}'

text: '{{ .CommonAnnotations.description }}'

@dataclass

class AlertRule:

name: str

condition: str

severity: str

action: str

notification: str

alerts = [

AlertRule("HighErrorRate", "5xx > 5% for 5min", "Critical", "Investigate immediately", "Slack + PagerDuty"),

AlertRule("HighLatency", "p99 > 500ms for 5min", "Warning", "Check slow endpoints", "Slack"),

AlertRule("PodCrashLooping", "Restart > 0 in 15min", "Critical", "Check pod logs", "Slack + PagerDuty"),

AlertRule("HighCPU", "CPU > 80% for 10min", "Warning", "Scale up or optimize", "Slack"),

AlertRule("HighMemory", "Memory > 85% for 5min", "Warning", "Check memory leaks", "Slack"),

AlertRule("DiskFull", "Disk > 90%", "Critical", "Clean up or expand", "Slack + PagerDuty"),

]

print("Alert Rules:")

for a in alerts:

print(f" [{a.severity}] {a.name}")

print(f" Condition: {a.condition}")

print(f" Action: {a.action} | Notify: {a.notification}")

เคล็ดลับ

  • Metrics: เก็บ RED Metrics (Rate Errors Duration) ทุก Service
  • Structured Log: ใช้ JSON Log Format ค้นหาง่าย
  • Alert: ตั้ง Alert เฉพาะที่ต้อง Action ไม่ Alert มากจนชิน
  • Dashboard: สร้าง Dashboard สำหรับทุก Service ดูภาพรวม
  • Dev Parity: ใช้ Monitoring เหมือนกันทั้ง Dev และ Production

Skaffold Dev Monitoring คืออะไร

ตรวจสอบสถานะ Application Dev Skaffold Pod Status Log Port Forwarding Prometheus Grafana Loki Alertmanager Metrics Dashboard Bug Performance

ตั้งค่า Prometheus กับ Grafana อย่างไร

Helm install kube-prometheus-stack ServiceMonitor Dashboard CPU Memory Request Error Latency Alert Rules 80% CPU 5% Error Alertmanager Slack Email

Log Aggregation ทำอย่างไร

Loki Promtail Pod stdout Grafana LogQL Labels App Namespace Structured Logging JSON Log Level Correlation ID ติดตาม Request

Health Check ตั้งค่าอย่างไร

Liveness Probe Container Restart Readiness Probe Traffic Startup Probe httpGet /health /ready initialDelaySeconds periodSeconds Database Redis Dependencies

สรุป

Skaffold Dev Monitoring Alerting Prometheus Grafana Loki Log Aggregation Health Check Observability Metrics Dashboard Alert PromQL LogQL Production

XM Legend · เทรดเดอร์ & ผู้สอน Forex 13 ปี

ผู้ก่อตั้ง SiamCafe ตั้งแต่ปี 1997 · เทรดเดอร์สาย Forex มากกว่า 13 ปี ได้รับการยกย่องเป็น XM Legend · แบ่งปันความรู้ Forex, ไอที, AI และการเทรด จากประสบการณ์จริงในตลาดจริง