SiamCafe.net Blog
Technology

SigNoz Observability Edge Deployment

signoz observability edge deployment
SigNoz Observability Edge Deployment | SiamCafe Blog
2026-04-08· อ. บอม — SiamCafe.net· 9,814 คำ

SigNoz Edge Observability

SigNoz Observability Edge Deployment OpenTelemetry ClickHouse Metrics Traces Logs Alerting Edge Node IoT Production

ComponentLocationPurposeResource
OTel CollectorEdge NodeCollect Buffer Export Telemetry100MB RAM 1 vCPU
ApplicationEdge NodeBusiness Logic + OTel SDKDepends on App
SigNoz ServerCentral CloudQuery Dashboard Alert4 vCPU 8GB RAM
ClickHouseCentral CloudLong-term Storage8 vCPU 32GB RAM SSD
Kafka (Optional)Central CloudBuffer between Collector → SigNoz3 Brokers 4GB each

Edge OTel Collector Config

# === OpenTelemetry Collector for Edge ===

# otel-collector-edge.yaml
# receivers:
#   otlp:
#     protocols:
#       grpc: { endpoint: "0.0.0.0:4317" }
#       http: { endpoint: "0.0.0.0:4318" }
#
# processors:
#   batch:
#     send_batch_size: 1000
#     timeout: 30s
#   memory_limiter:
#     check_interval: 5s
#     limit_mib: 100
#   filter/edge:
#     traces:
#       span:
#         - 'attributes["http.status_code"] == 200 and kind == SPAN_KIND_CLIENT'
#   resource:
#     attributes:
#       - key: edge.location
#         value: "factory-bangkok-01"
#         action: upsert
#
# exporters:
#   otlp/signoz:
#     endpoint: "signoz-central.example.com:4317"
#     tls: { insecure: false }
#     retry_on_failure:
#       enabled: true
#       initial_interval: 5s
#       max_interval: 300s
#     sending_queue:
#       enabled: true
#       num_consumers: 2
#       queue_size: 5000
#       storage: file_storage
#   file_storage:
#     directory: /var/otel/buffer
#     timeout: 10s
#
# service:
#   pipelines:
#     traces:
#       receivers: [otlp]
#       processors: [memory_limiter, filter/edge, resource, batch]
#       exporters: [otlp/signoz]

from dataclasses import dataclass

@dataclass
class CollectorConfig:
    component: str
    config_key: str
    value: str
    purpose: str

configs = [
    CollectorConfig("Batch Processor",
        "batch.send_batch_size: 1000, timeout: 30s",
        "รวม Telemetry เป็น Batch ลด Network Call",
        "ลด Bandwidth 60-80% เทียบ Real-time"),
    CollectorConfig("Memory Limiter",
        "memory_limiter.limit_mib: 100",
        "จำกัด Memory ที่ Collector ใช้",
        "ป้องกัน OOM บน Edge Node (Resource จำกัด)"),
    CollectorConfig("Filter Processor",
        "filter: drop healthy spans",
        "กรอง Span ที่ไม่จำเป็น (200 OK Client)",
        "ลด Volume 30-50% เก็บเฉพาะ Error/Slow"),
    CollectorConfig("Resource Attribute",
        "resource.attributes: edge.location",
        "เพิ่ม Edge Location Label ทุก Telemetry",
        "Query Filter ตาม Location ใน SigNoz"),
    CollectorConfig("Retry + Queue",
        "retry_on_failure + sending_queue + file_storage",
        "Buffer ใน Disk เมื่อ Offline Retry เมื่อ Online",
        "ไม่สูญ Telemetry เมื่อ Network ขาด"),
]

print("=== Edge Collector Config ===")
for c in configs:
    print(f"  [{c.component}]")
    print(f"    Config: {c.config_key}")
    print(f"    Value: {c.value}")
    print(f"    Purpose: {c.purpose}")

SigNoz Dashboard

# === SigNoz Dashboard for Edge Monitoring ===

@dataclass
class DashPanel:
    panel: str
    query: str
    viz: str
    alert: str

panels = [
    DashPanel("Edge Node Status Map",
        "count by (edge.location) where last_seen > now()-5m",
        "Map/Table แสดง Online/Offline ต่อ Location",
        "Offline > 5m → P1 Alert"),
    DashPanel("Request Latency per Edge",
        "P99(duration) group by edge.location",
        "Heatmap Latency per Location per Hour",
        "P99 > 2s → P2 Warning"),
    DashPanel("Error Rate per Edge",
        "count(status=ERROR) / count(*) group by edge.location",
        "Bar Chart % Error per Location",
        "> 5% → P2 Warning > 10% → P1"),
    DashPanel("Throughput per Edge",
        "rate(span_count) group by edge.location",
        "Time Series req/min per Location",
        "< 1 req/min → Check Edge Health"),
    DashPanel("Collector Buffer Usage",
        "otelcol_exporter_queue_size / queue_capacity",
        "Gauge % Buffer Full per Edge",
        "> 80% → P2 Network Issue"),
    DashPanel("Resource Usage per Edge",
        "system.cpu.utilization, system.memory.usage",
        "Multi-line CPU RAM per Edge Node",
        "CPU > 90% or RAM > 85% → P2"),
]

print("=== SigNoz Dashboard Panels ===")
for p in panels:
    print(f"  [{p.panel}]")
    print(f"    Query: {p.query}")
    print(f"    Viz: {p.viz}")
    print(f"    Alert: {p.alert}")

Scaling & Production

# === Production Scaling ===

@dataclass
class ScaleTier:
    tier: str
    edge_nodes: str
    central_sizing: str
    storage: str
    features: str

tiers = [
    ScaleTier("Small",
        "1-10 Edge Nodes",
        "SigNoz: 2 vCPU 4GB | ClickHouse: 4 vCPU 16GB",
        "100GB SSD (30 days retention)",
        "Basic Dashboard Alerts Email"),
    ScaleTier("Medium",
        "10-50 Edge Nodes",
        "SigNoz: 4 vCPU 8GB | ClickHouse: 8 vCPU 32GB",
        "500GB SSD (60 days retention)",
        "Multi-location Dashboard Sampling PagerDuty"),
    ScaleTier("Large",
        "50-200 Edge Nodes",
        "SigNoz: 8 vCPU 16GB HA | ClickHouse Cluster 3 nodes",
        "2TB SSD (90 days retention)",
        "Kafka Buffer Tail Sampling Multi-team RBAC"),
    ScaleTier("Enterprise",
        "200+ Edge Nodes",
        "SigNoz HA + LB | ClickHouse Sharded Cluster",
        "10TB+ SSD (1 year retention)",
        "Multi-region Custom Retention Compliance Audit"),
]

print("=== Scaling Tiers ===")
for t in tiers:
    print(f"  [{t.tier}] Edge Nodes: {t.edge_nodes}")
    print(f"    Central: {t.central_sizing}")
    print(f"    Storage: {t.storage}")
    print(f"    Features: {t.features}")

เคล็ดลับ

SigNoz คืออะไร

Open Source Observability Metrics Traces Logs OpenTelemetry ClickHouse Go React Dashboard Alert Self-hosted Datadog Alternative Free

Edge Deployment คืออะไร

Deploy ที่ Edge ใกล้ User IoT CDN Retail Telecom Offline Resource จำกัด Network ไม่เสถียร Buffer Store-and-forward

Architecture ออกแบบอย่างไร

Edge OTel Collector Buffer Filter Batch Central SigNoz ClickHouse Kafka Retry Queue File Storage Sampling Compression

Alerting ตั้งอย่างไร

Edge Offline Latency Error Rate Resource CPU RAM Network Collector Buffer Slack PagerDuty P1 P2 P3 Runbook Dashboard Location

สรุป

SigNoz Observability Edge Deployment OpenTelemetry Collector Buffer ClickHouse Metrics Traces Logs Alert Dashboard Production

📖 บทความที่เกี่ยวข้อง

SigNoz Observability Troubleshooting แก้ปัญหาอ่านบทความ → SigNoz Observability Edge Computingอ่านบทความ → PostgreSQL JSONB Edge Deploymentอ่านบทความ → SigNoz Observability 12 Factor Appอ่านบทความ → Elixir Nerves IoT Edge Computingอ่านบทความ →

📚 ดูบทความทั้งหมด →