Monte Carlo Observability Shift Left Security —

Monte Carlo Shift Left

Monte Carlo Data Observability Shift Left Security Data Quality Freshness Volume Schema Lineage Anomaly Detection dbt Great Expectations

เนื้อหาเกี่ยวข้อง — แนะนำให้อ่าน Skaffold Dev Distributed System

Monitor Type	What it Checks	Detection	Alert
Freshness	ข้อมูลมาตรงเวลาหรือไม่	ML (Auto-baseline)	Data Late > 2 std dev
Volume	ปริมาณข้อมูลปกติหรือไม่	ML (Auto-baseline)	Volume Change > 30%
Schema	โครงสร้าง Table เปลี่ยนหรือไม่	Exact Match	Column Added/Removed/Changed
Distribution	ค่าข้อมูลกระจายปกติหรือไม่	ML (Statistical)	Distribution Shift > Threshold
Custom SQL	Business Rule ที่กำหนดเอง	SQL Query Result	Result ≠ Expected

Data Observability Architecture

# === Data Observability Pipeline ===

from dataclasses import dataclass

@dataclass
class ObservabilityLayer:
    layer: str
    tool: str
    checks: str
    when: str
    action_on_fail: str

layers = [
    ObservabilityLayer("Source (Shift Left)",
        "Great Expectations + Custom Validators",
        "Schema Validation, PII Detection, Null Check, Type Check",
        "ก่อน Ingestion (ที่ Source)",
        "Reject Bad Data, Alert Data Owner"),
    ObservabilityLayer("Ingestion",
        "Airflow Sensors + Monte Carlo",
        "Freshness, Volume, File Format, Encoding",
        "ระหว่าง Ingestion",
        "Retry, Dead Letter Queue, Alert"),
    ObservabilityLayer("Transformation",
        "dbt Tests + Monte Carlo",
        "unique, not_null, accepted_values, relationships, custom SQL",
        "หลัง dbt run (CI/CD)",
        "Block Deploy, Alert Engineer"),
    ObservabilityLayer("Warehouse",
        "Monte Carlo (Auto ML Monitors)",
        "Freshness, Volume, Schema, Distribution, Lineage",
        "ต่อเนื่อง (24/7 Monitoring)",
        "Alert SOC, Impact Analysis, Incident"),
    ObservabilityLayer("Consumption",
        "Monte Carlo + Dashboard Monitoring",
        "Dashboard Load Time, Query Performance, User Reports",
        "เมื่อ User Report ปัญหา",
        "Root Cause Analysis via Lineage"),
]

print("=== Observability Layers ===")
for l in layers:
    print(f"\n  [{l.layer}] Tool: {l.tool}")
    print(f"    Checks: {l.checks}")
    print(f"    When: {l.when}")
    print(f"    On Fail: {l.action_on_fail}")

Shift Left Implementation

# === Shift Left Data Quality in CI/CD ===

# dbt test examples (schema.yml)
# models:
#   - name: orders
#     columns:
#       - name: order_id
#         tests:
#           - unique
#           - not_null
#       - name: status
#         tests:
#           - accepted_values:
#               values: ['pending', 'shipped', 'delivered', 'cancelled']
#       - name: total_amount
#         tests:
#           - not_null
#           - dbt_utils.expression_is_true:
#               expression: ">= 0"

# Great Expectations example
# import great_expectations as gx
# context = gx.get_context()
# validator = context.sources.pandas_default.read_csv("orders.csv")
# validator.expect_column_values_to_not_be_null("order_id")
# validator.expect_column_values_to_be_between("total_amount", min_value=0)
# validator.expect_column_values_to_be_in_set("status",
#     ["pending", "shipped", "delivered", "cancelled"])
# result = validator.validate()

@dataclass
class ShiftLeftCheck:
    check: str
    tool: str
    stage: str
    example: str
    impact: str

checks = [
    ShiftLeftCheck("Schema Validation",
        "Monte Carlo / dbt / Custom",
        "PR Review (CI)",
        "ตรวจ Schema Change ก่อน Merge dbt Model",
        "ป้องกัน Breaking Change ไป Production"),
    ShiftLeftCheck("Data Contract",
        "dbt Tests / Great Expectations",
        "Ingestion (Pre-load)",
        "ตรวจ Type Null Range ก่อน Load เข้า Warehouse",
        "ป้องกัน Bad Data เข้า Warehouse"),
    ShiftLeftCheck("PII Detection",
        "Monte Carlo / Custom Regex",
        "Pre-Ingestion + Post-Transform",
        "หา Email Phone ID Number ใน Column ที่ไม่ควรมี",
        "ป้องกัน Data Leak Compliance Violation"),
    ShiftLeftCheck("Business Rule",
        "dbt Tests / Custom SQL Monitor",
        "Post-Transform (dbt run)",
        "Revenue >= 0, User Count > Previous Day × 0.9",
        "ป้องกัน Wrong Data ไป Dashboard Report"),
    ShiftLeftCheck("Lineage Impact",
        "Monte Carlo Lineage",
        "Pre-Deploy (CI)",
        "ดูว่า Model Change กระทบ Dashboard/Report ไหน",
        "รู้ Impact ก่อน Deploy ลด Surprise"),
]

print("=== Shift Left Checks ===")
for c in checks:
    print(f"\n  [{c.check}] Tool: {c.tool}")
    print(f"    Stage: {c.stage}")
    print(f"    Example: {c.example}")
    print(f"    Impact: {c.impact}")

Incident Management

# === Data Incident Response ===

@dataclass
class IncidentStep:
    step: str
    action: str
    tool: str
    time: str

incident_flow = [
    IncidentStep("1. Detection",
        "Monte Carlo ML ตรวจจับ Anomaly อัตโนมัติ",
        "Monte Carlo Auto-monitors",
        "0-5 นาที (อัตโนมัติ)"),
    IncidentStep("2. Alert",
        "ส่ง Alert ไป Slack PagerDuty + สร้าง Incident",
        "Monte Carlo → Slack/PagerDuty/Jira",
        "ทันที"),
    IncidentStep("3. Triage",
        "ดู Impact Analysis ผ่าน Lineage ว่ากระทบอะไร",
        "Monte Carlo Lineage + Impact Dashboard",
        "5-10 นาที"),
    IncidentStep("4. Root Cause",
        "ใช้ Lineage Trace กลับไปหา Source ที่มีปัญหา",
        "Monte Carlo Lineage + Query Logs",
        "10-30 นาที"),
    IncidentStep("5. Fix",
        "แก้ไขที่ Source Fix dbt Model Re-run Pipeline",
        "dbt + Airflow + Git",
        "30-60 นาที"),
    IncidentStep("6. Verify",
        "ตรวจสอบ Data กลับมาปกติ Monitor ต่อเนื่อง",
        "Monte Carlo + dbt Tests",
        "ทันทีหลัง Fix"),
]

print("=== Incident Response ===")
for s in incident_flow:
    print(f"  [{s.step}] {s.action}")
    print(f"    Tool: {s.tool}")
    print(f"    Time: {s.time}")