Technology

BigQuery Scheduled Query Observability Stack

bigquery scheduled query observability stack
BigQuery Scheduled Query Observability Stack | SiamCafe Blog
2025-11-12· อ. บอม — SiamCafe.net· 11,064 คำ

BigQuery Observability

BigQuery Scheduled Query Observability Stack Monitoring Alerting Logging Dashboard Data Pipeline ETL Production

ComponentToolData SourceUse Case
MetricsCloud MonitoringBigQuery Metrics APIDuration, Bytes, Slots
LogsCloud LoggingAudit Logs, Job LogsError, Status, Details
Job HistoryINFORMATION_SCHEMABigQuery InternalPerformance Analysis
AlertsCloud MonitoringMetrics + LogsFail, Slow, Cost
DashboardLooker Studio / GrafanaAll aboveOverview, Trend

Scheduled Query Setup

# === BigQuery Scheduled Query with Terraform ===

# resource "google_bigquery_data_transfer_config" "daily_etl" {
#   display_name           = "daily_orders_etl"
#   data_source_id         = "scheduled_query"
#   schedule               = "every day 02:00"
#   destination_dataset_id = "analytics"
#   location               = "asia-southeast1"
#
#   params = {
#     destination_table_name_template = "orders_daily_{run_date}"
#     write_disposition               = "WRITE_TRUNCATE"
#     query                           = <<-SQL
#       SELECT
#         DATE(created_at) as order_date,
#         COUNT(*) as order_count,
#         SUM(total_amount) as total_revenue,
#         COUNT(DISTINCT customer_id) as unique_customers
#       FROM `project.raw.orders`
#       WHERE DATE(created_at) = @run_date
#       GROUP BY 1
#     SQL
#   }
#
#   email_preferences {
#     enable_failure_email = true
#   }
# }

# bq CLI
# bq mk --transfer_config \
#   --target_dataset=analytics \
#   --display_name='daily_orders_etl' \
#   --data_source=scheduled_query \
#   --schedule='every day 02:00' \
#   --params='{
#     "query": "SELECT ... FROM ... WHERE DATE(created_at) = @run_date",
#     "destination_table_name_template": "orders_daily_{run_date}",
#     "write_disposition": "WRITE_TRUNCATE"
#   }'

from dataclasses import dataclass

@dataclass
class ScheduledQuery:
    name: str
    schedule: str
    purpose: str
    destination: str
    alert: str

queries = [
    ScheduledQuery("daily_orders_etl",
        "every day 02:00",
        "รวม Orders รายวัน Revenue Customers",
        "analytics.orders_daily_{run_date}",
        "Fail → Slack #data-alerts"),
    ScheduledQuery("hourly_metrics_agg",
        "every 1 hours",
        "รวม Application Metrics ทุกชั่วโมง",
        "analytics.app_metrics_hourly",
        "Fail + Duration > 10min → PagerDuty"),
    ScheduledQuery("data_quality_check",
        "every day 03:00 (หลัง ETL)",
        "ตรวจ Row Count NULL Freshness",
        "monitoring.dq_results",
        "Quality Fail → Slack + Jira Ticket"),
    ScheduledQuery("weekly_report",
        "every monday 06:00",
        "สร้าง Weekly Summary Report",
        "reports.weekly_summary",
        "Fail → Email to Data Team"),
]

print("=== Scheduled Queries ===")
for q in queries:
    print(f"  [{q.name}] Schedule: {q.schedule}")
    print(f"    Purpose: {q.purpose}")
    print(f"    Destination: {q.destination}")
    print(f"    Alert: {q.alert}")

Monitoring & Alerting

# === Observability Setup ===

# INFORMATION_SCHEMA query for job monitoring
# SELECT
#   job_id,
#   creation_time,
#   end_time,
#   TIMESTAMP_DIFF(end_time, start_time, SECOND) as duration_sec,
#   total_bytes_processed,
#   total_slot_ms,
#   state,
#   error_result.reason as error_reason,
#   error_result.message as error_message
# FROM `region-asia-southeast1`.INFORMATION_SCHEMA.JOBS
# WHERE job_type = 'QUERY'
#   AND creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
# ORDER BY creation_time DESC
# LIMIT 100;

# Data Quality Check Query
# SELECT
#   'orders' as table_name,
#   COUNT(*) as row_count,
#   COUNTIF(customer_id IS NULL) as null_customer_id,
#   COUNTIF(total_amount IS NULL) as null_total_amount,
#   MAX(created_at) as latest_record,
#   TIMESTAMP_DIFF(CURRENT_TIMESTAMP(), MAX(created_at), HOUR) as freshness_hours
# FROM `project.raw.orders`
# WHERE DATE(created_at) = CURRENT_DATE();

@dataclass
class AlertRule:
    alert: str
    condition: str
    severity: str
    channel: str
    runbook: str

alerts = [
    AlertRule("Scheduled Query Failed",
        "transfer_config state = FAILED",
        "Critical",
        "Slack #data-alerts + PagerDuty",
        "ตรวจ Error Message → แก้ Query → Manual Rerun"),
    AlertRule("Query Duration > SLA",
        "duration_sec > 2x average",
        "Warning",
        "Slack #data-alerts",
        "ตรวจ Query Plan → Optimize Partitioning/Clustering"),
    AlertRule("Bytes Processed Spike",
        "total_bytes_processed > 3x average",
        "Warning",
        "Slack #data-costs",
        "ตรวจ Full Table Scan → เพิ่ม WHERE Clause"),
    AlertRule("Data Quality Failure",
        "row_count = 0 OR null_rate > 5% OR freshness > 4hr",
        "Critical",
        "Slack + Jira Ticket",
        "ตรวจ Source Data → แจ้ง Data Producer"),
    AlertRule("Cost Budget Exceeded",
        "daily_bytes > budget_bytes",
        "Warning",
        "Email to Data Lead",
        "Review Expensive Queries → Optimize or Limit"),
]

print("=== Alert Rules ===")
for a in alerts:
    print(f"  [{a.alert}] Severity: {a.severity}")
    print(f"    Condition: {a.condition}")
    print(f"    Channel: {a.channel}")
    print(f"    Runbook: {a.runbook}")

Dashboard

# === Observability Dashboard ===

@dataclass
class DashboardPanel:
    panel: str
    source: str
    visualization: str
    refresh: str

panels = [
    DashboardPanel("Scheduled Query Status",
        "INFORMATION_SCHEMA.JOBS + Transfer Config API",
        "Status Table: Green=Success Red=Failed Yellow=Running",
        "ทุก 5 นาที"),
    DashboardPanel("Query Duration Trend",
        "INFORMATION_SCHEMA.JOBS duration_sec",
        "Line Chart: duration per query over time + SLA line",
        "ทุก 15 นาที"),
    DashboardPanel("Bytes Processed Daily",
        "INFORMATION_SCHEMA.JOBS total_bytes_processed",
        "Bar Chart: daily bytes + cost estimate + budget line",
        "ทุกชั่วโมง"),
    DashboardPanel("Slot Utilization",
        "Cloud Monitoring bigquery.googleapis.com/slots",
        "Area Chart: slot usage over time + capacity line",
        "ทุก 5 นาที"),
    DashboardPanel("Data Quality Scores",
        "monitoring.dq_results table",
        "Scorecard: per table completeness freshness accuracy",
        "หลังทุก DQ Check"),
    DashboardPanel("Error Log",
        "Cloud Logging BigQuery Audit Log",
        "Log Table: recent errors with details and job_id",
        "Real-time"),
]

print("=== Dashboard Panels ===")
for p in panels:
    print(f"  [{p.panel}] Source: {p.source}")
    print(f"    Viz: {p.visualization}")
    print(f"    Refresh: {p.refresh}")

เคล็ดลับ

การนำไปใช้งานจริงในองค์กร

สำหรับองค์กรขนาดกลางถึงใหญ่ แนะนำให้ใช้หลัก Three-Tier Architecture คือ Core Layer ที่เป็นแกนกลางของระบบ Distribution Layer ที่ทำหน้าที่กระจาย Traffic และ Access Layer ที่เชื่อมต่อกับผู้ใช้โดยตรง การแบ่ง Layer ชัดเจนช่วยให้การ Troubleshoot ง่ายขึ้นและสามารถ Scale ระบบได้ตามความต้องการ

เรื่อง Network Security ก็สำคัญไม่แพ้กัน ควรติดตั้ง Next-Generation Firewall ที่สามารถ Deep Packet Inspection ได้ ใช้ Network Segmentation แยก VLAN สำหรับแต่ละแผนก ติดตั้ง IDS/IPS เพื่อตรวจจับการโจมตี และทำ Regular Security Audit อย่างน้อยปีละ 2 ครั้ง

BigQuery Scheduled Query คืออะไร

ตั้งเวลารัน SQL อัตโนมัติ Cron Schedule Destination Table WRITE_TRUNCATE APPEND Parameter @run_date ETL Aggregation Report Retry

Observability Stack คืออะไร

Metrics Duration Bytes Slots Logs Audit Error Traces Pipeline Alerts Fail Slow Cost Dashboards Cloud Monitoring Logging Grafana Datadog

ตั้งค่า Monitoring อย่างไร

Cloud Monitoring Dashboard Alert Policy Cloud Logging Audit Log INFORMATION_SCHEMA.JOBS Duration Bytes State Error Data Quality Check Row Count Freshness

Best Practices มีอะไร

Alert ทุก Fail Duration SLA Bytes Budget Data Quality Check Terraform Code INFORMATION_SCHEMA Performance Runbook Slack PagerDuty Jira

สรุป

BigQuery Scheduled Query Observability Monitoring Alerting Logging Dashboard INFORMATION_SCHEMA Data Quality ETL Cost Control Production Pipeline

📖 บทความที่เกี่ยวข้อง

BigQuery Scheduled Query Zero Downtime Deploymentอ่านบทความ → BigQuery Scheduled Query Distributed Systemอ่านบทความ → BigQuery Scheduled Query Technical Debt Managementอ่านบทความ → GitLab CI Include Observability Stackอ่านบทความ → BigQuery Scheduled Query Learning Path Roadmapอ่านบทความ →

📚 ดูบทความทั้งหมด →