ai

BigQuery Scheduled Query Open Source

BigQuery Scheduled Query Open Source

BigQuery + Open Source

BigQuery Scheduled Query Open Source

BigQuery Scheduled Query Open Source Contribution GitHub Archive Data Pipeline ETL Dashboard Looker Studio Community Metrics Contributor Analytics Production

MetricSQL AggregationInsightDashboard
Daily CommitsCOUNT WHERE type=PushEventDevelopment activity trendLine chart
PR Merge RateMerged / Total PRsReview efficiencyGauge
Issue Response TimeAVG(first_comment - created)Community responsivenessMetric card
New ContributorsCOUNT DISTINCT new actorsCommunity growthBar chart
Star GrowthCOUNT WatchEvent per dayProject popularityArea chart
Top ContributorsCOUNT events GROUP BY actorKey contributorsLeaderboard

Scheduled Query Setup

=== BigQuery Scheduled Query ===

SQL Query — Daily GitHub Contribution Summary

CREATE OR REPLACE TABLE `project.dataset.daily_contributions`

PARTITION BY date

AS

SELECT

DATE(created_at) AS date,

repo.name AS repo_name,

type AS event_type,

COUNT(*) AS event_count,

COUNT(DISTINCT actor.login) AS unique_contributors

FROM `githubarchive.day.20*`

WHERE

_TABLE_SUFFIX = FORMAT_DATE('%y%m%d', @run_date)

AND repo.name IN ('kubernetes/kubernetes', 'pytorch/pytorch', 'facebook/react')

GROUP BY date, repo_name, event_type

ORDER BY date DESC, event_count DESC;

Schedule via CLI:

bq mk --transfer_config \

--project_id=my-project \

--data_source=scheduled_query \

--target_dataset=github_analytics \

--display_name="Daily GitHub Contributions" \

--schedule="every 24 hours" \

เนื้อหาเกี่ยวข้อง — Eleventy Static Event Driven Design

--params='{

"query": "SELECT ... FROM githubarchive.day.* ...",

"destination_table_name_template": "daily_contributions_{run_date}",

"write_disposition": "WRITE_TRUNCATE"

}'

Terraform:

resource "google_bigquery_data_transfer_config" "github_daily" {

display_name = "Daily GitHub Contributions"

แนะนำเพิ่มเติม — สัญญาณเทรดรายวัน XM Signal

data_source_id = "scheduled_query"

schedule = "every 24 hours"

destination_dataset_id = google_bigquery_dataset.github.dataset_id

params = {

query = file("queries/daily_contributions.sql")

destination_table_name_template = "daily_contributions"

write_disposition = "WRITE_APPEND"

}

}

from dataclasses import dataclass

@dataclass

class ScheduleConfig:

name: str

schedule: str

query_type: str

destination: str

cost_estimate: str

configs = [

เนื้อหาเกี่ยวข้อง — แนะนำให้อ่าน C# MAUI Container Orchestration

ScheduleConfig("Daily Contributions", "every 24 hours", "Aggregate events by repo/type",

"daily_contributions", "~$0.05/day"),

ScheduleConfig("Weekly Summary", "every week on Monday 08:00", "Weekly rollup metrics",

"weekly_summary", "~$0.10/week"),

ScheduleConfig("Contributor Leaderboard", "every 24 hours", "Top contributors ranked",

"contributor_leaderboard", "~$0.03/day"),

ScheduleConfig("PR Metrics", "every 6 hours", "PR open/merge/close rates",

"pr_metrics", "~$0.08/day"),

ScheduleConfig("Issue Health", "every 12 hours", "Issue response and resolution time",

"issue_health", "~$0.04/day"),

]

print("=== Scheduled Queries ===")

for c in configs:

BigQuery Scheduled Query Open Source

print(f" [{c.name}] Schedule: {c.schedule}")

แนะนำเพิ่มเติม — คอร์สเทรด Forex ที่ iCafeForex

print(f" Query: {c.query_type}")

print(f" Dest: {c.destination} | Cost: {c.cost_estimate}")

GitHub Analytics Queries

=== GitHub Archive Analytics ===

Top Contributors Query

SELECT

actor.login AS contributor,

COUNT(*) AS total_events,

COUNTIF(type = 'PushEvent') AS commits,

COUNTIF(type = 'PullRequestEvent') AS pull_requests,

COUNTIF(type = 'IssuesEvent') AS issues,

COUNTIF(type = 'PullRequestReviewEvent') AS reviews,

เนื้อหาเกี่ยวข้อง — บทความที่เกี่ยวข้อง: Header PHP คืออะไร? คู่มือฉบับสมบูรณ์ 2026 สำหรับนักพัฒนาเว็บไซต์

MIN(created_at) AS first_contribution,

MAX(created_at) AS last_contribution

FROM `githubarchive.day.20*`

WHERE repo.name = 'kubernetes/kubernetes'

AND _TABLE_SUFFIX BETWEEN '240101' AND '241231'

GROUP BY contributor

ORDER BY total_events DESC

LIMIT 50;

Community Health Score

WITH metrics AS (

SELECT

DATE(created_at) AS date,

COUNTIF(type = 'PushEvent') AS pushes,

COUNTIF(type = 'PullRequestEvent') AS prs,

COUNTIF(type = 'IssuesEvent') AS issues,

COUNT(DISTINCT actor.login) AS contributors

FROM `githubarchive.day.20*`

WHERE repo.name = @repo_name

GROUP BY date

)

SELECT

date,

pushes, prs, issues, contributors,

(pushes * 1 + prs * 3 + issues * 2 + contributors * 5) AS health_score

เนื้อหาเกี่ยวข้อง — แนะนำให้อ่าน WordPress WooCommerce GreenOps Sustainability ลด Carbon Footprint สำหรับ E-C…

FROM metrics

ORDER BY date DESC;

@dataclass

class RepoMetric:

repo: str

daily_commits: int

weekly_prs: int

open_issues: int

contributors_30d: int

health_score: int

trend: str

repos = [

RepoMetric("kubernetes/kubernetes", 85, 120, 2500, 450, 92, "Stable"),

RepoMetric("pytorch/pytorch", 65, 95, 1800, 380, 88, "Growing"),

RepoMetric("facebook/react", 25, 40, 800, 150, 75, "Stable"),

RepoMetric("golang/go", 30, 50, 6000, 200, 70, "Stable"),

RepoMetric("rust-lang/rust", 45, 80, 3500, 300, 85, "Growing"),

]

print("\n=== Repository Health Dashboard ===")

for r in repos:

print(f" [{r.repo}] Health: {r.health_score}/100 | Trend: {r.trend}")

print(f" Commits/day: {r.daily_commits} | PRs/week: {r.weekly_prs}")

print(f" Open Issues: {r.open_issues} | Contributors (30d): {r.contributors_30d}")

Cost Optimization

# === BigQuery Cost Optimization ===



@dataclass

class CostTip:

    tip: str

    savings: str

    implementation: str



tips = [

    CostTip("Use partition filter", "50-90% query cost reduction",

        "WHERE _TABLE_SUFFIX = FORMAT_DATE('%y%m%d', @run_date)"),

    CostTip("SELECT specific columns", "30-70% reduction",

        "SELECT actor.login, type instead of SELECT *"),

    CostTip("Use LIMIT for exploration", "Variable savings",

        "LIMIT 1000 when exploring data"),

    CostTip("Materialized Views", "Cache repeated aggregations",

        "CREATE MATERIALIZED VIEW for daily summaries"),

    CostTip("Clustering", "Faster filtered queries",

        "CLUSTER BY repo_name, event_type"),

    CostTip("Budget alerts", "Prevent bill shock",

        "Set budget alert at $10/month threshold"),

]



print("Cost Optimization Tips:")

for t in tips:

    print(f"  [{t.tip}] Savings: {t.savings}")

    print(f"    How: {t.implementation}")



monthly_cost = {

    "Scheduled Queries (5)": "$5-10/month",

    "Ad-hoc Exploration": "$2-5/month",

    "Storage (Summary Tables)": "$0.50/month",

    "Looker Studio": "Free",

    "Total Estimated": "$8-16/month",

    "Free Tier Covers": "Up to 1TB query/month",

}



print(f"\n\nMonthly Cost Estimate:")

for k, v in monthly_cost.items():

    print(f"  [{k}]: {v}")

เคล็ดลับ

  • Partition: ใช้ Partition Filter ทุก Query ลด Cost 50-90%
  • MERGE: ใช้ MERGE Statement สำหรับ Upsert ป้องกัน Duplicate
  • Terraform: จัดการ Scheduled Query ด้วย Terraform เป็น Code
  • Alert: ตั้ง Email Notification เมื่อ Query Fail
  • Budget: ตั้ง Budget Alert ป้องกันค่าใช้จ่ายเกิน

BigQuery Scheduled Query คืออะไร

ตั้งเวลารัน SQL อัตโนมัติ ทุกชั่วโมง วัน สัปดาห์ Destination Table ETL Console CLI Terraform @run_time @run_date Email Notification Service Account

XM Legend · เทรดเดอร์ & ผู้สอน Forex 13 ปี

ผู้ก่อตั้ง SiamCafe ตั้งแต่ปี 1997 · เทรดเดอร์สาย Forex มากกว่า 13 ปี ได้รับการยกย่องเป็น XM Legend · แบ่งปันความรู้ Forex, ไอที, AI และการเทรด จากประสบการณ์จริงในตลาดจริง