OPA Gatekeeper HA
OPA Gatekeeper High Availability Kubernetes Admission Controller Policy Rego ConstraintTemplate Webhook Audit Prometheus Monitor
| Component | Replicas | Purpose | HA Config |
|---|---|---|---|
| Controller Manager | 3 | Webhook ตรวจ Admission Request | AntiAffinity + PDB |
| Audit Controller | 2 | ตรวจ Existing Resources | AntiAffinity + PDB |
| Webhook Config | - | K8s เรียก Gatekeeper | failurePolicy + timeout |
| Constraint Templates | - | Policy Template (Rego) | Version Control Git |
| Constraints | - | Policy Instance + Parameters | dryrun → enforce |
HA Configuration
# === OPA Gatekeeper HA Setup ===
# Helm Install with HA
# helm repo add gatekeeper https://open-policy-agent.github.io/gatekeeper/charts
# helm install gatekeeper gatekeeper/gatekeeper \
# --namespace gatekeeper-system --create-namespace \
# --set replicas=3 \
# --set audit.replicas=2 \
# --set podAnnotations."prometheus\.io/scrape"=true \
# --set podAnnotations."prometheus\.io/port"=8888 \
# --set pdb.controllerManager.minAvailable=2 \
# --set controllerManager.resources.requests.cpu=500m \
# --set controllerManager.resources.requests.memory=512Mi \
# --set controllerManager.resources.limits.cpu=1000m \
# --set controllerManager.resources.limits.memory=1Gi \
# --set controllerManager.priorityClassName=system-cluster-critical \
# --set controllerManager.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[0].weight=100 \
# --set controllerManager.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[0].podAffinityTerm.topologyKey=kubernetes.io/hostname
# Webhook Configuration
# apiVersion: admissionregistration.k8s.io/v1
# kind: ValidatingWebhookConfiguration
# webhooks:
# - name: validation.gatekeeper.sh
# failurePolicy: Ignore # Allow if Gatekeeper down (safety)
# timeoutSeconds: 5
# matchPolicy: Exact
from dataclasses import dataclass
@dataclass
class HAConfig:
setting: str
value: str
purpose: str
risk_if_missing: str
ha_configs = [
HAConfig("replicas",
"3 (Controller Manager)",
"Webhook HA ถ้า Pod หนึ่งตาย ยังมี 2 ตัวรับ Request",
"Single Point of Failure ถ้า Pod ตาย Webhook ไม่ทำงาน"),
HAConfig("audit.replicas",
"2",
"Audit HA ตรวจ Existing Resources อย่างต่อเนื่อง",
"ไม่ตรวจ Existing Violations"),
HAConfig("podAntiAffinity",
"preferredDuringScheduling hostname",
"กระจาย Pod ไปต่างคนละ Node",
"ทุก Pod อยู่ Node เดียว Node ตาย = ทั้งหมดตาย"),
HAConfig("PodDisruptionBudget",
"minAvailable: 2",
"ป้องกัน Rolling Update ทำให้ทุก Pod หยุดพร้อมกัน",
"kubectl drain อาจหยุดทุก Pod"),
HAConfig("failurePolicy",
"Ignore (recommended) หรือ Fail (strict)",
"Ignore: Allow ถ้า Gatekeeper ไม่ตอบ, Fail: Block ทุกอย่าง",
"Fail + Gatekeeper down = Cluster Lock ไม่มีใครสร้างอะไรได้"),
HAConfig("priorityClassName",
"system-cluster-critical",
"Gatekeeper Pod ได้ Schedule ก่อน Workload ปกติ",
"ถ้า Node เต็ม Gatekeeper อาจถูก Evict"),
]
print("=== HA Configuration ===")
for h in ha_configs:
print(f" [{h.setting}] = {h.value}")
print(f" Purpose: {h.purpose}")
print(f" Risk: {h.risk_if_missing}")
ConstraintTemplate
# === ConstraintTemplate Examples ===
# apiVersion: templates.gatekeeper.sh/v1
# kind: ConstraintTemplate
# metadata:
# name: k8srequiredlabels
# spec:
# crd:
# spec:
# names:
# kind: K8sRequiredLabels
# validation:
# openAPIV3Schema:
# type: object
# properties:
# labels:
# type: array
# items: { type: string }
# targets:
# - target: admission.k8s.gatekeeper.sh
# rego: |
# package k8srequiredlabels
# violation[{"msg": msg}] {
# provided := {label | input.review.object.metadata.labels[label]}
# required := {label | label := input.parameters.labels[_]}
# missing := required - provided
# count(missing) > 0
# msg := sprintf("Missing labels: %v", [missing])
# }
# Constraint Instance
# apiVersion: constraints.gatekeeper.sh/v1beta1
# kind: K8sRequiredLabels
# metadata:
# name: require-team-label
# spec:
# enforcementAction: deny # deny | dryrun | warn
# match:
# kinds:
# - apiGroups: [""] kinds: ["Pod"]
# - apiGroups: ["apps"] kinds: ["Deployment"]
# excludedNamespaces: ["kube-system", "gatekeeper-system"]
# parameters:
# labels: ["app", "team", "env"]
@dataclass
class PolicyExample:
name: str
purpose: str
rego_logic: str
parameters: str
policies = [
PolicyExample("K8sRequiredLabels",
"ทุก Resource ต้องมี Label ที่กำหนด",
"ตรวจ metadata.labels มี required labels ครบ",
"labels: ['app', 'team', 'env']"),
PolicyExample("K8sContainerLimits",
"ทุก Container ต้องมี Resource Limits",
"ตรวจ containers[].resources.limits.cpu/memory",
"cpu: '2', memory: '4Gi'"),
PolicyExample("K8sAllowedRepos",
"ใช้ Image จาก Trusted Registry เท่านั้น",
"ตรวจ containers[].image starts with allowed repos",
"repos: ['gcr.io/my-project/', 'registry.example.com/']"),
PolicyExample("K8sBlockPrivileged",
"ห้ามใช้ Privileged Container",
"ตรวจ securityContext.privileged != true",
"ไม่มี (Block ทั้งหมด)"),
PolicyExample("K8sDisallowLatest",
"ห้ามใช้ Image Tag :latest",
"ตรวจ image tag != 'latest' และต้องมี Tag",
"ไม่มี (Block ทั้งหมด)"),
]
print("=== Policy Examples ===")
for p in policies:
print(f" [{p.name}] {p.purpose}")
print(f" Rego: {p.rego_logic}")
print(f" Params: {p.parameters}")
Monitoring & Alerting
# === Gatekeeper Monitoring ===
@dataclass
class GKMetric:
metric: str
type: str
alert_condition: str
action: str
metrics = [
GKMetric("gatekeeper_violations",
"Gauge (per constraint)",
"เพิ่มขึ้น > 10 ใน 5 นาที",
"ตรวจ Constraint ไหน Violation เพิ่ม แจ้ง Team"),
GKMetric("gatekeeper_request_duration_seconds",
"Histogram",
"p99 > 3 seconds",
"ตรวจ Rego Policy ซับซ้อนเกิน Optimize"),
GKMetric("gatekeeper_audit_duration_seconds",
"Histogram",
"> 30 seconds",
"Cluster ใหญ่เกิน ลด Audit Interval"),
GKMetric("gatekeeper_constraint_templates",
"Gauge",
"status != active",
"ConstraintTemplate มี Error ตรวจ Rego Syntax"),
GKMetric("up{job='gatekeeper'}",
"Gauge",
"== 0 (Pod down)",
"Pod Restart Alert ตรวจ OOM Resource Limit"),
]
print("=== Monitoring Metrics ===")
for m in metrics:
print(f" [{m.metric}] ({m.type})")
print(f" Alert: {m.alert_condition}")
print(f" Action: {m.action}")
เคล็ดลับ
- failurePolicy: ใช้ Ignore สำหรับ Production ป้องกัน Cluster Lock
- dryrun: ทดสอบ Policy ด้วย dryrun ก่อน Enforce
- PDB: ตั้ง PodDisruptionBudget minAvailable: 2 เสมอ
- Priority: ใช้ system-cluster-critical Priority Class
- Audit: ตั้ง Audit Interval เหมาะสม ไม่ถี่เกินไป
OPA Gatekeeper คืออะไร
Policy Engine Kubernetes Admission Controller Rego ConstraintTemplate Webhook Reject Privileged Registry Label Resource Limit Audit
HA Setup ทำอย่างไร
Replicas 3 AntiAffinity PDB minAvailable 2 failurePolicy Ignore timeout 5s Priority system-cluster-critical Resource Request Limit Audit 2
ConstraintTemplate เขียนอย่างไร
Rego Language violation msg input.review.object CRD Validation Constraint enforcementAction deny dryrun warn match kinds excludedNamespaces
Monitor อย่างไร
Prometheus Metrics 8888 gatekeeper_violations request_duration audit_duration constraint_templates Grafana Dashboard Slack Alert Certificate
สรุป
OPA Gatekeeper HA Kubernetes Admission Controller Rego ConstraintTemplate Replicas AntiAffinity PDB failurePolicy Prometheus Monitoring Production
