ai
oVirt Virtualization Site Reliability SRE —
oVirt SRE Practices

oVirt KVM Virtualization SRE SLI SLO Error Budget Automation Ansible Terraform Monitoring Prometheus Grafana Production
เนื้อหาเกี่ยวข้อง — แนะนำให้อ่าน LangChain Agent คืออะไร? วิธีนำ DevOps Culture ไปใช้ในปี 2026
| SLI | SLO Target | Measurement | Alert Threshold |
|---|---|---|---|
| VM Availability | > 99.9% | VM Up Time / Total Time | Any unexpected VM Down |
| VM Boot Time | < 60 seconds | API call → VM Running | > 120 seconds |
| Live Migration | < 30 seconds | Migration Start → Complete | > 60 seconds |
| Storage Latency P99 | < 5ms | Disk I/O Latency | > 10ms |
| Engine API Response | < 2 seconds | API Call Duration | > 5 seconds |
Monitoring Setup
# === oVirt Monitoring with Prometheus ===
# Prometheus config (prometheus.yml)
# scrape_configs:
# - job_name: 'ovirt-hosts'
# static_configs:
# - targets: ['host1:9100', 'host2:9100', 'host3:9100']
# - job_name: 'ovirt-engine'
# static_configs:
# - targets: ['engine:9100']
# - job_name: 'ovirt-exporter'
# static_configs:
# - targets: ['ovirt-exporter:9325']
# Alert Rules (alerts.yml)
# groups:
# - name: ovirt
# rules:
# - alert: HostCPUHigh
# expr: 100 - (avg by(instance)(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 90
# for: 5m
# labels: { severity: warning }
# - alert: HostRAMCritical
# expr: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) > 0.95
# for: 2m
# labels: { severity: critical }
# - alert: VMUnexpectedDown
# expr: ovirt_vm_status{status!="up"} == 1
# for: 1m
# labels: { severity: critical }
from dataclasses import dataclass
@dataclass
class MonitorLayer:
layer: str
metrics: str
tool: str
alert_example: str
layers = [
MonitorLayer("Host (Physical)",
"CPU RAM Disk I/O Network Temperature",
"Prometheus + Node Exporter",
"CPU > 90% 5min → Warning RAM > 95% → Critical"),
MonitorLayer("VM (Virtual Machine)",
"vCPU vRAM vDisk IOPS Network VM State",
"oVirt Exporter + Prometheus",
"VM Down Unexpected → Critical IOPS > Threshold"),
MonitorLayer("oVirt Engine",
"API Response Time DB Pool Active Tasks",
"Prometheus + Custom Exporter",
"API > 5s → Critical DB Connection > 90%"),
MonitorLayer("Storage",
"IOPS Latency Throughput Capacity Used%",
"Node Exporter + Storage Exporter",
"Latency P99 > 10ms → Warning Capacity > 85%"),
MonitorLayer("Network",
"Bandwidth Packet Loss Latency Errors",
"Node Exporter + SNMP Exporter",
"Packet Loss > 0.1% → Warning Bond Down → Critical"),
]
print("=== Monitoring Layers ===")
for l in layers:
print(f" [{l.layer}] Metrics: {l.metrics}")
print(f" Tool: {l.tool}")
print(f" Alert: {l.alert_example}")
Automation with Ansible

# === Ansible Automation for oVirt ===
# Install: ansible-galaxy collection install ovirt.ovirt
# Playbook: Create VM from Template
# - hosts: localhost
# connection: local
# collections:
# - ovirt.ovirt
# tasks:
# - ovirt_auth:
# url: https://engine.example.com/ovirt-engine/api
# username: admin@internal
# password: "{{ vault_ovirt_password }}"
# - ovirt_vm:
# auth: "{{ ovirt_auth }}"
# name: web-server-01
# template: centos9-template
# cluster: production
# memory: 4GiB
# cpu_cores: 2
# state: running
# nics:
# - name: nic1
# network: production-net
# - ovirt_auth:
# state: absent
# ovirt_auth: "{{ ovirt_auth }}"
@dataclass
class AutomationTask:
task: str
tool: str
trigger: str
playbook: str
tasks = [
AutomationTask("VM Provisioning",
"Ansible ovirt.ovirt",
"Jira Ticket / API Request",
"create_vm.yml: สร้าง VM จาก Template ตั้ง Network IP DNS"),
AutomationTask("Host Patching",
"Ansible ovirt.ovirt + yum",
"Monthly Patch Window",
"patch_host.yml: Maintenance → Migrate VMs → Patch → Reboot → Activate"),
AutomationTask("VM Backup",
"Ansible + oVirt API",
"Daily Cron 02:00",
"backup_vm.yml: Snapshot → Export → Upload S3 → Delete Old"),
AutomationTask("Capacity Report",
"Python + oVirt SDK",
"Weekly Monday 09:00",
"capacity_report.py: CPU RAM Storage Usage Trend → Email Report"),
AutomationTask("Disaster Recovery",
"Ansible + oVirt API",
"DR Drill Quarterly / Actual DR",
"dr_failover.yml: Import VM → Start → Verify → Update DNS"),
]
print("=== Automation Tasks ===")
for t in tasks:
print(f" [{t.task}] Tool: {t.tool}")
print(f" Trigger: {t.trigger}")
print(f" Playbook: {t.playbook}")
Capacity Planning
# === Capacity Planning ===
@dataclass
class CapacityMetric:
resource: str
current_usage: str
threshold: str
forecast: str
action: str
capacity = [
CapacityMetric("CPU (Total Cluster)",
"65% average 85% peak",
"Warning 70% avg Critical 90% peak",
"เพิ่ม 5% ต่อเดือน → Full ใน 7 เดือน",
"เพิ่ม Host 2 ตัว ใน Q3"),
CapacityMetric("RAM (Total Cluster)",
"72% allocated 55% actual",
"Warning 80% allocated Critical 90%",
"เพิ่ม 8% ต่อเดือน → Full ใน 4 เดือน",
"เพิ่ม RAM แต่ละ Host 64GB → 128GB"),
CapacityMetric("Storage (NFS)",
"4.2TB / 6TB (70%)",
"Warning 80% Critical 90%",
"เพิ่ม 200GB ต่อเดือน → Full ใน 9 เดือน",
"เพิ่ม Storage Volume 4TB ใน Q3"),
CapacityMetric("Network (10Gbps Bond)",
"3.5Gbps peak 2.1Gbps avg",
"Warning 70% peak Critical 90%",
"เพิ่ม 10% ต่อเดือน → Full ใน 12 เดือน",
"พิจารณา 25Gbps Upgrade ใน Q4"),
]
print("=== Capacity Planning ===")
for c in capacity:
print(f" [{c.resource}] Current: {c.current_usage}")
print(f" Threshold: {c.threshold}")
print(f" Forecast: {c.forecast}")
print(f" Action: {c.action}")
เคล็ดลับ
- SLO: ตั้ง SLO ชัดเจน วัดทุกสัปดาห์ ใช้ Error Budget บริหาร Change
- Ansible: Automate ทุกงาน Manual ลด Toil
- Template: ใช้ VM Template มาตรฐาน สร้าง VM เร็ว Consistent
- HA: เปิด HA สำหรับ Critical VM Auto-restart เมื่อ Host ล่ม
- Capacity: วิเคราะห์ Trend ทุกเดือน วางแผนขยายล่วงหน้า
oVirt คืออะไร
Open Source KVM Virtualization Red Hat RHV Engine Host Web UI REST API Live Migration HA Snapshot Template Quota ฟรี Private Cloud
แนะนำเพิ่มเติม — ติดตาม XM Signal
เนื้อหาเกี่ยวข้อง — ดูเพิ่มเติมเรื่อง ราคาea — คู่มือฉบับสมบูรณ์ 2026
เนื้อหาเกี่ยวข้อง — mql4 tutorial pdf — ข้อมูลครบถ้วน 2026





