SiamCafe.net Blog
Technology

oVirt Virtualization Progressive Delivery — จัดการ VM แบบ Progressive

ovirt virtualization progressive delivery
oVirt Virtualization Progressive Delivery | SiamCafe Blog
2026-05-27· อ. บอม — SiamCafe.net· 1,671 คำ

oVirt คืออะไร

oVirt เป็น open source virtualization management platform สร้างบน KVM hypervisor ให้ความสามารถจัดการ virtual machines, storage, networking ใน datacenter เทียบเท่า VMware vSphere แต่เป็น open source ฟรี พัฒนาโดย Red Hat community เป็น upstream project ของ Red Hat Virtualization (RHV)

Components หลักของ oVirt ได้แก่ oVirt Engine เป็น management server ที่ให้ web UI และ REST API สำหรับจัดการทุกอย่าง, oVirt Node เป็น minimal OS ที่ติดตั้งบน physical hosts สำหรับรัน VMs, VDSM (Virtual Desktop and Server Manager) เป็น agent บน hosts ที่รับคำสั่งจาก Engine, Storage domains เก็บ VM disks รองรับ NFS, iSCSI, FC, GlusterFS, Ceph

Progressive Delivery สำหรับ oVirt เป็นการ deploy workloads ใหม่ทีละน้อยบน virtualization platform ใช้ techniques เช่น canary VMs, rolling updates, blue-green VM pools เพื่อลด risk เมื่อ update OS images, applications หรือ configurations บน VMs จำนวนมาก

ติดตั้ง oVirt Engine และ Hosts

Setup oVirt virtualization platform

# === oVirt Installation ===

# 1. Install oVirt Engine (CentOS Stream 9 / AlmaLinux 9)
# ===================================
sudo dnf install -y centos-release-ovirt45
sudo dnf install -y ovirt-engine

# 2. Run Engine Setup
sudo engine-setup

# Configuration prompts:
# --== PRODUCT OPTIONS ==--
# Configure Engine: Yes
# Configure Data Warehouse: Yes
# Configure Grafana: Yes
#
# --== PACKAGES ==--
# Update packages: Yes
#
# --== NETWORK ==--
# Engine FQDN: engine.example.com
# Firewall manager: firewalld
#
# --== DATABASE ==--
# Local or Remote: Local
# Automatically configure PostgreSQL: Yes
#
# --== ENGINE ==--
# Application mode: Both (Server + Desktop)
# Default SAN wipe: No
#
# --== STORAGE ==--
# Default storage type: NFS
#
# --== PKI ==--
# Organization name: example.com
#
# --== APACHE ==--
# Configure Apache: Yes
# Setup SSL: Yes

# 3. Access Web UI
# https://engine.example.com/ovirt-engine/
# Login: admin@internal / (password set during setup)

# 4. Add Hypervisor Host
# ===================================
# On host machine:
sudo dnf install -y centos-release-ovirt45
sudo dnf install -y ovirt-host

# Or use oVirt Node (minimal ISO):
# Download from: https://ovirt.org/download/node.html

# In oVirt Engine Web UI:
# Compute > Hosts > New
# Name: host01.example.com
# Address: 192.168.1.101
# SSH Port: 22
# Authentication: Password or SSH Public Key

# 5. Configure Storage Domain
# ===================================
# NFS Storage:
# Storage > Domains > New
# Name: nfs-data
# Domain Function: Data
# Storage Type: NFS
# Export Path: nfs-server:/data/ovirt
# NFS Version: Auto

# 6. Create VM Template
# ===================================
# Compute > Virtual Machines > New
# Name: template-almalinux9
# Cluster: Default
# Template: Blank
# OS: Linux > Red Hat Enterprise Linux 9
# Memory: 4096 MB
# CPUs: 2
# Disk: 40 GB (Thin Provision)
#
# Install OS, configure, then:
# Right-click VM > Make Template
# Template Name: almalinux9-base-v1

echo "oVirt platform installed"

จัดการ Virtual Machines

จัดการ VMs ผ่าน oVirt REST API

#!/usr/bin/env python3
# ovirt_manager.py — oVirt VM Management
import json
import logging
from typing import Dict, List

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("ovirt")

class OVirtManager:
    def __init__(self, engine_url, username, password):
        self.engine_url = engine_url
        self.auth = (username, password)
        self.api_url = f"{engine_url}/ovirt-engine/api"
    
    def list_vms(self):
        """List all VMs via REST API"""
        # In production:
        # response = requests.get(
        #     f"{self.api_url}/vms",
        #     auth=self.auth,
        #     headers={"Accept": "application/json"},
        #     verify=False
        # )
        # return response.json()["vm"]
        
        return [
            {"id": "vm-001", "name": "web-01", "status": "up", "memory": 4096, "cpu_cores": 2, "template": "almalinux9-base-v1"},
            {"id": "vm-002", "name": "web-02", "status": "up", "memory": 4096, "cpu_cores": 2, "template": "almalinux9-base-v1"},
            {"id": "vm-003", "name": "web-03", "status": "up", "memory": 4096, "cpu_cores": 2, "template": "almalinux9-base-v1"},
            {"id": "vm-004", "name": "db-01", "status": "up", "memory": 8192, "cpu_cores": 4, "template": "almalinux9-db-v1"},
            {"id": "vm-005", "name": "app-01", "status": "up", "memory": 4096, "cpu_cores": 2, "template": "almalinux9-base-v1"},
        ]
    
    def create_vm_from_template(self, name, template, memory_mb=4096, cpu_cores=2):
        """Create VM from template"""
        vm_config = {
            "name": name,
            "template": {"name": template},
            "cluster": {"name": "Default"},
            "memory": memory_mb * 1024 * 1024,
            "cpu": {"topology": {"cores": cpu_cores, "sockets": 1}},
        }
        
        logger.info(f"Creating VM: {name} from template: {template}")
        return {"id": f"vm-new-{name}", "name": name, "status": "down", "template": template}
    
    def clone_vm(self, source_vm_id, new_name):
        """Clone existing VM"""
        return {"id": f"vm-clone-{new_name}", "name": new_name, "status": "down", "cloned_from": source_vm_id}
    
    def start_vm(self, vm_id):
        logger.info(f"Starting VM: {vm_id}")
        return {"vm_id": vm_id, "status": "up"}
    
    def stop_vm(self, vm_id):
        logger.info(f"Stopping VM: {vm_id}")
        return {"vm_id": vm_id, "status": "down"}
    
    def live_migrate(self, vm_id, target_host):
        """Live migrate VM to another host"""
        logger.info(f"Migrating VM {vm_id} to {target_host}")
        return {"vm_id": vm_id, "target_host": target_host, "status": "migrating"}
    
    def create_snapshot(self, vm_id, description):
        logger.info(f"Creating snapshot for VM: {vm_id}")
        return {"vm_id": vm_id, "snapshot_id": f"snap-{vm_id}", "description": description}

manager = OVirtManager("https://engine.example.com", "admin@internal", "password")

vms = manager.list_vms()
print(f"Total VMs: {len(vms)}")
for vm in vms:
    print(f"  {vm['name']}: {vm['status']} ({vm['memory']}MB, {vm['cpu_cores']} cores)")

new_vm = manager.create_vm_from_template("web-04", "almalinux9-base-v1")
print(f"\nCreated: {json.dumps(new_vm, indent=2)}")

snapshot = manager.create_snapshot("vm-001", "Before update")
print(f"Snapshot: {json.dumps(snapshot, indent=2)}")

Progressive Delivery สำหรับ VM Workloads

Deploy workloads ทีละน้อยบน oVirt

#!/usr/bin/env python3
# progressive_vm.py — Progressive VM Delivery
import json
import logging
from datetime import datetime
from typing import Dict, List

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("progressive")

class ProgressiveVMDelivery:
    def __init__(self, vm_pool):
        self.vm_pool = vm_pool
        self.phases = [
            {"name": "canary", "pct": 10, "duration_min": 30},
            {"name": "early", "pct": 25, "duration_min": 60},
            {"name": "half", "pct": 50, "duration_min": 120},
            {"name": "full", "pct": 100, "duration_min": 0},
        ]
        self.current_phase = 0
        self.updated_vms = []
        self.rollback_snapshots = {}
    
    def plan_rollout(self, new_template):
        """Plan progressive rollout of new VM template"""
        total = len(self.vm_pool)
        plan = []
        
        for phase in self.phases:
            count = max(1, int(total * phase["pct"] / 100))
            plan.append({
                "phase": phase["name"],
                "vms_to_update": count,
                "monitoring_duration_min": phase["duration_min"],
                "auto_proceed": phase["pct"] < 50,
            })
        
        return {
            "new_template": new_template,
            "total_vms": total,
            "phases": plan,
            "estimated_total_time_min": sum(p["duration_min"] for p in self.phases),
        }
    
    def execute_phase(self, phase_idx, new_template):
        """Execute a rollout phase"""
        phase = self.phases[phase_idx]
        count = max(1, int(len(self.vm_pool) * phase["pct"] / 100))
        
        vms_to_update = [
            vm for vm in self.vm_pool
            if vm["name"] not in self.updated_vms
        ][:count - len(self.updated_vms)]
        
        results = []
        for vm in vms_to_update:
            # 1. Create snapshot before update
            self.rollback_snapshots[vm["name"]] = {
                "snapshot_id": f"snap-{vm['id']}",
                "template": vm["template"],
            }
            
            # 2. Update VM
            results.append({
                "vm": vm["name"],
                "action": "updated",
                "old_template": vm["template"],
                "new_template": new_template,
                "snapshot_created": True,
            })
            self.updated_vms.append(vm["name"])
        
        return {
            "phase": phase["name"],
            "vms_updated": len(results),
            "total_updated": len(self.updated_vms),
            "total_vms": len(self.vm_pool),
            "details": results,
        }
    
    def check_health(self):
        """Check health of updated VMs"""
        healthy = len(self.updated_vms)
        unhealthy = 0
        
        health_metrics = {
            "total_checked": healthy + unhealthy,
            "healthy": healthy,
            "unhealthy": unhealthy,
            "health_pct": round(healthy / max(healthy + unhealthy, 1) * 100, 1),
            "cpu_avg_pct": 35.2,
            "memory_avg_pct": 52.8,
            "error_rate_pct": 0.1,
        }
        
        return {
            "passed": health_metrics["error_rate_pct"] < 1.0,
            "metrics": health_metrics,
        }
    
    def rollback(self, vm_names=None):
        """Rollback VMs to pre-update snapshot"""
        targets = vm_names or self.updated_vms
        results = []
        
        for name in targets:
            snap = self.rollback_snapshots.get(name)
            if snap:
                results.append({
                    "vm": name,
                    "action": "rollback",
                    "restored_template": snap["template"],
                })
        
        return {
            "rolled_back": len(results),
            "details": results,
        }

# Example
vm_pool = [
    {"id": f"vm-{i:03d}", "name": f"web-{i:02d}", "status": "up",
     "template": "almalinux9-v1", "memory": 4096, "cpu_cores": 2}
    for i in range(1, 11)
]

pd = ProgressiveVMDelivery(vm_pool)
plan = pd.plan_rollout("almalinux9-v2")
print("Plan:", json.dumps(plan, indent=2))

phase1 = pd.execute_phase(0, "almalinux9-v2")
print("\nPhase 1:", json.dumps(phase1, indent=2))

health = pd.check_health()
print("\nHealth:", json.dumps(health, indent=2))

High Availability และ Live Migration

HA และ migration สำหรับ oVirt

# === oVirt HA and Live Migration ===

# 1. Enable VM High Availability
# ===================================
# In oVirt Web UI:
# Compute > Virtual Machines > Edit VM > High Availability
# - Highly Available: Yes
# - Priority: High (1 = Low, 100 = High)
# - Resume Behavior: AUTO_RESUME
# - Migration Policy: Cluster Default
#
# Via REST API:
# PUT /ovirt-engine/api/vms/{vm_id}
# {
#   "high_availability": {
#     "enabled": true,
#     "priority": 100
#   }
# }

# 2. Configure Fencing (STONITH)
# ===================================
# Required for HA to work properly
# Compute > Hosts > Edit > Power Management
# Type: ipmilan / ilo / drac5
# Address: 192.168.1.201 (IPMI address)
# Username: admin
# Password: secret
# Options: lanplus=1
#
# Test fencing:
# Compute > Hosts > Right-click > Confirm Host has been Rebooted

# 3. Live Migration
# ===================================
# Requirements:
# - Shared storage (NFS, iSCSI, GlusterFS)
# - Same CPU family on source and destination
# - Network connectivity between hosts
#
# Migrate VM:
# Compute > Virtual Machines > Right-click > Migrate
# Select target host or "Automatic"
#
# CLI:
# curl -X POST \
#   "https://engine.example.com/ovirt-engine/api/vms/{vm_id}/migrate" \
#   -H "Content-Type: application/xml" \
#   -u "admin@internal:password" \
#   -d 'host02'

# 4. Scheduling Policies
# ===================================
# Cluster > Edit > Scheduling Policy
#
# Available policies:
# - evenly_distributed: Balance VMs across hosts by CPU/memory
# - power_saving: Consolidate VMs to fewer hosts (save power)
# - vm_evenly_distributed: Even number of VMs per host
# - cluster_maintenance: Migrate all VMs off a host
#
# Custom properties:
# CpuOverCommitDurationMinutes: 10
# HighUtilization: 80
# LowUtilization: 20

# 5. Affinity Groups
# ===================================
# Control VM placement:
# Compute > Affinity Groups > New
#
# Positive affinity: VMs should run on same host
# (e.g., app + cache for low latency)
#
# Negative affinity: VMs should NOT run on same host
# (e.g., web-01 and web-02 for HA)
#
# Host affinity: VMs should/must run on specific hosts
# (e.g., GPU VMs on GPU hosts)

# 6. Maintenance Mode
# ===================================
# Before host maintenance:
# Compute > Hosts > Right-click > Maintenance
# This live-migrates all VMs to other hosts
#
# After maintenance:
# Compute > Hosts > Right-click > Activate

echo "HA and migration configured"

Monitoring และ Automation

Monitor oVirt platform

#!/usr/bin/env python3
# ovirt_monitor.py — oVirt Monitoring
import json
import logging
from datetime import datetime
from typing import Dict, List

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("monitor")

class OVirtMonitor:
    def __init__(self):
        self.metrics = {}
    
    def collect_host_metrics(self):
        return [
            {"name": "host01", "status": "up", "cpu_pct": 45, "memory_pct": 62, "vms": 8, "storage_pct": 55},
            {"name": "host02", "status": "up", "cpu_pct": 38, "memory_pct": 48, "vms": 6, "storage_pct": 55},
            {"name": "host03", "status": "up", "cpu_pct": 72, "memory_pct": 78, "vms": 12, "storage_pct": 55},
        ]
    
    def collect_vm_metrics(self):
        return {
            "total": 26,
            "running": 24,
            "stopped": 2,
            "migrating": 0,
            "avg_cpu_pct": 42,
            "avg_memory_pct": 58,
        }
    
    def collect_storage_metrics(self):
        return [
            {"name": "nfs-data", "type": "NFS", "total_gb": 2000, "used_gb": 1100, "used_pct": 55},
            {"name": "iscsi-fast", "type": "iSCSI", "total_gb": 500, "used_gb": 320, "used_pct": 64},
        ]
    
    def check_alerts(self, hosts, vms, storage):
        alerts = []
        
        for host in hosts:
            if host["cpu_pct"] > 80:
                alerts.append({"severity": "high", "target": host["name"], "msg": f"CPU {host['cpu_pct']}%"})
            if host["memory_pct"] > 85:
                alerts.append({"severity": "high", "target": host["name"], "msg": f"Memory {host['memory_pct']}%"})
            if host["status"] != "up":
                alerts.append({"severity": "critical", "target": host["name"], "msg": f"Host {host['status']}"})
        
        for stor in storage:
            if stor["used_pct"] > 80:
                alerts.append({"severity": "high", "target": stor["name"], "msg": f"Storage {stor['used_pct']}%"})
        
        return {
            "total_alerts": len(alerts),
            "critical": sum(1 for a in alerts if a["severity"] == "critical"),
            "high": sum(1 for a in alerts if a["severity"] == "high"),
            "alerts": alerts,
        }
    
    def capacity_planning(self, hosts):
        total_cpu = len(hosts) * 100
        used_cpu = sum(h["cpu_pct"] for h in hosts)
        total_mem = len(hosts) * 100
        used_mem = sum(h["memory_pct"] for h in hosts)
        
        cpu_headroom = (total_cpu - used_cpu) / total_cpu * 100
        mem_headroom = (total_mem - used_mem) / total_mem * 100
        
        return {
            "hosts": len(hosts),
            "total_vms": sum(h["vms"] for h in hosts),
            "avg_cpu_pct": round(used_cpu / len(hosts), 1),
            "avg_memory_pct": round(used_mem / len(hosts), 1),
            "cpu_headroom_pct": round(cpu_headroom, 1),
            "memory_headroom_pct": round(mem_headroom, 1),
            "can_add_vms": int(min(cpu_headroom, mem_headroom) / 10),
            "recommendation": "add_host" if min(cpu_headroom, mem_headroom) < 20 else "optimal",
        }

monitor = OVirtMonitor()
hosts = monitor.collect_host_metrics()
vms = monitor.collect_vm_metrics()
storage = monitor.collect_storage_metrics()
alerts = monitor.check_alerts(hosts, vms, storage)
capacity = monitor.capacity_planning(hosts)

print("Hosts:", json.dumps(hosts, indent=2))
print("Alerts:", json.dumps(alerts, indent=2))
print("Capacity:", json.dumps(capacity, indent=2))

FAQ คำถามที่พบบ่อย

Q: oVirt กับ Proxmox VE ต่างกันอย่างไร?

A: oVirt เป็น enterprise-grade ออกแบบสำหรับ large-scale datacenter management มี features เช่น scheduling policies, affinity groups, data warehouse, Grafana dashboards เหมาะสำหรับองค์กรที่ต้องการ VMware alternative ฟรี Proxmox VE ใช้งานง่ายกว่า web UI ดีกว่า ติดตั้งเร็วกว่า มี built-in container support (LXC) เหมาะสำหรับ SMB และ home lab ทั้งคู่ใช้ KVM hypervisor เลือก oVirt สำหรับ enterprise ที่ต้องการ advanced scheduling และ integration กับ Red Hat ecosystem เลือก Proxmox สำหรับ simplicity และ ease of use

Q: Progressive Delivery บน VMs ต่างจากบน Kubernetes อย่างไร?

A: Kubernetes มี built-in rolling updates, canary deployments ผ่าน Deployment resource และ service mesh (Istio) traffic splitting ง่าย สำหรับ VMs ต้อง implement เอง ใช้ snapshots สำหรับ rollback, templates สำหรับ versioning, scripts สำหรับ orchestration ข้อดีของ VMs คือ isolation สูงกว่า, stateful workloads จัดการง่ายกว่า, legacy applications ที่ไม่ containerize ได้ แนะนำใช้ Ansible หรือ Terraform กับ oVirt provider สำหรับ automate progressive delivery บน VMs

Q: oVirt storage แนะนำแบบไหน?

A: NFS ง่ายที่สุด เหมาะสำหรับเริ่มต้นและ small-medium deployment performance ดีถ้าใช้ 10GbE network iSCSI performance ดีกว่า NFS สำหรับ random I/O เหมาะสำหรับ database VMs GlusterFS distributed storage ที่ scale ได้ ใช้ storage ของ hosts เอง ไม่ต้องมี dedicated storage server Ceph RBD performance ดีที่สุด scale ได้มากที่สุด แต่ซับซ้อนในการ setup สำหรับ production แนะนำ NFS สำหรับ general workloads iSCSI สำหรับ database Ceph สำหรับ large-scale

Q: Live Migration ใช้เวลานานไหม?

A: ขึ้นกับ memory size ของ VM และ network bandwidth VM 4GB memory บน 10GbE network ใช้เวลา 5-15 วินาที VM 16GB memory อาจใช้ 30-60 วินาที VM ที่มี dirty memory rate สูง (database, in-memory cache) อาจใช้เวลานานกว่าเพราะต้อง copy memory pages ที่เปลี่ยนระหว่าง migration ซ้ำ Convergence policies ช่วย กำหนด max migration time ถ้าเกินจะ throttle VM CPU เพื่อลด dirty pages Downtime จริง (switchover) เพียง 10-100 milliseconds ผู้ใช้แทบไม่รู้สึก

📖 บทความที่เกี่ยวข้อง

GitHub Actions Matrix Progressive Deliveryอ่านบทความ → oVirt Virtualization Service Level Objective SLOอ่านบทความ → oVirt Virtualization GreenOps Sustainabilityอ่านบทความ → oVirt Virtualization Learning Path Roadmapอ่านบทความ → oVirt Virtualization Service Mesh Setupอ่านบทความ →

📚 ดูบทความทั้งหมด →