oVirt High Availability ?????????????????????
oVirt ???????????? open source virtualization platform ?????????????????????????????? KVM hypervisor ?????????????????? virtual machines, storage ????????? network ???????????? web-based management interface High Availability (HA) ?????? oVirt ????????????????????????????????????????????? VMs ??????????????????????????????????????????????????? host ????????? run ?????????????????? fail ????????? oVirt ?????? migrate VM ??????????????? host ???????????????????????????????????????
oVirt HA ??????????????????????????? hosted engine architecture ????????? oVirt Engine (management plane) run ???????????? VM ?????? cluster ????????? ????????? host ????????? run Engine fail ???????????? agent ?????? hosts ?????????????????????????????????????????? restart Engine VM ?????? host ???????????????????????? ?????????????????? workload VMs ????????? fencing mechanism ????????????????????? host failure ???????????? restart VMs ?????? hosts ????????????????????????
Components ???????????????????????? oVirt HA ?????????????????? Hosted Engine Agent ???????????? Engine VM, VDSM (Virtual Desktop and Server Manager) ?????????????????? VMs ????????????????????? host, Fencing Agents ???????????????????????????????????????????????? host failure, Shared Storage ???????????? VM disks ?????????????????? hosts ??????????????????????????????, SPM (Storage Pool Manager) ?????????????????? storage operations
????????????????????? oVirt Engine ????????? Hosts
Setup oVirt cluster ?????????????????? HA
# === oVirt Installation ===
# 1. Install oVirt Engine (on CentOS Stream 9)
# Add oVirt repository
dnf install -y centos-release-ovirt45
dnf module enable -y pki-deps postgresql:15
# Install Engine
dnf install -y ovirt-engine
# 2. Configure Engine
engine-setup
# Answer the prompts:
# Configure Engine: Yes
# Configure Data Warehouse: Yes
# Configure Grafana: Yes
# Application mode: Both (Virt + Gluster)
# Firewall manager: firewalld
# Host FQDN: engine.example.com
# Organization name: MyOrg
# Admin password: ********
# Database: Local
# Default SAN wipe after delete: No
# NFS ISO domain: Yes (optional)
# 3. Install oVirt Node on Hosts
# Option A: oVirt Node (minimal OS image)
# Download oVirt Node ISO and install on bare metal
# Option B: Install on existing CentOS
dnf install -y centos-release-ovirt45
dnf install -y ovirt-host
# 4. Add Host to Cluster via Engine
# Web UI: https://engine.example.com
# Compute ??? Hosts ??? New
# Name: host01.example.com
# Hostname: host01.example.com
# Authentication: SSH Public Key or Password
# 5. Configure Network
# Create ovirtmgmt bridge (automatic during host add)
# Add additional networks:
# - storage: for NFS/iSCSI traffic
# - migration: for live migration traffic (dedicated NIC recommended)
# - vm: for VM traffic
# 6. Hosted Engine Deploy (self-hosted)
hosted-engine --deploy
# Answer prompts:
# Storage type: NFS
# Storage connection: nfs.example.com:/hosted-engine
# VM disk size: 80 GB
# VM memory: 16384 MB
# VM CPUs: 4
# Engine FQDN: engine.example.com
# Admin password: ********
# 7. Add Additional Hosts
# Each host runs hosted-engine agent
hosted-engine --deploy --4
# On additional hosts:
hosted-engine --deploy
# Select: Additional host
echo "oVirt Engine and Hosts installed"
Configure HA ?????????????????? VMs
????????????????????? High Availability ?????????????????? virtual machines
# === VM High Availability Configuration ===
# 1. Via oVirt Engine REST API
# Enable HA for a VM
curl -k -u admin@internal:password \
-H "Content-Type: application/xml" \
-X PUT \
"https://engine.example.com/ovirt-engine/api/vms/VM_ID" \
-d '
true
100
true
true
500
'
# 2. Via Ansible (recommended for automation)
cat > ovirt_ha_setup.yml << 'EOF'
---
- name: Configure oVirt HA
hosts: localhost
connection: local
gather_facts: false
vars:
engine_url: https://engine.example.com/ovirt-engine/api
engine_user: admin@internal
engine_password: "{{ vault_engine_password }}"
engine_cafile: /etc/pki/ovirt-engine/ca.pem
tasks:
- name: Login to oVirt
ovirt_auth:
url: "{{ engine_url }}"
username: "{{ engine_user }}"
password: "{{ engine_password }}"
ca_file: "{{ engine_cafile }}"
register: ovirt_auth
- name: Create HA VM
ovirt_vm:
auth: "{{ ovirt_auth.ovirt_auth }}"
name: web-server-01
cluster: Default
template: centos9-template
memory: 4GiB
cpu_cores: 2
cpu_sockets: 1
high_availability: true
high_availability_priority: 100
operating_system: rhel_9x64
type: server
state: running
nics:
- name: nic1
profile_name: ovirtmgmt
disks:
- name: web-server-01-disk
size: 50GiB
storage_domain: data-nfs
interface: virtio_scsi
- name: Configure VM HA Policy
ovirt_vm:
auth: "{{ ovirt_auth.ovirt_auth }}"
name: web-server-01
high_availability: true
high_availability_priority: 100
lease:
storage_domain: data-nfs
placement_policy:
affinity: migratable
hosts:
- host01
- host02
- host03
- name: Logout
ovirt_auth:
ovirt_auth: "{{ ovirt_auth.ovirt_auth }}"
state: absent
EOF
ansible-playbook ovirt_ha_setup.yml --ask-vault-pass
echo "HA VMs configured"
Fencing ????????? Power Management
Configure fencing ?????????????????? host failure detection
# === Fencing Configuration ===
# Fencing ???????????????????????????????????? isolate host ????????? fail
# ???????????????????????? VMs restart ?????? host ????????????????????????????????????????????????
# ????????????????????? split-brain (2 hosts run same VM)
# 1. IPMI/BMC Fencing (physical servers)
# Configure via Engine UI:
# Compute ??? Hosts ??? host01 ??? Power Management
# Type: ipmilan
# Address: 192.168.1.101 (BMC IP)
# Username: admin
# Password: ********
# Options: lanplus=1
# API equivalent:
curl -k -u admin@internal:password \
-H "Content-Type: application/xml" \
-X POST \
"https://engine.example.com/ovirt-engine/api/hosts/HOST_ID/fenceagents" \
-d '
ipmilan
192.168.1.101
admin
password
1
'
# 2. Test Fencing
curl -k -u admin@internal:password \
-H "Content-Type: application/xml" \
-X POST \
"https://engine.example.com/ovirt-engine/api/hosts/HOST_ID/fence" \
-d 'status '
# 3. Fencing Policy (Cluster Level)
# Compute ??? Clusters ??? Default ??? Fencing Policy
# Enable fencing: Yes
# Skip if host has live lease: Yes
# Skip if SD is active: Yes
# Skip if connectivity > threshold: Yes
# 4. Ansible Fencing Setup
cat > fencing_setup.yml << 'EOF'
---
- name: Configure Fencing
hosts: localhost
tasks:
- name: Add fence agent
ovirt_host_pm:
auth: "{{ ovirt_auth.ovirt_auth }}"
name: host01
address: 192.168.1.101
username: admin
password: "{{ vault_bmc_password }}"
type: ipmilan
options:
lanplus: 1
order: 1
state: present
- name: Test fence status
ovirt_host_pm:
auth: "{{ ovirt_auth.ovirt_auth }}"
name: host01
state: status
register: fence_status
- name: Show fence status
debug:
var: fence_status
EOF
echo "Fencing configured"
Storage High Availability
Configure storage HA ?????????????????? oVirt
#!/usr/bin/env python3
# storage_ha.py ??? oVirt Storage HA Configuration
import json
import logging
from typing import Dict, List
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("storage")
class OVirtStorageHA:
def __init__(self):
self.domains = []
def storage_options(self):
return {
"nfs": {
"description": "NFS shared storage",
"ha_level": "Depends on NFS server HA",
"performance": "Good for general workloads",
"setup": "Simple, widely supported",
"recommendation": "Use HA NFS (DRBD, GlusterFS, or NAS appliance)",
},
"iscsi": {
"description": "iSCSI block storage",
"ha_level": "Depends on target HA (multipath recommended)",
"performance": "Better than NFS for I/O intensive",
"setup": "Moderate complexity",
"recommendation": "Use multipath with 2+ paths for HA",
},
"fc": {
"description": "Fibre Channel SAN",
"ha_level": "Very high (dual fabric)",
"performance": "Best for enterprise workloads",
"setup": "Complex, requires FC switches",
"recommendation": "Enterprise only, dual fabric mandatory",
},
"glusterfs": {
"description": "GlusterFS distributed storage",
"ha_level": "Built-in replication (replica 3)",
"performance": "Good, scales horizontally",
"setup": "Moderate, can use oVirt hosts as storage",
"recommendation": "Good for hyper-converged deployment",
},
}
def glusterfs_setup(self):
return {
"description": "Hyper-converged: oVirt hosts also serve GlusterFS storage",
"minimum_nodes": 3,
"replica_count": 3,
"steps": [
"1. Install GlusterFS on all oVirt hosts",
"2. Create GlusterFS volume with replica 3",
"3. Add as storage domain in oVirt Engine",
"4. VMs stored on replicated GlusterFS volume",
"5. If 1 host fails, data still available on 2 others",
],
"commands": [
"gluster peer probe host02",
"gluster peer probe host03",
"gluster volume create vmstore replica 3 host01:/data/brick host02:/data/brick host03:/data/brick",
"gluster volume start vmstore",
"gluster volume set vmstore group virt",
"gluster volume set vmstore storage.owner-uid 36",
"gluster volume set vmstore storage.owner-gid 36",
],
}
def multipath_config(self):
return {
"description": "iSCSI multipath for storage HA",
"config": {
"file": "/etc/multipath.conf",
"content": {
"defaults": {
"user_friendly_names": "yes",
"find_multipaths": "yes",
"path_grouping_policy": "failover",
"path_selector": "round-robin 0",
"failback": "immediate",
"no_path_retry": 5,
},
},
},
}
storage = OVirtStorageHA()
options = storage.storage_options()
print("Storage Options:")
for name, opt in options.items():
print(f" {name}: {opt['ha_level']}")
gluster = storage.glusterfs_setup()
print(f"\nGlusterFS Setup ({gluster['minimum_nodes']} nodes, replica {gluster['replica_count']})")
for cmd in gluster["commands"][:3]:
print(f" {cmd}")
Monitoring ????????? Disaster Recovery
Monitor oVirt cluster ????????? DR planning
# === oVirt Monitoring & DR ===
# 1. Built-in Grafana Dashboards
# oVirt Engine includes Grafana integration
# Access: https://engine.example.com/ovirt-engine-grafana/
# Dashboards:
# - Executive Dashboard (overview)
# - Inventory Dashboard (hosts, VMs, storage)
# - Service Level Dashboard (uptime, SLA)
# - Trend Dashboard (resource trends)
# 2. Prometheus Monitoring
cat > prometheus-ovirt.yml << 'EOF'
scrape_configs:
- job_name: "ovirt-engine"
metrics_path: /ovirt-engine/services/metrics
static_configs:
- targets: ["engine.example.com:443"]
scheme: https
tls_config:
insecure_skip_verify: true
- job_name: "ovirt-hosts"
static_configs:
- targets:
- "host01:9100"
- "host02:9100"
- "host03:9100"
EOF
# 3. Health Check Script
cat > ovirt_health.sh << 'BASH'
#!/bin/bash
# oVirt Cluster Health Check
ENGINE_URL="https://engine.example.com/ovirt-engine/api"
AUTH="admin@internal:password"
echo "=== oVirt Cluster Health ==="
# Check hosts status
echo "Hosts:"
curl -sk -u $AUTH "$ENGINE_URL/hosts" \
-H "Accept: application/json" | \
python3 -c "
import sys, json
data = json.load(sys.stdin)
for h in data.get('host', []):
print(f\" {h['name']}: {h['status']}\")"
# Check VMs with HA enabled
echo -e "\nHA VMs:"
curl -sk -u $AUTH "$ENGINE_URL/vms?search=ha_enabled%3Dtrue" \
-H "Accept: application/json" | \
python3 -c "
import sys, json
data = json.load(sys.stdin)
for vm in data.get('vm', []):
print(f\" {vm['name']}: {vm['status']}\")"
# Check storage domains
echo -e "\nStorage Domains:"
curl -sk -u $AUTH "$ENGINE_URL/storagedomains" \
-H "Accept: application/json" | \
python3 -c "
import sys, json
data = json.load(sys.stdin)
for sd in data.get('storage_domain', []):
avail = int(sd.get('available', 0)) // (1024**3)
used = int(sd.get('used', 0)) // (1024**3)
print(f\" {sd['name']}: {sd['status']} (used: {used}GB, avail: {avail}GB)\")"
BASH
chmod +x ovirt_health.sh
# 4. Backup Engine
# Full backup:
engine-backup --mode=backup --file=engine-backup.tar.gz --log=backup.log
# Restore:
engine-backup --mode=restore --file=engine-backup.tar.gz --log=restore.log
# Schedule daily backup:
echo "0 2 * * * root engine-backup --mode=backup --file=/backup/engine-\$(date +\%Y\%m\%d).tar.gz --log=/var/log/engine-backup.log" > /etc/cron.d/ovirt-backup
echo "Monitoring and DR configured"
FAQ ??????????????????????????????????????????
Q: oVirt ????????? VMware vSphere ???????????????????????????????????????????
A: oVirt ???????????? open source ??????????????? license cost ????????? KVM hypervisor (Linux kernel built-in) community support ???????????????????????? ?????? commercial support ????????? Red Hat (RHEV) ????????????????????????????????? budget ??????????????? ???????????????????????? Linux skills VMware vSphere ???????????? commercial product license cost ?????????????????? features ????????????????????? (vMotion, DRS, vSAN) support ???????????????????????? ecosystem ???????????? ????????????????????????????????? enterprise ?????????????????????????????? full support Performance ???????????????????????????????????? KVM ????????? ESXi ??????????????????????????????????????? oVirt ????????????????????????????????????????????????????????????????????? VMware Broadcom licensing ???????????????????????????????????????
Q: Fencing ????????????????????????????????????????????? HA?
A: ??????????????????????????? Fencing ???????????? requirement ?????????????????? VM HA ???????????????????????? fencing ??????????????? host fail oVirt ??????????????????????????? restart VMs ?????? host ????????????????????? ??????????????????????????????????????????????????? host ??????????????????????????????????????????????????? ????????????????????? split-brain (2 copies ????????? VM ???????????????????????? run ???????????????????????? ??????????????? data corruption) Fencing agent ?????? power off host ????????? fail ???????????? ???????????????????????? restart VMs ????????????????????? ?????????????????? fencing ????????????????????? IPMI/BMC ?????????????????? physical servers, power switch ?????????????????? PDU-based fencing ???????????????????????? fencing hardware ????????? SSH fencing ???????????????????????????????????????????????????????????? production
Q: Live Migration ?????????????????????????????????????
A: Live Migration ???????????? running VM ????????? host ?????????????????????????????? host ?????????????????? downtime ??????????????????????????? 1) Copy memory pages ????????? source ?????? destination ????????? VM ???????????????????????? 2) Copy dirty pages (pages ??????????????????????????????????????????????????? copy) ?????????????????????????????????????????? 3) Pause VM ????????????????????? (?????????????????????????????????) copy dirty pages ????????????????????? 4) Resume VM ?????? destination 5) Cleanup source ???????????????????????? ???????????? 2 hosts ???????????? access shared storage ????????????????????????, CPU compatible, network connectivity ?????? ????????????????????? dedicated migration network (10Gbps) ?????????????????? large VMs
Q: Hosted Engine ?????????????????????????????? ????????? host ????????? run Engine fail?
A: ????????????????????? Hosted Engine Agent ?????????????????????????????? host ????????????????????? monitor Engine VM ???????????????????????? ????????? host ????????? run Engine fail agent ?????? hosts ???????????????????????????????????? ??????????????? host ???????????????????????????????????????????????? (score-based) ???????????? start Engine VM ?????? host ???????????? ???????????????????????????????????????????????? 3-5 ???????????? ?????????????????????????????? VMs ????????? run ???????????????????????????????????????????????? ????????? management ??????????????????????????????????????????????????? ???????????????????????????????????????????????? 3 hosts ?????????????????? hosted engine ??????????????? redundancy ??????????????????
