SiamCafe.net Blog
Technology

Multus CNI Backup Recovery Strategy

multus cni backup recovery strategy
Multus CNI Backup Recovery Strategy | SiamCafe Blog
2025-09-22· อ. บอม — SiamCafe.net· 11,748 คำ

Multus CNI Backup

Multus CNI Backup Recovery NetworkAttachmentDefinition IPAM SR-IOV Macvlan Disaster Recovery Configuration Export Git Production Operations

Backup ItemLocationMethodFrequencyPriority
NAD ResourcesKubernetes APIkubectl export / VeleroEvery change + dailyCritical
CNI Config Files/etc/cni/net.d/Ansible collect / rsyncDailyHigh
IPAM DatabaseWhereabouts CRDkubectl exportHourlyHigh
SR-IOV ConfigConfigMap + Nodekubectl + AnsibleEvery changeHigh
Helm ValuesGit repositoryGit commitEvery changeMedium
Pod AnnotationsPod specskubectl exportWith deploymentMedium

Backup Procedures

# === Multus CNI Backup Script ===

# Export all NetworkAttachmentDefinitions
# kubectl get network-attachment-definitions -A -o yaml > backup/nad-all.yaml

# Export per namespace
# for ns in $(kubectl get ns -o jsonpath='{.items[*].metadata.name}'); do
#   kubectl get nad -n $ns -o yaml > backup/nad-$ns.yaml 2>/dev/null
# done

# Backup CNI config from all nodes
# for node in $(kubectl get nodes -o jsonpath='{.items[*].metadata.name}'); do
#   ssh $node "cat /etc/cni/net.d/00-multus.conf" > backup/multus-$node.conf
#   ssh $node "ls /etc/cni/net.d/" > backup/cni-files-$node.txt
# done

# Backup Whereabouts IPAM
# kubectl get ippools.whereabouts.cni.cncf.io -A -o yaml > backup/whereabouts-pools.yaml
# kubectl get overlappingrangeipreservations.whereabouts.cni.cncf.io -A -o yaml > backup/whereabouts-reservations.yaml

# Backup SR-IOV config
# kubectl get sriovnetworknodepolicies.sriovnetwork.openshift.io -A -o yaml > backup/sriov-policies.yaml
# kubectl get sriovnetworks.sriovnetwork.openshift.io -A -o yaml > backup/sriov-networks.yaml

# Velero backup with Multus resources
# velero backup create multus-backup \
#   --include-resources networkattachmentdefinitions, ippools, sriovnetworks \
#   --include-namespaces default, production, telco

from dataclasses import dataclass

@dataclass
class BackupJob:
    name: str
    command: str
    schedule: str
    retention: str
    storage: str

jobs = [
    BackupJob("NAD Export", "kubectl get nad -A -o yaml", "Daily 02:00", "30 days", "Git + S3"),
    BackupJob("IPAM State", "kubectl get ippools -A -o yaml", "Hourly", "7 days", "S3"),
    BackupJob("CNI Config", "Ansible collect /etc/cni/net.d/", "Daily 03:00", "30 days", "Git + S3"),
    BackupJob("SR-IOV Policy", "kubectl get sriovnetworknodepolicies -o yaml", "On change", "30 days", "Git"),
    BackupJob("Velero Full", "velero backup create", "Daily 04:00", "14 days", "S3"),
    BackupJob("Helm Values", "git commit values.yaml", "On change", "Forever", "Git"),
]

print("=== Backup Jobs ===")
for j in jobs:
    print(f"  [{j.name}] Schedule: {j.schedule}")
    print(f"    Command: {j.command}")
    print(f"    Retention: {j.retention} | Storage: {j.storage}")

Recovery Procedures

# === Disaster Recovery Runbook ===

@dataclass
class RecoveryStep:
    order: int
    step: str
    command: str
    verify: str
    rollback: str

steps = [
    RecoveryStep(1, "Install Multus CNI",
        "helm install multus -f values.yaml",
        "kubectl get pods -n kube-system | grep multus",
        "helm uninstall multus"),
    RecoveryStep(2, "Restore NAD resources",
        "kubectl apply -f backup/nad-all.yaml",
        "kubectl get nad -A",
        "kubectl delete -f backup/nad-all.yaml"),
    RecoveryStep(3, "Restore IPAM state",
        "kubectl apply -f backup/whereabouts-pools.yaml",
        "kubectl get ippools -A",
        "kubectl delete ippools -A"),
    RecoveryStep(4, "Restore SR-IOV config",
        "kubectl apply -f backup/sriov-policies.yaml",
        "kubectl get sriovnetworknodepolicies -A",
        "kubectl delete -f backup/sriov-policies.yaml"),
    RecoveryStep(5, "Verify CNI config on nodes",
        "ansible all -m shell -a 'cat /etc/cni/net.d/00-multus.conf'",
        "Compare with backup config",
        "Ansible push backup config"),
    RecoveryStep(6, "Restart affected pods",
        "kubectl rollout restart deployment -n production",
        "kubectl get pods -n production -o wide (check IPs)",
        "kubectl rollout undo deployment -n production"),
    RecoveryStep(7, "Network connectivity test",
        "kubectl exec test-pod -- ping -c 3 10.0.1.1",
        "All pings succeed, no packet loss",
        "Check NAD config, IPAM, routes"),
]

print("=== Recovery Runbook ===")
for s in steps:
    print(f"  Step {s.order}: {s.step}")
    print(f"    Command: {s.command}")
    print(f"    Verify: {s.verify}")
    print(f"    Rollback: {s.rollback}")

Automation and Testing

# === Automated DR Testing ===

# CronJob for automated backup verification
# apiVersion: batch/v1
# kind: CronJob
# metadata:
#   name: multus-backup-verify
# spec:
#   schedule: "0 6 * * 1"  # Every Monday 6:00
#   jobTemplate:
#     spec:
#       template:
#         spec:
#           containers:
#             - name: verify
#               image: bitnami/kubectl
#               command:
#                 - /bin/bash
#                 - -c
#                 - |
#                   # Count NADs
#                   CURRENT=$(kubectl get nad -A --no-headers | wc -l)
#                   BACKUP=$(grep "kind: NetworkAttachmentDefinition" /backup/nad-all.yaml | wc -l)
#                   if [ "$CURRENT" != "$BACKUP" ]; then
#                     echo "WARNING: NAD count mismatch current=$CURRENT backup=$BACKUP"
#                     # Send alert
#                   fi
#                   # Verify IPAM
#                   kubectl get ippools -A --no-headers | wc -l

@dataclass
class DRTest:
    test: str
    frequency: str
    method: str
    pass_criteria: str
    last_result: str

tests = [
    DRTest("NAD backup completeness", "Weekly", "Compare NAD count backup vs live",
        "Count matches, all NADs in backup", "PASS"),
    DRTest("IPAM state consistency", "Daily", "Compare IP assignments backup vs live",
        "No orphaned IPs, no conflicts", "PASS"),
    DRTest("Restore to test cluster", "Monthly", "Apply backup to test cluster, deploy pods",
        "Pods get correct IPs, traffic flows", "PASS"),
    DRTest("SR-IOV recovery", "Quarterly", "Restore SR-IOV config, verify VF binding",
        "VFs allocated, pod network works", "PASS"),
    DRTest("Full DR drill", "Quarterly", "Simulate cluster loss, full recovery",
        "All networks restored < 2 hours", "PASS"),
]

print("DR Test Schedule:")
for t in tests:
    print(f"  [{t.test}] Frequency: {t.frequency}")
    print(f"    Method: {t.method}")
    print(f"    Pass: {t.pass_criteria} | Last: {t.last_result}")

rto_rpo = {
    "NAD Resources": "RTO: 15 min | RPO: Last change (Git)",
    "IPAM State": "RTO: 30 min | RPO: 1 hour",
    "SR-IOV Config": "RTO: 1 hour | RPO: Last change (Git)",
    "Full Network Recovery": "RTO: 2 hours | RPO: 1 hour",
    "Pod Network Connectivity": "RTO: 30 min (after infra) | RPO: N/A",
}

print(f"\n\nRTO/RPO Targets:")
for k, v in rto_rpo.items():
    print(f"  [{k}]: {v}")

เคล็ดลับ

Multus CNI คืออะไร

Meta CNI Plugin Multi-homed Pod หลาย Network Interface Calico Flannel SR-IOV Macvlan NAD Telco NFV Storage Management Network Traffic

ทำไมต้อง Backup Multus Config

NAD Custom Resource IPAM IP Assignment Conflict SR-IOV Hardware Disaster Recovery Configuration Drift Pod Annotation Network Config Git

Backup อะไรบ้าง

NAD kubectl yaml IPAM /etc/cni/net.d/ Multus Config SR-IOV Device Plugin ConfigMap Whereabouts Pod Annotations Helm Values Git

Recovery ทำอย่างไร

Install Multus Apply NAD Restore IPAM Whereabouts SR-IOV Verify CNI Restart Pods Network Test Ping Smoke Test Policy RTO 2 hours

สรุป

Multus CNI Backup Recovery NAD IPAM SR-IOV Velero Git Disaster Recovery Runbook Automated Testing RTO RPO Network Connectivity Production Operations

📖 บทความที่เกี่ยวข้อง

Multus CNI Cloud Native Designอ่านบทความ → Multus CNI High Availability HA Setupอ่านบทความ → Multus CNI SaaS Architectureอ่านบทความ → Multus CNI Metric Collectionอ่านบทความ → Multus CNI Edge Computingอ่านบทความ →

📚 ดูบทความทั้งหมด →