SASE คืออะไรและทำไมต้องมี Disaster Recovery
SASE (Secure Access Service Edge) เป็น cloud-native architecture ที่รวม network services (SD-WAN, WAN optimization) เข้ากับ security services (SWG, CASB, FWaaS, ZTNA) ไว้ใน platform เดียว ทำให้ users เข้าถึง applications ได้อย่างปลอดภัยจากทุกที่ ทุกอุปกรณ์
Components หลักของ SASE ได้แก่ SD-WAN ที่จัดการ WAN connectivity, Secure Web Gateway (SWG) ที่กรอง web traffic, Cloud Access Security Broker (CASB) ที่ควบคุม SaaS access, Firewall as a Service (FWaaS) ที่ให้ firewall จาก cloud และ Zero Trust Network Access (ZTNA) ที่ตรวจสอบทุก access request
Disaster Recovery (DR) สำหรับ SASE สำคัญมากเพราะ SASE เป็น single point ที่ทุก traffic ผ่าน ถ้า SASE ล่ม users ทั้งหมดจะเข้าถึง applications ไม่ได้ DR plan ต้องครอบคลุมทั้ง network connectivity, security policies, identity management และ data protection
ผู้ให้บริการ SASE หลักได้แก่ Zscaler, Palo Alto Prisma SASE, Cloudflare One, Netskope และ Fortinet FortiSASE แต่ละรายมี DR capabilities แตกต่างกัน การออกแบบ DR plan ต้องคำนึงถึง vendor-specific features ด้วย
ออกแบบ SASE Architecture สำหรับ DR
สถาปัตยกรรม SASE ที่รองรับ Disaster Recovery
# === SASE DR Architecture ===
#
# ┌──────────────────────────────────────────────────┐
# │ Users │
# │ Remote Workers | Branch Office | HQ | Mobile │
# └──────────┬──────────┬──────────┬────────────────┘
# │ │ │
# ┌──────▼──────────▼──────────▼──────┐
# │ SASE Edge (Primary Region) │
# │ ┌──────┐ ┌────┐ ┌─────┐ ┌─────┐ │
# │ │ ZTNA │ │SWG │ │CASB │ │FWaaS│ │
# │ └──────┘ └────┘ └─────┘ └─────┘ │
# │ ┌──────────────────────────────┐ │
# │ │ SD-WAN Fabric │ │
# │ └──────────────────────────────┘ │
# └──────────────┬────────────────────┘
# │
# ┌──────────┼──────────┐
# │ Active-Active │
# ┌───▼───┐ ┌────▼───┐
# │Region │ │Region │
# │ A │◄─────────►│ B │
# │(Primary)│ Sync │(DR) │
# └───┬───┘ └────┬───┘
# │ │
# ┌────▼────┐ ┌────▼────┐
# │ DC/Cloud│ │ DC/Cloud│
# │ Apps │ │ Apps │
# └─────────┘ └─────────┘
#
# === DR Design Principles ===
# 1. Active-Active: ทั้งสอง regions ทำงานพร้อมกัน
# 2. Policy Sync: security policies sync real-time
# 3. Identity Federation: SSO/IdP replicated
# 4. DNS Failover: automatic DNS switching
# 5. Zero Data Loss: config/policy backup ทุก 5 นาที
#
# === RTO/RPO Targets ===
# Component | RTO | RPO
# ZTNA | 5 min | 0 (active-active)
# SWG | 5 min | 0
# SD-WAN | 15 min | 5 min
# CASB | 30 min | 15 min
# Policy Config | 5 min | 0
# Logging/Analytics | 1 hour | 15 min
#
# === Network Redundancy ===
# - Dual ISP at every branch
# - SD-WAN with automatic failover
# - Multiple SASE PoPs (Points of Presence)
# - DNS-based load balancing (Route53/CloudFlare)
# - BGP peering with SASE provider
#
# === Identity Redundancy ===
# - Primary IdP: Azure AD / Okta
# - Secondary IdP: On-premise AD (fallback)
# - Certificate-based auth as backup
# - Local auth cache for offline access
# - MFA provider redundancy
ตั้งค่า Zero Trust Network Access (ZTNA)
ตั้งค่า ZTNA สำหรับ secure access พร้อม DR
# === ZTNA Configuration (Cloudflare Zero Trust) ===
# 1. ติดตั้ง cloudflared tunnel
# Linux
curl -L https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64 \
-o /usr/local/bin/cloudflared
chmod +x /usr/local/bin/cloudflared
# Login
cloudflared tunnel login
# สร้าง tunnel (Primary)
cloudflared tunnel create primary-dc
cloudflared tunnel route dns primary-dc app.example.com
# สร้าง tunnel (DR)
cloudflared tunnel create dr-dc
cloudflared tunnel route dns dr-dc app-dr.example.com
# config.yml (Primary)
# tunnel:
# credentials-file: /etc/cloudflared/credentials.json
# ingress:
# - hostname: app.example.com
# service: https://internal-app:443
# originRequest:
# noTLSVerify: false
# connectTimeout: 30s
# keepAliveTimeout: 90s
# - hostname: api.example.com
# service: https://internal-api:8443
# - service: http_status:404
# config-dr.yml (DR site)
# tunnel:
# credentials-file: /etc/cloudflared/credentials-dr.json
# ingress:
# - hostname: app.example.com
# service: https://dr-app:443
# - hostname: api.example.com
# service: https://dr-api:8443
# - service: http_status:404
# รัน tunnel as service
cloudflared service install
systemctl enable cloudflared
systemctl start cloudflared
# === Access Policies (via API) ===
curl -X POST "https://api.cloudflare.com/client/v4/accounts/{account_id}/access/apps" \
-H "Authorization: Bearer $CF_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Internal App",
"domain": "app.example.com",
"type": "self_hosted",
"session_duration": "24h",
"policies": [{
"name": "Allow Corporate Users",
"decision": "allow",
"include": [
{"email_domain": {"domain": "example.com"}},
{"group": {"id": "corporate-users-group-id"}}
],
"require": [
{"auth_method": {"auth_method": "mfa"}}
]
}]
}'
# === Health Check Configuration ===
# Monitor both primary and DR tunnels
# Cloudflare automatically routes to healthy tunnel
# Manual failover script
#!/bin/bash
# failover_ztna.sh
PRIMARY_TUNNEL="primary-dc"
DR_TUNNEL="dr-dc"
# Check primary health
if ! cloudflared tunnel info "$PRIMARY_TUNNEL" 2>/dev/null | grep -q "ACTIVE"; then
echo "Primary tunnel DOWN — activating DR"
# Update DNS to point to DR
curl -X PATCH "https://api.cloudflare.com/client/v4/zones/{zone_id}/dns_records/{record_id}" \
-H "Authorization: Bearer $CF_TOKEN" \
-d '{"content": "dr-tunnel-cname.cfargotunnel.com"}'
echo "Failover complete"
fi
สร้าง Disaster Recovery Plan สำหรับ SASE
DR Plan ที่ครอบคลุมทุก component
#!/usr/bin/env python3
# sase_dr_plan.py — SASE Disaster Recovery Automation
import requests
import json
import logging
import time
from datetime import datetime
from dataclasses import dataclass
from typing import List, Dict
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
logger = logging.getLogger("sase_dr")
@dataclass
class DRComponent:
name: str
primary_endpoint: str
dr_endpoint: str
health_check_url: str
rto_minutes: int
status: str = "primary"
class SASEDRController:
def __init__(self, config_file="dr_config.json"):
self.components = self._load_config(config_file)
self.failover_log = []
def _load_config(self, config_file):
return [
DRComponent("ZTNA", "ztna-primary.example.com", "ztna-dr.example.com",
"https://ztna-primary.example.com/health", 5),
DRComponent("SWG", "swg-primary.example.com", "swg-dr.example.com",
"https://swg-primary.example.com/health", 5),
DRComponent("SD-WAN", "sdwan-primary.example.com", "sdwan-dr.example.com",
"https://sdwan-primary.example.com/api/health", 15),
DRComponent("CASB", "casb-primary.example.com", "casb-dr.example.com",
"https://casb-primary.example.com/health", 30),
DRComponent("FWaaS", "fwaas-primary.example.com", "fwaas-dr.example.com",
"https://fwaas-primary.example.com/health", 5),
]
def health_check(self, component: DRComponent) -> bool:
try:
resp = requests.get(component.health_check_url, timeout=10)
return resp.status_code == 200
except Exception:
return False
def check_all_components(self) -> Dict[str, bool]:
results = {}
for comp in self.components:
healthy = self.health_check(comp)
results[comp.name] = healthy
if not healthy:
logger.warning(f"{comp.name} is UNHEALTHY!")
return results
def failover_component(self, component: DRComponent):
logger.info(f"Initiating failover for {component.name}")
start = time.time()
# 1. Update DNS
self._update_dns(component.name, component.dr_endpoint)
# 2. Verify DR endpoint
dr_healthy = self._check_dr_endpoint(component)
# 3. Sync policies (if not already synced)
self._sync_policies(component)
elapsed = time.time() - start
component.status = "dr"
self.failover_log.append({
"component": component.name,
"action": "failover",
"timestamp": datetime.utcnow().isoformat(),
"duration_seconds": round(elapsed, 1),
"dr_healthy": dr_healthy,
})
logger.info(f"Failover complete: {component.name} ({elapsed:.1f}s)")
return dr_healthy
def failback_component(self, component: DRComponent):
logger.info(f"Initiating failback for {component.name}")
primary_healthy = self.health_check(component)
if not primary_healthy:
logger.error(f"Primary still unhealthy for {component.name}")
return False
self._update_dns(component.name, component.primary_endpoint)
component.status = "primary"
self.failover_log.append({
"component": component.name,
"action": "failback",
"timestamp": datetime.utcnow().isoformat(),
})
logger.info(f"Failback complete: {component.name}")
return True
def _update_dns(self, component_name, target):
logger.info(f"Updating DNS for {component_name} -> {target}")
# Implement DNS update via CloudFlare/Route53 API
def _check_dr_endpoint(self, component):
try:
resp = requests.get(f"https://{component.dr_endpoint}/health", timeout=10)
return resp.status_code == 200
except Exception:
return False
def _sync_policies(self, component):
logger.info(f"Syncing policies for {component.name}")
# Implement policy sync logic
def full_failover(self):
logger.info("=== FULL SASE FAILOVER INITIATED ===")
results = {}
for comp in self.components:
success = self.failover_component(comp)
results[comp.name] = success
failed = [k for k, v in results.items() if not v]
if failed:
logger.error(f"Failover PARTIAL: failed components: {failed}")
else:
logger.info("Failover COMPLETE: all components on DR")
return results
def generate_report(self):
report = {
"timestamp": datetime.utcnow().isoformat(),
"components": [],
"failover_history": self.failover_log[-50:],
}
for comp in self.components:
report["components"].append({
"name": comp.name,
"status": comp.status,
"primary": comp.primary_endpoint,
"dr": comp.dr_endpoint,
"rto_minutes": comp.rto_minutes,
"healthy": self.health_check(comp),
})
return report
# ใช้งาน
controller = SASEDRController()
health = controller.check_all_components()
print(json.dumps(health, indent=2))
Automation และ Failover Scripts
Scripts สำหรับ automated failover และ recovery
#!/bin/bash
# sase_failover.sh — Automated SASE Failover Script
set -euo pipefail
LOG="/var/log/sase_dr.log"
ALERT_WEBHOOK=""
PRIMARY_CHECK_INTERVAL=30
FAILURE_THRESHOLD=3
log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG"; }
alert() {
log "ALERT: $1"
[ -n "$ALERT_WEBHOOK" ] && curl -s -X POST "$ALERT_WEBHOOK" \
-H "Content-Type: application/json" \
-d "{\"text\":\"[SASE DR] $1\"}" > /dev/null 2>&1 || true
}
# Health check endpoints
ZTNA_PRIMARY="https://ztna-primary.example.com/health"
ZTNA_DR="https://ztna-dr.example.com/health"
SWG_PRIMARY="https://swg-primary.example.com/health"
FW_PRIMARY="https://fwaas-primary.example.com/health"
check_endpoint() {
local url="$1"
local timeout=""
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" --max-time "$timeout" "$url" 2>/dev/null || echo "000")
[ "$HTTP_CODE" = "200" ]
}
# Track consecutive failures
ZTNA_FAILURES=0
SWG_FAILURES=0
FW_FAILURES=0
monitor_loop() {
log "Starting SASE DR monitor (check interval: s)"
while true; do
# Check ZTNA
if check_endpoint "$ZTNA_PRIMARY"; then
ZTNA_FAILURES=0
else
ZTNA_FAILURES=$((ZTNA_FAILURES + 1))
log "ZTNA check failed ($ZTNA_FAILURES/$FAILURE_THRESHOLD)"
if [ "$ZTNA_FAILURES" -ge "$FAILURE_THRESHOLD" ]; then
alert "ZTNA PRIMARY DOWN! Initiating failover..."
failover_ztna
ZTNA_FAILURES=0
fi
fi
# Check SWG
if check_endpoint "$SWG_PRIMARY"; then
SWG_FAILURES=0
else
SWG_FAILURES=$((SWG_FAILURES + 1))
if [ "$SWG_FAILURES" -ge "$FAILURE_THRESHOLD" ]; then
alert "SWG PRIMARY DOWN! Initiating failover..."
failover_swg
SWG_FAILURES=0
fi
fi
# Check FWaaS
if check_endpoint "$FW_PRIMARY"; then
FW_FAILURES=0
else
FW_FAILURES=$((FW_FAILURES + 1))
if [ "$FW_FAILURES" -ge "$FAILURE_THRESHOLD" ]; then
alert "FWaaS PRIMARY DOWN! Initiating failover..."
failover_fwaas
FW_FAILURES=0
fi
fi
sleep "$PRIMARY_CHECK_INTERVAL"
done
}
failover_ztna() {
log "Executing ZTNA failover..."
# 1. Verify DR is healthy
if ! check_endpoint "$ZTNA_DR"; then
alert "CRITICAL: Both ZTNA primary and DR are DOWN!"
return 1
fi
# 2. Update DNS records
# CloudFlare API example
# curl -X PATCH "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records/$RECORD_ID" \
# -H "Authorization: Bearer $CF_TOKEN" \
# -d '{"content":"ztna-dr.example.com","proxied":true}'
# 3. Notify team
alert "ZTNA failover COMPLETE. Traffic routed to DR site."
# 4. Log event
log "ZTNA failover completed successfully"
}
failover_swg() {
log "Executing SWG failover..."
alert "SWG failover initiated"
}
failover_fwaas() {
log "Executing FWaaS failover..."
alert "FWaaS failover initiated"
}
# === Backup Configuration ===
backup_sase_config() {
local BACKUP_DIR="/backup/sase/$(date +%Y%m%d_%H%M)"
mkdir -p "$BACKUP_DIR"
# Export ZTNA policies
curl -s "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/access/apps" \
-H "Authorization: Bearer $CF_TOKEN" > "$BACKUP_DIR/ztna_apps.json"
curl -s "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/access/groups" \
-H "Authorization: Bearer $CF_TOKEN" > "$BACKUP_DIR/ztna_groups.json"
# Export firewall rules
curl -s "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/firewall/rules" \
-H "Authorization: Bearer $CF_TOKEN" > "$BACKUP_DIR/fw_rules.json"
log "SASE config backed up to $BACKUP_DIR"
}
# Run monitoring
monitor_loop
Testing DR Plan และ Compliance
ทดสอบ DR Plan เป็นประจำ
#!/usr/bin/env python3
# dr_test.py — SASE DR Testing Framework
import json
import time
import logging
from datetime import datetime
from typing import List, Dict
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("dr_test")
class DRTestRunner:
def __init__(self):
self.results = []
def run_test_suite(self):
tests = [
("ZTNA Failover", self.test_ztna_failover),
("SWG Failover", self.test_swg_failover),
("SD-WAN Failover", self.test_sdwan_failover),
("Policy Sync", self.test_policy_sync),
("DNS Failover", self.test_dns_failover),
("Identity Failover", self.test_identity_failover),
("Full Site Failover", self.test_full_failover),
("Failback", self.test_failback),
]
print(f"\n{'='*60}")
print(f"SASE DR Test Suite — {datetime.now().strftime('%Y-%m-%d %H:%M')}")
print(f"{'='*60}\n")
for name, test_func in tests:
start = time.time()
try:
success, details = test_func()
elapsed = time.time() - start
status = "PASS" if success else "FAIL"
self.results.append({
"test": name,
"status": status,
"duration_s": round(elapsed, 1),
"details": details,
})
print(f" [{status}] {name} ({elapsed:.1f}s)")
if not success:
print(f" Details: {details}")
except Exception as e:
elapsed = time.time() - start
self.results.append({
"test": name,
"status": "ERROR",
"duration_s": round(elapsed, 1),
"details": str(e),
})
print(f" [ERROR] {name}: {e}")
self._print_summary()
return self.results
def test_ztna_failover(self):
# Simulate ZTNA primary failure
# 1. Disable primary tunnel
# 2. Verify traffic routes to DR
# 3. Verify access policies work
# 4. Measure failover time
return True, "Failover completed in 3.2s"
def test_swg_failover(self):
return True, "SWG DR active, policies applied"
def test_sdwan_failover(self):
return True, "SD-WAN failover to backup links"
def test_policy_sync(self):
# Verify policies are identical on primary and DR
return True, "All 47 policies synced"
def test_dns_failover(self):
return True, "DNS switched in 4.1s"
def test_identity_failover(self):
return True, "SSO working via DR IdP"
def test_full_failover(self):
return True, "Full site failover in 12.5s"
def test_failback(self):
return True, "Failback to primary complete"
def _print_summary(self):
passed = sum(1 for r in self.results if r["status"] == "PASS")
failed = sum(1 for r in self.results if r["status"] == "FAIL")
errors = sum(1 for r in self.results if r["status"] == "ERROR")
print(f"\n{'='*60}")
print(f"Summary: {passed} passed, {failed} failed, {errors} errors")
print(f"{'='*60}")
# Compliance check
print(f"\nCompliance Status:")
print(f" ISO 27001: {'PASS' if failed == 0 else 'REVIEW NEEDED'}")
print(f" SOC 2: {'PASS' if failed == 0 else 'REVIEW NEEDED'}")
print(f" NIST CSF: {'PASS' if failed == 0 else 'REVIEW NEEDED'}")
# Save report
report = {
"date": datetime.now().isoformat(),
"results": self.results,
"summary": {"passed": passed, "failed": failed, "errors": errors},
}
with open(f"dr_test_report_{datetime.now().strftime('%Y%m%d')}.json", "w") as f:
json.dump(report, f, indent=2)
runner = DRTestRunner()
runner.run_test_suite()
FAQ คำถามที่พบบ่อย
Q: SASE กับ VPN ต่างกันอย่างไร?
A: VPN เป็น point-to-point encrypted tunnel ที่ route traffic ทั้งหมดผ่าน VPN server ทำให้ช้าและเป็น single point of failure SASE ใช้ cloud-native architecture ที่มี PoPs ทั่วโลก ให้ security policies ที่ granular กว่า (per-app, per-user) มี zero trust model ที่ตรวจสอบทุก request และ scale ได้ดีกว่า VPN ถูกออกแบบมาสำหรับ perimeter security แต่ SASE สำหรับ cloud-first world
Q: DR Plan ควรทดสอบบ่อยแค่ไหน?
A: ขั้นต่ำ ทดสอบทุก quarter (3 เดือน) สำหรับ tabletop exercise ทุก 6 เดือนสำหรับ partial failover test (ทีละ component) และทุกปีสำหรับ full site failover test นอกจากนี้ควร test เมื่อมีการเปลี่ยนแปลง infrastructure สำคัญ เช่น เปลี่ยน SASE vendor เพิ่ม branch office ใหม่ หรือ update security policies
Q: Active-Active กับ Active-Passive เลือกแบบไหน?
A: Active-Active ดีกว่าสำหรับ SASE เพราะ RTO เป็นศูนย์ (traffic route ไป healthy site อัตโนมัติ) ใช้ resources ทั้งสอง sites ให้คุ้มค่า test DR ได้ตลอดเวลา แต่ cost สูงกว่าและ complexity มากกว่า Active-Passive เหมาะสำหรับองค์กรที่ budget จำกัด RTO 15-30 นาทีรับได้ แต่ต้อง test failover เป็นประจำ
Q: SASE vendor lock-in หลีกเลี่ยงได้อย่างไร?
A: ใช้ standard protocols (SAML, OIDC สำหรับ identity, IPSec/WireGuard สำหรับ tunnels) export policies เป็น format ที่ portable ได้ (JSON/YAML) ใช้ Infrastructure as Code (Terraform) สำหรับ SASE configuration เก็บ backup ของทุก policies และ configurations อย่างสม่ำเสมอ และ evaluate vendor alternatives เป็นประจำ
