High Availability ?????????????????? Hydrogen ?????????????????????
High Availability (HA) ????????????????????? ????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? components ????????? ?????????????????? e-commerce storefront ???????????????????????????????????? Shopify Hydrogen downtime ????????????????????? revenue loss ?????????????????? ?????????????????????????????? site ????????????????????????????????????????????????????????????
HA ?????????????????? Hydrogen ???????????????????????? Multi-region deployment ????????? app ???????????? regions ????????? region ?????????????????????????????? failover ??????????????? region, Load balancing ?????????????????? traffic ?????????????????? instances, Auto-scaling ??????????????????????????? instances ????????? traffic, CDN caching ?????? load ?????? origin servers, Health checks ????????????????????? app health ???????????????????????????, Graceful degradation ?????????????????????????????????????????? services ????????????????????????
???????????????????????? HA ???????????????????????????????????? e-commerce 99.9% uptime (8.76 ????????????????????? downtime ???????????????), 99.95% (4.38 ?????????????????????), 99.99% (52.6 ????????????) ?????????????????? Shopify store ???????????????????????????????????????????????? 1 ????????????????????? 99.9% uptime = ????????????????????? ~365,000 ?????????/?????? ????????? downtime
?????????????????? HA Architecture
Architecture patterns ?????????????????? Hydrogen HA
# === Hydrogen HA Architecture ===
cat > ha_architecture.yaml << 'EOF'
hydrogen_ha_architecture:
tier_1_basic:
description: "Single region, multi-instance"
uptime: "99.9%"
components:
- "Load Balancer (ALB/NLB)"
- "2+ Hydrogen instances (ECS/K8s)"
- "CDN (CloudFront/Cloudflare)"
- "Health checks"
cost: "$50-200/month"
suitable_for: "Small-medium stores"
tier_2_advanced:
description: "Multi-AZ, auto-scaling"
uptime: "99.95%"
components:
- "ALB with multi-AZ"
- "Auto-scaling group (2-10 instances)"
- "Redis cluster for sessions"
- "CDN with failover origin"
- "RDS Multi-AZ (if using database)"
cost: "$200-1000/month"
suitable_for: "Medium-large stores"
tier_3_enterprise:
description: "Multi-region, active-active"
uptime: "99.99%"
components:
- "Global load balancer (Route53/Cloudflare)"
- "2+ regions (ap-southeast-1, us-west-2)"
- "Redis Global Datastore"
- "Cross-region replication"
- "Automated failover"
- "Chaos engineering testing"
cost: "$1000-5000/month"
suitable_for: "Enterprise, high-traffic stores"
edge_deployment:
description: "Deploy to edge (Cloudflare Workers/Oxygen)"
uptime: "99.99%+"
components:
- "Shopify Oxygen (edge SSR)"
- "Cloudflare Workers (alternative)"
- "Edge caching at 300+ PoPs"
- "Auto-failover built-in"
cost: "Included with Shopify plan / $5-25/month CF Workers"
suitable_for: "All stores (recommended starting point)"
EOF
echo "HA architecture defined"
Kubernetes Deployment ?????????????????? HA
Deploy Hydrogen ?????? Kubernetes ????????? HA
# === Kubernetes HA Deployment ===
# 1. Deployment with anti-affinity
cat > k8s/deployment.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: hydrogen-storefront
labels:
app: hydrogen
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
selector:
matchLabels:
app: hydrogen
template:
metadata:
labels:
app: hydrogen
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: hydrogen
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: hydrogen
topologyKey: kubernetes.io/hostname
containers:
- name: hydrogen
image: myregistry/hydrogen-store:latest
ports:
- containerPort: 3000
resources:
requests:
cpu: 250m
memory: 256Mi
limits:
cpu: 1000m
memory: 512Mi
readinessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 15
periodSeconds: 20
env:
- name: NODE_ENV
value: "production"
- name: PUBLIC_STORE_DOMAIN
valueFrom:
secretKeyRef:
name: shopify-secrets
key: store-domain
---
apiVersion: v1
kind: Service
metadata:
name: hydrogen-service
spec:
selector:
app: hydrogen
ports:
- port: 80
targetPort: 3000
type: ClusterIP
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: hydrogen-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: hydrogen-storefront
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: hydrogen-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: hydrogen
EOF
kubectl apply -f k8s/deployment.yaml
echo "Kubernetes HA deployment applied"
CDN ????????? Edge Caching
????????????????????? CDN ?????????????????? performance ????????? availability
# === CDN & Edge Caching Configuration ===
# 1. Cloudflare Configuration
cat > cloudflare_config.yaml << 'EOF'
cloudflare_settings:
dns:
- type: CNAME
name: shop.example.com
content: hydrogen-lb.example.com
proxied: true # Enable Cloudflare proxy
page_rules:
- url: "shop.example.com/static/*"
settings:
cache_level: "Cache Everything"
edge_cache_ttl: 86400 # 24 hours
browser_cache_ttl: 2592000 # 30 days
- url: "shop.example.com/products/*"
settings:
cache_level: "Cache Everything"
edge_cache_ttl: 3600 # 1 hour
origin_cache_control: true
- url: "shop.example.com/cart*"
settings:
cache_level: "Bypass" # Never cache cart
cache_rules:
static_assets:
match: "*.js OR *.css OR *.png OR *.jpg OR *.woff2"
ttl: 2592000
html_pages:
match: "*.html"
ttl: 300
stale_while_revalidate: 86400
workers:
stale_while_revalidate: |
addEventListener('fetch', event => {
event.respondWith(handleRequest(event.request));
});
async function handleRequest(request) {
const cache = caches.default;
let response = await cache.match(request);
if (response) {
// Serve from cache, revalidate in background
event.waitUntil(
fetch(request).then(fresh => cache.put(request, fresh))
);
return response;
}
response = await fetch(request);
event.waitUntil(cache.put(request, response.clone()));
return response;
}
EOF
# 2. Nginx Caching (Origin)
cat > nginx/caching.conf << 'EOF'
proxy_cache_path /var/cache/nginx levels=1:2
keys_zone=hydrogen:50m max_size=10g inactive=24h;
server {
listen 80;
server_name shop.example.com;
# Static assets - long cache
location ~* \.(js|css|png|jpg|jpeg|gif|ico|svg|woff2)$ {
proxy_pass http://hydrogen-upstream;
proxy_cache hydrogen;
proxy_cache_valid 200 30d;
proxy_cache_use_stale error timeout updating http_500 http_502 http_503;
add_header X-Cache-Status $upstream_cache_status;
expires 30d;
}
# Product pages - short cache with stale-while-revalidate
location /products/ {
proxy_pass http://hydrogen-upstream;
proxy_cache hydrogen;
proxy_cache_valid 200 5m;
proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504;
proxy_cache_background_update on;
add_header Cache-Control "public, max-age=300, stale-while-revalidate=86400";
}
# Cart/checkout - no cache
location ~ ^/(cart|checkout|account) {
proxy_pass http://hydrogen-upstream;
proxy_no_cache 1;
proxy_cache_bypass 1;
add_header Cache-Control "private, no-store";
}
}
EOF
echo "CDN and caching configured"
Health Checks ????????? Auto-Recovery
????????????????????? health ??????????????????????????????????????????????????????
#!/usr/bin/env python3
# health_check.py ??? Hydrogen Health Check System
import json
import logging
from typing import Dict, List
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("health")
class HealthCheckSystem:
def __init__(self):
self.checks = {}
def define_checks(self):
return {
"application": {
"endpoint": "/health",
"interval": "10s",
"timeout": "5s",
"checks": {
"http_status": "200 OK",
"response_time": "< 500ms",
"memory_usage": "< 80%",
"cpu_usage": "< 90%",
},
},
"shopify_api": {
"endpoint": "/health/shopify",
"interval": "30s",
"checks": {
"storefront_api": "reachable",
"api_latency": "< 200ms",
"rate_limit_remaining": "> 20%",
},
},
"dependencies": {
"endpoint": "/health/deps",
"interval": "30s",
"checks": {
"redis": "connected",
"cdn": "reachable",
},
},
}
def evaluate_health(self, results):
"""Evaluate overall health status"""
statuses = {"healthy": 0, "degraded": 0, "unhealthy": 0}
for check_name, check_result in results.items():
if check_result["status"] == "pass":
statuses["healthy"] += 1
elif check_result["status"] == "warn":
statuses["degraded"] += 1
else:
statuses["unhealthy"] += 1
if statuses["unhealthy"] > 0:
return {"overall": "UNHEALTHY", "action": "RESTART_POD"}
elif statuses["degraded"] > 0:
return {"overall": "DEGRADED", "action": "ALERT_TEAM"}
else:
return {"overall": "HEALTHY", "action": "NONE"}
def recovery_actions(self):
return {
"pod_restart": {
"trigger": "Health check fails 3 consecutive times",
"action": "Kubernetes restarts pod (livenessProbe)",
"escalation": "If restart fails 5 times, alert on-call",
},
"auto_scale_up": {
"trigger": "CPU > 70% for 2 minutes",
"action": "HPA adds new pods",
"max_pods": 20,
},
"circuit_breaker": {
"trigger": "Shopify API error rate > 50%",
"action": "Serve cached content, queue requests",
"recovery": "Retry after 30 seconds",
},
"failover": {
"trigger": "Primary region health check fails",
"action": "Route53/Cloudflare switches to secondary region",
"rto": "< 60 seconds",
},
}
health = HealthCheckSystem()
checks = health.define_checks()
print("Health Check Configuration:")
for name, config in checks.items():
print(f" {name}: {config['endpoint']} (every {config['interval']})")
recovery = health.recovery_actions()
print("\nRecovery Actions:")
for name, info in recovery.items():
print(f" {name}: {info['action']}")
print(f" Trigger: {info['trigger']}")
Monitoring ????????? Alerting
??????????????????????????????????????????????????????
# === Monitoring & Alerting ===
# 1. Prometheus Metrics
cat > monitoring/prometheus.yml << 'EOF'
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'hydrogen'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
regex: hydrogen
action: keep
EOF
# 2. Grafana Dashboard Queries
cat > monitoring/grafana_queries.yaml << 'EOF'
hydrogen_dashboard:
availability:
query: "sum(up{app='hydrogen'}) / count(up{app='hydrogen'}) * 100"
title: "Availability %"
threshold: 99.9
request_rate:
query: "sum(rate(http_requests_total{app='hydrogen'}[5m]))"
title: "Requests/sec"
error_rate:
query: "sum(rate(http_requests_total{app='hydrogen',status=~'5..'}[5m])) / sum(rate(http_requests_total{app='hydrogen'}[5m])) * 100"
title: "Error Rate %"
alert_threshold: 1
latency_p99:
query: "histogram_quantile(0.99, rate(http_request_duration_seconds_bucket{app='hydrogen'}[5m]))"
title: "P99 Latency"
alert_threshold: "2s"
pod_count:
query: "count(kube_pod_info{pod=~'hydrogen.*',pod_status='Running'})"
title: "Running Pods"
cache_hit_rate:
query: "sum(rate(cdn_cache_hits[5m])) / sum(rate(cdn_requests_total[5m])) * 100"
title: "CDN Cache Hit Rate %"
target: 90
EOF
# 3. Alert Rules
cat > monitoring/alerts.yaml << 'EOF'
groups:
- name: hydrogen-alerts
rules:
- alert: HighErrorRate
expr: |
sum(rate(http_requests_total{app="hydrogen",status=~"5.."}[5m]))
/ sum(rate(http_requests_total{app="hydrogen"}[5m])) > 0.01
for: 2m
labels:
severity: critical
annotations:
summary: "Hydrogen error rate > 1%"
- alert: HighLatency
expr: |
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket{app="hydrogen"}[5m])) > 2
for: 5m
labels:
severity: warning
annotations:
summary: "Hydrogen P99 latency > 2s"
- alert: PodCrashLooping
expr: |
rate(kube_pod_container_status_restarts_total{pod=~"hydrogen.*"}[15m]) > 0
for: 5m
labels:
severity: critical
annotations:
summary: "Hydrogen pod crash looping"
- alert: LowAvailability
expr: |
sum(up{app="hydrogen"}) / count(up{app="hydrogen"}) < 0.5
for: 1m
labels:
severity: critical
annotations:
summary: "Less than 50% of Hydrogen pods are up"
EOF
echo "Monitoring and alerting configured"
FAQ ??????????????????????????????????????????
Q: Shopify Oxygen ?????? HA built-in ??????????
A: ????????? Shopify Oxygen ???????????? edge deployment platform ??????????????? HA built-in Deploy ?????????????????? edge locations ???????????????????????????????????????????????? Shopify ?????????????????? scaling, failover, health checks ????????? ????????????????????? configure Kubernetes ???????????? load balancer ????????? uptime SLA ????????? Shopify plan ?????????????????? stores ???????????????????????? Oxygen ????????????????????? ????????????????????? self-host ???????????????????????? ??????????????????????????? customize infrastructure ?????????????????????????????? ??????????????????????????? custom middleware ???????????? backend services ?????????????????????????????? control ????????????????????? ????????? self-hosted ?????? Kubernetes ???????????? Cloudflare Workers
Q: ????????????????????? multi-region ??????????
A: ????????????????????? requirements ????????? store serve ???????????????????????? region ??????????????? (???????????? ???????????????????????????????????????) multi-AZ ?????? region ??????????????? (ap-southeast-1) ??????????????????????????????????????? 99.95% uptime ????????? store serve ??????????????????????????????????????? multi-region ???????????????????????? HA ????????? performance (latency ???????????????????????????????????? region) ?????????????????? CDN + edge caching ?????? ?????????????????????????????? multi-region origin ??????????????? CDN serve content ????????? edge ???????????? user ???????????????????????? ??????????????? ???????????????????????? single region + CDN ????????? downtime cost ?????????????????? ??????????????????????????? multi-region
Q: Cache invalidation ?????????????????????????????????????????? product ??????????????????????
A: Shopify Storefront API ????????? cache headers ?????????????????????????????????????????????????????? Hydrogen ????????? stale-while-revalidate pattern serve cached content ??????????????? ???????????? revalidate ?????? background ?????????????????? CDN cache ????????? Shopify webhooks (product/update, inventory/update) trigger CDN cache purge ??????????????? URL ??????????????????????????????????????? Cloudflare API purge by URL curl -X POST "https://api.cloudflare.com/client/v4/zones/{zone}/purge_cache" -d '{"files":["https://shop.example.com/products/xyz"]}' ?????????????????? critical changes (price, stock) ????????? short cache TTL (5 ????????????) ?????????????????? static content (images, CSS) ????????? long cache TTL (30 ?????????) ????????? versioned URLs
Q: Hydrogen handle flash sales ??????????????????????
A: Flash sales = traffic spike ?????????????????? (10-100x normal) ?????????????????????????????? Pre-scale pods ???????????? sale ??????????????? (??????????????? auto-scale), CDN cache product pages aggressively (?????? origin load), Queue system ?????????????????? checkout (????????????????????? overload), Rate limiting ?????????????????? bots, Shopify backend handle inventory locking (????????????????????? worry) ?????????????????? Kubernetes ???????????? HPA minReplicas ????????????????????? sale, ???????????? cluster autoscaler ????????? provision nodes ???????????????????????? Test ???????????? load testing tools (k6, Artillery) ???????????? sale ???????????? simulate traffic pattern ?????????????????????????????????????????????
