SiamCafe.net Blog
Technology

Shopify Hydrogen High Availability HA Setup สร้าง E-Commerce ทไม่มีวนลม

shopify hydrogen high availability ha setup
Shopify Hydrogen High Availability HA Setup | SiamCafe Blog
2026-02-07· อ. บอม — SiamCafe.net· 1,163 คำ

High Availability ?????????????????? Hydrogen ?????????????????????

High Availability (HA) ????????????????????? ????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? components ????????? ?????????????????? e-commerce storefront ???????????????????????????????????? Shopify Hydrogen downtime ????????????????????? revenue loss ?????????????????? ?????????????????????????????? site ????????????????????????????????????????????????????????????

HA ?????????????????? Hydrogen ???????????????????????? Multi-region deployment ????????? app ???????????? regions ????????? region ?????????????????????????????? failover ??????????????? region, Load balancing ?????????????????? traffic ?????????????????? instances, Auto-scaling ??????????????????????????? instances ????????? traffic, CDN caching ?????? load ?????? origin servers, Health checks ????????????????????? app health ???????????????????????????, Graceful degradation ?????????????????????????????????????????? services ????????????????????????

???????????????????????? HA ???????????????????????????????????? e-commerce 99.9% uptime (8.76 ????????????????????? downtime ???????????????), 99.95% (4.38 ?????????????????????), 99.99% (52.6 ????????????) ?????????????????? Shopify store ???????????????????????????????????????????????? 1 ????????????????????? 99.9% uptime = ????????????????????? ~365,000 ?????????/?????? ????????? downtime

?????????????????? HA Architecture

Architecture patterns ?????????????????? Hydrogen HA

# === Hydrogen HA Architecture ===

cat > ha_architecture.yaml << 'EOF'
hydrogen_ha_architecture:
  tier_1_basic:
    description: "Single region, multi-instance"
    uptime: "99.9%"
    components:
      - "Load Balancer (ALB/NLB)"
      - "2+ Hydrogen instances (ECS/K8s)"
      - "CDN (CloudFront/Cloudflare)"
      - "Health checks"
    cost: "$50-200/month"
    suitable_for: "Small-medium stores"

  tier_2_advanced:
    description: "Multi-AZ, auto-scaling"
    uptime: "99.95%"
    components:
      - "ALB with multi-AZ"
      - "Auto-scaling group (2-10 instances)"
      - "Redis cluster for sessions"
      - "CDN with failover origin"
      - "RDS Multi-AZ (if using database)"
    cost: "$200-1000/month"
    suitable_for: "Medium-large stores"

  tier_3_enterprise:
    description: "Multi-region, active-active"
    uptime: "99.99%"
    components:
      - "Global load balancer (Route53/Cloudflare)"
      - "2+ regions (ap-southeast-1, us-west-2)"
      - "Redis Global Datastore"
      - "Cross-region replication"
      - "Automated failover"
      - "Chaos engineering testing"
    cost: "$1000-5000/month"
    suitable_for: "Enterprise, high-traffic stores"

  edge_deployment:
    description: "Deploy to edge (Cloudflare Workers/Oxygen)"
    uptime: "99.99%+"
    components:
      - "Shopify Oxygen (edge SSR)"
      - "Cloudflare Workers (alternative)"
      - "Edge caching at 300+ PoPs"
      - "Auto-failover built-in"
    cost: "Included with Shopify plan / $5-25/month CF Workers"
    suitable_for: "All stores (recommended starting point)"
EOF

echo "HA architecture defined"

Kubernetes Deployment ?????????????????? HA

Deploy Hydrogen ?????? Kubernetes ????????? HA

# === Kubernetes HA Deployment ===

# 1. Deployment with anti-affinity
cat > k8s/deployment.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hydrogen-storefront
  labels:
    app: hydrogen
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1
  selector:
    matchLabels:
      app: hydrogen
  template:
    metadata:
      labels:
        app: hydrogen
    spec:
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: hydrogen
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchLabels:
                    app: hydrogen
                topologyKey: kubernetes.io/hostname
      containers:
        - name: hydrogen
          image: myregistry/hydrogen-store:latest
          ports:
            - containerPort: 3000
          resources:
            requests:
              cpu: 250m
              memory: 256Mi
            limits:
              cpu: 1000m
              memory: 512Mi
          readinessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 15
            periodSeconds: 20
          env:
            - name: NODE_ENV
              value: "production"
            - name: PUBLIC_STORE_DOMAIN
              valueFrom:
                secretKeyRef:
                  name: shopify-secrets
                  key: store-domain
---
apiVersion: v1
kind: Service
metadata:
  name: hydrogen-service
spec:
  selector:
    app: hydrogen
  ports:
    - port: 80
      targetPort: 3000
  type: ClusterIP
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: hydrogen-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: hydrogen-storefront
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: hydrogen-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: hydrogen
EOF

kubectl apply -f k8s/deployment.yaml

echo "Kubernetes HA deployment applied"

CDN ????????? Edge Caching

????????????????????? CDN ?????????????????? performance ????????? availability

# === CDN & Edge Caching Configuration ===

# 1. Cloudflare Configuration
cat > cloudflare_config.yaml << 'EOF'
cloudflare_settings:
  dns:
    - type: CNAME
      name: shop.example.com
      content: hydrogen-lb.example.com
      proxied: true  # Enable Cloudflare proxy

  page_rules:
    - url: "shop.example.com/static/*"
      settings:
        cache_level: "Cache Everything"
        edge_cache_ttl: 86400  # 24 hours
        browser_cache_ttl: 2592000  # 30 days

    - url: "shop.example.com/products/*"
      settings:
        cache_level: "Cache Everything"
        edge_cache_ttl: 3600  # 1 hour
        origin_cache_control: true

    - url: "shop.example.com/cart*"
      settings:
        cache_level: "Bypass"  # Never cache cart

  cache_rules:
    static_assets:
      match: "*.js OR *.css OR *.png OR *.jpg OR *.woff2"
      ttl: 2592000
      
    html_pages:
      match: "*.html"
      ttl: 300
      stale_while_revalidate: 86400

  workers:
    stale_while_revalidate: |
      addEventListener('fetch', event => {
        event.respondWith(handleRequest(event.request));
      });
      
      async function handleRequest(request) {
        const cache = caches.default;
        let response = await cache.match(request);
        
        if (response) {
          // Serve from cache, revalidate in background
          event.waitUntil(
            fetch(request).then(fresh => cache.put(request, fresh))
          );
          return response;
        }
        
        response = await fetch(request);
        event.waitUntil(cache.put(request, response.clone()));
        return response;
      }
EOF

# 2. Nginx Caching (Origin)
cat > nginx/caching.conf << 'EOF'
proxy_cache_path /var/cache/nginx levels=1:2
    keys_zone=hydrogen:50m max_size=10g inactive=24h;

server {
    listen 80;
    server_name shop.example.com;

    # Static assets - long cache
    location ~* \.(js|css|png|jpg|jpeg|gif|ico|svg|woff2)$ {
        proxy_pass http://hydrogen-upstream;
        proxy_cache hydrogen;
        proxy_cache_valid 200 30d;
        proxy_cache_use_stale error timeout updating http_500 http_502 http_503;
        add_header X-Cache-Status $upstream_cache_status;
        expires 30d;
    }

    # Product pages - short cache with stale-while-revalidate
    location /products/ {
        proxy_pass http://hydrogen-upstream;
        proxy_cache hydrogen;
        proxy_cache_valid 200 5m;
        proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504;
        proxy_cache_background_update on;
        add_header Cache-Control "public, max-age=300, stale-while-revalidate=86400";
    }

    # Cart/checkout - no cache
    location ~ ^/(cart|checkout|account) {
        proxy_pass http://hydrogen-upstream;
        proxy_no_cache 1;
        proxy_cache_bypass 1;
        add_header Cache-Control "private, no-store";
    }
}
EOF

echo "CDN and caching configured"

Health Checks ????????? Auto-Recovery

????????????????????? health ??????????????????????????????????????????????????????

#!/usr/bin/env python3
# health_check.py ??? Hydrogen Health Check System
import json
import logging
from typing import Dict, List

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("health")

class HealthCheckSystem:
    def __init__(self):
        self.checks = {}
    
    def define_checks(self):
        return {
            "application": {
                "endpoint": "/health",
                "interval": "10s",
                "timeout": "5s",
                "checks": {
                    "http_status": "200 OK",
                    "response_time": "< 500ms",
                    "memory_usage": "< 80%",
                    "cpu_usage": "< 90%",
                },
            },
            "shopify_api": {
                "endpoint": "/health/shopify",
                "interval": "30s",
                "checks": {
                    "storefront_api": "reachable",
                    "api_latency": "< 200ms",
                    "rate_limit_remaining": "> 20%",
                },
            },
            "dependencies": {
                "endpoint": "/health/deps",
                "interval": "30s",
                "checks": {
                    "redis": "connected",
                    "cdn": "reachable",
                },
            },
        }
    
    def evaluate_health(self, results):
        """Evaluate overall health status"""
        statuses = {"healthy": 0, "degraded": 0, "unhealthy": 0}
        
        for check_name, check_result in results.items():
            if check_result["status"] == "pass":
                statuses["healthy"] += 1
            elif check_result["status"] == "warn":
                statuses["degraded"] += 1
            else:
                statuses["unhealthy"] += 1
        
        if statuses["unhealthy"] > 0:
            return {"overall": "UNHEALTHY", "action": "RESTART_POD"}
        elif statuses["degraded"] > 0:
            return {"overall": "DEGRADED", "action": "ALERT_TEAM"}
        else:
            return {"overall": "HEALTHY", "action": "NONE"}
    
    def recovery_actions(self):
        return {
            "pod_restart": {
                "trigger": "Health check fails 3 consecutive times",
                "action": "Kubernetes restarts pod (livenessProbe)",
                "escalation": "If restart fails 5 times, alert on-call",
            },
            "auto_scale_up": {
                "trigger": "CPU > 70% for 2 minutes",
                "action": "HPA adds new pods",
                "max_pods": 20,
            },
            "circuit_breaker": {
                "trigger": "Shopify API error rate > 50%",
                "action": "Serve cached content, queue requests",
                "recovery": "Retry after 30 seconds",
            },
            "failover": {
                "trigger": "Primary region health check fails",
                "action": "Route53/Cloudflare switches to secondary region",
                "rto": "< 60 seconds",
            },
        }

health = HealthCheckSystem()
checks = health.define_checks()
print("Health Check Configuration:")
for name, config in checks.items():
    print(f"  {name}: {config['endpoint']} (every {config['interval']})")

recovery = health.recovery_actions()
print("\nRecovery Actions:")
for name, info in recovery.items():
    print(f"  {name}: {info['action']}")
    print(f"    Trigger: {info['trigger']}")

Monitoring ????????? Alerting

??????????????????????????????????????????????????????

# === Monitoring & Alerting ===

# 1. Prometheus Metrics
cat > monitoring/prometheus.yml << 'EOF'
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'hydrogen'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app]
        regex: hydrogen
        action: keep
EOF

# 2. Grafana Dashboard Queries
cat > monitoring/grafana_queries.yaml << 'EOF'
hydrogen_dashboard:
  availability:
    query: "sum(up{app='hydrogen'}) / count(up{app='hydrogen'}) * 100"
    title: "Availability %"
    threshold: 99.9

  request_rate:
    query: "sum(rate(http_requests_total{app='hydrogen'}[5m]))"
    title: "Requests/sec"

  error_rate:
    query: "sum(rate(http_requests_total{app='hydrogen',status=~'5..'}[5m])) / sum(rate(http_requests_total{app='hydrogen'}[5m])) * 100"
    title: "Error Rate %"
    alert_threshold: 1

  latency_p99:
    query: "histogram_quantile(0.99, rate(http_request_duration_seconds_bucket{app='hydrogen'}[5m]))"
    title: "P99 Latency"
    alert_threshold: "2s"

  pod_count:
    query: "count(kube_pod_info{pod=~'hydrogen.*',pod_status='Running'})"
    title: "Running Pods"

  cache_hit_rate:
    query: "sum(rate(cdn_cache_hits[5m])) / sum(rate(cdn_requests_total[5m])) * 100"
    title: "CDN Cache Hit Rate %"
    target: 90
EOF

# 3. Alert Rules
cat > monitoring/alerts.yaml << 'EOF'
groups:
  - name: hydrogen-alerts
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate(http_requests_total{app="hydrogen",status=~"5.."}[5m]))
          / sum(rate(http_requests_total{app="hydrogen"}[5m])) > 0.01
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Hydrogen error rate > 1%"

      - alert: HighLatency
        expr: |
          histogram_quantile(0.99, rate(http_request_duration_seconds_bucket{app="hydrogen"}[5m])) > 2
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Hydrogen P99 latency > 2s"

      - alert: PodCrashLooping
        expr: |
          rate(kube_pod_container_status_restarts_total{pod=~"hydrogen.*"}[15m]) > 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Hydrogen pod crash looping"

      - alert: LowAvailability
        expr: |
          sum(up{app="hydrogen"}) / count(up{app="hydrogen"}) < 0.5
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Less than 50% of Hydrogen pods are up"
EOF

echo "Monitoring and alerting configured"

FAQ ??????????????????????????????????????????

Q: Shopify Oxygen ?????? HA built-in ??????????

A: ????????? Shopify Oxygen ???????????? edge deployment platform ??????????????? HA built-in Deploy ?????????????????? edge locations ???????????????????????????????????????????????? Shopify ?????????????????? scaling, failover, health checks ????????? ????????????????????? configure Kubernetes ???????????? load balancer ????????? uptime SLA ????????? Shopify plan ?????????????????? stores ???????????????????????? Oxygen ????????????????????? ????????????????????? self-host ???????????????????????? ??????????????????????????? customize infrastructure ?????????????????????????????? ??????????????????????????? custom middleware ???????????? backend services ?????????????????????????????? control ????????????????????? ????????? self-hosted ?????? Kubernetes ???????????? Cloudflare Workers

Q: ????????????????????? multi-region ??????????

A: ????????????????????? requirements ????????? store serve ???????????????????????? region ??????????????? (???????????? ???????????????????????????????????????) multi-AZ ?????? region ??????????????? (ap-southeast-1) ??????????????????????????????????????? 99.95% uptime ????????? store serve ??????????????????????????????????????? multi-region ???????????????????????? HA ????????? performance (latency ???????????????????????????????????? region) ?????????????????? CDN + edge caching ?????? ?????????????????????????????? multi-region origin ??????????????? CDN serve content ????????? edge ???????????? user ???????????????????????? ??????????????? ???????????????????????? single region + CDN ????????? downtime cost ?????????????????? ??????????????????????????? multi-region

Q: Cache invalidation ?????????????????????????????????????????? product ??????????????????????

A: Shopify Storefront API ????????? cache headers ?????????????????????????????????????????????????????? Hydrogen ????????? stale-while-revalidate pattern serve cached content ??????????????? ???????????? revalidate ?????? background ?????????????????? CDN cache ????????? Shopify webhooks (product/update, inventory/update) trigger CDN cache purge ??????????????? URL ??????????????????????????????????????? Cloudflare API purge by URL curl -X POST "https://api.cloudflare.com/client/v4/zones/{zone}/purge_cache" -d '{"files":["https://shop.example.com/products/xyz"]}' ?????????????????? critical changes (price, stock) ????????? short cache TTL (5 ????????????) ?????????????????? static content (images, CSS) ????????? long cache TTL (30 ?????????) ????????? versioned URLs

Q: Hydrogen handle flash sales ??????????????????????

A: Flash sales = traffic spike ?????????????????? (10-100x normal) ?????????????????????????????? Pre-scale pods ???????????? sale ??????????????? (??????????????? auto-scale), CDN cache product pages aggressively (?????? origin load), Queue system ?????????????????? checkout (????????????????????? overload), Rate limiting ?????????????????? bots, Shopify backend handle inventory locking (????????????????????? worry) ?????????????????? Kubernetes ???????????? HPA minReplicas ????????????????????? sale, ???????????? cluster autoscaler ????????? provision nodes ???????????????????????? Test ???????????? load testing tools (k6, Artillery) ???????????? sale ???????????? simulate traffic pattern ?????????????????????????????????????????????

📖 บทความที่เกี่ยวข้อง

Shopify Hydrogen Hexagonal Architectureอ่านบทความ → Shopify Hydrogen Audit Trail Loggingอ่านบทความ → DALL-E API High Availability HA Setupอ่านบทความ → Shopify Hydrogen Event Driven Designอ่านบทความ → Shopify Hydrogen FinOps Cloud Costอ่านบทความ →

📚 ดูบทความทั้งหมด →