Technology

A/B Testing ML Home Lab Setup สร้าง Experimentation Platform บนเครองตวเอง

A/B Testing ML Home Lab Setup | SiamCafe Blog
2025-10-01· อ. บอม — SiamCafe.net· 1,108 คำ

A/B Testing ????????? Machine Learning ?????????????????????

A/B Testing ????????? ??????????????????????????????????????????????????? (controlled experiment) ????????????????????????????????? 2 ????????????????????????????????? variants ?????????????????????????????? variant ????????????????????????????????????????????????????????? ???????????????????????????????????????????????? web optimization, product development, marketing campaigns

Machine Learning ???????????? A/B Testing ?????????????????????????????? Multi-Armed Bandits (MAB) ?????????????????? traffic ?????????????????????????????????????????? variant ??????????????????????????? ?????? regret, Bayesian optimization ?????? optimal parameters ???????????????????????? grid search, Causal inference ???????????????????????????????????????????????????????????????????????? treatment, Automated sample size calculation ????????? ML predict effect size, Heterogeneous treatment effects ???????????????????????????????????? treatment ????????????????????????????????????????????????????????? segment

Home Lab Setup ????????????????????? ??????????????? A/B testing platform ???????????????????????????????????????????????? ??????????????????????????????????????????, ??????????????? algorithms, ????????? prototype ???????????? deploy production ????????? Docker, Python, Redis, PostgreSQL ???????????????????????????????????? local machine

????????????????????? Home Lab ?????????????????? A/B Testing

Setup infrastructure ?????????????????? A/B testing platform

# === A/B Testing Home Lab Setup ===

# 1. Docker Compose Stack
cat > docker-compose.yml << 'EOF'
version: '3.8'
services:
  # PostgreSQL (experiment data)
  postgres:
    image: postgres:16-alpine
    ports:
      - "5432:5432"
    environment:
      POSTGRES_DB: ab_testing
      POSTGRES_USER: abtest
      POSTGRES_PASSWORD: password123
    volumes:
      - pgdata:/var/lib/postgresql/data
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql

  # Redis (feature flags & assignment cache)
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"

  # Jupyter Lab (analysis)
  jupyter:
    image: jupyter/scipy-notebook:latest
    ports:
      - "8888:8888"
    volumes:
      - ./notebooks:/home/jovyan/work
    environment:
      JUPYTER_TOKEN: "abtest123"

  # GrowthBook (open-source A/B testing platform)
  growthbook:
    image: growthbook/growthbook:latest
    ports:
      - "3000:3000"
      - "3100:3100"
    environment:
      - MONGODB_URI=mongodb://mongo:27017/growthbook
      - APP_ORIGIN=http://localhost:3000
      - API_HOST=http://localhost:3100
    depends_on:
      - mongo

  mongo:
    image: mongo:7
    volumes:
      - mongodata:/data/db

volumes:
  pgdata:
  mongodata:
EOF

# 2. Database Schema
cat > init.sql << 'EOF'
CREATE TABLE experiments (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    description TEXT,
    status VARCHAR(20) DEFAULT 'draft',
    start_date TIMESTAMP,
    end_date TIMESTAMP,
    created_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE variants (
    id SERIAL PRIMARY KEY,
    experiment_id INT REFERENCES experiments(id),
    name VARCHAR(50) NOT NULL,
    weight FLOAT DEFAULT 0.5,
    is_control BOOLEAN DEFAULT FALSE
);

CREATE TABLE assignments (
    id SERIAL PRIMARY KEY,
    experiment_id INT REFERENCES experiments(id),
    variant_id INT REFERENCES variants(id),
    user_id VARCHAR(100) NOT NULL,
    assigned_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE events (
    id SERIAL PRIMARY KEY,
    experiment_id INT REFERENCES experiments(id),
    variant_id INT REFERENCES variants(id),
    user_id VARCHAR(100) NOT NULL,
    event_type VARCHAR(50) NOT NULL,
    value FLOAT,
    created_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_assignments_user ON assignments(user_id, experiment_id);
CREATE INDEX idx_events_experiment ON events(experiment_id, variant_id);
EOF

# 3. Start Stack
docker compose up -d
echo "Home lab ready at:"
echo "  Jupyter: http://localhost:8888 (token: abtest123)"
echo "  GrowthBook: http://localhost:3000"
echo "  PostgreSQL: localhost:5432"

??????????????? ML-Powered A/B Testing System

Python A/B testing engine

#!/usr/bin/env python3
# ab_engine.py ??? ML-Powered A/B Testing Engine
import json
import logging
import hashlib
import math
import random
from typing import Dict, List, Optional

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("abtest")

class ABTestEngine:
    """A/B Testing Engine with ML capabilities"""
    
    def __init__(self):
        self.experiments = {}
        self.assignments = {}
        self.events = {}
    
    def create_experiment(self, name, variants, traffic_pct=100):
        """Create new experiment"""
        self.experiments[name] = {
            "name": name,
            "variants": variants,
            "traffic_pct": traffic_pct,
            "status": "running",
        }
        self.events[name] = {v: {"impressions": 0, "conversions": 0} for v in variants}
        return self.experiments[name]
    
    def assign_variant(self, experiment_name, user_id):
        """Deterministic variant assignment using hash"""
        exp = self.experiments.get(experiment_name)
        if not exp or exp["status"] != "running":
            return None
        
        # Check if already assigned
        key = f"{experiment_name}:{user_id}"
        if key in self.assignments:
            return self.assignments[key]
        
        # Deterministic hash-based assignment
        hash_val = int(hashlib.md5(key.encode()).hexdigest(), 16)
        
        # Traffic allocation
        if hash_val % 100 >= exp["traffic_pct"]:
            return None
        
        # Variant selection based on weights
        variants = exp["variants"]
        total_weight = sum(v.get("weight", 1) for v in variants.values())
        threshold = (hash_val % 1000) / 1000 * total_weight
        
        cumulative = 0
        selected = None
        for name, config in variants.items():
            cumulative += config.get("weight", 1)
            if threshold < cumulative:
                selected = name
                break
        
        self.assignments[key] = selected
        return selected
    
    def record_event(self, experiment_name, user_id, event_type, value=1):
        """Record conversion event"""
        variant = self.assignments.get(f"{experiment_name}:{user_id}")
        if not variant:
            return
        
        events = self.events[experiment_name][variant]
        if event_type == "impression":
            events["impressions"] += 1
        elif event_type == "conversion":
            events["conversions"] += 1
    
    def analyze(self, experiment_name):
        """Statistical analysis of experiment"""
        events = self.events.get(experiment_name, {})
        results = {}
        
        for variant, data in events.items():
            n = data["impressions"]
            x = data["conversions"]
            p = x / n if n > 0 else 0
            se = math.sqrt(p * (1 - p) / n) if n > 0 else 0
            
            results[variant] = {
                "impressions": n,
                "conversions": x,
                "conversion_rate": round(p * 100, 3),
                "std_error": round(se * 100, 3),
                "ci_95": [round((p - 1.96*se) * 100, 3), round((p + 1.96*se) * 100, 3)],
            }
        
        # Statistical significance between first two variants
        variants = list(results.keys())
        if len(variants) >= 2:
            a = results[variants[0]]
            b = results[variants[1]]
            p_a = a["conversions"] / a["impressions"] if a["impressions"] > 0 else 0
            p_b = b["conversions"] / b["impressions"] if b["impressions"] > 0 else 0
            n_a = a["impressions"]
            n_b = b["impressions"]
            
            if n_a > 0 and n_b > 0:
                p_pool = (a["conversions"] + b["conversions"]) / (n_a + n_b)
                se = math.sqrt(p_pool * (1 - p_pool) * (1/n_a + 1/n_b)) if p_pool > 0 and p_pool < 1 else 1
                z = (p_b - p_a) / se if se > 0 else 0
                
                results["significance"] = {
                    "z_score": round(z, 3),
                    "significant": abs(z) > 1.96,
                    "lift": round((p_b - p_a) / p_a * 100, 2) if p_a > 0 else 0,
                    "winner": variants[1] if z > 1.96 else variants[0] if z < -1.96 else "none",
                }
        
        return results

# Demo
engine = ABTestEngine()
engine.create_experiment("checkout_button", {
    "control": {"weight": 1, "description": "Blue button"},
    "treatment": {"weight": 1, "description": "Green button"},
})

# Simulate traffic
random.seed(42)
for i in range(2000):
    user = f"user_{i}"
    variant = engine.assign_variant("checkout_button", user)
    if variant:
        engine.record_event("checkout_button", user, "impression")
        cvr = 0.10 if variant == "control" else 0.13
        if random.random() < cvr:
            engine.record_event("checkout_button", user, "conversion")

results = engine.analyze("checkout_button")
print("Experiment: checkout_button")
for variant in ["control", "treatment"]:
    r = results[variant]
    print(f"  {variant}: CVR={r['conversion_rate']}%, CI={r['ci_95']}, n={r['impressions']}")
sig = results.get("significance", {})
print(f"  Lift: {sig.get('lift', 0)}%, Significant: {sig.get('significant', False)}, Winner: {sig.get('winner', 'N/A')}")

Multi-Armed Bandit Algorithms

ML algorithms ?????????????????? adaptive testing

#!/usr/bin/env python3
# bandits.py ??? Multi-Armed Bandit Algorithms
import json
import logging
import random
import math
from typing import Dict, List

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("bandit")

class EpsilonGreedy:
    """Epsilon-Greedy Bandit"""
    def __init__(self, n_arms, epsilon=0.1):
        self.n_arms = n_arms
        self.epsilon = epsilon
        self.counts = [0] * n_arms
        self.values = [0.0] * n_arms
    
    def select_arm(self):
        if random.random() < self.epsilon:
            return random.randint(0, self.n_arms - 1)
        return self.values.index(max(self.values))
    
    def update(self, arm, reward):
        self.counts[arm] += 1
        n = self.counts[arm]
        self.values[arm] = ((n - 1) * self.values[arm] + reward) / n

class ThompsonSampling:
    """Thompson Sampling (Bayesian Bandit)"""
    def __init__(self, n_arms):
        self.n_arms = n_arms
        self.alpha = [1] * n_arms  # Successes + 1
        self.beta = [1] * n_arms   # Failures + 1
    
    def select_arm(self):
        samples = [random.betavariate(self.alpha[i], self.beta[i]) for i in range(self.n_arms)]
        return samples.index(max(samples))
    
    def update(self, arm, reward):
        if reward > 0:
            self.alpha[arm] += 1
        else:
            self.beta[arm] += 1
    
    def get_probabilities(self):
        """Probability each arm is best"""
        n_sim = 10000
        wins = [0] * self.n_arms
        for _ in range(n_sim):
            samples = [random.betavariate(self.alpha[i], self.beta[i]) for i in range(self.n_arms)]
            wins[samples.index(max(samples))] += 1
        return [round(w / n_sim * 100, 1) for w in wins]

class UCB1:
    """Upper Confidence Bound"""
    def __init__(self, n_arms):
        self.n_arms = n_arms
        self.counts = [0] * n_arms
        self.values = [0.0] * n_arms
        self.total = 0
    
    def select_arm(self):
        for i in range(self.n_arms):
            if self.counts[i] == 0:
                return i
        
        ucb_values = [
            self.values[i] + math.sqrt(2 * math.log(self.total) / self.counts[i])
            for i in range(self.n_arms)
        ]
        return ucb_values.index(max(ucb_values))
    
    def update(self, arm, reward):
        self.counts[arm] += 1
        self.total += 1
        n = self.counts[arm]
        self.values[arm] = ((n - 1) * self.values[arm] + reward) / n

# Compare algorithms
true_rates = [0.10, 0.13, 0.08]  # True conversion rates
n_rounds = 5000

algorithms = {
    "Epsilon-Greedy": EpsilonGreedy(3, epsilon=0.1),
    "Thompson Sampling": ThompsonSampling(3),
    "UCB1": UCB1(3),
}

random.seed(42)
for name, algo in algorithms.items():
    total_reward = 0
    for _ in range(n_rounds):
        arm = algo.select_arm()
        reward = 1 if random.random() < true_rates[arm] else 0
        algo.update(arm, reward)
        total_reward += reward
    
    avg_reward = total_reward / n_rounds
    print(f"{name}: Total reward={total_reward}, Avg={avg_reward:.4f}")
    if hasattr(algo, 'counts'):
        print(f"  Arm pulls: {algo.counts}")
    if hasattr(algo, 'get_probabilities'):
        print(f"  Win probabilities: {algo.get_probabilities()}")

Statistical Analysis Pipeline

Pipeline ???????????????????????????????????????????????????????????????????????????

# === Statistical Analysis Pipeline ===

cat > analysis_pipeline.yaml << 'EOF'
ab_test_analysis:
  pre_experiment:
    sample_size:
      formula: "n = (Z_??/2 + Z_??)?? ?? (p???(1-p???) + p???(1-p???)) / (p??? - p???)??"
      parameters:
        alpha: 0.05
        power: 0.80
        baseline_cvr: 0.10
        minimum_detectable_effect: 0.02
        calculated_n_per_variant: 3842
      
    duration:
      daily_traffic: 1000
      variants: 2
      estimated_days: 8

  during_experiment:
    guardrails:
      - "Do not peek at results before sample size reached"
      - "Monitor for Sample Ratio Mismatch (SRM)"
      - "Check for novelty/primacy effects"
      - "Monitor key guardrail metrics (error rate, latency)"
    
    srm_check:
      description: "Chi-squared test for equal split"
      threshold: "p-value < 0.001 indicates SRM"
      action: "Stop experiment, investigate assignment bug"

  post_experiment:
    primary_analysis:
      - "Z-test for proportions (binary outcomes)"
      - "t-test for continuous outcomes"
      - "Mann-Whitney U for non-normal distributions"
      
    secondary_analysis:
      - "Segment analysis (by device, country, user type)"
      - "Heterogeneous Treatment Effects (CATE)"
      - "Regression adjustment for covariates"
      
    decision_framework:
      significant_positive: "Ship treatment"
      significant_negative: "Keep control"
      not_significant: "Need more data or accept null"
      
    multiple_testing:
      correction: "Bonferroni or Benjamini-Hochberg"
      note: "Adjust alpha when testing multiple metrics"
EOF

python3 -c "
import math

# Sample size calculator
def sample_size(baseline, mde, alpha=0.05, power=0.80):
    z_alpha = 1.96  # two-sided
    z_beta = 0.84   # 80% power
    p1 = baseline
    p2 = baseline + mde
    n = ((z_alpha + z_beta)**2 * (p1*(1-p1) + p2*(1-p2))) / (p2 - p1)**2
    return math.ceil(n)

# Examples
examples = [
    (0.10, 0.02, 'CVR 10% ??? 12% (20% lift)'),
    (0.10, 0.01, 'CVR 10% ??? 11% (10% lift)'),
    (0.05, 0.01, 'CVR 5% ??? 6% (20% lift)'),
    (0.02, 0.005, 'CVR 2% ??? 2.5% (25% lift)'),
]
print('Sample Size Calculator (per variant, 95% confidence, 80% power):')
for baseline, mde, desc in examples:
    n = sample_size(baseline, mde)
    print(f'  {desc}: n = {n:,}')
"

echo "Analysis pipeline configured"

Monitoring ????????? Dashboard

????????????????????????????????????????????????

#!/usr/bin/env python3
# experiment_dashboard.py ??? Experiment Monitoring Dashboard
import json
import logging
from typing import Dict, List

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("dashboard")

class ExperimentDashboard:
    def __init__(self):
        pass
    
    def overview(self):
        return {
            "active_experiments": 5,
            "completed_this_month": 8,
            "win_rate": "37.5% (3/8 had significant positive results)",
            "experiments": [
                {
                    "name": "checkout_green_button",
                    "status": "running",
                    "days_active": 7,
                    "traffic": 12500,
                    "control_cvr": 10.2,
                    "treatment_cvr": 12.8,
                    "lift": "+25.5%",
                    "significant": True,
                    "recommendation": "Ship treatment",
                },
                {
                    "name": "pricing_page_redesign",
                    "status": "running",
                    "days_active": 3,
                    "traffic": 4200,
                    "control_cvr": 5.1,
                    "treatment_cvr": 5.4,
                    "lift": "+5.9%",
                    "significant": False,
                    "recommendation": "Continue collecting data",
                },
                {
                    "name": "onboarding_flow_v2",
                    "status": "completed",
                    "days_active": 14,
                    "traffic": 28000,
                    "control_cvr": 22.3,
                    "treatment_cvr": 25.1,
                    "lift": "+12.6%",
                    "significant": True,
                    "recommendation": "Shipped to 100%",
                },
            ],
            "platform_health": {
                "assignment_accuracy": "99.98%",
                "srm_alerts": 0,
                "avg_analysis_latency": "< 1 min",
            },
        }

dashboard = ExperimentDashboard()
data = dashboard.overview()
print(f"Experiment Dashboard:")
print(f"  Active: {data['active_experiments']}, Completed: {data['completed_this_month']}")
print(f"  Win rate: {data['win_rate']}")

print(f"\nExperiments:")
for exp in data["experiments"]:
    status = "WINNER" if exp["significant"] and exp["status"] == "running" else exp["status"].upper()
    print(f"  [{status}] {exp['name']}: lift={exp['lift']}, sig={exp['significant']}")
    print(f"    ??? {exp['recommendation']}")

FAQ ??????????????????????????????????????????

Q: A/B Testing ????????? Multi-Armed Bandit ??????????????????????????????????

A: A/B Testing (Fixed allocation) ???????????? traffic ????????????????????? 50/50 ???????????????????????????????????? ??????????????? statistically valid ????????? sample size ????????????????????????????????? ?????????????????????????????? ????????????????????? rigorous statistical proof, ?????????????????? exploration cost ?????????, ?????? traffic ??????????????? Multi-Armed Bandit (Adaptive allocation) ?????????????????? traffic ??????????????? variant ?????????????????????????????????????????????????????? ?????? opportunity cost (?????????????????????????????? traffic ?????? variant ???????????????????????????) ?????????????????????????????? ????????????????????? maximize revenue ????????????????????????????????????, traffic ???????????????, ?????????????????? statistical rigor ?????????????????????????????? ??????????????? ????????? A/B Testing ?????????????????? product decisions ???????????????????????? (???????????? rigorous) ????????? Bandit ?????????????????? optimization ??????????????????????????? (ad serving, recommendations, pricing)

Q: ?????????????????? traffic ????????????????????????????????????????????? A/B Testing ??????????

A: ????????????????????? baseline conversion rate ????????? minimum detectable effect (MDE) ???????????????????????? CVR 10% ????????????????????? detect 20% lift (10%???12%) ???????????? ~3,850 users ????????? variant, CVR 10% ????????????????????? detect 10% lift (10%???11%) ???????????? ~14,750 users ????????? variant, CVR 2% ????????????????????? detect 25% lift (2%???2.5%) ???????????? ~28,000 users ????????? variant ????????? website ?????? 500 visitors/day ???????????? 15+ ??????????????????????????? case ????????? ????????? traffic ?????????????????????????????? ????????? Bayesian A/B testing (???????????? sample ????????????????????????) ???????????? Bandit algorithms ??????????????????????????? MDE (detect ??????????????? big changes)

Q: Home Lab ??????????????????????????? ????????? SaaS ????????????????????????????

A: SaaS tools ??????????????????????????? GrowthBook (open source, free self-hosted), LaunchDarkly (enterprise feature flags + experiments), Optimizely (web experimentation leader), VWO (visual A/B testing), Google Optimize (?????????????????????????????? ?????????????????? GA4) Home Lab ???????????????????????? ???????????????????????? statistical concepts hands-on, prototype custom algorithms (Bandit, CATE), ??????????????????????????? commit ????????? SaaS, ???????????????????????? license ????????????????????????????????????????????? ?????????????????? production ??????????????? GrowthBook (free, feature-rich) ???????????? SaaS ?????????????????? comfortable ???????????? Home Lab ???????????? learning tool ????????????????????????

Q: Thompson Sampling ?????????????????? Epsilon-Greedy ??????????????????????

A: Epsilon-Greedy ???????????? explore ???????????? probability ?? (???????????? 10%) ???????????????????????? exploit arm ????????????????????????????????? ????????????????????? implement ????????? explore/exploit ratio ???????????????????????? adaptive Thompson Sampling (Bayesian) ????????????????????? posterior distribution ???????????????????????? arm ???????????????????????????????????????????????????????????? uncertainty ???????????? explore ????????????????????????????????????????????? ???????????????????????????????????????????????? scenario ???????????? regret ?????????????????????????????? converge ???????????????????????? UCB1 ???????????? exploration ????????????????????? logarithm ????????? total pulls ?????????????????? Bayesian Thompson Sampling ???????????????????????????????????????????????????????????? production ??????????????? adaptive, ?????? theoretical guarantees, implement ??????????????????

📖 บทความที่เกี่ยวข้อง

PHP Pest Testing Home Lab Setupอ่านบทความ → Python SQLAlchemy Home Lab Setupอ่านบทความ → Htmx Alpine.js Home Lab Setupอ่านบทความ → Spark Structured Streaming Home Lab Setupอ่านบทความ → Helm Chart Template Home Lab Setupอ่านบทความ →

📚 ดูบทความทั้งหมด →