Technology

ClickHouse Analytics AR VR Development Real-Time Analytics สำหรับ AR/VR

clickhouse analytics ar vr development
ClickHouse Analytics AR VR Development | SiamCafe Blog
2025-09-21· อ. บอม — SiamCafe.net· 1,247 คำ

ClickHouse ?????????????????? AR/VR Analytics ?????????????????????

ClickHouse ???????????? columnar database ??????????????????????????????????????????????????? Online Analytical Processing (OLAP) ???????????????????????? queries ?????? billions of rows ??????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????? analytics workloads ????????????????????? aggregate ??????????????????????????????????????????

AR/VR applications ??????????????? telemetry data ???????????????????????????????????? User interaction events (gaze tracking, hand gestures, controller inputs), Performance metrics (frame rate, latency, render time), Spatial data (position, rotation, scale), Session analytics (duration, engagement, drop-off), Device metrics (battery, thermal, memory) ?????????????????????????????????????????????????????????????????????????????????????????? real-time ??????????????? optimize ????????????????????????????????????????????????????????? troubleshoot ???????????????

???????????????????????? ClickHouse ?????????????????? AR/VR ?????????????????? insert rate ????????? (millions of rows/second), Query speed ??????????????????????????????????????? aggregation, Compression ratio ???????????? storage cost, SQL-based ??????????????????????????????, Materialized Views ?????????????????? pre-aggregation

????????????????????? ClickHouse ?????????????????? Real-Time Analytics

Setup ClickHouse cluster ?????????????????? AR/VR telemetry

# === ClickHouse Setup for AR/VR Analytics ===

# 1. Install ClickHouse
sudo apt-get install -y apt-transport-https ca-certificates
curl -fsSL 'https://packages.clickhouse.com/rpm/lts/repodata/repomd.xml.key' | sudo gpg --dearmor -o /usr/share/keyrings/clickhouse-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb stable main" | sudo tee /etc/apt/sources.list.d/clickhouse.list
sudo apt-get update
sudo apt-get install -y clickhouse-server clickhouse-client
sudo systemctl start clickhouse-server

# 2. Docker Compose
cat > docker-compose.yml << 'EOF'
version: '3.8'
services:
  clickhouse:
    image: clickhouse/clickhouse-server:24.3
    ports:
      - "8123:8123"  # HTTP
      - "9000:9000"  # Native
    volumes:
      - clickhouse-data:/var/lib/clickhouse
      - ./config/users.xml:/etc/clickhouse-server/users.d/users.xml
    environment:
      CLICKHOUSE_DB: arvr_analytics
      CLICKHOUSE_USER: analytics
      CLICKHOUSE_PASSWORD: secure_password
    ulimits:
      nofile:
        soft: 262144
        hard: 262144

  grafana:
    image: grafana/grafana:10.4.0
    ports:
      - "3000:3000"
    environment:
      GF_INSTALL_PLUGINS: "grafana-clickhouse-datasource"
    volumes:
      - grafana-data:/var/lib/grafana

volumes:
  clickhouse-data:
  grafana-data:
EOF

# 3. Create Tables
cat > schema.sql << 'EOF'
CREATE DATABASE IF NOT EXISTS arvr_analytics;

-- User interaction events
CREATE TABLE arvr_analytics.events (
    event_id UUID DEFAULT generateUUIDv4(),
    session_id String,
    user_id String,
    device_type LowCardinality(String),
    event_type LowCardinality(String),
    event_name String,
    position_x Float32,
    position_y Float32,
    position_z Float32,
    rotation_x Float32,
    rotation_y Float32,
    rotation_z Float32,
    duration_ms UInt32,
    metadata String,
    created_at DateTime DEFAULT now()
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(created_at)
ORDER BY (user_id, session_id, created_at)
TTL created_at + INTERVAL 90 DAY;

-- Performance metrics
CREATE TABLE arvr_analytics.performance (
    session_id String,
    device_type LowCardinality(String),
    fps Float32,
    frame_time_ms Float32,
    gpu_usage_pct Float32,
    cpu_usage_pct Float32,
    memory_mb UInt32,
    battery_pct UInt8,
    thermal_state LowCardinality(String),
    network_latency_ms UInt16,
    created_at DateTime DEFAULT now()
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(created_at)
ORDER BY (session_id, created_at)
TTL created_at + INTERVAL 30 DAY;

-- Materialized View: Hourly aggregates
CREATE MATERIALIZED VIEW arvr_analytics.events_hourly_mv
ENGINE = SummingMergeTree()
ORDER BY (event_type, device_type, hour)
AS SELECT
    event_type,
    device_type,
    toStartOfHour(created_at) AS hour,
    count() AS event_count,
    uniq(user_id) AS unique_users,
    uniq(session_id) AS unique_sessions,
    avg(duration_ms) AS avg_duration_ms
FROM arvr_analytics.events
GROUP BY event_type, device_type, hour;
EOF

clickhouse-client --multiquery < schema.sql
echo "ClickHouse schema created"

??????????????? AR/VR Analytics Pipeline

Python pipeline ?????????????????? ingest ????????? analyze AR/VR data

#!/usr/bin/env python3
# arvr_pipeline.py ??? AR/VR Analytics Pipeline for ClickHouse
import json
import logging
import random
import time
from typing import Dict, List
from datetime import datetime, timedelta

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("arvr")

class ARVRAnalyticsPipeline:
    """Analytics pipeline for AR/VR telemetry data"""
    
    def __init__(self):
        self.buffer = []
        self.batch_size = 1000
        self.stats = {"ingested": 0, "flushed": 0}
    
    def ingest_event(self, event):
        """Buffer event for batch insert"""
        self.buffer.append(event)
        self.stats["ingested"] += 1
        
        if len(self.buffer) >= self.batch_size:
            self.flush()
    
    def flush(self):
        """Flush buffer to ClickHouse"""
        if not self.buffer:
            return
        
        # In production: clickhouse_connect client.insert()
        count = len(self.buffer)
        self.stats["flushed"] += count
        logger.info(f"Flushed {count} events to ClickHouse")
        self.buffer = []
    
    def generate_sample_events(self, num_sessions=100):
        """Generate sample AR/VR telemetry"""
        event_types = ["gaze_hit", "hand_grab", "controller_click", "teleport", "menu_open", "object_interact"]
        devices = ["Meta Quest 3", "Apple Vision Pro", "PSVR 2", "HTC Vive Pro"]
        
        events = []
        for s in range(num_sessions):
            session_id = f"sess-{s:05d}"
            user_id = f"user-{s % 50:04d}"
            device = random.choice(devices)
            duration = random.randint(300, 3600)  # 5-60 min session
            
            num_events = duration // 2  # ~1 event per 2 seconds
            for e in range(num_events):
                events.append({
                    "session_id": session_id,
                    "user_id": user_id,
                    "device_type": device,
                    "event_type": random.choice(event_types),
                    "position_x": random.uniform(-10, 10),
                    "position_y": random.uniform(0, 3),
                    "position_z": random.uniform(-10, 10),
                    "duration_ms": random.randint(50, 2000),
                })
        
        return events
    
    def analyze_sessions(self, events):
        """Analyze session metrics"""
        sessions = {}
        for e in events:
            sid = e["session_id"]
            if sid not in sessions:
                sessions[sid] = {"events": 0, "devices": set(), "users": set(), "total_duration_ms": 0}
            sessions[sid]["events"] += 1
            sessions[sid]["devices"].add(e["device_type"])
            sessions[sid]["users"].add(e["user_id"])
            sessions[sid]["total_duration_ms"] += e["duration_ms"]
        
        total_sessions = len(sessions)
        avg_events = sum(s["events"] for s in sessions.values()) / total_sessions
        avg_duration = sum(s["total_duration_ms"] for s in sessions.values()) / total_sessions / 1000
        
        # Device breakdown
        device_counts = {}
        for s in sessions.values():
            for d in s["devices"]:
                device_counts[d] = device_counts.get(d, 0) + 1
        
        return {
            "total_sessions": total_sessions,
            "total_events": sum(s["events"] for s in sessions.values()),
            "avg_events_per_session": round(avg_events, 1),
            "avg_session_duration_sec": round(avg_duration, 1),
            "unique_users": len(set(e["user_id"] for e in events)),
            "device_breakdown": device_counts,
        }

# Demo
pipeline = ARVRAnalyticsPipeline()
events = pipeline.generate_sample_events(num_sessions=100)

# Ingest
for event in events:
    pipeline.ingest_event(event)
pipeline.flush()

# Analyze
analysis = pipeline.analyze_sessions(events)
print(f"AR/VR Analytics Summary:")
print(f"  Sessions: {analysis['total_sessions']}")
print(f"  Events: {analysis['total_events']:,}")
print(f"  Avg events/session: {analysis['avg_events_per_session']}")
print(f"  Avg duration: {analysis['avg_session_duration_sec']}s")
print(f"  Unique users: {analysis['unique_users']}")
print(f"\nDevice Breakdown:")
for device, count in sorted(analysis["device_breakdown"].items(), key=lambda x: x[1], reverse=True):
    print(f"  {device}: {count} sessions")

Query Optimization ?????????????????? Event Data

ClickHouse queries ?????????????????? AR/VR analytics

# === ClickHouse Query Optimization ===

cat > queries.sql << 'EOF'
-- 1. Real-time session metrics (last 1 hour)
SELECT
    device_type,
    count(DISTINCT session_id) AS sessions,
    count(DISTINCT user_id) AS users,
    count() AS events,
    avg(duration_ms) AS avg_duration_ms,
    quantile(0.95)(duration_ms) AS p95_duration_ms
FROM arvr_analytics.events
WHERE created_at >= now() - INTERVAL 1 HOUR
GROUP BY device_type
ORDER BY sessions DESC;

-- 2. Heatmap data (spatial analysis)
SELECT
    round(position_x, 1) AS x,
    round(position_z, 1) AS z,
    count() AS density,
    avg(duration_ms) AS avg_stay_ms
FROM arvr_analytics.events
WHERE event_type = 'gaze_hit'
  AND created_at >= today()
GROUP BY x, z
HAVING density > 10
ORDER BY density DESC
LIMIT 100;

-- 3. Performance degradation detection
SELECT
    session_id,
    device_type,
    avg(fps) AS avg_fps,
    min(fps) AS min_fps,
    avg(frame_time_ms) AS avg_frame_time,
    max(gpu_usage_pct) AS max_gpu,
    countIf(fps < 30) AS low_fps_count,
    count() AS total_samples,
    round(countIf(fps < 30) / count() * 100, 2) AS low_fps_pct
FROM arvr_analytics.performance
WHERE created_at >= today()
GROUP BY session_id, device_type
HAVING low_fps_pct > 10
ORDER BY low_fps_pct DESC
LIMIT 50;

-- 4. User engagement funnel
SELECT
    event_type,
    uniq(user_id) AS users,
    count() AS events,
    round(uniq(user_id) / (SELECT uniq(user_id) FROM arvr_analytics.events WHERE created_at >= today()) * 100, 2) AS pct_of_total
FROM arvr_analytics.events
WHERE created_at >= today()
GROUP BY event_type
ORDER BY users DESC;

-- 5. Retention analysis (daily active users)
SELECT
    toDate(created_at) AS day,
    uniq(user_id) AS dau,
    uniq(session_id) AS sessions,
    count() AS events,
    avg(duration_ms) AS avg_event_duration
FROM arvr_analytics.events
WHERE created_at >= today() - INTERVAL 30 DAY
GROUP BY day
ORDER BY day;
EOF

echo "Optimized queries ready"

Dashboard ????????? Visualization

??????????????? analytics dashboard

#!/usr/bin/env python3
# dashboard.py ??? AR/VR Analytics Dashboard
import json
import logging
from typing import Dict, List

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("dashboard")

class ARVRDashboard:
    def __init__(self):
        pass
    
    def overview(self):
        return {
            "realtime_24h": {
                "active_users": 1250,
                "active_sessions": 890,
                "events_per_second": 2500,
                "avg_session_duration_min": 18.5,
                "avg_fps": 72.3,
            },
            "device_performance": [
                {"device": "Meta Quest 3", "users": 520, "avg_fps": 72, "crash_rate": "0.5%", "satisfaction": 4.2},
                {"device": "Apple Vision Pro", "users": 280, "avg_fps": 90, "crash_rate": "0.2%", "satisfaction": 4.6},
                {"device": "PSVR 2", "users": 310, "avg_fps": 60, "crash_rate": "0.8%", "satisfaction": 3.9},
                {"device": "HTC Vive Pro", "users": 140, "avg_fps": 85, "crash_rate": "0.3%", "satisfaction": 4.1},
            ],
            "top_interactions": [
                {"event": "gaze_hit", "count": 125000, "avg_duration_ms": 350},
                {"event": "hand_grab", "count": 45000, "avg_duration_ms": 820},
                {"event": "controller_click", "count": 38000, "avg_duration_ms": 120},
                {"event": "teleport", "count": 22000, "avg_duration_ms": 450},
                {"event": "object_interact", "count": 18000, "avg_duration_ms": 1200},
            ],
            "performance_alerts": [
                {"alert": "FPS drops below 30 on Quest 3", "sessions": 12, "severity": "HIGH"},
                {"alert": "High thermal throttling on Vision Pro", "sessions": 5, "severity": "MEDIUM"},
                {"alert": "Memory pressure on PSVR 2", "sessions": 8, "severity": "MEDIUM"},
            ],
        }

dash = ARVRDashboard()
data = dash.overview()
rt = data["realtime_24h"]
print(f"AR/VR Analytics Dashboard (24h):")
print(f"  Active Users: {rt['active_users']:,}")
print(f"  Sessions: {rt['active_sessions']:,}")
print(f"  Events/sec: {rt['events_per_second']:,}")
print(f"  Avg Session: {rt['avg_session_duration_min']} min")

print(f"\nDevice Performance:")
for d in data["device_performance"]:
    print(f"  {d['device']}: {d['users']} users, {d['avg_fps']} FPS, crash={d['crash_rate']}")

print(f"\nAlerts:")
for a in data["performance_alerts"]:
    print(f"  [{a['severity']}] {a['alert']} ({a['sessions']} sessions)")

Scaling ????????? Performance Tuning

???????????????????????? ClickHouse ?????????????????? high-volume AR/VR data

# === ClickHouse Performance Tuning ===

cat > clickhouse_tuning.yaml << 'EOF'
clickhouse_optimization:
  table_design:
    engine: "MergeTree (default, best for most cases)"
    partition_by: "toYYYYMM(created_at) ??? monthly partitions"
    order_by: "Columns used in WHERE/GROUP BY most often"
    ttl: "Auto-delete old data (90 days for events, 30 days for metrics)"
    codec: "LZ4 (fast) or ZSTD (better compression)"
    
  insert_optimization:
    batch_size: "10,000-100,000 rows per insert"
    async_insert: true
    buffer_table: "Use Buffer engine for micro-batching"
    kafka_engine: "Direct ingestion from Kafka"
    
  query_optimization:
    materialized_views: "Pre-aggregate common queries"
    projection: "Alternative sort orders for different query patterns"
    sampling: "Use SAMPLE for approximate queries on large datasets"
    prewhere: "Use PREWHERE for filtering before reading columns"
    
  cluster_scaling:
    replication:
      factor: 2
      engine: "ReplicatedMergeTree"
      zookeeper: "ClickHouse Keeper (built-in)"
    sharding:
      strategy: "By user_id hash for even distribution"
      resharding: "Online resharding (ClickHouse 24.1+)"
    
  hardware:
    cpu: "Prefer high clock speed (ClickHouse is single-thread per query)"
    memory: "64-256 GB (for large aggregations)"
    storage: "NVMe SSD (IOPS matters more than throughput)"
    network: "10 Gbps+ for cluster communication"
EOF

python3 -c "
import yaml
with open('clickhouse_tuning.yaml') as f:
    data = yaml.safe_load(f)
opt = data['clickhouse_optimization']
print('ClickHouse Optimization Guide:')
print(f'  Engine: {opt[\"table_design\"][\"engine\"]}')
print(f'  Batch size: {opt[\"insert_optimization\"][\"batch_size\"]}')
print(f'  Async insert: {opt[\"insert_optimization\"][\"async_insert\"]}')
print(f'  Replication: {opt[\"cluster_scaling\"][\"replication\"][\"factor\"]}x')
print(f'  CPU: {opt[\"hardware\"][\"cpu\"]}')
"

echo "Performance tuning guide ready"

FAQ ??????????????????????????????????????????

Q: ClickHouse ????????? BigQuery ????????? Snowflake ??????????????????????????????????????????????????? AR/VR analytics?

A: ClickHouse ???????????????????????????????????????????????? real-time queries self-host ????????? (?????????) ????????????????????? ClickHouse Cloud ?????????????????????????????????????????? high-volume insert + real-time dashboards ??????????????? AR/VR telemetry ????????????????????? real-time BigQuery serverless ????????????????????? manage infrastructure ??????????????????????????? high-volume inserts (streaming) ???????????????????????? ad-hoc analysis scale ????????????????????????????????? ??????????????? teams ?????????????????? GCP ???????????????????????? Snowflake ???????????????????????? data sharing, multi-cluster warehousing ??????????????????????????? ???????????????????????? complex analytics ???????????????????????? real-time (near-real-time ?????????) ?????????????????? AR/VR analytics ????????????????????? real-time + high insert rate ??????????????? ClickHouse ?????????????????????????????? serverless + ad-hoc ??????????????? BigQuery

Q: AR/VR telemetry data ?????????????????????????????????????????????????

A: ????????????????????? sampling rate ???????????????????????? users ???????????????????????? Gaze tracking 30-90 Hz = 30-90 events/second/user, Hand tracking 60 Hz = 60 events/second/user, Performance metrics 1 Hz = 1 event/second/user, Interaction events 0.5-2 Hz = variable ?????????????????? 1000 concurrent users ????????? gaze tracking 30 Hz = 30,000 events/second = 108M events/hour = 2.6B events/day ClickHouse ??????????????????????????????????????? (insert rate 1M+ rows/second per node) ???????????????????????????????????? Client-side aggregation (aggregate ?????????????????????), Sampling (?????????????????? 5th frame ?????????????????? frame), Edge processing (process ?????? device ???????????????????????? summary)

Q: Materialized Views ?????? ClickHouse ??????????????????????????????????

A: Materialized Views (MV) ????????? pre-computed aggregations ????????? update ?????????????????????????????????????????? insert data ???????????? ???????????????????????? ???????????????????????? query COUNT/AVG ?????? billions of rows ???????????????????????? MV aggregate ????????????????????????????????? ???????????? hourly_events_mv aggregate events per hour per device ??????????????? dashboard query ????????? MV ????????????????????????????????????????????? rows ????????? billions ???????????????????????? Query ???????????????????????? 100-1000x, ?????? CPU usage, Dashboard load ????????????????????????????????? ????????????????????? ????????? storage ??????????????? (????????????????????????), ???????????? design MV ?????????????????? query patterns, ????????? support ????????? aggregation (approximate functions ??????????????????????????????) ??????????????? ??????????????? MV ??????????????????????????? dashboard query ??????????????????????????????

Q: ClickHouse Cloud ????????? Self-Hosted ??????????????????????????????????

A: Self-Hosted ????????? (open source) control ????????????????????? ???????????? manage servers, upgrades, backups, monitoring ??????????????? teams ??????????????? DevOps/SRE ???????????? infrastructure ClickHouse Cloud managed service auto-scaling, auto-backup, monitoring built-in ????????????????????? usage (compute + storage) ????????????????????? manage infrastructure ??????????????? teams ???????????????????????? resources ???????????? database ????????????????????????????????????????????? Self-hosted (3 nodes c5.4xlarge) ?????????????????? $1,500/month, ClickHouse Cloud ????????????????????????????????????????????????????????????????????????????????? utilization ?????????????????? ?????????????????? startup/???????????????????????? ??????????????? ClickHouse Cloud ??????????????? scale ???????????????????????????????????? team ????????????????????????????????? self-host

📖 บทความที่เกี่ยวข้อง

ClickHouse Analytics Batch Processing Pipelineอ่านบทความ → ClickHouse Analytics Certification Pathอ่านบทความ → ClickHouse Analytics Community Buildingอ่านบทความ → ClickHouse Analytics Pub Sub Architectureอ่านบทความ → ClickHouse Analytics Post-mortem Analysisอ่านบทความ →

📚 ดูบทความทั้งหมด →