ClickHouse Analytics Performance Tuning

ClickHouse Performance

ClickHouse Column-oriented OLAP Analytics Performance Tuning MergeTree Partitioning Materialized Views Compression Vectorized SQL Billion Rows Milliseconds

Database	Type	Analytics Speed	Scale	เหมาะกับ
ClickHouse	Column OLAP	เร็วสุด	Petabyte	Analytics
PostgreSQL	Row OLTP	ช้า (Analytics)	Terabyte	Transactional
BigQuery	Cloud OLAP	เร็ว	Petabyte	GCP Analytics
Apache Druid	Column OLAP	เร็ว	Petabyte	Real-time
DuckDB	Embedded OLAP	เร็ว	100GB	Local Analytics

Table Design

=== ClickHouse Table Design ===

MergeTree — Primary Table Engine

CREATE TABLE events (

event_date Date,

event_time DateTime,

user_id UInt64,

event_type LowCardinality(String),

page_url String,

country LowCardinality(String),

device LowCardinality(String),

duration_ms UInt32,

revenue Decimal(10, 2)

)

ENGINE = MergeTree()

PARTITION BY toYYYYMM(event_date)

ORDER BY (event_type, user_id, event_time)

TTL event_date + INTERVAL 90 DAY

SETTINGS index_granularity = 8192;

Key Design Decisions:

1. PARTITION BY toYYYYMM — แบ่งตามเดือน ลบข้อมูลเก่าง่าย

2. ORDER BY — ตรงกับ Query Pattern ที่ใช้บ่อย

3. LowCardinality — สำหรับ Column ที่มีค่าซ้ำน้อย

4. TTL — ลบข้อมูลเก่าอัตโนมัติ

5. index_granularity — Default 8192 เหมาะกับส่วนใหญ่

Materialized View — Pre-aggregate

เนื้อหาเกี่ยวข้อง — บทความที่เกี่ยวข้อง: Elasticsearch OpenSearch Code Review Best

CREATE MATERIALIZED VIEW events_daily_mv

ENGINE = SummingMergeTree()

PARTITION BY toYYYYMM(event_date)

ORDER BY (event_date, event_type, country)

AS SELECT

event_date,

event_type,

แนะนำเพิ่มเติม — XM Signal

country,

count() AS event_count,

uniqExact(user_id) AS unique_users,

sum(revenue) AS total_revenue,

avg(duration_ms) AS avg_duration

FROM events

GROUP BY event_date, event_type, country;

Projection — Alternative Query Pattern

ALTER TABLE events ADD PROJECTION events_by_country (

SELECT

country,

event_type,

count() AS cnt,

sum(revenue) AS rev

GROUP BY country, event_type

);

ALTER TABLE events MATERIALIZE PROJECTION events_by_country;

เนื้อหาเกี่ยวข้อง — อ่านต่อ: Satoshi — คู่มือฉบับสมบูรณ์ 2026

from dataclasses import dataclass

@dataclass

class TableConfig:

table: str

engine: str

rows: str

compressed_gb: float

raw_gb: float

ratio: float

partition: str

tables = [

TableConfig("events", "MergeTree", "5.2B", 85, 520, 6.1, "Monthly"),

TableConfig("events_daily_mv", "SummingMergeTree", "120K", 0.5, 2, 4.0, "Monthly"),

TableConfig("user_sessions", "MergeTree", "800M", 25, 180, 7.2, "Monthly"),

แนะนำเพิ่มเติม — อีบุ๊กการลงทุน SiamCafeBook

TableConfig("error_logs", "MergeTree", "2.1B", 45, 350, 7.8, "Weekly"),

TableConfig("metrics_1m", "MergeTree", "10B", 120, 800, 6.7, "Daily"),

]

print("=== ClickHouse Tables ===")

for t in tables:

print(f" [{t.table}] Engine: {t.engine}")

print(f" Rows: {t.rows} | Compressed: {t.compressed_gb}GB | Raw: {t.raw_gb}GB")

print(f" Compression: {t.ratio}x | Partition: {t.partition}")

Query Optimization

=== Query Performance Tuning ===

Bad Query — Full Scan

SELECT count(DISTINCT user_id)

FROM events

เนื้อหาเกี่ยวข้อง — บทความที่เกี่ยวข้อง: Directus CMS Load Testing Strategy —

WHERE event_date >= '2025-01-01'

-- Scans all columns, slow

Good Query — Optimized

SELECT uniqExact(user_id)

FROM events

WHERE event_date >= '2025-01-01'

AND event_type = 'page_view'

-- Uses ORDER BY index, partition pruning

Use EXPLAIN to analyze

EXPLAIN pipeline

SELECT event_type, count()

FROM events

WHERE event_date = '2025-01-15'

GROUP BY event_type;

Performance Settings

SET max_threads = 16;

SET max_memory_usage = 10000000000; -- 10GB

SET max_execution_time = 30;

SET optimize_read_in_order = 1;

SET allow_experimental_projection_optimization = 1;

Common Optimization Patterns:

1. ใช้ Partition Pruning — WHERE event_date ตรง Partition

2. ORDER BY ตรง Query — ClickHouse ใช้ Primary Index

3. PREWHERE แทน WHERE — อ่าน Column น้อยลง

4. Sampling — ใช้ SAMPLE สำหรับ Approximate Query

เนื้อหาเกี่ยวข้อง — ดูเพิ่มเติมเรื่อง smart contract ข้อดีข้อเสีย

5. Materialized View — Pre-compute aggregates

@dataclass

class QueryBenchmark:

query: str

rows_scanned: str

time_ms: int

optimized: bool

technique: str

benchmarks = [

QueryBenchmark("Count events by type (1 day)", "15M", 45, True, "Partition pruning"),

QueryBenchmark("Unique users (1 month)", "450M", 850, True, "uniqExact + index"),

QueryBenchmark("Revenue by country (MV)", "30K", 5, True, "Materialized View"),

QueryBenchmark("Top pages (1 week)", "105M", 320, True, "ORDER BY + LIMIT"),

QueryBenchmark("Funnel analysis", "50M", 180, True, "windowFunnel()"),

QueryBenchmark("Full table scan (bad)", "5.2B", 45000, False, "No optimization"),

QueryBenchmark("Same with index (good)", "15M", 45, True, "Primary key match"),

]

print("\n=== Query Benchmarks ===")

for b in benchmarks:

status = "FAST" if b.time_ms < 1000 else "SLOW"

opt = "Optimized" if b.optimized else "Not optimized"

print(f" [{status}] {b.query}")

print(f" Rows: {b.rows_scanned} | Time: {b.time_ms}ms | {opt}")

print(f" Technique: {b.technique}")

Production Setup

# === Production Configuration ===



# Docker Compose

# services:

#   clickhouse:

#     image: clickhouse/clickhouse-server:latest

#     ports:

#       - "8123:8123"  # HTTP

#       - "9000:9000"  # Native

#     volumes:

#       - ./data:/var/lib/clickhouse

#       - ./config:/etc/clickhouse-server/config.d

#     ulimits:

#       nofile:

#         soft: 262144

#         hard: 262144



# config.xml — Performance Settings

# 

#   100

#   0.8

#   

#     161061273600

#   

# 



# Monitoring — system tables

# SELECT * FROM system.query_log ORDER BY event_time DESC LIMIT 10;

# SELECT * FROM system.parts WHERE table = 'events';

# SELECT * FROM system.metrics;

# SELECT * FROM system.asynchronous_metrics;



production_config = {

    "Cluster": "3 nodes (ReplicatedMergeTree)",

    "CPU": "32 cores per node",

    "RAM": "128 GB per node",

    "Storage": "NVMe SSD 2TB per node",

    "Total Data": "520 GB compressed (3.2 TB raw)",

    "Total Rows": "18 Billion",

    "Daily Ingestion": "500M rows/day",

    "Peak QPS": "200 queries/sec",

    "p99 Latency": "850ms",

    "Uptime": "99.99%",

}



print("Production Cluster:")

for k, v in production_config.items():

    print(f"  {k}: {v}")



tuning_checklist = [

    "ORDER BY ตรงกับ Query Pattern ที่ใช้บ่อยที่สุด",

    "PARTITION BY ตามเวลา ลบข้อมูลเก่าง่าย",

    "LowCardinality สำหรับ String ที่มี Distinct < 10K",

    "Materialized View สำหรับ Dashboard Query",

    "TTL ลบข้อมูลเก่าอัตโนมัติ ประหยัด Storage",

    "CODEC เลือก Compression ตาม Data Type",

    "max_threads ตาม CPU Cores ที่มี",

    "Monitor system.query_log หา Slow Query",

]



print(f"\n\nTuning Checklist:")

for i, c in enumerate(tuning_checklist, 1):

    print(f"  {i}. {c}")

เคล็ดลับ

ORDER BY: สำคัญที่สุด ต้องตรงกับ Query Pattern
MV: Materialized View สำหรับ Dashboard Query
LowCardinality: ใช้ทุก String Column ที่มีค่าซ้ำ
Partition: แบ่งตามเดือน ไม่ละเอียดเกินไป
Monitor: ดู system.query_log หา Slow Query ทุกวัน

ClickHouse คืออะไร

Open Source Column OLAP Database เร็วสุด Yandex Analytics Billion Rows Milliseconds SQL Real-time Ingestion Compression Log Event BI Dashboard

ClickHouse Analytics Performance Tuning

ClickHouse Performance

Table Design

Key Design Decisions:

class TableConfig:

for t in tables:

Query Optimization

Common Optimization Patterns:

class QueryBenchmark:

for b in benchmarks:

Production Setup

เคล็ดลับ

ClickHouse คืออะไร

บทความที่เกี่ยวข้อง

แนะนำจากเครือข่าย SiamCafe

บทความที่เกี่ยวข้อง