Ceph Storage Cluster
Ceph Software-defined Storage RBD CephFS RGW S3 Distributed CRUSH Self-healing Kubernetes Rook Scale-out Enterprise
| Storage Type | Protocol | Use Case | Kubernetes |
|---|---|---|---|
| RBD (Block) | RADOS Block Device | VM Disk, Container Volume | StorageClass RBD |
| CephFS (File) | POSIX Filesystem | Shared Storage, NFS | StorageClass CephFS |
| RGW (Object) | S3 / Swift API | Backup, Media, Data Lake | ObjectBucketClaim |
Architecture & Deployment
# === Ceph Cluster Architecture ===
# Rook Operator Deployment (Kubernetes)
# kubectl create -f https://raw.githubusercontent.com/rook/rook/release-1.13/deploy/examples/crds.yaml
# kubectl create -f https://raw.githubusercontent.com/rook/rook/release-1.13/deploy/examples/common.yaml
# kubectl create -f https://raw.githubusercontent.com/rook/rook/release-1.13/deploy/examples/operator.yaml
# kubectl create -f cluster.yaml
# cephadm Bootstrap (Bare Metal)
# curl --silent --remote-name --location https://github.com/ceph/ceph/raw/quincy/src/cephadm/cephadm
# chmod +x cephadm
# ./cephadm bootstrap --mon-ip 192.168.1.10
# ceph orch host add node2 192.168.1.11
# ceph orch host add node3 192.168.1.12
# ceph orch apply osd --all-available-devices
from dataclasses import dataclass
@dataclass
class CephComponent:
component: str
role: str
min_count: str
resource: str
tip: str
components = [
CephComponent("MON (Monitor)",
"จัดการ Cluster Map ลงคะแนน Leader",
"3 (ต้องเป็นเลขคี่)",
"CPU 1 core, RAM 2GB, SSD 50GB",
"กระจายไปต่าง Node เสมอ"),
CephComponent("OSD (Object Storage Daemon)",
"เก็บข้อมูลจริงบน Disk",
"3+ (ยิ่งมากยิ่ง Capacity สูง)",
"CPU 1 core/OSD, RAM 4GB/OSD, SSD/HDD",
"ใช้ SSD สำหรับ WAL+DB แยกจาก HDD Data"),
CephComponent("MGR (Manager)",
"จัดการ Dashboard Prometheus Module",
"2 (Active + Standby)",
"CPU 1 core, RAM 1GB",
"เปิด Dashboard Module สำหรับ Web UI"),
CephComponent("MDS (Metadata Server)",
"จัดการ Metadata สำหรับ CephFS",
"2+ (ถ้าใช้ CephFS)",
"CPU 2 core, RAM 4GB",
"ต้องมีถ้าใช้ CephFS ไม่ต้องถ้าใช้แค่ RBD"),
CephComponent("RGW (RADOS Gateway)",
"S3/Swift API Gateway",
"2+ (ถ้าใช้ Object Storage)",
"CPU 2 core, RAM 4GB",
"ใช้ Load Balancer หน้า RGW สำหรับ HA"),
]
print("=== Ceph Components ===")
for c in components:
print(f"\n [{c.component}] {c.role}")
print(f" Min: {c.min_count}")
print(f" Resource: {c.resource}")
print(f" Tip: {c.tip}")
Developer Workflow
# === Developer Workflow with Ceph ===
@dataclass
class DevWorkflow:
task: str
tool: str
command: str
verify: str
workflows = [
DevWorkflow("สร้าง Block Volume (RBD)",
"kubectl + StorageClass",
"kubectl apply -f pvc-rbd.yaml\n# apiVersion: v1\n# kind: PersistentVolumeClaim\n# spec:\n# storageClassName: rook-ceph-block\n# resources:\n# requests:\n# storage: 10Gi",
"kubectl get pvc (Status: Bound)"),
DevWorkflow("สร้าง Shared Filesystem (CephFS)",
"kubectl + StorageClass",
"kubectl apply -f pvc-cephfs.yaml\n# storageClassName: rook-cephfs\n# accessModes: [ReadWriteMany]",
"kubectl get pvc (Bound + RWX)"),
DevWorkflow("ใช้ S3 API (RGW)",
"boto3 / aws-cli",
"# aws --endpoint-url http://rook-ceph-rgw:80 s3 ls\n# aws --endpoint-url http://rook-ceph-rgw:80 s3 cp file.txt s3://mybucket/",
"aws s3 ls s3://mybucket/"),
DevWorkflow("Monitor Cluster",
"ceph CLI / Dashboard",
"ceph status\nceph osd tree\nceph df\nceph health detail",
"HEALTH_OK = ปกติ"),
DevWorkflow("Benchmark Performance",
"rados bench",
"rados bench -p testpool 30 write --no-cleanup\nrados bench -p testpool 30 seq",
"ดู Bandwidth IOPS Latency"),
]
print("=== Developer Workflows ===")
for w in workflows:
print(f"\n [{w.task}] Tool: {w.tool}")
print(f" Command: {w.command}")
print(f" Verify: {w.verify}")
Troubleshooting
# === Common Issues & Fixes ===
@dataclass
class CephIssue:
issue: str
symptom: str
diagnose: str
fix: str
issues = [
CephIssue("OSD Down",
"HEALTH_WARN: 1 osds down",
"ceph osd tree (ดู OSD ไหน down)\nsystemctl status ceph-osd@ID",
"systemctl restart ceph-osd@ID\nดู Log: journalctl -u ceph-osd@ID"),
CephIssue("Cluster Near Full",
"HEALTH_ERR: OSD near full (> 85%)",
"ceph df (ดู Usage)\nceph osd df tree",
"ลบข้อมูลไม่จำเป็น หรือ เพิ่ม OSD\nceph osd set-nearfull-ratio 0.90"),
CephIssue("PG Not Active+Clean",
"HEALTH_WARN: Degraded PGs",
"ceph pg stat\nceph pg dump_stuck",
"รอ Recovery อัตโนมัติ\nถ้านาน: ceph pg repair PG_ID"),
CephIssue("Slow OSD / High Latency",
"Slow requests, Client timeout",
"ceph osd perf\nsmartctl -a /dev/sdX",
"เปลี่ยน Disk ที่ช้า\nแยก WAL+DB ไป SSD\nตรวจ Network"),
CephIssue("Clock Skew",
"HEALTH_WARN: clock skew detected",
"ceph time-sync-status",
"ตั้ง NTP ทุก Node: timedatectl set-ntp true"),
CephIssue("Mon Quorum Lost",
"HEALTH_ERR: mon quorum lost",
"ceph mon stat\nceph mon dump",
"Restart Mon ที่ down\nถ้า Majority down: Recovery จาก Backup"),
]
print("=== Troubleshooting ===")
for i in issues:
print(f"\n [{i.issue}] {i.symptom}")
print(f" Diagnose: {i.diagnose}")
print(f" Fix: {i.fix}")
เคล็ดลับ
- Rook: ใช้ Rook Operator บน Kubernetes ง่ายที่สุด
- SSD: ใช้ SSD สำหรับ WAL+DB แยกจาก HDD Data เร็วขึ้นมาก
- Network: แยก Public Network และ Cluster Network
- 3 Node: ขั้นต่ำ 3 Node สำหรับ HA อย่าใช้ 1-2 Node ใน Production
- Monitor: เปิด Prometheus + Grafana Dashboard ติดตามตลอด
Ceph คืออะไร
Open Source Software-defined Storage RBD CephFS RGW S3 Distributed CRUSH Self-healing Kubernetes Rook Scale-out CERN Red Hat
Developer Experience เป็นอย่างไร
Rook Operator Kubernetes cephadm Bare Metal StorageClass PVC S3 boto3 Dashboard Prometheus CLI ceph status Community ดีขึ้นมาก
ติดตั้งอย่างไร
Rook kubectl apply CephCluster CR cephadm bootstrap OSD Mon 3 Node SSD WAL DB Network แยก Public Cluster RAM 4GB/OSD
Troubleshoot อย่างไร
ceph status HEALTH_OK osd tree PG active+clean ceph df ceph health detail OSD Down Near Full Slow Clock Skew Mon Quorum NTP
สรุป
Ceph Storage RBD CephFS RGW S3 Rook Kubernetes cephadm CRUSH Self-healing Monitor OSD PG Troubleshoot Scale-out Enterprise
