Elasticsearch ELK Stack — ระบบ Log Management

ทำไม Log Management ถึงสำคัญ

ผมเคยดูแล server 200+ เครื่องสมัยที่ยังไม่มีระบบ centralized logging ทุกครั้งที่มีปัญหาผมต้อง SSH เข้าไปแต่ละเครื่องแล้ว grep log ทีละไฟล์บางทีปัญหาอยู่คนละเครื่องต้อง correlate เวลาจาก log หลายๆที่ใช้เวลาหาสาเหตุเป็นชั่วโมงกว่าจะรู้ว่า root cause คืออะไรพอเปลี่ยนมาใช้ ELK Stack ทุกอย่างเปลี่ยนไปแค่พิมพ์ query ใน Kibana ก็เห็น log จากทุกเครื่องพร้อมกัน filter ตาม timestamp, service, error level ได้หมดเวลาหา root cause ลดจากชั่วโมงเหลือนาที

ELK Stack เป็นชุดเครื่องมือ Open Source สำหรับ Log Management ที่ได้รับความนิยมมากที่สุดในโลกประกอบด้วย Elasticsearch สำหรับเก็บและค้นหาข้อมูล Logstash สำหรับรับแปลงและส่ง log data และ Kibana สำหรับ visualization ปัจจุบันมักเรียกว่า Elastic Stack เพราะมี Beats เพิ่มเข้ามาด้วย

ELK Stack vs Loki vs Graylog

ในตลาด open-source log management มี 3 ตัวหลักที่เจอบ่อย ELK Stack เป็นตัวที่ full-featured ที่สุดค้นหา full-text ได้ดีมากสร้าง visualization ซับซ้อนได้แต่กิน resource เยอะ Grafana Loki เบากว่ามากเพราะไม่ index เนื้อหา log ทั้งหมด index แค่ labels ทำให้ storage ถูกกว่าแต่ค้นหาข้อมูลได้ช้ากว่าเหมาะกับทีมที่ใช้ Grafana อยู่แล้ว Graylog เป็นทางเลือกที่อยู่ตรงกลางใช้ Elasticsearch เป็น backend แต่มี UI ที่ออกแบบมาสำหรับ log analysis โดยเฉพาะ

Data Flow ทั้งระบบ

ข้อมูลไหลจากซ้ายไปขวาแบบนี้ Applications/Servers → Filebeat (lightweight shipper เก็บ log จาก file) → Logstash (parse, transform, enrich) → Elasticsearch (store, index, search) → Kibana (visualize, dashboard, alert)

สำหรับ production ผมแนะนำให้เพิ่ม Kafka หรือ Redis เป็น buffer ระหว่าง Filebeat กับ Logstash เพื่อรองรับ spike ของ log volume ถ้า Logstash ประมวลผลไม่ทัน log จะถูกเก็บไว้ใน buffer ก่อนไม่สูญหาย

Hardware Sizing

สำหรับ Small setup (log volume < 50 GB/day)

Elasticsearch: 3 nodes, 16 GB RAM, 4 cores, SSD 500 GB each

Logstash: 1 node, 8 GB RAM, 4 cores

Kibana: 1 node, 4 GB RAM, 2 cores (หรือรวมกับ Logstash)

เนื้อหาเกี่ยวข้อง — ดูเพิ่มเติมเรื่อง Kubernetes Operator Metric Collection

สำหรับ Medium setup (log volume 50-200 GB/day)

Elasticsearch: 5 nodes (3 master-eligible + 2 data), 32 GB RAM, 8 cores, SSD 2 TB each

Logstash: 2 nodes, 16 GB RAM, 8 cores

แนะนำเพิ่มเติม — บทวิเคราะห์จาก XM Signal

Kibana: 1 node, 8 GB RAM, 4 cores

Kafka: 3 nodes (optional แต่แนะนำ)

กฎหัวแม่มือ: Elasticsearch JVM heap ตั้งไว้ครึ่งหนึ่งของ RAM แต่ไม่เกิน 31 GB

เหลือ RAM อีกครึ่งให้ OS filesystem cache

ติดตั้ง Elasticsearch Cluster

ผมจะแสดงการติดตั้ง Elasticsearch 8.x cluster 3 nodes บน Ubuntu 22.04 ซึ่งเป็น minimum recommended สำหรับ production

เนื้อหาเกี่ยวข้อง — บทความที่เกี่ยวข้อง: Linux Systemd Advanced Clean Architecture

ติดตั้งแบบ Package Manager

# เพิ่ม Elastic repository (ทำบนทุก node)

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | \

 gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg



echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] \

 https://artifacts.elastic.co/packages/8.x/apt stable main" | \

 tee /etc/apt/sources.list.d/elastic-8.x.list



apt update && apt install elasticsearch



# ตั้งค่า Elasticsearch — Node 1 (master + data)

cat > /etc/elasticsearch/elasticsearch.yml << 'EOF'

cluster.name: siamcafe-logs

node.name: es-node-1

node.roles: [master, data, ingest]



path.data: /var/lib/elasticsearch

path.logs: /var/log/elasticsearch



network.host: 10.10.10.11

http.port: 9200

transport.port: 9300



discovery.seed_hosts:

 - 10.10.10.11:9300

 - 10.10.10.12:9300

 - 10.10.10.13:9300



cluster.initial_master_nodes:

 - es-node-1

 - es-node-2

 - es-node-3



xpack.security.enabled: true

xpack.security.transport.ssl.enabled: true

xpack.security.transport.ssl.verification_mode: certificate

xpack.security.transport.ssl.keystore.path: elastic-certificates.p12

xpack.security.transport.ssl.truststore.path: elastic-certificates.p12

EOF



# ตั้งค่า JVM heap (ครึ่งหนึ่งของ RAM แต่ไม่เกิน 31g)

cat > /etc/elasticsearch/jvm.options.d/heap.options << 'EOF'

-Xms16g

-Xmx16g

EOF



# สร้าง certificate สำหรับ transport layer

/usr/share/elasticsearch/bin/elasticsearch-certutil ca --out /tmp/elastic-stack-ca.p12

/usr/share/elasticsearch/bin/elasticsearch-certutil cert --ca /tmp/elastic-stack-ca.p12 \

 --out /etc/elasticsearch/elastic-certificates.p12



# คัดลอก certificate ไปยัง node อื่นๆ

scp /etc/elasticsearch/elastic-certificates.p12 es-node-2:/etc/elasticsearch/

scp /etc/elasticsearch/elastic-certificates.p12 es-node-3:/etc/elasticsearch/



# Start Elasticsearch

systemctl enable --now elasticsearch



# ตั้ง password สำหรับ built-in users

/usr/share/elasticsearch/bin/elasticsearch-setup-passwords auto

# จดบันทึก password ที่ได้ โดยเฉพาะ elastic และ kibana_system



# ตรวจสอบ cluster health

curl -u elastic:YOUR_PASSWORD -k https://10.10.10.11:9200/_cluster/health?pretty

# status: "green" = ทุกอย่างปกติ

ติดตั้งด้วย Docker Compose

# docker-compose.yml สำหรับ development/testing

services:

 elasticsearch:

 image: docker.elastic.co/elasticsearch/elasticsearch:8.13.0

 environment:

 - discovery.type=single-node

 - xpack.security.enabled=false

 - "ES_JAVA_OPTS=-Xms2g -Xmx2g"

 ports:

 - "9200:9200"

 volumes:

 - es-data:/usr/share/elasticsearch/data

 ulimits:

 memlock:

 soft: -1

 hard: -1



 kibana:

 image: docker.elastic.co/kibana/kibana:8.13.0

 ports:

 - "5601:5601"

 environment:

 ELASTICSEARCH_HOSTS: '["http://elasticsearch:9200"]'

 depends_on:

 - elasticsearch



 logstash:

 image: docker.elastic.co/logstash/logstash:8.13.0

 volumes:

 - ./logstash/pipeline:/usr/share/logstash/pipeline

 ports:

 - "5044:5044"

 depends_on:

 - elasticsearch



volumes:

 es-data:

ตั้งค่า Logstash Pipeline

Logstash เป็น data processing pipeline ที่รับ input หลายรูปแบบแปลง (filter) แล้วส่งไปยัง output ปลายทางผมจะแสดง pipeline สำหรับ use case ที่พบบ่อย

Pipeline สำหรับ Syslog

# /etc/logstash/conf.d/syslog.conf

input {

 beats {

 port => 5044

 type => "syslog"

 }

}



filter {

 if [type] == "syslog" {

 grok {

 match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:hostname} %{DATA:program}(?:\[%{POSINT:pid}\])?: %{GREEDYDATA:log_message}" }

 }

 date {

 match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]

 }

 mutate {

 remove_field => [ "syslog_timestamp" ]

 }

 }

}



output {

 elasticsearch {

 hosts => ["https://10.10.10.11:9200"]

 user => "elastic"

 password => "YOUR_PASSWORD"

 ssl_certificate_verification => false

 index => "syslog-%{+YYYY.MM.dd}"

 }

}

Pipeline สำหรับ Nginx Access Log

# /etc/logstash/conf.d/nginx.conf

input {

 beats {

 port => 5045

 type => "nginx-access"

 }

}



filter {

 if [type] == "nginx-access" {

 grok {

 match => { "message" => '%{IPORHOST:client_ip} - %{DATA:user} \[%{HTTPDATE:timestamp}\] "%{WORD:method} %{URIPATHPARAM:request} HTTP/%{NUMBER:http_version}" %{NUMBER:status:int} %{NUMBER:bytes:int} "%{DATA:referrer}" "%{DATA:user_agent}"' }

 }

 date {

 match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]

 }

 geoip {

 source => "client_ip"

 target => "geoip"

 }

 useragent {

 source => "user_agent"

 target => "ua"

 }

 }

}



output {

 elasticsearch {

 hosts => ["https://10.10.10.11:9200"]

 user => "elastic"

 password => "YOUR_PASSWORD"

 index => "nginx-%{+YYYY.MM.dd}"

 }

}

Filebeat — เก็บ Log จาก Server

Filebeat เป็น lightweight log shipper ที่ติดตั้งบน server ที่ต้องการเก็บ log กิน RAM แค่ 20-50 MB เทียบกับ Logstash ที่กิน 1-2 GB

ติดตั้งและตั้งค่า Filebeat

# ติดตั้ง Filebeat

apt install filebeat



# ตั้งค่า /etc/filebeat/filebeat.yml

filebeat.inputs:

 - type: filestream

 id: syslog

 paths:

 - /var/log/syslog

 - /var/log/auth.log

 fields:

 log_type: syslog



 - type: filestream

 id: nginx-access

 paths:

 - /var/log/nginx/access.log

 fields:

 log_type: nginx-access



 - type: filestream

 id: nginx-error

 paths:

 - /var/log/nginx/error.log

 fields:

 log_type: nginx-error



output.logstash:

 hosts: ["logstash.example.com:5044"]

 ssl.enabled: true



# หรือส่งตรงไป Elasticsearch (ไม่ผ่าน Logstash)

# output.elasticsearch:

# hosts: ["https://es-node-1:9200"]

# username: "elastic"

# password: "YOUR_PASSWORD"



# Filebeat modules — ใช้ pre-built configs

filebeat modules enable system nginx mysql



# Start

systemctl enable --now filebeat



# ตรวจสอบ

filebeat test config

filebeat test output

Filebeat สำหรับ Docker Containers

# เก็บ log จาก Docker containers อัตโนมัติ

filebeat.autodiscover:

 providers:

 - type: docker

 hints.enabled: true

 hints.default_config:

 type: container

 paths:

 - /var/lib/docker/containers//*.log

 templates:

 - condition:

 contains:

 docker.container.image: "nginx"

 config:

 - module: nginx

 access:

 input:

 type: container

 paths:

 - /var/lib/docker/containers//*.log

Kibana — สร้าง Dashboard สวยๆ

Kibana เป็น visualization layer ของ ELK Stack ที่ช่วยให้เราสร้าง dashboard และวิเคราะห์ log ได้อย่างมีประสิทธิภาพ

สร้าง Data View (Index Pattern)

เปิด Kibana ที่ http://kibana-host:5601

ไปที่ Stack Management > Data Views

สร้าง Data View ใหม่:

Name: syslog-*

แนะนำเพิ่มเติม — คู่มือเทรดจาก SiamCafeBook

Index pattern: syslog-*

Timestamp field: @timestamp

สร้างอีกอัน:

Name: nginx-*

เนื้อหาเกี่ยวข้อง — GCP BigQuery ML Network Segmentation

Index pattern: nginx-*

Timestamp field: @timestamp

Dashboard ที่ผมใช้ประจำ

ผมสร้าง Dashboard หลักๆ 3 อันอันแรกคือ Security Dashboard แสดง failed SSH attempts, IP ที่พยายาม brute force (ข้อมูลนี้ผมส่งไปให้ Fail2ban ด้วย) top attacking IPs บน geo map อันที่สองคือ Web Traffic Dashboard แสดง request per second, response time percentiles, HTTP error rates, top URLs, user agents อันที่สามคือ System Health Dashboard แสดง disk usage warnings, OOM kills, service restarts, CPU/Memory alerts

KQL (Kibana Query Language) ที่ใช้บ่อย

# หา error logs

level: "error" or level: "critical"



# หา specific IP

client_ip: "203.150.x.x"



# หา 404 errors ใน nginx

status: 404 and type: "nginx-access"



# หา failed SSH login

program: "sshd" and log_message: "Failed password"



# หา slow requests (response time > 5 seconds)

response_time > 5000



# Combined query

status >= 500 and not request: "/health" and client_ip: "10.0.0.0/8"

Performance Tuning และ Index Management

Elasticsearch กิน resource เยอะถ้าไม่ tune ดีๆจะพบปัญหา disk เต็ม memory leak หรือ search ช้า

Index Lifecycle Management (ILM)

# สร้าง ILM Policy ผ่าน API

PUT _ilm/policy/logs-policy

{

 "policy": {

 "phases": {

 "hot": {

 "min_age": "0ms",

 "actions": {

 "rollover": {

 "max_age": "1d",

 "max_primary_shard_size": "50gb"

 },

 "set_priority": { "priority": 100 }

 }

 },

 "warm": {

 "min_age": "7d",

 "actions": {

 "shrink": { "number_of_shards": 1 },

 "forcemerge": { "max_num_segments": 1 },

 "set_priority": { "priority": 50 }

 }

 },

 "cold": {

 "min_age": "30d",

 "actions": {

 "set_priority": { "priority": 0 },

 "freeze": {}

 }

 },

 "delete": {

 "min_age": "90d",

 "actions": {

 "delete": {}

 }

 }

 }

 }

}

Shard Sizing

กฎหัวแม่มือ

Primary shard ควรมีขนาด 10-50 GB
จำนวน shards per node ไม่เกิน 20 per GB of heap
ถ้า heap 16 GB → ไม่เกิน 320 shards per node

ตรวจสอบ shard allocation

GET _cat/shards?v&s=store:desc

ตรวจสอบ index size

เนื้อหาเกี่ยวข้อง — อ่านต่อ: Neon Serverless Postgres Observability Stack

GET _cat/indices?v&s=store.size:desc&h=index, docs.count, store.size

JVM Tuning

# /etc/elasticsearch/jvm.options.d/gc.options

# ใช้ G1GC สำหรับ heap > 4 GB (default ตั้งแต่ ES 7.x)

-XX:+UseG1GC

-XX:G1ReservePercent=25

-XX:InitiatingHeapOccupancyPercent=30



# ตรวจสอบ JVM stats

GET _nodes/stats/jvm

# ดู heap usage, gc count, gc time

ELK Stack กิน resource เยอะมากมีทางเลือกที่เบากว่าไหม?

ถ้างบจำกัดหรือ server เล็กผมแนะนำ Grafana Loki แทนมันเบากว่ามากเพราะไม่ทำ full-text indexing ใช้ Promtail เก็บ log (เทียบเท่า Filebeat) แล้วดูผ่าน Grafana ได้เลยแต่ข้อเสียคือ search capability จำกัดกว่าถ้าต้อง grep log ซับซ้อน ELK ยังดีกว่า

Elasticsearch disk เต็มบ่อยทำยังไง?

ตั้ง ILM Policy ให้ลบ index เก่าอัตโนมัติใช้ rollover index แทน daily index ตั้ง watermark alerts ที่ 80% disk ผมตั้ง retention ไว้ 30 วันสำหรับ nginx log และ 90 วันสำหรับ security log ถ้ายังไม่พอก็ compress old indices ด้วย force merge

Filebeat กับ Logstash ต่างกันยังไงต้องใช้ทั้งคู่ไหม?

Filebeat เป็น lightweight shipper หน้าที่แค่อ่าน log file แล้วส่งต่อส่วน Logstash เป็น heavy-weight processor ที่ parse, transform และ enrich data ได้ถ้า log format ง่ายๆใช้ Filebeat ส่งตรงไป Elasticsearch ได้เลยโดยใช้ Ingest Pipeline ของ Elasticsearch แทน Logstash แต่ถ้า log format ซับซ้อนต้อง parse หลายรูปแบบใช้ Logstash จะง่ายกว่า

ใช้ ELK Stack ร่วมกับ Monitoring อื่นได้ไหม?

ได้ครับผมใช้ ELK สำหรับ logs และใช้ Zabbix หรือ Prometheus + Grafana สำหรับ metrics ทั้งสองระบบเสริมกัน Zabbix ดีเรื่อง alerting บน infrastructure metrics ส่วน ELK ดีเรื่อง log analysis และ search

สรุป

ELK Stack ยังคงเป็น Log Management solution ที่ทรงพลังที่สุดในโลก Open Source แม้จะกิน resource มากกว่าทางเลือกอื่นแต่ความสามารถในการค้นหา full-text, visualize data และสร้าง complex queries ยังไม่มีใครเทียบได้สำหรับองค์กรที่มี log volume ปานกลางถึงสูงและต้องการ observability ที่สมบูรณ์แบบ ELK Stack คือการลงทุนที่คุ้มค่าเพียงแค่ต้องวางแผน hardware sizing และ index management ให้ดีก็จะใช้งานได้อย่างราบรื่นครับ