Database Analyst คืออะไรและทำอะไร
Database Analyst (DBA) เป็นผู้เชี่ยวชาญที่ดูแลจัดการฐานข้อมูลขององค์กร ทั้งการออกแบบ ติดตั้ง บำรุงรักษา optimize performance และรักษาความปลอดภัยของข้อมูล เป็นตำแหน่งที่สำคัญในทุกองค์กรที่ใช้ฐานข้อมูล
หน้าที่หลักของ Database Analyst ได้แก่ Database Design ออกแบบ schema, tables, relationships และ indexes, Performance Tuning วิเคราะห์และ optimize query performance, Backup and Recovery วางแผนและทดสอบ backup/restore procedures, Security Management จัดการ user permissions, encryption และ audit logs, Capacity Planning วางแผนการเติบโตของ storage และ compute resources, Data Migration ย้ายข้อมูลระหว่างระบบ, Monitoring ดูแล database health และ availability
Database Analyst แตกต่างจาก Data Analyst ตรงที่ Database Analyst เน้น infrastructure และ operations ของฐานข้อมูล ส่วน Data Analyst เน้น analyze ข้อมูลเพื่อ business insights แต่ในบางองค์กรอาจรวมทั้งสอง roles เข้าด้วยกัน
ทักษะและเครื่องมือที่ต้องรู้
ทักษะสำคัญสำหรับ Database Analyst
# === Database Analyst Skill Stack ===
# 1. Database Systems
# ===================================
# Relational Databases (ต้องรู้อย่างน้อย 1)
# - PostgreSQL (open source, enterprise-grade)
# - MySQL/MariaDB (web applications)
# - Microsoft SQL Server (enterprise)
# - Oracle Database (enterprise)
# NoSQL Databases (ดีถ้ารู้)
# - MongoDB (document store)
# - Redis (cache/key-value)
# - Cassandra (wide-column)
# - Elasticsearch (search engine)
# Cloud Databases
# - Amazon RDS/Aurora
# - Google Cloud SQL/BigQuery
# - Azure SQL Database
# 2. Essential Tools
# ===================================
# Install PostgreSQL
sudo apt-get update
sudo apt-get install -y postgresql postgresql-contrib
# Start service
sudo systemctl start postgresql
sudo systemctl enable postgresql
# Connect
sudo -u postgres psql
# Install monitoring tools
sudo apt-get install -y pgbadger # Log analyzer
pip install pgcli # Enhanced PostgreSQL CLI
# Install MySQL
sudo apt-get install -y mysql-server
mysql_secure_installation
# 3. Version Control for Database
# ===================================
# Install Flyway (database migrations)
wget -qO- https://download.red-gate.com/maven/release/com/redgate/flyway/flyway-commandline/10.0.0/flyway-commandline-10.0.0-linux-x64.tar.gz | tar xvz
sudo ln -s $(pwd)/flyway-10.0.0/flyway /usr/local/bin/flyway
# Create migration
mkdir -p sql/migrations
cat > sql/migrations/V001__create_users.sql << 'SQL'
CREATE TABLE users (
id SERIAL PRIMARY KEY,
username VARCHAR(100) NOT NULL UNIQUE,
email VARCHAR(255) NOT NULL UNIQUE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_created ON users(created_at);
SQL
# Run migrations
flyway -url=jdbc:postgresql://localhost:5432/mydb -user=admin migrate
# 4. Monitoring Stack
# ===================================
# PostgreSQL: pg_stat_statements, pgBadger, Prometheus + postgres_exporter
# MySQL: Performance Schema, slow query log, Prometheus + mysqld_exporter
# General: Grafana dashboards, alerting
# Install postgres_exporter
wget https://github.com/prometheus-community/postgres_exporter/releases/download/v0.15.0/postgres_exporter-0.15.0.linux-amd64.tar.gz
tar xvf postgres_exporter-0.15.0.linux-amd64.tar.gz
export DATA_SOURCE_NAME="postgresql://postgres:password@localhost:5432/postgres?sslmode=disable"
./postgres_exporter &
echo "DBA tools installed"
SQL Queries สำหรับ Database Analysis
SQL queries ที่ Database Analyst ใช้บ่อย
-- === Essential DBA Queries (PostgreSQL) ===
-- 1. Database Size Overview
SELECT
datname AS database_name,
pg_size_pretty(pg_database_size(datname)) AS size
FROM pg_database
WHERE datistemplate = false
ORDER BY pg_database_size(datname) DESC;
-- 2. Table Sizes with Indexes
SELECT
schemaname || '.' || relname AS table_name,
pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
pg_size_pretty(pg_relation_size(relid)) AS data_size,
pg_size_pretty(pg_indexes_size(relid)) AS index_size,
n_live_tup AS row_count,
n_dead_tup AS dead_rows,
ROUND(n_dead_tup::numeric / NULLIF(n_live_tup, 0) * 100, 2) AS dead_pct
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(relid) DESC
LIMIT 20;
-- 3. Slow Queries (requires pg_stat_statements)
SELECT
LEFT(query, 100) AS query_preview,
calls,
ROUND(total_exec_time::numeric / 1000, 2) AS total_sec,
ROUND(mean_exec_time::numeric, 2) AS avg_ms,
ROUND(max_exec_time::numeric, 2) AS max_ms,
rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
-- 4. Index Usage Analysis
SELECT
schemaname || '.' || relname AS table_name,
indexrelname AS index_name,
idx_scan AS times_used,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
CASE WHEN idx_scan = 0 THEN 'UNUSED — consider dropping'
WHEN idx_scan < 50 THEN 'rarely used'
ELSE 'active'
END AS status
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC, pg_relation_size(indexrelid) DESC;
-- 5. Missing Indexes (sequential scans on large tables)
SELECT
schemaname || '.' || relname AS table_name,
seq_scan,
seq_tup_read,
idx_scan,
CASE WHEN seq_scan > 0
THEN ROUND(seq_tup_read::numeric / seq_scan, 0)
ELSE 0
END AS avg_rows_per_seq_scan,
n_live_tup AS row_count
FROM pg_stat_user_tables
WHERE seq_scan > 100
AND n_live_tup > 10000
AND (idx_scan IS NULL OR idx_scan < seq_scan * 0.1)
ORDER BY seq_tup_read DESC
LIMIT 10;
-- 6. Active Connections
SELECT
pid,
usename,
application_name,
client_addr,
state,
query_start,
NOW() - query_start AS duration,
LEFT(query, 80) AS query_preview
FROM pg_stat_activity
WHERE state != 'idle'
AND pid != pg_backend_pid()
ORDER BY query_start;
-- 7. Lock Monitoring
SELECT
blocked_locks.pid AS blocked_pid,
blocked_activity.usename AS blocked_user,
blocking_locks.pid AS blocking_pid,
blocking_activity.usename AS blocking_user,
LEFT(blocked_activity.query, 60) AS blocked_query,
LEFT(blocking_activity.query, 60) AS blocking_query
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity
ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.relation = blocked_locks.relation
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity
ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;
-- 8. Cache Hit Ratio (should be > 99%)
SELECT
sum(heap_blks_read) AS heap_read,
sum(heap_blks_hit) AS heap_hit,
ROUND(sum(heap_blks_hit)::numeric /
NULLIF(sum(heap_blks_hit) + sum(heap_blks_read), 0) * 100, 2)
AS cache_hit_ratio_pct
FROM pg_statio_user_tables;
Performance Monitoring และ Optimization
Monitor และ optimize database performance
#!/usr/bin/env python3
# db_monitor.py — Database Performance Monitor
import json
import logging
import time
from datetime import datetime
from typing import Dict, List
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("db_monitor")
class DatabaseMonitor:
def __init__(self, connection_string):
self.conn_str = connection_string
self.history = []
def get_metrics(self):
"""Collect database metrics (pseudo-code with psycopg2)"""
# import psycopg2
# conn = psycopg2.connect(self.conn_str)
# cur = conn.cursor()
metrics = {
"timestamp": datetime.utcnow().isoformat(),
"connections": self._get_connections(),
"cache_hit_ratio": self._get_cache_ratio(),
"transactions": self._get_transaction_stats(),
"replication_lag": self._get_replication_lag(),
"table_bloat": self._get_bloat_estimate(),
}
self.history.append(metrics)
return metrics
def _get_connections(self):
# Query: SELECT count(*) FROM pg_stat_activity
return {
"active": 15,
"idle": 30,
"idle_in_transaction": 2,
"max_connections": 100,
"utilization_pct": 47,
}
def _get_cache_ratio(self):
# Query: cache hit ratio from pg_statio_user_tables
return {"heap_hit_ratio": 99.5, "index_hit_ratio": 99.8}
def _get_transaction_stats(self):
return {
"commits_per_sec": 150,
"rollbacks_per_sec": 2,
"deadlocks": 0,
}
def _get_replication_lag(self):
return {"lag_bytes": 1024, "lag_seconds": 0.5}
def _get_bloat_estimate(self):
return [
{"table": "orders", "bloat_pct": 15, "dead_rows": 50000},
{"table": "logs", "bloat_pct": 45, "dead_rows": 500000},
]
def analyze_performance(self, metrics):
issues = []
conn = metrics["connections"]
if conn["utilization_pct"] > 80:
issues.append(f"High connection usage: {conn['utilization_pct']}%")
if conn["idle_in_transaction"] > 5:
issues.append(f"Idle-in-transaction: {conn['idle_in_transaction']}")
cache = metrics["cache_hit_ratio"]
if cache["heap_hit_ratio"] < 99:
issues.append(f"Low cache hit ratio: {cache['heap_hit_ratio']}%")
tx = metrics["transactions"]
if tx["deadlocks"] > 0:
issues.append(f"Deadlocks detected: {tx['deadlocks']}")
if tx["rollbacks_per_sec"] > tx["commits_per_sec"] * 0.05:
issues.append("High rollback rate")
repl = metrics["replication_lag"]
if repl["lag_seconds"] > 10:
issues.append(f"Replication lag: {repl['lag_seconds']}s")
for table in metrics["table_bloat"]:
if table["bloat_pct"] > 30:
issues.append(f"Table bloat: {table['table']} at {table['bloat_pct']}%")
return {
"healthy": len(issues) == 0,
"issues": issues,
"recommendations": self._get_recommendations(issues),
}
def _get_recommendations(self, issues):
recs = []
for issue in issues:
if "connection" in issue.lower():
recs.append("Consider using connection pooler (PgBouncer)")
elif "cache" in issue.lower():
recs.append("Increase shared_buffers or add RAM")
elif "bloat" in issue.lower():
recs.append("Run VACUUM ANALYZE on bloated tables")
elif "replication" in issue.lower():
recs.append("Check network and replica performance")
elif "deadlock" in issue.lower():
recs.append("Review transaction isolation and query order")
return recs
monitor = DatabaseMonitor("postgresql://localhost:5432/mydb")
metrics = monitor.get_metrics()
analysis = monitor.analyze_performance(metrics)
print(json.dumps(analysis, indent=2))
Automation Scripts สำหรับ DBA
Scripts อัตโนมัติสำหรับงาน DBA
#!/bin/bash
# dba_automation.sh — Daily DBA Tasks
# === 1. Automated Backup ===
BACKUP_DIR="/backup/postgresql/$(date +%Y%m%d)"
mkdir -p "$BACKUP_DIR"
# Full backup with pg_dump
pg_dump -h localhost -U postgres -Fc mydb > "$BACKUP_DIR/mydb.dump"
# Verify backup
pg_restore --list "$BACKUP_DIR/mydb.dump" > /dev/null 2>&1
if [ $? -eq 0 ]; then
echo "Backup verified: $BACKUP_DIR/mydb.dump"
else
echo "ERROR: Backup verification failed!"
fi
# Clean old backups (keep 7 days)
find /backup/postgresql -maxdepth 1 -mtime +7 -type d -exec rm -rf {} \;
# === 2. Automated VACUUM ===
psql -U postgres -d mydb -c "
SELECT schemaname || '.' || relname AS table_name,
n_dead_tup,
last_vacuum,
last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;" > /tmp/vacuum_candidates.txt
# Vacuum tables with high dead rows
psql -U postgres -d mydb -c "
DO \$\$
DECLARE
r RECORD;
BEGIN
FOR r IN SELECT schemaname, relname
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
LOOP
EXECUTE 'VACUUM ANALYZE ' || r.schemaname || '.' || r.relname;
RAISE NOTICE 'Vacuumed: %.%', r.schemaname, r.relname;
END LOOP;
END \$\$;"
# === 3. Index Maintenance ===
# Reindex bloated indexes
psql -U postgres -d mydb -c "
SELECT indexrelname, pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE pg_relation_size(indexrelid) > 100 * 1024 * 1024
ORDER BY pg_relation_size(indexrelid) DESC;" > /tmp/large_indexes.txt
# === 4. Health Check Report ===
echo "=== Database Health Report $(date) ===" > /tmp/db_health.txt
# Connection count
psql -U postgres -c "
SELECT state, count(*)
FROM pg_stat_activity
GROUP BY state;" >> /tmp/db_health.txt
# Database sizes
psql -U postgres -c "
SELECT datname, pg_size_pretty(pg_database_size(datname))
FROM pg_database
WHERE datistemplate = false
ORDER BY pg_database_size(datname) DESC;" >> /tmp/db_health.txt
# Replication status
psql -U postgres -c "
SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn
FROM pg_stat_replication;" >> /tmp/db_health.txt
# Cache hit ratio
psql -U postgres -d mydb -c "
SELECT ROUND(sum(heap_blks_hit)::numeric /
NULLIF(sum(heap_blks_hit) + sum(heap_blks_read), 0) * 100, 2)
AS cache_hit_ratio
FROM pg_statio_user_tables;" >> /tmp/db_health.txt
echo "Health report: /tmp/db_health.txt"
# === 5. Crontab Setup ===
# Daily backup at 2 AM
# 0 2 * * * /opt/scripts/dba_automation.sh backup >> /var/log/dba_backup.log 2>&1
#
# Vacuum every 6 hours
# 0 */6 * * * /opt/scripts/dba_automation.sh vacuum >> /var/log/dba_vacuum.log 2>&1
#
# Health check every hour
# 0 * * * * /opt/scripts/dba_automation.sh health >> /var/log/dba_health.log 2>&1
echo "DBA automation complete"
Career Path และ Certifications
เส้นทางอาชีพและ certifications สำหรับ Database Analyst
# === Database Analyst Career Path ===
# 1. Entry Level: Junior DBA / Database Administrator
# ===================================
# Skills: SQL basics, backup/restore, user management
# Experience: 0-2 years
# Salary (Thailand): 25,000-45,000 THB/month
# Salary (US): $55,000-$75,000/year
# 2. Mid Level: Database Analyst / Senior DBA
# ===================================
# Skills: Performance tuning, replication, HA setup
# Experience: 2-5 years
# Salary (Thailand): 45,000-80,000 THB/month
# Salary (US): $75,000-$110,000/year
# 3. Senior Level: Lead DBA / Database Architect
# ===================================
# Skills: Architecture design, capacity planning, migration
# Experience: 5-10 years
# Salary (Thailand): 80,000-150,000 THB/month
# Salary (US): $110,000-$160,000/year
# 4. Expert Level: Principal DBA / Data Platform Engineer
# ===================================
# Skills: Multi-cloud, automation, team leadership
# Experience: 10+ years
# Salary (Thailand): 150,000+ THB/month
# Salary (US): $160,000+/year
# === Certifications ===
# PostgreSQL:
# - EDB PostgreSQL Associate Certification
# - EDB PostgreSQL Professional Certification
# MySQL:
# - Oracle MySQL Database Administrator Certified Professional
# AWS:
# - AWS Certified Database - Specialty (DBS-C01)
# Azure:
# - Microsoft Certified: Azure Database Administrator Associate (DP-300)
# Google Cloud:
# - Google Cloud Professional Data Engineer
# Oracle:
# - Oracle Database Administration Certified Professional
# === Learning Path ===
# Month 1-3: SQL mastery + PostgreSQL/MySQL basics
# Month 4-6: Performance tuning + monitoring
# Month 7-9: Replication + HA + backup strategies
# Month 10-12: Cloud databases + automation + first certification
# === Skills to Learn in 2025 ===
# - Database-as-Code (Terraform, Pulumi for DB)
# - Kubernetes operators (CloudNativePG, Percona)
# - Observability (OpenTelemetry for databases)
# - AI/ML integration (vector databases, pgvector)
# - Multi-cloud database management
echo "Career path documented"
FAQ คำถามที่พบบ่อย
Q: Database Analyst ต้องเขียน code ไหม?
A: ต้องเขียน SQL เป็นหลัก (ทั้ง queries, stored procedures, triggers) และ scripting สำหรับ automation เช่น Bash, Python, PowerShell ไม่จำเป็นต้องเขียน application code แต่ต้องเข้าใจ ORMs และ query patterns ที่ applications ใช้ ยิ่งเขียน code ได้มากยิ่งดี เพราะ modern DBA ต้อง automate ทุกอย่าง Infrastructure-as-Code, CI/CD สำหรับ database migrations, monitoring scripts
Q: PostgreSQL กับ MySQL ควรเลือกเรียนตัวไหน?
A: PostgreSQL แนะนำมากกว่าสำหรับ career เพราะ feature-rich กว่า (JSON support, full-text search, CTEs, window functions, extensions), compliance ดีกว่า, ใช้ใน enterprise มากขึ้นเรื่อยๆ MySQL ยังคง popular สำหรับ web applications (WordPress, PHP frameworks) และ cloud services (Amazon Aurora MySQL) เรียน PostgreSQL เป็นหลัก แล้วเสริม MySQL จะ cover ได้กว้างที่สุด
Q: Cloud DBA กับ On-Premise DBA ต่างกันอย่างไร?
A: Cloud DBA เน้น managed services (RDS, Cloud SQL) ไม่ต้องดูแล OS, patching, hardware แต่ต้องเข้าใจ cloud pricing, networking, IAM, multi-region setups On-Premise DBA ต้องดูแลทั้ง hardware, OS, database software ตรงจนถึง application layer ปัจจุบัน trend เป็น hybrid ต้องรู้ทั้งสอง แต่ cloud skills มี demand สูงกว่า
Q: เงินเดือน Database Analyst ในไทยเป็นอย่างไร?
A: เริ่มต้น 25,000-35,000 บาทสำหรับ junior (0-2 ปี), 45,000-80,000 บาทสำหรับ mid-level (3-5 ปี), 80,000-150,000+ บาทสำหรับ senior (5+ ปี) ปัจจัยที่ทำให้เงินเดือนสูงขึ้น ได้แก่ cloud certifications (AWS/Azure), Oracle DBA experience, ความสามารถด้าน performance tuning, ประสบการณ์กับ high-traffic systems และ leadership skills บริษัทต่างชาติใน Bangkok จ่ายสูงกว่า 20-50%
