Technology

Database Analyst คืออะไร — เส้นทางอาชีพ DBA ทักษะและเครื่องมือ

database analyst คอ
database analyst คือ | SiamCafe Blog
2025-12-09· อ. บอม — SiamCafe.net· 1,652 คำ

Database Analyst คืออะไรและทำอะไร

Database Analyst (DBA) เป็นผู้เชี่ยวชาญที่ดูแลจัดการฐานข้อมูลขององค์กร ทั้งการออกแบบ ติดตั้ง บำรุงรักษา optimize performance และรักษาความปลอดภัยของข้อมูล เป็นตำแหน่งที่สำคัญในทุกองค์กรที่ใช้ฐานข้อมูล

หน้าที่หลักของ Database Analyst ได้แก่ Database Design ออกแบบ schema, tables, relationships และ indexes, Performance Tuning วิเคราะห์และ optimize query performance, Backup and Recovery วางแผนและทดสอบ backup/restore procedures, Security Management จัดการ user permissions, encryption และ audit logs, Capacity Planning วางแผนการเติบโตของ storage และ compute resources, Data Migration ย้ายข้อมูลระหว่างระบบ, Monitoring ดูแล database health และ availability

Database Analyst แตกต่างจาก Data Analyst ตรงที่ Database Analyst เน้น infrastructure และ operations ของฐานข้อมูล ส่วน Data Analyst เน้น analyze ข้อมูลเพื่อ business insights แต่ในบางองค์กรอาจรวมทั้งสอง roles เข้าด้วยกัน

ทักษะและเครื่องมือที่ต้องรู้

ทักษะสำคัญสำหรับ Database Analyst

# === Database Analyst Skill Stack ===

# 1. Database Systems
# ===================================
# Relational Databases (ต้องรู้อย่างน้อย 1)
# - PostgreSQL (open source, enterprise-grade)
# - MySQL/MariaDB (web applications)
# - Microsoft SQL Server (enterprise)
# - Oracle Database (enterprise)

# NoSQL Databases (ดีถ้ารู้)
# - MongoDB (document store)
# - Redis (cache/key-value)
# - Cassandra (wide-column)
# - Elasticsearch (search engine)

# Cloud Databases
# - Amazon RDS/Aurora
# - Google Cloud SQL/BigQuery
# - Azure SQL Database

# 2. Essential Tools
# ===================================
# Install PostgreSQL
sudo apt-get update
sudo apt-get install -y postgresql postgresql-contrib

# Start service
sudo systemctl start postgresql
sudo systemctl enable postgresql

# Connect
sudo -u postgres psql

# Install monitoring tools
sudo apt-get install -y pgbadger  # Log analyzer
pip install pgcli                  # Enhanced PostgreSQL CLI

# Install MySQL
sudo apt-get install -y mysql-server
mysql_secure_installation

# 3. Version Control for Database
# ===================================
# Install Flyway (database migrations)
wget -qO- https://download.red-gate.com/maven/release/com/redgate/flyway/flyway-commandline/10.0.0/flyway-commandline-10.0.0-linux-x64.tar.gz | tar xvz
sudo ln -s $(pwd)/flyway-10.0.0/flyway /usr/local/bin/flyway

# Create migration
mkdir -p sql/migrations
cat > sql/migrations/V001__create_users.sql << 'SQL'
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    username VARCHAR(100) NOT NULL UNIQUE,
    email VARCHAR(255) NOT NULL UNIQUE,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_created ON users(created_at);
SQL

# Run migrations
flyway -url=jdbc:postgresql://localhost:5432/mydb -user=admin migrate

# 4. Monitoring Stack
# ===================================
# PostgreSQL: pg_stat_statements, pgBadger, Prometheus + postgres_exporter
# MySQL: Performance Schema, slow query log, Prometheus + mysqld_exporter
# General: Grafana dashboards, alerting

# Install postgres_exporter
wget https://github.com/prometheus-community/postgres_exporter/releases/download/v0.15.0/postgres_exporter-0.15.0.linux-amd64.tar.gz
tar xvf postgres_exporter-0.15.0.linux-amd64.tar.gz

export DATA_SOURCE_NAME="postgresql://postgres:password@localhost:5432/postgres?sslmode=disable"
./postgres_exporter &

echo "DBA tools installed"

SQL Queries สำหรับ Database Analysis

SQL queries ที่ Database Analyst ใช้บ่อย

-- === Essential DBA Queries (PostgreSQL) ===

-- 1. Database Size Overview
SELECT
    datname AS database_name,
    pg_size_pretty(pg_database_size(datname)) AS size
FROM pg_database
WHERE datistemplate = false
ORDER BY pg_database_size(datname) DESC;

-- 2. Table Sizes with Indexes
SELECT
    schemaname || '.' || relname AS table_name,
    pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
    pg_size_pretty(pg_relation_size(relid)) AS data_size,
    pg_size_pretty(pg_indexes_size(relid)) AS index_size,
    n_live_tup AS row_count,
    n_dead_tup AS dead_rows,
    ROUND(n_dead_tup::numeric / NULLIF(n_live_tup, 0) * 100, 2) AS dead_pct
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(relid) DESC
LIMIT 20;

-- 3. Slow Queries (requires pg_stat_statements)
SELECT
    LEFT(query, 100) AS query_preview,
    calls,
    ROUND(total_exec_time::numeric / 1000, 2) AS total_sec,
    ROUND(mean_exec_time::numeric, 2) AS avg_ms,
    ROUND(max_exec_time::numeric, 2) AS max_ms,
    rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

-- 4. Index Usage Analysis
SELECT
    schemaname || '.' || relname AS table_name,
    indexrelname AS index_name,
    idx_scan AS times_used,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    CASE WHEN idx_scan = 0 THEN 'UNUSED — consider dropping'
         WHEN idx_scan < 50 THEN 'rarely used'
         ELSE 'active'
    END AS status
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC, pg_relation_size(indexrelid) DESC;

-- 5. Missing Indexes (sequential scans on large tables)
SELECT
    schemaname || '.' || relname AS table_name,
    seq_scan,
    seq_tup_read,
    idx_scan,
    CASE WHEN seq_scan > 0
        THEN ROUND(seq_tup_read::numeric / seq_scan, 0)
        ELSE 0
    END AS avg_rows_per_seq_scan,
    n_live_tup AS row_count
FROM pg_stat_user_tables
WHERE seq_scan > 100
  AND n_live_tup > 10000
  AND (idx_scan IS NULL OR idx_scan < seq_scan * 0.1)
ORDER BY seq_tup_read DESC
LIMIT 10;

-- 6. Active Connections
SELECT
    pid,
    usename,
    application_name,
    client_addr,
    state,
    query_start,
    NOW() - query_start AS duration,
    LEFT(query, 80) AS query_preview
FROM pg_stat_activity
WHERE state != 'idle'
  AND pid != pg_backend_pid()
ORDER BY query_start;

-- 7. Lock Monitoring
SELECT
    blocked_locks.pid AS blocked_pid,
    blocked_activity.usename AS blocked_user,
    blocking_locks.pid AS blocking_pid,
    blocking_activity.usename AS blocking_user,
    LEFT(blocked_activity.query, 60) AS blocked_query,
    LEFT(blocking_activity.query, 60) AS blocking_query
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity
    ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
    ON blocking_locks.locktype = blocked_locks.locktype
    AND blocking_locks.relation = blocked_locks.relation
    AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity
    ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;

-- 8. Cache Hit Ratio (should be > 99%)
SELECT
    sum(heap_blks_read) AS heap_read,
    sum(heap_blks_hit) AS heap_hit,
    ROUND(sum(heap_blks_hit)::numeric /
        NULLIF(sum(heap_blks_hit) + sum(heap_blks_read), 0) * 100, 2)
        AS cache_hit_ratio_pct
FROM pg_statio_user_tables;

Performance Monitoring และ Optimization

Monitor และ optimize database performance

#!/usr/bin/env python3
# db_monitor.py — Database Performance Monitor
import json
import logging
import time
from datetime import datetime
from typing import Dict, List

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("db_monitor")

class DatabaseMonitor:
    def __init__(self, connection_string):
        self.conn_str = connection_string
        self.history = []
    
    def get_metrics(self):
        """Collect database metrics (pseudo-code with psycopg2)"""
        # import psycopg2
        # conn = psycopg2.connect(self.conn_str)
        # cur = conn.cursor()
        
        metrics = {
            "timestamp": datetime.utcnow().isoformat(),
            "connections": self._get_connections(),
            "cache_hit_ratio": self._get_cache_ratio(),
            "transactions": self._get_transaction_stats(),
            "replication_lag": self._get_replication_lag(),
            "table_bloat": self._get_bloat_estimate(),
        }
        
        self.history.append(metrics)
        return metrics
    
    def _get_connections(self):
        # Query: SELECT count(*) FROM pg_stat_activity
        return {
            "active": 15,
            "idle": 30,
            "idle_in_transaction": 2,
            "max_connections": 100,
            "utilization_pct": 47,
        }
    
    def _get_cache_ratio(self):
        # Query: cache hit ratio from pg_statio_user_tables
        return {"heap_hit_ratio": 99.5, "index_hit_ratio": 99.8}
    
    def _get_transaction_stats(self):
        return {
            "commits_per_sec": 150,
            "rollbacks_per_sec": 2,
            "deadlocks": 0,
        }
    
    def _get_replication_lag(self):
        return {"lag_bytes": 1024, "lag_seconds": 0.5}
    
    def _get_bloat_estimate(self):
        return [
            {"table": "orders", "bloat_pct": 15, "dead_rows": 50000},
            {"table": "logs", "bloat_pct": 45, "dead_rows": 500000},
        ]
    
    def analyze_performance(self, metrics):
        issues = []
        
        conn = metrics["connections"]
        if conn["utilization_pct"] > 80:
            issues.append(f"High connection usage: {conn['utilization_pct']}%")
        if conn["idle_in_transaction"] > 5:
            issues.append(f"Idle-in-transaction: {conn['idle_in_transaction']}")
        
        cache = metrics["cache_hit_ratio"]
        if cache["heap_hit_ratio"] < 99:
            issues.append(f"Low cache hit ratio: {cache['heap_hit_ratio']}%")
        
        tx = metrics["transactions"]
        if tx["deadlocks"] > 0:
            issues.append(f"Deadlocks detected: {tx['deadlocks']}")
        if tx["rollbacks_per_sec"] > tx["commits_per_sec"] * 0.05:
            issues.append("High rollback rate")
        
        repl = metrics["replication_lag"]
        if repl["lag_seconds"] > 10:
            issues.append(f"Replication lag: {repl['lag_seconds']}s")
        
        for table in metrics["table_bloat"]:
            if table["bloat_pct"] > 30:
                issues.append(f"Table bloat: {table['table']} at {table['bloat_pct']}%")
        
        return {
            "healthy": len(issues) == 0,
            "issues": issues,
            "recommendations": self._get_recommendations(issues),
        }
    
    def _get_recommendations(self, issues):
        recs = []
        for issue in issues:
            if "connection" in issue.lower():
                recs.append("Consider using connection pooler (PgBouncer)")
            elif "cache" in issue.lower():
                recs.append("Increase shared_buffers or add RAM")
            elif "bloat" in issue.lower():
                recs.append("Run VACUUM ANALYZE on bloated tables")
            elif "replication" in issue.lower():
                recs.append("Check network and replica performance")
            elif "deadlock" in issue.lower():
                recs.append("Review transaction isolation and query order")
        return recs

monitor = DatabaseMonitor("postgresql://localhost:5432/mydb")
metrics = monitor.get_metrics()
analysis = monitor.analyze_performance(metrics)
print(json.dumps(analysis, indent=2))

Automation Scripts สำหรับ DBA

Scripts อัตโนมัติสำหรับงาน DBA

#!/bin/bash
# dba_automation.sh — Daily DBA Tasks

# === 1. Automated Backup ===
BACKUP_DIR="/backup/postgresql/$(date +%Y%m%d)"
mkdir -p "$BACKUP_DIR"

# Full backup with pg_dump
pg_dump -h localhost -U postgres -Fc mydb > "$BACKUP_DIR/mydb.dump"

# Verify backup
pg_restore --list "$BACKUP_DIR/mydb.dump" > /dev/null 2>&1
if [ $? -eq 0 ]; then
    echo "Backup verified: $BACKUP_DIR/mydb.dump"
else
    echo "ERROR: Backup verification failed!"
fi

# Clean old backups (keep 7 days)
find /backup/postgresql -maxdepth 1 -mtime +7 -type d -exec rm -rf {} \;

# === 2. Automated VACUUM ===
psql -U postgres -d mydb -c "
SELECT schemaname || '.' || relname AS table_name,
       n_dead_tup,
       last_vacuum,
       last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;" > /tmp/vacuum_candidates.txt

# Vacuum tables with high dead rows
psql -U postgres -d mydb -c "
DO \$\$
DECLARE
    r RECORD;
BEGIN
    FOR r IN SELECT schemaname, relname
             FROM pg_stat_user_tables
             WHERE n_dead_tup > 10000
    LOOP
        EXECUTE 'VACUUM ANALYZE ' || r.schemaname || '.' || r.relname;
        RAISE NOTICE 'Vacuumed: %.%', r.schemaname, r.relname;
    END LOOP;
END \$\$;"

# === 3. Index Maintenance ===
# Reindex bloated indexes
psql -U postgres -d mydb -c "
SELECT indexrelname, pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE pg_relation_size(indexrelid) > 100 * 1024 * 1024
ORDER BY pg_relation_size(indexrelid) DESC;" > /tmp/large_indexes.txt

# === 4. Health Check Report ===
echo "=== Database Health Report $(date) ===" > /tmp/db_health.txt

# Connection count
psql -U postgres -c "
SELECT state, count(*)
FROM pg_stat_activity
GROUP BY state;" >> /tmp/db_health.txt

# Database sizes
psql -U postgres -c "
SELECT datname, pg_size_pretty(pg_database_size(datname))
FROM pg_database
WHERE datistemplate = false
ORDER BY pg_database_size(datname) DESC;" >> /tmp/db_health.txt

# Replication status
psql -U postgres -c "
SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn
FROM pg_stat_replication;" >> /tmp/db_health.txt

# Cache hit ratio
psql -U postgres -d mydb -c "
SELECT ROUND(sum(heap_blks_hit)::numeric /
    NULLIF(sum(heap_blks_hit) + sum(heap_blks_read), 0) * 100, 2)
    AS cache_hit_ratio
FROM pg_statio_user_tables;" >> /tmp/db_health.txt

echo "Health report: /tmp/db_health.txt"

# === 5. Crontab Setup ===
# Daily backup at 2 AM
# 0 2 * * * /opt/scripts/dba_automation.sh backup >> /var/log/dba_backup.log 2>&1
# 
# Vacuum every 6 hours
# 0 */6 * * * /opt/scripts/dba_automation.sh vacuum >> /var/log/dba_vacuum.log 2>&1
# 
# Health check every hour
# 0 * * * * /opt/scripts/dba_automation.sh health >> /var/log/dba_health.log 2>&1

echo "DBA automation complete"

Career Path และ Certifications

เส้นทางอาชีพและ certifications สำหรับ Database Analyst

# === Database Analyst Career Path ===

# 1. Entry Level: Junior DBA / Database Administrator
# ===================================
# Skills: SQL basics, backup/restore, user management
# Experience: 0-2 years
# Salary (Thailand): 25,000-45,000 THB/month
# Salary (US): $55,000-$75,000/year

# 2. Mid Level: Database Analyst / Senior DBA
# ===================================
# Skills: Performance tuning, replication, HA setup
# Experience: 2-5 years
# Salary (Thailand): 45,000-80,000 THB/month
# Salary (US): $75,000-$110,000/year

# 3. Senior Level: Lead DBA / Database Architect
# ===================================
# Skills: Architecture design, capacity planning, migration
# Experience: 5-10 years
# Salary (Thailand): 80,000-150,000 THB/month
# Salary (US): $110,000-$160,000/year

# 4. Expert Level: Principal DBA / Data Platform Engineer
# ===================================
# Skills: Multi-cloud, automation, team leadership
# Experience: 10+ years
# Salary (Thailand): 150,000+ THB/month
# Salary (US): $160,000+/year

# === Certifications ===
# PostgreSQL:
# - EDB PostgreSQL Associate Certification
# - EDB PostgreSQL Professional Certification

# MySQL:
# - Oracle MySQL Database Administrator Certified Professional

# AWS:
# - AWS Certified Database - Specialty (DBS-C01)

# Azure:
# - Microsoft Certified: Azure Database Administrator Associate (DP-300)

# Google Cloud:
# - Google Cloud Professional Data Engineer

# Oracle:
# - Oracle Database Administration Certified Professional

# === Learning Path ===
# Month 1-3: SQL mastery + PostgreSQL/MySQL basics
# Month 4-6: Performance tuning + monitoring
# Month 7-9: Replication + HA + backup strategies
# Month 10-12: Cloud databases + automation + first certification

# === Skills to Learn in 2025 ===
# - Database-as-Code (Terraform, Pulumi for DB)
# - Kubernetes operators (CloudNativePG, Percona)
# - Observability (OpenTelemetry for databases)
# - AI/ML integration (vector databases, pgvector)
# - Multi-cloud database management

echo "Career path documented"

FAQ คำถามที่พบบ่อย

Q: Database Analyst ต้องเขียน code ไหม?

A: ต้องเขียน SQL เป็นหลัก (ทั้ง queries, stored procedures, triggers) และ scripting สำหรับ automation เช่น Bash, Python, PowerShell ไม่จำเป็นต้องเขียน application code แต่ต้องเข้าใจ ORMs และ query patterns ที่ applications ใช้ ยิ่งเขียน code ได้มากยิ่งดี เพราะ modern DBA ต้อง automate ทุกอย่าง Infrastructure-as-Code, CI/CD สำหรับ database migrations, monitoring scripts

Q: PostgreSQL กับ MySQL ควรเลือกเรียนตัวไหน?

A: PostgreSQL แนะนำมากกว่าสำหรับ career เพราะ feature-rich กว่า (JSON support, full-text search, CTEs, window functions, extensions), compliance ดีกว่า, ใช้ใน enterprise มากขึ้นเรื่อยๆ MySQL ยังคง popular สำหรับ web applications (WordPress, PHP frameworks) และ cloud services (Amazon Aurora MySQL) เรียน PostgreSQL เป็นหลัก แล้วเสริม MySQL จะ cover ได้กว้างที่สุด

Q: Cloud DBA กับ On-Premise DBA ต่างกันอย่างไร?

A: Cloud DBA เน้น managed services (RDS, Cloud SQL) ไม่ต้องดูแล OS, patching, hardware แต่ต้องเข้าใจ cloud pricing, networking, IAM, multi-region setups On-Premise DBA ต้องดูแลทั้ง hardware, OS, database software ตรงจนถึง application layer ปัจจุบัน trend เป็น hybrid ต้องรู้ทั้งสอง แต่ cloud skills มี demand สูงกว่า

Q: เงินเดือน Database Analyst ในไทยเป็นอย่างไร?

A: เริ่มต้น 25,000-35,000 บาทสำหรับ junior (0-2 ปี), 45,000-80,000 บาทสำหรับ mid-level (3-5 ปี), 80,000-150,000+ บาทสำหรับ senior (5+ ปี) ปัจจัยที่ทำให้เงินเดือนสูงขึ้น ได้แก่ cloud certifications (AWS/Azure), Oracle DBA experience, ความสามารถด้าน performance tuning, ประสบการณ์กับ high-traffic systems และ leadership skills บริษัทต่างชาติใน Bangkok จ่ายสูงกว่า 20-50%

📖 บทความที่เกี่ยวข้อง

OWASP ZAP Database Migrationอ่านบทความ → what data analyst doอ่านบทความ → ฝึกงาน data analyst 2025อ่านบทความ → job description data analystอ่านบทความ → data analyst career pathอ่านบทความ →

📚 ดูบทความทั้งหมด →