Database Analyst คืออะไร — เส้นทางอาชีพ DBA

Database Analyst คืออะไรและทำอะไร

Database Analyst (DBA) เป็นผู้เชี่ยวชาญที่ดูแลจัดการฐานข้อมูลขององค์กร ทั้งการออกแบบ ติดตั้ง บำรุงรักษา optimize performance และรักษาความปลอดภัยของข้อมูล เป็นตำแหน่งที่สำคัญในทุกองค์กรที่ใช้ฐานข้อมูล

หน้าที่หลักของ Database Analyst ได้แก่ Database Design ออกแบบ schema, tables, relationships และ indexes, Performance Tuning วิเคราะห์และ optimize query performance, Backup and Recovery วางแผนและทดสอบ backup/restore procedures, Security Management จัดการ user permissions, encryption และ audit logs, Capacity Planning วางแผนการเติบโตของ storage และ compute resources, Data Migration ย้ายข้อมูลระหว่างระบบ, Monitoring ดูแล database health และ availability

Database Analyst แตกต่างจาก Data Analyst ตรงที่ Database Analyst เน้น infrastructure และ operations ของฐานข้อมูล ส่วน Data Analyst เน้น analyze ข้อมูลเพื่อ business insights แต่ในบางองค์กรอาจรวมทั้งสอง roles เข้าด้วยกัน

ทักษะและเครื่องมือที่ต้องรู้

ทักษะสำคัญสำหรับ Database Analyst

เนื้อหาเกี่ยวข้อง — ดูเพิ่มเติมเรื่อง CrowdSec IPS Production Setup Guide ติดตั้งระบบปองกนการโจมตแบบ Crowd-sourced

# === Database Analyst Skill Stack ===



# 1. Database Systems

# ===================================

# Relational Databases (ต้องรู้อย่างน้อย 1)

# - PostgreSQL (open source, enterprise-grade)

# - MySQL/MariaDB (web applications)

# - Microsoft SQL Server (enterprise)

# - Oracle Database (enterprise)



# NoSQL Databases (ดีถ้ารู้)

# - MongoDB (document store)

# - Redis (cache/key-value)

# - Cassandra (wide-column)

# - Elasticsearch (search engine)



# Cloud Databases

# - Amazon RDS/Aurora

# - Google Cloud SQL/BigQuery

# - Azure SQL Database



# 2. Essential Tools

# ===================================

# Install PostgreSQL

sudo apt-get update

sudo apt-get install -y postgresql postgresql-contrib



# Start service

sudo systemctl start postgresql

sudo systemctl enable postgresql



# Connect

sudo -u postgres psql



# Install monitoring tools

sudo apt-get install -y pgbadger  # Log analyzer

pip install pgcli                  # Enhanced PostgreSQL CLI



# Install MySQL

sudo apt-get install -y mysql-server

mysql_secure_installation



# 3. Version Control for Database

# ===================================

# Install Flyway (database migrations)

wget -qO- https://download.red-gate.com/maven/release/com/redgate/flyway/flyway-commandline/10.0.0/flyway-commandline-10.0.0-linux-x64.tar.gz | tar xvz

sudo ln -s $(pwd)/flyway-10.0.0/flyway /usr/local/bin/flyway



# Create migration

mkdir -p sql/migrations

cat > sql/migrations/V001__create_users.sql << 'SQL'

CREATE TABLE users (

    id SERIAL PRIMARY KEY,

    username VARCHAR(100) NOT NULL UNIQUE,

    email VARCHAR(255) NOT NULL UNIQUE,

    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP

);

CREATE INDEX idx_users_email ON users(email);

CREATE INDEX idx_users_created ON users(created_at);

SQL



# Run migrations

flyway -url=jdbc:postgresql://localhost:5432/mydb -user=admin migrate



# 4. Monitoring Stack

# ===================================

# PostgreSQL: pg_stat_statements, pgBadger, Prometheus + postgres_exporter

# MySQL: Performance Schema, slow query log, Prometheus + mysqld_exporter

# General: Grafana dashboards, alerting



# Install postgres_exporter

wget https://github.com/prometheus-community/postgres_exporter/releases/download/v0.15.0/postgres_exporter-0.15.0.linux-amd64.tar.gz

tar xvf postgres_exporter-0.15.0.linux-amd64.tar.gz



export DATA_SOURCE_NAME="postgresql://postgres:password@localhost:5432/postgres?sslmode=disable"

./postgres_exporter &



echo "DBA tools installed"

SQL Queries สำหรับ Database Analysis

SQL queries ที่ Database Analyst ใช้บ่อย

-- === Essential DBA Queries (PostgreSQL) ===



-- 1. Database Size Overview

SELECT

    datname AS database_name,

    pg_size_pretty(pg_database_size(datname)) AS size

FROM pg_database

WHERE datistemplate = false

ORDER BY pg_database_size(datname) DESC;



-- 2. Table Sizes with Indexes

SELECT

    schemaname || '.' || relname AS table_name,

    pg_size_pretty(pg_total_relation_size(relid)) AS total_size,

    pg_size_pretty(pg_relation_size(relid)) AS data_size,

    pg_size_pretty(pg_indexes_size(relid)) AS index_size,

    n_live_tup AS row_count,

    n_dead_tup AS dead_rows,

    ROUND(n_dead_tup::numeric / NULLIF(n_live_tup, 0) * 100, 2) AS dead_pct

FROM pg_stat_user_tables

ORDER BY pg_total_relation_size(relid) DESC

LIMIT 20;



-- 3. Slow Queries (requires pg_stat_statements)

SELECT

    LEFT(query, 100) AS query_preview,

    calls,

    ROUND(total_exec_time::numeric / 1000, 2) AS total_sec,

    ROUND(mean_exec_time::numeric, 2) AS avg_ms,

    ROUND(max_exec_time::numeric, 2) AS max_ms,

    rows

FROM pg_stat_statements

ORDER BY total_exec_time DESC

LIMIT 20;



-- 4. Index Usage Analysis

SELECT

    schemaname || '.' || relname AS table_name,

    indexrelname AS index_name,

    idx_scan AS times_used,

    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,

    CASE WHEN idx_scan = 0 THEN 'UNUSED — consider dropping'

         WHEN idx_scan < 50 THEN 'rarely used'

         ELSE 'active'

    END AS status

FROM pg_stat_user_indexes

ORDER BY idx_scan ASC, pg_relation_size(indexrelid) DESC;



-- 5. Missing Indexes (sequential scans on large tables)

SELECT

    schemaname || '.' || relname AS table_name,

    seq_scan,

    seq_tup_read,

    idx_scan,

    CASE WHEN seq_scan > 0

        THEN ROUND(seq_tup_read::numeric / seq_scan, 0)

        ELSE 0

    END AS avg_rows_per_seq_scan,

    n_live_tup AS row_count

FROM pg_stat_user_tables

WHERE seq_scan > 100

  AND n_live_tup > 10000

  AND (idx_scan IS NULL OR idx_scan < seq_scan * 0.1)

ORDER BY seq_tup_read DESC

LIMIT 10;



-- 6. Active Connections

SELECT

    pid,

    usename,

    application_name,

    client_addr,

    state,

    query_start,

    NOW() - query_start AS duration,

    LEFT(query, 80) AS query_preview

FROM pg_stat_activity

WHERE state != 'idle'

  AND pid != pg_backend_pid()

ORDER BY query_start;



-- 7. Lock Monitoring

SELECT

    blocked_locks.pid AS blocked_pid,

    blocked_activity.usename AS blocked_user,

    blocking_locks.pid AS blocking_pid,

    blocking_activity.usename AS blocking_user,

    LEFT(blocked_activity.query, 60) AS blocked_query,

    LEFT(blocking_activity.query, 60) AS blocking_query

FROM pg_catalog.pg_locks blocked_locks

JOIN pg_catalog.pg_stat_activity blocked_activity

    ON blocked_activity.pid = blocked_locks.pid

JOIN pg_catalog.pg_locks blocking_locks

    ON blocking_locks.locktype = blocked_locks.locktype

    AND blocking_locks.relation = blocked_locks.relation

    AND blocking_locks.pid != blocked_locks.pid

JOIN pg_catalog.pg_stat_activity blocking_activity

    ON blocking_activity.pid = blocking_locks.pid

WHERE NOT blocked_locks.granted;



-- 8. Cache Hit Ratio (should be > 99%)

SELECT

    sum(heap_blks_read) AS heap_read,

    sum(heap_blks_hit) AS heap_hit,

    ROUND(sum(heap_blks_hit)::numeric /

        NULLIF(sum(heap_blks_hit) + sum(heap_blks_read), 0) * 100, 2)

        AS cache_hit_ratio_pct

FROM pg_statio_user_tables;

Performance Monitoring และ Optimization

Monitor และ optimize database performance

แนะนำเพิ่มเติม — หนังสือเทรดที่ SiamCafeBook

#!/usr/bin/env python3

# db_monitor.py — Database Performance Monitor

import json

import logging

import time

from datetime import datetime

from typing import Dict, List



logging.basicConfig(level=logging.INFO)

logger = logging.getLogger("db_monitor")



class DatabaseMonitor:

    def __init__(self, connection_string):

        self.conn_str = connection_string

        self.history = []

    

    def get_metrics(self):

        """Collect database metrics (pseudo-code with psycopg2)"""

        # import psycopg2

        # conn = psycopg2.connect(self.conn_str)

        # cur = conn.cursor()

        

        metrics = {

            "timestamp": datetime.utcnow().isoformat(),

            "connections": self._get_connections(),

            "cache_hit_ratio": self._get_cache_ratio(),

            "transactions": self._get_transaction_stats(),

            "replication_lag": self._get_replication_lag(),

            "table_bloat": self._get_bloat_estimate(),

        }

        

        self.history.append(metrics)

        return metrics

    

    def _get_connections(self):

        # Query: SELECT count(*) FROM pg_stat_activity

        return {

            "active": 15,

            "idle": 30,

            "idle_in_transaction": 2,

            "max_connections": 100,

            "utilization_pct": 47,

        }

    

    def _get_cache_ratio(self):

        # Query: cache hit ratio from pg_statio_user_tables

        return {"heap_hit_ratio": 99.5, "index_hit_ratio": 99.8}

    

    def _get_transaction_stats(self):

        return {

            "commits_per_sec": 150,

            "rollbacks_per_sec": 2,

            "deadlocks": 0,

        }

    

    def _get_replication_lag(self):

        return {"lag_bytes": 1024, "lag_seconds": 0.5}

    

    def _get_bloat_estimate(self):

        return [

            {"table": "orders", "bloat_pct": 15, "dead_rows": 50000},

            {"table": "logs", "bloat_pct": 45, "dead_rows": 500000},

        ]

    

    def analyze_performance(self, metrics):

        issues = []

        

        conn = metrics["connections"]

        if conn["utilization_pct"] > 80:

            issues.append(f"High connection usage: {conn['utilization_pct']}%")

        if conn["idle_in_transaction"] > 5:

            issues.append(f"Idle-in-transaction: {conn['idle_in_transaction']}")

        

        cache = metrics["cache_hit_ratio"]

        if cache["heap_hit_ratio"] < 99:

            issues.append(f"Low cache hit ratio: {cache['heap_hit_ratio']}%")

        

        tx = metrics["transactions"]

        if tx["deadlocks"] > 0:

            issues.append(f"Deadlocks detected: {tx['deadlocks']}")

        if tx["rollbacks_per_sec"] > tx["commits_per_sec"] * 0.05:

            issues.append("High rollback rate")

        

        repl = metrics["replication_lag"]

        if repl["lag_seconds"] > 10:

            issues.append(f"Replication lag: {repl['lag_seconds']}s")

        

        for table in metrics["table_bloat"]:

            if table["bloat_pct"] > 30:

                issues.append(f"Table bloat: {table['table']} at {table['bloat_pct']}%")

        

        return {

            "healthy": len(issues) == 0,

            "issues": issues,

            "recommendations": self._get_recommendations(issues),

        }

    

    def _get_recommendations(self, issues):

        recs = []

        for issue in issues:

            if "connection" in issue.lower():

                recs.append("Consider using connection pooler (PgBouncer)")

            elif "cache" in issue.lower():

                recs.append("Increase shared_buffers or add RAM")

            elif "bloat" in issue.lower():

                recs.append("Run VACUUM ANALYZE on bloated tables")

            elif "replication" in issue.lower():

                recs.append("Check network and replica performance")

            elif "deadlock" in issue.lower():

                recs.append("Review transaction isolation and query order")

        return recs



monitor = DatabaseMonitor("postgresql://localhost:5432/mydb")

metrics = monitor.get_metrics()

analysis = monitor.analyze_performance(metrics)

print(json.dumps(analysis, indent=2))

Automation Scripts สำหรับ DBA

Scripts อัตโนมัติสำหรับงาน DBA

เนื้อหาเกี่ยวข้อง — บทความที่เกี่ยวข้อง: Uptime Kuma Monitoring Learning Path Roadmap

#!/bin/bash

# dba_automation.sh — Daily DBA Tasks



# === 1. Automated Backup ===

BACKUP_DIR="/backup/postgresql/$(date +%Y%m%d)"

mkdir -p "$BACKUP_DIR"



# Full backup with pg_dump

pg_dump -h localhost -U postgres -Fc mydb > "$BACKUP_DIR/mydb.dump"



# Verify backup

pg_restore --list "$BACKUP_DIR/mydb.dump" > /dev/null 2>&1

if [ $? -eq 0 ]; then

    echo "Backup verified: $BACKUP_DIR/mydb.dump"

else

    echo "ERROR: Backup verification failed!"

fi



# Clean old backups (keep 7 days)

find /backup/postgresql -maxdepth 1 -mtime +7 -type d -exec rm -rf {} \;



# === 2. Automated VACUUM ===

psql -U postgres -d mydb -c "

SELECT schemaname || '.' || relname AS table_name,

       n_dead_tup,

       last_vacuum,

       last_autovacuum

FROM pg_stat_user_tables

WHERE n_dead_tup > 10000

ORDER BY n_dead_tup DESC;" > /tmp/vacuum_candidates.txt



# Vacuum tables with high dead rows

psql -U postgres -d mydb -c "

DO \$\$

DECLARE

    r RECORD;

BEGIN

    FOR r IN SELECT schemaname, relname

             FROM pg_stat_user_tables

             WHERE n_dead_tup > 10000

    LOOP

        EXECUTE 'VACUUM ANALYZE ' || r.schemaname || '.' || r.relname;

        RAISE NOTICE 'Vacuumed: %.%', r.schemaname, r.relname;

    END LOOP;

END \$\$;"



# === 3. Index Maintenance ===

# Reindex bloated indexes

psql -U postgres -d mydb -c "

SELECT indexrelname, pg_size_pretty(pg_relation_size(indexrelid)) AS size

FROM pg_stat_user_indexes

WHERE pg_relation_size(indexrelid) > 100 * 1024 * 1024

ORDER BY pg_relation_size(indexrelid) DESC;" > /tmp/large_indexes.txt



# === 4. Health Check Report ===

echo "=== Database Health Report $(date) ===" > /tmp/db_health.txt



# Connection count

psql -U postgres -c "

SELECT state, count(*)

FROM pg_stat_activity

GROUP BY state;" >> /tmp/db_health.txt



# Database sizes

psql -U postgres -c "

SELECT datname, pg_size_pretty(pg_database_size(datname))

FROM pg_database

WHERE datistemplate = false

ORDER BY pg_database_size(datname) DESC;" >> /tmp/db_health.txt



# Replication status

psql -U postgres -c "

SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn

FROM pg_stat_replication;" >> /tmp/db_health.txt



# Cache hit ratio

psql -U postgres -d mydb -c "

SELECT ROUND(sum(heap_blks_hit)::numeric /

    NULLIF(sum(heap_blks_hit) + sum(heap_blks_read), 0) * 100, 2)

    AS cache_hit_ratio

FROM pg_statio_user_tables;" >> /tmp/db_health.txt



echo "Health report: /tmp/db_health.txt"



# === 5. Crontab Setup ===

# Daily backup at 2 AM

# 0 2 * * * /opt/scripts/dba_automation.sh backup >> /var/log/dba_backup.log 2>&1

# 

# Vacuum every 6 hours

# 0 */6 * * * /opt/scripts/dba_automation.sh vacuum >> /var/log/dba_vacuum.log 2>&1

# 

# Health check every hour

# 0 * * * * /opt/scripts/dba_automation.sh health >> /var/log/dba_health.log 2>&1



echo "DBA automation complete"

Career Path และ Certifications

เส้นทางอาชีพและ certifications สำหรับ Database Analyst

# === Database Analyst Career Path ===



# 1. Entry Level: Junior DBA / Database Administrator

# ===================================

# Skills: SQL basics, backup/restore, user management

# Experience: 0-2 years

# Salary (Thailand): 25,000-45,000 THB/month

# Salary (US): $55,000-$75,000/year



# 2. Mid Level: Database Analyst / Senior DBA

# ===================================

# Skills: Performance tuning, replication, HA setup

# Experience: 2-5 years

# Salary (Thailand): 45,000-80,000 THB/month

# Salary (US): $75,000-$110,000/year



# 3. Senior Level: Lead DBA / Database Architect

# ===================================

# Skills: Architecture design, capacity planning, migration

# Experience: 5-10 years

# Salary (Thailand): 80,000-150,000 THB/month

# Salary (US): $110,000-$160,000/year



# 4. Expert Level: Principal DBA / Data Platform Engineer

# ===================================

# Skills: Multi-cloud, automation, team leadership

# Experience: 10+ years

# Salary (Thailand): 150,000+ THB/month

# Salary (US): $160,000+/year



# === Certifications ===

# PostgreSQL:

# - EDB PostgreSQL Associate Certification

# - EDB PostgreSQL Professional Certification



# MySQL:

# - Oracle MySQL Database Administrator Certified Professional



# AWS:

# - AWS Certified Database - Specialty (DBS-C01)



# Azure:

# - Microsoft Certified: Azure Database Administrator Associate (DP-300)



# Google Cloud:

# - Google Cloud Professional Data Engineer



# Oracle:

# - Oracle Database Administration Certified Professional



# === Learning Path ===

# Month 1-3: SQL mastery + PostgreSQL/MySQL basics

# Month 4-6: Performance tuning + monitoring

# Month 7-9: Replication + HA + backup strategies

# Month 10-12: Cloud databases + automation + first certification



# === Skills to Learn in 2025 ===

# - Database-as-Code (Terraform, Pulumi for DB)

# - Kubernetes operators (CloudNativePG, Percona)

# - Observability (OpenTelemetry for databases)

# - AI/ML integration (vector databases, pgvector)

# - Multi-cloud database management



echo "Career path documented"

FAQ คำถามที่พบบ่อย

Q: Database Analyst ต้องเขียน code ไหม?

A: ต้องเขียน SQL เป็นหลัก (ทั้ง queries, stored procedures, triggers) และ scripting สำหรับ automation เช่น Bash, Python, PowerShell ไม่จำเป็นต้องเขียน application code แต่ต้องเข้าใจ ORMs และ query patterns ที่ applications ใช้ ยิ่งเขียน code ได้มากยิ่งดี เพราะ modern DBA ต้อง automate ทุกอย่าง Infrastructure-as-Code, CI/CD สำหรับ database migrations, monitoring scripts

แนะนำเพิ่มเติม — สัญญาณเทรดรายวัน XM Signal

Q: PostgreSQL กับ MySQL ควรเลือกเรียนตัวไหน?

เนื้อหาเกี่ยวข้อง — แนะนำให้อ่าน มีเงิน 30000 ลงทุนอะไรดี —

A: PostgreSQL แนะนำมากกว่าสำหรับ career เพราะ feature-rich กว่า (JSON support, full-text search, CTEs, window functions, extensions), compliance ดีกว่า, ใช้ใน enterprise มากขึ้นเรื่อยๆ MySQL ยังคง popular สำหรับ web applications (WordPress, PHP frameworks) และ cloud services (Amazon Aurora MySQL) เรียน PostgreSQL เป็นหลัก แล้วเสริม MySQL จะ cover ได้กว้างที่สุด

Q: Cloud DBA กับ On-Premise DBA ต่างกันอย่างไร?

A: Cloud DBA เน้น managed services (RDS, Cloud SQL) ไม่ต้องดูแล OS, patching, hardware แต่ต้องเข้าใจ cloud pricing, networking, IAM, multi-region setups On-Premise DBA ต้องดูแลทั้ง hardware, OS, database software ตรงจนถึง application layer ปัจจุบัน trend เป็น hybrid ต้องรู้ทั้งสอง แต่ cloud skills มี demand สูงกว่า

เนื้อหาเกี่ยวข้อง — แนะนำให้อ่าน Java Spring Security Technical Debt Management

Q: เงินเดือน Database Analyst ในไทยเป็นอย่างไร?

A: เริ่มต้น 25,000-35,000 บาทสำหรับ junior (0-2 ปี), 45,000-80,000 บาทสำหรับ mid-level (3-5 ปี), 80,000-150,000+ บาทสำหรับ senior (5+ ปี) ปัจจัยที่ทำให้เงินเดือนสูงขึ้น ได้แก่ cloud certifications (AWS/Azure), Oracle DBA experience, ความสามารถด้าน performance tuning, ประสบการณ์กับ high-traffic systems และ leadership skills บริษัทต่างชาติใน Bangkok จ่ายสูงกว่า 20-50%