ai

Pulumi IaC Post-mortem Analysis — วิเคราะห์ปัญหา Infrastructure as Code

Pulumi IaC Post-mortem Analysis — วิเคราะห์ปัญหา Infrastructure as Code

Pulumi IaC Post-mortem

Pulumi IaC Post-mortem Analysis — วิเคราะห์ปัญหา Infrastructure as Code

Pulumi Infrastructure as Code Python TypeScript Post-mortem Analysis Root Cause Blameless Drift Detection State Management AWS Azure GCP Kubernetes Policy as Code

IaC ToolLanguageStateLearning CurveTesting
PulumiPython/TS/Go/C#Pulumi Cloud/S3ปานกลางUnit Test
TerraformHCLS3/Terraform Cloudต่ำ-ปานกลางTerratest
CDK (AWS)Python/TS/JavaCloudFormationปานกลางUnit Test
CrossplaneYAMLKubernetesสูงkubectl

Pulumi Infrastructure Code

=== Pulumi Python Infrastructure ===

pulumi new python

pip install pulumi-aws

__main__.py

import pulumi

import pulumi_aws as aws

# VPC

vpc = aws.ec2.Vpc("main-vpc",

cidr_block="10.0.0.0/16",

enable_dns_hostnames=True,

tags={"Name": "production-vpc", "Environment": "prod"},

)

# Subnets

public_subnet = aws.ec2.Subnet("public-subnet",

vpc_id=vpc.id,

cidr_block="10.0.1.0/24",

availability_zone="ap-southeast-1a",

map_public_ip_on_launch=True,

)

# Security Group

web_sg = aws.ec2.SecurityGroup("web-sg",

vpc_id=vpc.id,

ingress=[

{"protocol": "tcp", "from_port": 80, "to_port": 80,

"cidr_blocks": ["0.0.0.0/0"]},

{"protocol": "tcp", "from_port": 443, "to_port": 443,

"cidr_blocks": ["0.0.0.0/0"]},

],

egress=[

เนื้อหาเกี่ยวข้อง — ดูเพิ่มเติมเรื่อง Far คืออะไร — คู่มือฉบับสมบูรณ์ 2026

{"protocol": "-1", "from_port": 0, "to_port": 0,

"cidr_blocks": ["0.0.0.0/0"]},

],

)

# RDS

db = aws.rds.Instance("main-db",

engine="postgres",

engine_version="15",

instance_class="db.t3.medium",

แนะนำเพิ่มเติม — บทวิเคราะห์จาก XM Signal

allocated_storage=100,

db_name="production",

username="admin",

password=pulumi.Config().require_secret("db_password"),

skip_final_snapshot=False,

backup_retention_period=7,

multi_az=True,

)

pulumi.export("vpc_id", vpc.id)

pulumi.export("db_endpoint", db.endpoint)

CLI Commands

pulumi preview — ดู Changes ก่อน Deploy

pulumi up — Deploy Infrastructure

pulumi refresh — Detect Drift

pulumi destroy — ลบทั้งหมด

pulumi stack ls — ดู Stacks

from dataclasses import dataclass

from typing import List

@dataclass

class PulumiResource:

name: str

type: str

เนื้อหาเกี่ยวข้อง — ดูเพิ่มเติมเรื่อง yield farming vs staking คือ

status: str

provider: str

resources = [

PulumiResource("main-vpc", "aws:ec2:Vpc", "created", "aws"),

PulumiResource("public-subnet", "aws:ec2:Subnet", "created", "aws"),

PulumiResource("web-sg", "aws:ec2:SecurityGroup", "created", "aws"),

PulumiResource("main-db", "aws:rds:Instance", "created", "aws"),

PulumiResource("web-cluster", "aws:ecs:Cluster", "updated", "aws"),

PulumiResource("api-service", "aws:ecs:Service", "updated", "aws"),

]

print("=== Pulumi Stack Resources ===")

for r in resources:

print(f" [{r.status}] {r.name} ({r.type})")

Post-mortem Template

=== Post-mortem Analysis ===

Post-mortem Template

## Incident: Database Outage due to IaC Drift

**Date:** 2024-03-15

แนะนำเพิ่มเติม — อ่านเพิ่มเติมที่ SiamCafeBook

**Duration:** 45 minutes

**Severity:** P1 - Critical

**Author:** Platform Team

### Timeline

  • 14:00 — Alert: Database connection errors
  • 14:05 — On-call acknowledges, starts investigation
  • 14:10 — Found: Security Group rules changed manually
  • 14:15 — Root cause identified: Manual SG change blocked DB port
  • 14:20 — pulumi refresh to detect full drift
  • 14:25 — pulumi up to restore correct state
  • 14:30 — Verified: Database connections restored
  • 14:45 — All services healthy, incident resolved

### Root Cause

Engineer manually modified Security Group via AWS Console

to add temporary rule, accidentally deleted port 5432 rule

### Impact

  • 45 minutes downtime for all services using PostgreSQL
  • ~500 failed API requests
  • ~200 affected users

### Action Items

1. Enable AWS Config rule to detect SG changes

2. Add Pulumi Policy to prevent manual changes

3. Schedule drift detection every 15 minutes

4. Add database connectivity check to health checks

เนื้อหาเกี่ยวข้อง — trailing stop หุ้น — ข้อมูลครบถ้วน 2026

@dataclass

class PostMortem:

incident: str

date: str

duration: str

severity: str

root_cause: str

action_items: int

status: str

incidents = [

PostMortem("DB Outage (IaC Drift)", "2024-03-15", "45 min", "P1",

"Manual SG change deleted DB port rule", 4, "Resolved"),

PostMortem("SSL Cert Expired", "2024-02-20", "15 min", "P2",

"Certificate renewal not in IaC", 3, "Resolved"),

PostMortem("Wrong Instance Type", "2024-01-10", "2 hours", "P2",

"Typo in Pulumi config: t3.micro instead of t3.large", 2, "Resolved"),

PostMortem("State Lock Conflict", "2024-01-05", "30 min", "P3",

"Two engineers ran pulumi up simultaneously", 3, "Resolved"),

]

print("\n=== Post-mortem Registry ===")

for pm in incidents:

print(f" [{pm.severity}] {pm.incident} ({pm.date})")

print(f" Duration: {pm.duration} | Root Cause: {pm.root_cause}")

print(f" Actions: {pm.action_items} | Status: {pm.status}")

Drift Detection และ Prevention

=== Drift Detection & Prevention ===

Automated Drift Detection

GitHub Actions — Run every 15 min

name: Drift Detection

on:

schedule:

  • cron: '*/15 * * * *'

jobs:

Pulumi IaC Post-mortem Analysis — วิเคราะห์ปัญหา Infrastructure as Code

detect-drift:

runs-on: ubuntu-latest

steps:

  • uses: actions/checkout@v4
  • uses: pulumi/actions@v5

with:

command: refresh

stack-name: production

expect-no-changes: true

เนื้อหาเกี่ยวข้อง — บทความที่เกี่ยวข้อง: โทร 02 888 8888 — ข้อมูลครบถ้วน 2026

env:

PULUMI_ACCESS_TOKEN: }

Pulumi Policy (CrossGuard)

from pulumi_policy import (

EnforcementLevel, PolicyPack, ResourceValidationPolicy

)

def no_public_s3(args, report_violation):

if args.resource_type == "aws:s3:Bucket":

acl = args.props.get("acl")

if acl == "public-read" or acl == "public-read-write":

report_violation("S3 buckets must not be public")

PolicyPack("security-policies", policies=[

ResourceValidationPolicy(

name="no-public-s3",

description="Prevent public S3 buckets",

validate=no_public_s3,

enforcement_level=EnforcementLevel.MANDATORY,

),

])

prevention = {

"Drift Detection": "pulumi refresh ทุก 15 นาที Alert ถ้าพบ Drift",

"Policy as Code": "CrossGuard ป้องกัน Misconfiguration",

"State Locking": "ล็อค State ป้องกัน Concurrent Update",

"Code Review": "PR Review ทุก Infrastructure Change",

"Testing": "Unit Test + Integration Test ก่อน Deploy",

"Audit Log": "บันทึกทุก Change ใคร ทำอะไร เมื่อไหร่",

"Rollback Plan": "มีแผน Rollback ทุก Deploy",

"No Manual Changes": "ห้ามแก้ผ่าน Console ทำผ่าน Code เท่านั้น",

}

print("Prevention Strategies:")

for strategy, desc in prevention.items():

print(f" [{strategy}]: {desc}")

เคล็ดลับ

  • Preview: pulumi preview ทุกครั้งก่อน pulumi up
  • Drift: ตรวจ Drift อัตโนมัติทุก 15 นาที
  • Blameless: Post-mortem ต้อง Blameless โฟกัสที่ระบบ ไม่โทษคน
  • Policy: ใช้ CrossGuard ป้องกัน Misconfiguration
  • Test: เขียน Unit Test สำหรับ Infrastructure Code

Pulumi คืออะไร

IaC Platform ภาษา Programming จริง Python TypeScript Go AWS Azure GCP Kubernetes State Management Preview Policy Unit Test

XM Legend · เทรดเดอร์ & ผู้สอน Forex 13 ปี

ผู้ก่อตั้ง SiamCafe ตั้งแต่ปี 1997 · เทรดเดอร์สาย Forex มากกว่า 13 ปี ได้รับการยกย่องเป็น XM Legend · แบ่งปันความรู้ Forex, ไอที, AI และการเทรด จากประสบการณ์จริงในตลาดจริง