Semgrep SAST Internal Developer Platform

Semgrep SAST Internal Developer Platform คืออะไร

Semgrep เป็น open source Static Application Security Testing (SAST) tool ที่ใช้ pattern matching ในการค้นหา vulnerabilities, bugs และ code smells ใน source code รองรับ 30+ ภาษาโปรแกรม Internal Developer Platform (IDP) คือ platform ที่สร้างขึ้นภายในองค์กรเพื่อให้ developers มี self-service tools สำหรับ build, deploy และ operate applications การรวม Semgrep เข้ากับ IDP ช่วยให้ security scanning เป็นส่วนหนึ่งของ developer workflow โดยอัตโนมัติ ลด friction และเพิ่ม security coverage ทั้งองค์กร

Semgrep Architecture ใน IDP

# semgrep_idp.py — Semgrep in Internal Developer Platform
import json

class SemgrepIDP:
    COMPONENTS = {
        "semgrep_cli": {
            "name": "Semgrep CLI",
            "role": "Local scanning สำหรับ developers (pre-commit, IDE)",
            "install": "pip install semgrep / brew install semgrep",
        },
        "semgrep_ci": {
            "name": "Semgrep CI",
            "role": "Automated scanning ใน CI/CD pipeline",
            "install": "Docker image / GitHub Action / GitLab CI",
        },
        "semgrep_cloud": {
            "name": "Semgrep Cloud Platform",
            "role": "Dashboard, policy management, findings triage",
            "install": "semgrep.dev (SaaS)",
        },
        "custom_rules": {
            "name": "Custom Rules Repository",
            "role": "Organization-specific security rules",
            "install": "Git repo with .yaml rule files",
        },
        "idp_integration": {
            "name": "IDP Integration Layer",
            "role": "เชื่อม Semgrep กับ service catalog, scaffolding, CI templates",
            "install": "Backstage plugin / custom API",
        },
    }

    ARCHITECTURE = """
    [Developer] → [IDE Plugin] → Semgrep scan locally
         ↓
    [Git Push] → [CI/CD Pipeline]
         ↓
    [Semgrep CI] → scan → [Semgrep Cloud]
         ↓                      ↓
    [PR Comment]          [Dashboard]
    (findings)            (org-wide view)
         ↓
    [IDP Portal (Backstage)]
    - Security score per service
    - Findings dashboard
    - Auto-remediation suggestions
    """

    def show_components(self):
        print("=== Semgrep IDP Components ===\n")
        for key, comp in self.COMPONENTS.items():
            print(f"[{comp['name']}]")
            print(f"  Role: {comp['role']}")
            print()

    def show_architecture(self):
        print("=== Architecture ===")
        print(self.ARCHITECTURE)

arch = SemgrepIDP()
arch.show_components()
arch.show_architecture()

Semgrep Rules & Configuration

# rules.py — Semgrep rules and configuration
import json

class SemgrepRules:
    RULE_EXAMPLE = """
# .semgrep/rules/sql-injection.yaml
rules:
  - id: python-sql-injection
    patterns:
      - pattern: |
          cursor.execute($QUERY % ...)
      - pattern: |
          cursor.execute($QUERY.format(...))
      - pattern: |
          cursor.execute(f"...{$VAR}...")
    message: >
      SQL Injection detected. Use parameterized queries instead.
      Bad:  cursor.execute(f"SELECT * FROM users WHERE id={user_id}")
      Good: cursor.execute("SELECT * FROM users WHERE id=%s", (user_id,))
    languages: [python]
    severity: ERROR
    metadata:
      cwe: ["CWE-89"]
      owasp: ["A03:2021"]
      confidence: HIGH

  - id: hardcoded-secret
    patterns:
      - pattern: |
          $KEY = "..."
      - metavariable-regex:
          metavariable: $KEY
          regex: (password|secret|api_key|token|private_key)
    message: "Hardcoded secret detected in variable '$KEY'"
    languages: [python, javascript, typescript, java]
    severity: WARNING
    metadata:
      cwe: ["CWE-798"]
"""

    SEMGREP_CONFIG = """
# .semgrep.yml — Project configuration
rules:
  # Use Semgrep registry rules
  - p/python
  - p/javascript
  - p/owasp-top-ten
  - p/security-audit
  
  # Custom org rules
  - r/company-rules

# Ignore patterns
exclude:
  - "tests/**"
  - "vendor/**"
  - "node_modules/**"
  - "*.min.js"
  - "migrations/**"
"""

    RULE_CATEGORIES = {
        "security": {"name": "Security", "examples": ["SQL Injection", "XSS", "SSRF", "Path Traversal"], "severity": "ERROR"},
        "best_practices": {"name": "Best Practices", "examples": ["Error handling", "Input validation", "Logging"], "severity": "WARNING"},
        "performance": {"name": "Performance", "examples": ["N+1 queries", "Unnecessary loops", "Memory leaks"], "severity": "INFO"},
        "compliance": {"name": "Compliance", "examples": ["PII handling", "GDPR data retention", "Encryption"], "severity": "ERROR"},
    }

    def show_rule(self):
        print("=== Custom Rule Example ===")
        print(self.RULE_EXAMPLE[:500])

    def show_config(self):
        print(f"\n=== Project Config ===")
        print(self.SEMGREP_CONFIG[:400])

    def show_categories(self):
        print(f"\n=== Rule Categories ===")
        for key, cat in self.RULE_CATEGORIES.items():
            print(f"  [{cat['name']}] {cat['severity']}: {', '.join(cat['examples'][:3])}")

rules = SemgrepRules()
rules.show_rule()
rules.show_config()
rules.show_categories()

CI/CD Pipeline Integration

# cicd.py — Semgrep in CI/CD
import json

class SemgrepCICD:
    GITHUB_ACTION = """
# .github/workflows/semgrep.yml
name: Semgrep Security Scan
on:
  pull_request: {}
  push:
    branches: [main, develop]

jobs:
  semgrep:
    runs-on: ubuntu-latest
    container:
      image: semgrep/semgrep
    steps:
      - uses: actions/checkout@v4
      
      - name: Run Semgrep
        env:
          SEMGREP_APP_TOKEN: }
        run: semgrep ci
        
      - name: Run with custom rules
        run: |
          semgrep scan \\
            --config p/owasp-top-ten \\
            --config .semgrep/ \\
            --sarif --output results.sarif
      
      - name: Upload SARIF
        if: always()
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: results.sarif
"""

    GITLAB_CI = """
# .gitlab-ci.yml
semgrep:
  image: semgrep/semgrep
  stage: test
  script:
    - semgrep ci
  variables:
    SEMGREP_APP_TOKEN: $SEMGREP_APP_TOKEN
  rules:
    - if: $CI_MERGE_REQUEST_IID
    - if: $CI_COMMIT_BRANCH == "main"
"""

    IDP_TEMPLATE = """
# IDP CI Template (reusable across all services)
# backstage/templates/semgrep-scan.yaml
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
  name: semgrep-enabled-service
  title: Service with Semgrep Security
spec:
  steps:
    - id: fetch
      action: fetch:template
      input:
        url: ./skeleton
        values:
          semgrep_rules: }
    - id: publish
      action: publish:github
"""

    def show_github(self):
        print("=== GitHub Actions ===")
        print(self.GITHUB_ACTION[:500])

    def show_gitlab(self):
        print(f"\n=== GitLab CI ===")
        print(self.GITLAB_CI[:300])

    def show_template(self):
        print(f"\n=== IDP Template ===")
        print(self.IDP_TEMPLATE[:400])

cicd = SemgrepCICD()
cicd.show_github()
cicd.show_gitlab()
cicd.show_template()

IDP Dashboard & Metrics

# dashboard.py — Security dashboard for IDP
import json
import random

class SecurityDashboard:
    def org_overview(self):
        print("=== Organization Security Overview ===\n")
        metrics = {
            "Total repositories scanned": random.randint(50, 200),
            "Rules enabled": random.randint(200, 500),
            "Open findings": random.randint(20, 200),
            "Critical/High": random.randint(5, 30),
            "Medium": random.randint(10, 80),
            "Low/Info": random.randint(20, 100),
            "MTTR (avg)": f"{random.randint(12, 72)} hours",
            "Coverage": f"{random.randint(85, 99)}%",
        }
        for m, v in metrics.items():
            print(f"  {m}: {v}")

    def service_scores(self):
        print(f"\n=== Service Security Scores ===")
        services = [
            {"name": "user-service", "score": random.randint(80, 100), "findings": random.randint(0, 10)},
            {"name": "payment-api", "score": random.randint(70, 95), "findings": random.randint(2, 15)},
            {"name": "auth-service", "score": random.randint(85, 100), "findings": random.randint(0, 5)},
            {"name": "notification-svc", "score": random.randint(60, 90), "findings": random.randint(5, 20)},
            {"name": "admin-portal", "score": random.randint(50, 85), "findings": random.randint(10, 30)},
        ]
        for svc in sorted(services, key=lambda x: x["score"], reverse=True):
            bar = "█" * (svc["score"] // 5)
            print(f"  {svc['name']:<20} Score: {svc['score']:>3}/100 | Findings: {svc['findings']:>3} {bar}")

    def top_findings(self):
        print(f"\n=== Top Findings ===")
        findings = [
            {"rule": "hardcoded-secret", "count": random.randint(5, 20), "severity": "HIGH"},
            {"rule": "sql-injection", "count": random.randint(2, 10), "severity": "CRITICAL"},
            {"rule": "xss-reflected", "count": random.randint(3, 15), "severity": "HIGH"},
            {"rule": "insecure-deserialization", "count": random.randint(1, 5), "severity": "CRITICAL"},
            {"rule": "missing-auth-check", "count": random.randint(5, 25), "severity": "MEDIUM"},
        ]
        for f in sorted(findings, key=lambda x: x["count"], reverse=True):
            print(f"  [{f['severity']:>8}] {f['rule']:<30} × {f['count']}")

dash = SecurityDashboard()
dash.org_overview()
dash.service_scores()
dash.top_findings()

Developer Self-Service

# self_service.py — Developer self-service security
import json

class DeveloperSelfService:
    FEATURES = {
        "pre_commit": {
            "name": "Pre-commit Hook",
            "description": "สแกนก่อน commit ป้องกัน secrets และ vulns เข้า repo",
            "setup": """
# .pre-commit-config.yaml
repos:
  - repo: https://github.com/semgrep/semgrep
    rev: v1.60.0
    hooks:
      - id: semgrep
        args: ['--config', 'p/secrets', '--config', '.semgrep/', '--error']
""",
        },
        "ide_plugin": {
            "name": "IDE Integration",
            "description": "Real-time scanning ขณะเขียน code",
            "setup": "VS Code: Semgrep extension | IntelliJ: Semgrep plugin",
        },
        "auto_fix": {
            "name": "Auto-fix Suggestions",
            "description": "Semgrep แนะนำวิธีแก้ไขอัตโนมัติ",
            "setup": "semgrep scan --autofix (apply fixes automatically)",
        },
        "ignore": {
            "name": "Triage & Ignore",
            "description": "Developers สามารถ triage findings (false positive, won't fix)",
            "setup": "# nosemgrep: rule-id (inline ignore) หรือ triage ใน Semgrep Cloud",
        },
    }

    GOLDEN_PATH = """
    Developer Golden Path (IDP):
    
    1. สร้าง service ใหม่ → IDP scaffold → Semgrep config included
    2. เขียน code → IDE plugin สแกน real-time
    3. git commit → pre-commit hook สแกน
    4. git push → CI pipeline สแกน (Semgrep CI)
    5. PR → Semgrep comment ที่ findings
    6. Fix → re-scan → merge
    7. Deploy → production scan (scheduled)
    8. Dashboard → security score ต่อ service
    """

    def show_features(self):
        print("=== Developer Self-Service ===\n")
        for key, feature in self.FEATURES.items():
            print(f"[{feature['name']}]")
            print(f"  {feature['description']}")
            print()

    def show_golden_path(self):
        print("=== Golden Path ===")
        print(self.GOLDEN_PATH)

ss = DeveloperSelfService()
ss.show_features()
ss.show_golden_path()

FAQ - คำถามที่พบบ่อย

Q: Semgrep กับ SonarQube อันไหนดีสำหรับ IDP?

A: Semgrep: เร็วกว่า, rule authoring ง่ายกว่า (YAML), CI-native, open source SonarQube: features เยอะกว่า (code quality + security), IDE integration ดี ใช้ Semgrep: security-focused, ต้องการ custom rules, lightweight ใช้ SonarQube: ต้องการ code quality + security รวม, enterprise หลายทีมใช้ทั้งคู่: Semgrep สำหรับ security, SonarQube สำหรับ code quality

Q: Custom rules เขียนยากไหม?

A: ง่ายมาก Semgrep ใช้ pattern matching ที่อ่านง่าย เขียน rule ใหม่ได้ใน 5-10 นาที ใช้ semgrep.dev/playground สำหรับ test rules ดู community rules เป็นตัวอย่าง (4,000+ rules) ทีม security เขียน rules ทีม dev ใช้ — ไม่ต้อง dev เขียนเอง

Q: Semgrep สแกนช้าไหม?

A: เร็วมาก สแกน repo ขนาดกลาง (100K lines): 10-30 วินาที เร็วกว่า SonarQube, CodeQL, Checkmarx มาก เหมาะสำหรับ CI pipeline (ไม่ทำให้ build ช้า) Incremental scan: สแกนเฉพาะ diff (เร็วขึ้นอีก)

Q: Backstage plugin สำหรับ Semgrep มีไหม?

A: Official plugin ยังไม่มี แต่สร้างได้ง่าย ใช้ Semgrep API ดึง findings มาแสดงใน Backstage Entity page Community plugins มีตัวอย่างบน GitHub หรือใช้ TechDocs integration แสดง security report

Semgrep SAST Internal Developer Platform