Semgrep SAST Internal Developer Platform คืออะไร
Semgrep เป็น open source Static Application Security Testing (SAST) tool ที่ใช้ pattern matching ในการค้นหา vulnerabilities, bugs และ code smells ใน source code รองรับ 30+ ภาษาโปรแกรม Internal Developer Platform (IDP) คือ platform ที่สร้างขึ้นภายในองค์กรเพื่อให้ developers มี self-service tools สำหรับ build, deploy และ operate applications การรวม Semgrep เข้ากับ IDP ช่วยให้ security scanning เป็นส่วนหนึ่งของ developer workflow โดยอัตโนมัติ ลด friction และเพิ่ม security coverage ทั้งองค์กร
Semgrep Architecture ใน IDP
# semgrep_idp.py — Semgrep in Internal Developer Platform
import json
class SemgrepIDP:
COMPONENTS = {
"semgrep_cli": {
"name": "Semgrep CLI",
"role": "Local scanning สำหรับ developers (pre-commit, IDE)",
"install": "pip install semgrep / brew install semgrep",
},
"semgrep_ci": {
"name": "Semgrep CI",
"role": "Automated scanning ใน CI/CD pipeline",
"install": "Docker image / GitHub Action / GitLab CI",
},
"semgrep_cloud": {
"name": "Semgrep Cloud Platform",
"role": "Dashboard, policy management, findings triage",
"install": "semgrep.dev (SaaS)",
},
"custom_rules": {
"name": "Custom Rules Repository",
"role": "Organization-specific security rules",
"install": "Git repo with .yaml rule files",
},
"idp_integration": {
"name": "IDP Integration Layer",
"role": "เชื่อม Semgrep กับ service catalog, scaffolding, CI templates",
"install": "Backstage plugin / custom API",
},
}
ARCHITECTURE = """
[Developer] → [IDE Plugin] → Semgrep scan locally
↓
[Git Push] → [CI/CD Pipeline]
↓
[Semgrep CI] → scan → [Semgrep Cloud]
↓ ↓
[PR Comment] [Dashboard]
(findings) (org-wide view)
↓
[IDP Portal (Backstage)]
- Security score per service
- Findings dashboard
- Auto-remediation suggestions
"""
def show_components(self):
print("=== Semgrep IDP Components ===\n")
for key, comp in self.COMPONENTS.items():
print(f"[{comp['name']}]")
print(f" Role: {comp['role']}")
print()
def show_architecture(self):
print("=== Architecture ===")
print(self.ARCHITECTURE)
arch = SemgrepIDP()
arch.show_components()
arch.show_architecture()
Semgrep Rules & Configuration
# rules.py — Semgrep rules and configuration
import json
class SemgrepRules:
RULE_EXAMPLE = """
# .semgrep/rules/sql-injection.yaml
rules:
- id: python-sql-injection
patterns:
- pattern: |
cursor.execute($QUERY % ...)
- pattern: |
cursor.execute($QUERY.format(...))
- pattern: |
cursor.execute(f"...{$VAR}...")
message: >
SQL Injection detected. Use parameterized queries instead.
Bad: cursor.execute(f"SELECT * FROM users WHERE id={user_id}")
Good: cursor.execute("SELECT * FROM users WHERE id=%s", (user_id,))
languages: [python]
severity: ERROR
metadata:
cwe: ["CWE-89"]
owasp: ["A03:2021"]
confidence: HIGH
- id: hardcoded-secret
patterns:
- pattern: |
$KEY = "..."
- metavariable-regex:
metavariable: $KEY
regex: (password|secret|api_key|token|private_key)
message: "Hardcoded secret detected in variable '$KEY'"
languages: [python, javascript, typescript, java]
severity: WARNING
metadata:
cwe: ["CWE-798"]
"""
SEMGREP_CONFIG = """
# .semgrep.yml — Project configuration
rules:
# Use Semgrep registry rules
- p/python
- p/javascript
- p/owasp-top-ten
- p/security-audit
# Custom org rules
- r/company-rules
# Ignore patterns
exclude:
- "tests/**"
- "vendor/**"
- "node_modules/**"
- "*.min.js"
- "migrations/**"
"""
RULE_CATEGORIES = {
"security": {"name": "Security", "examples": ["SQL Injection", "XSS", "SSRF", "Path Traversal"], "severity": "ERROR"},
"best_practices": {"name": "Best Practices", "examples": ["Error handling", "Input validation", "Logging"], "severity": "WARNING"},
"performance": {"name": "Performance", "examples": ["N+1 queries", "Unnecessary loops", "Memory leaks"], "severity": "INFO"},
"compliance": {"name": "Compliance", "examples": ["PII handling", "GDPR data retention", "Encryption"], "severity": "ERROR"},
}
def show_rule(self):
print("=== Custom Rule Example ===")
print(self.RULE_EXAMPLE[:500])
def show_config(self):
print(f"\n=== Project Config ===")
print(self.SEMGREP_CONFIG[:400])
def show_categories(self):
print(f"\n=== Rule Categories ===")
for key, cat in self.RULE_CATEGORIES.items():
print(f" [{cat['name']}] {cat['severity']}: {', '.join(cat['examples'][:3])}")
rules = SemgrepRules()
rules.show_rule()
rules.show_config()
rules.show_categories()
CI/CD Pipeline Integration
# cicd.py — Semgrep in CI/CD
import json
class SemgrepCICD:
GITHUB_ACTION = """
# .github/workflows/semgrep.yml
name: Semgrep Security Scan
on:
pull_request: {}
push:
branches: [main, develop]
jobs:
semgrep:
runs-on: ubuntu-latest
container:
image: semgrep/semgrep
steps:
- uses: actions/checkout@v4
- name: Run Semgrep
env:
SEMGREP_APP_TOKEN: }
run: semgrep ci
- name: Run with custom rules
run: |
semgrep scan \\
--config p/owasp-top-ten \\
--config .semgrep/ \\
--sarif --output results.sarif
- name: Upload SARIF
if: always()
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: results.sarif
"""
GITLAB_CI = """
# .gitlab-ci.yml
semgrep:
image: semgrep/semgrep
stage: test
script:
- semgrep ci
variables:
SEMGREP_APP_TOKEN: $SEMGREP_APP_TOKEN
rules:
- if: $CI_MERGE_REQUEST_IID
- if: $CI_COMMIT_BRANCH == "main"
"""
IDP_TEMPLATE = """
# IDP CI Template (reusable across all services)
# backstage/templates/semgrep-scan.yaml
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: semgrep-enabled-service
title: Service with Semgrep Security
spec:
steps:
- id: fetch
action: fetch:template
input:
url: ./skeleton
values:
semgrep_rules: }
- id: publish
action: publish:github
"""
def show_github(self):
print("=== GitHub Actions ===")
print(self.GITHUB_ACTION[:500])
def show_gitlab(self):
print(f"\n=== GitLab CI ===")
print(self.GITLAB_CI[:300])
def show_template(self):
print(f"\n=== IDP Template ===")
print(self.IDP_TEMPLATE[:400])
cicd = SemgrepCICD()
cicd.show_github()
cicd.show_gitlab()
cicd.show_template()
IDP Dashboard & Metrics
# dashboard.py — Security dashboard for IDP
import json
import random
class SecurityDashboard:
def org_overview(self):
print("=== Organization Security Overview ===\n")
metrics = {
"Total repositories scanned": random.randint(50, 200),
"Rules enabled": random.randint(200, 500),
"Open findings": random.randint(20, 200),
"Critical/High": random.randint(5, 30),
"Medium": random.randint(10, 80),
"Low/Info": random.randint(20, 100),
"MTTR (avg)": f"{random.randint(12, 72)} hours",
"Coverage": f"{random.randint(85, 99)}%",
}
for m, v in metrics.items():
print(f" {m}: {v}")
def service_scores(self):
print(f"\n=== Service Security Scores ===")
services = [
{"name": "user-service", "score": random.randint(80, 100), "findings": random.randint(0, 10)},
{"name": "payment-api", "score": random.randint(70, 95), "findings": random.randint(2, 15)},
{"name": "auth-service", "score": random.randint(85, 100), "findings": random.randint(0, 5)},
{"name": "notification-svc", "score": random.randint(60, 90), "findings": random.randint(5, 20)},
{"name": "admin-portal", "score": random.randint(50, 85), "findings": random.randint(10, 30)},
]
for svc in sorted(services, key=lambda x: x["score"], reverse=True):
bar = "█" * (svc["score"] // 5)
print(f" {svc['name']:<20} Score: {svc['score']:>3}/100 | Findings: {svc['findings']:>3} {bar}")
def top_findings(self):
print(f"\n=== Top Findings ===")
findings = [
{"rule": "hardcoded-secret", "count": random.randint(5, 20), "severity": "HIGH"},
{"rule": "sql-injection", "count": random.randint(2, 10), "severity": "CRITICAL"},
{"rule": "xss-reflected", "count": random.randint(3, 15), "severity": "HIGH"},
{"rule": "insecure-deserialization", "count": random.randint(1, 5), "severity": "CRITICAL"},
{"rule": "missing-auth-check", "count": random.randint(5, 25), "severity": "MEDIUM"},
]
for f in sorted(findings, key=lambda x: x["count"], reverse=True):
print(f" [{f['severity']:>8}] {f['rule']:<30} × {f['count']}")
dash = SecurityDashboard()
dash.org_overview()
dash.service_scores()
dash.top_findings()
Developer Self-Service
# self_service.py — Developer self-service security
import json
class DeveloperSelfService:
FEATURES = {
"pre_commit": {
"name": "Pre-commit Hook",
"description": "สแกนก่อน commit ป้องกัน secrets และ vulns เข้า repo",
"setup": """
# .pre-commit-config.yaml
repos:
- repo: https://github.com/semgrep/semgrep
rev: v1.60.0
hooks:
- id: semgrep
args: ['--config', 'p/secrets', '--config', '.semgrep/', '--error']
""",
},
"ide_plugin": {
"name": "IDE Integration",
"description": "Real-time scanning ขณะเขียน code",
"setup": "VS Code: Semgrep extension | IntelliJ: Semgrep plugin",
},
"auto_fix": {
"name": "Auto-fix Suggestions",
"description": "Semgrep แนะนำวิธีแก้ไขอัตโนมัติ",
"setup": "semgrep scan --autofix (apply fixes automatically)",
},
"ignore": {
"name": "Triage & Ignore",
"description": "Developers สามารถ triage findings (false positive, won't fix)",
"setup": "# nosemgrep: rule-id (inline ignore) หรือ triage ใน Semgrep Cloud",
},
}
GOLDEN_PATH = """
Developer Golden Path (IDP):
1. สร้าง service ใหม่ → IDP scaffold → Semgrep config included
2. เขียน code → IDE plugin สแกน real-time
3. git commit → pre-commit hook สแกน
4. git push → CI pipeline สแกน (Semgrep CI)
5. PR → Semgrep comment ที่ findings
6. Fix → re-scan → merge
7. Deploy → production scan (scheduled)
8. Dashboard → security score ต่อ service
"""
def show_features(self):
print("=== Developer Self-Service ===\n")
for key, feature in self.FEATURES.items():
print(f"[{feature['name']}]")
print(f" {feature['description']}")
print()
def show_golden_path(self):
print("=== Golden Path ===")
print(self.GOLDEN_PATH)
ss = DeveloperSelfService()
ss.show_features()
ss.show_golden_path()
FAQ - คำถามที่พบบ่อย
Q: Semgrep กับ SonarQube อันไหนดีสำหรับ IDP?
A: Semgrep: เร็วกว่า, rule authoring ง่ายกว่า (YAML), CI-native, open source SonarQube: features เยอะกว่า (code quality + security), IDE integration ดี ใช้ Semgrep: security-focused, ต้องการ custom rules, lightweight ใช้ SonarQube: ต้องการ code quality + security รวม, enterprise หลายทีมใช้ทั้งคู่: Semgrep สำหรับ security, SonarQube สำหรับ code quality
Q: Custom rules เขียนยากไหม?
A: ง่ายมาก Semgrep ใช้ pattern matching ที่อ่านง่าย เขียน rule ใหม่ได้ใน 5-10 นาที ใช้ semgrep.dev/playground สำหรับ test rules ดู community rules เป็นตัวอย่าง (4,000+ rules) ทีม security เขียน rules ทีม dev ใช้ — ไม่ต้อง dev เขียนเอง
Q: Semgrep สแกนช้าไหม?
A: เร็วมาก สแกน repo ขนาดกลาง (100K lines): 10-30 วินาที เร็วกว่า SonarQube, CodeQL, Checkmarx มาก เหมาะสำหรับ CI pipeline (ไม่ทำให้ build ช้า) Incremental scan: สแกนเฉพาะ diff (เร็วขึ้นอีก)
Q: Backstage plugin สำหรับ Semgrep มีไหม?
A: Official plugin ยังไม่มี แต่สร้างได้ง่าย ใช้ Semgrep API ดึง findings มาแสดงใน Backstage Entity page Community plugins มีตัวอย่างบน GitHub หรือใช้ TechDocs integration แสดง security report
