Semgrep SAST Internal Developer Platform
Semgrep SAST Internal Developer Platform คืออะไร
Semgrep เป็น open source Static Application Security Testing (SAST) tool ที่ใช้ pattern matching ในการค้นหา vulnerabilities, bugs และ code smells ใน source code รองรับ 30+ ภาษาโปรแกรม Internal Developer Platform (IDP) คือ platform ที่สร้างขึ้นภายในองค์กรเพื่อให้ developers มี self-service tools สำหรับ build, deploy และ operate applications การรวม Semgrep เข้ากับ IDP ช่วยให้ security scanning เป็นส่วนหนึ่งของ developer workflow โดยอัตโนมัติ ลด friction และเพิ่ม security coverage ทั้งองค์กร
Semgrep Architecture ใน IDP
# semgrep_idp.py — Semgrep in Internal Developer Platform
import json
class SemgrepIDP:
COMPONENTS = {
"semgrep_cli": {
"name": "Semgrep CLI",
"role": "Local scanning สำหรับ developers (pre-commit, IDE)",
"install": "pip install semgrep / brew install semgrep",
},
"semgrep_ci": {
"name": "Semgrep CI",
"role": "Automated scanning ใน CI/CD pipeline",
"install": "Docker image / GitHub Action / GitLab CI",
},
"semgrep_cloud": {
"name": "Semgrep Cloud Platform",
"role": "Dashboard, policy management, findings triage",
"install": "semgrep.dev (SaaS)",
},
"custom_rules": {
"name": "Custom Rules Repository",
"role": "Organization-specific security rules",
"install": "Git repo with .yaml rule files",
},
"idp_integration": {
"name": "IDP Integration Layer",
"role": "เชื่อม Semgrep กับ service catalog, scaffolding, CI templates",
"install": "Backstage plugin / custom API",
},
}
ARCHITECTURE = """
[Developer] → [IDE Plugin] → Semgrep scan locally
↓
[Git Push] → [CI/CD Pipeline]
↓
[Semgrep CI] → scan → [Semgrep Cloud]
↓ ↓
[PR Comment] [Dashboard]
(findings) (org-wide view)
↓
[IDP Portal (Backstage)]
- Security score per service
- Findings dashboard
- Auto-remediation suggestions
"""
def show_components(self):
print("=== Semgrep IDP Components ===\n")
for key, comp in self.COMPONENTS.items():
print(f"[{comp['name']}]")
print(f" Role: {comp['role']}")
print()
def show_architecture(self):
print("=== Architecture ===")
print(self.ARCHITECTURE)
arch = SemgrepIDP()
arch.show_components()
arch.show_architecture()
Semgrep Rules & Configuration
# rules.py — Semgrep rules and configuration
import json
class SemgrepRules:
RULE_EXAMPLE = """
# .semgrep/rules/sql-injection.yaml
rules:
- id: python-sql-injection
patterns:
- pattern: |
cursor.execute($QUERY % ...)
- pattern: |
cursor.execute($QUERY.format(...))
- pattern: |
cursor.execute(f"...{$VAR}...")
message: >
SQL Injection detected. Use parameterized queries instead.
Bad: cursor.execute(f"SELECT * FROM users WHERE id={user_id}")
Good: cursor.execute("SELECT * FROM users WHERE id=%s", (user_id,))
languages: [python]
severity: ERROR
metadata:
cwe: ["CWE-89"]
owasp: ["A03:2021"]
confidence: HIGH
- id: hardcoded-secret
patterns:
- pattern: |
$KEY = "..."
- metavariable-regex:
metavariable: $KEY
regex: (password|secret|api_key|token|private_key)
message: "Hardcoded secret detected in variable '$KEY'"
languages: [python, javascript, typescript, java]
severity: WARNING
metadata:
cwe: ["CWE-798"]
"""
SEMGREP_CONFIG = """
# .semgrep.yml — Project configuration
rules:
# Use Semgrep registry rules
- p/python
- p/javascript
- p/owasp-top-ten
- p/security-audit
# Custom org rules
- r/company-rules
# Ignore patterns
exclude:
- "tests/**"
- "vendor/**"
- "node_modules/**"
- "*.min.js"
- "migrations/**"
"""
RULE_CATEGORIES = {
"security": {"name": "Security", "examples": ["SQL Injection", "XSS", "SSRF", "Path Traversal"], "severity": "ERROR"},
"best_practices": {"name": "Best Practices", "examples": ["Error handling", "Input validation", "Logging"], "severity": "WARNING"},
"performance": {"name": "Performance", "examples": ["N+1 queries", "Unnecessary loops", "Memory leaks"], "severity": "INFO"},
"compliance": {"name": "Compliance", "examples": ["PII handling", "GDPR data retention", "Encryption"], "severity": "ERROR"},
}
def show_rule(self):
print("=== Custom Rule Example ===")
print(self.RULE_EXAMPLE[:500])
def show_config(self):
print(f"\n=== Project Config ===")
print(self.SEMGREP_CONFIG[:400])
def show_categories(self):
print(f"\n=== Rule Categories ===")
for key, cat in self.RULE_CATEGORIES.items():
print(f" [{cat['name']}] {cat['severity']}: {', '.join(cat['examples'][:3])}")
rules = SemgrepRules()
rules.show_rule()
rules.show_config()
rules.show_categories()
CI/CD Pipeline Integration
# cicd.py — Semgrep in CI/CD
import json
class SemgrepCICD:
GITHUB_ACTION = """
# .github/workflows/semgrep.yml
name: Semgrep Security Scan
on:
pull_request: {}
push:
branches: [main, develop]
jobs:
semgrep:
runs-on: ubuntu-latest
container:
image: semgrep/semgrep
steps:
- uses: actions/checkout@v4
- name: Run Semgrep
env:
SEMGREP_APP_TOKEN: }
run: semgrep ci
- name: Run with custom rules
run: |
semgrep scan \\
--config p/owasp-top-ten \\
--config .semgrep/ \\
--sarif --output results.sarif
- name: Upload SARIF
if: always()
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: results.sarif
"""
GITLAB_CI = """
# .gitlab-ci.yml
semgrep:
image: semgrep/semgrep
stage: test
script:
- semgrep ci
variables:
SEMGREP_APP_TOKEN: $SEMGREP_APP_TOKEN
rules:
- if: $CI_MERGE_REQUEST_IID
- if: $CI_COMMIT_BRANCH == "main"
"""
IDP_TEMPLATE = """
# IDP CI Template (reusable across all services)
# backstage/templates/semgrep-scan.yaml
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: semgrep-enabled-service
title: Service with Semgrep Security
spec:
steps:
- id: fetch
action: fetch:template
input:
url: ./skeleton
values:
semgrep_rules: }
- id: publish
action: publish:github
"""
def show_github(self):
print("=== GitHub Actions ===")
print(self.GITHUB_ACTION[:500])
def show_gitlab(self):
print(f"\n=== GitLab CI ===")
print(self.GITLAB_CI[:300])
def show_template(self):
print(f"\n=== IDP Template ===")
print(self.IDP_TEMPLATE[:400])
cicd = SemgrepCICD()
cicd.show_github()
cicd.show_gitlab()
cicd.show_template()
IDP Dashboard & Metrics
# dashboard.py — Security dashboard for IDP
import json
import random
class SecurityDashboard:
def org_overview(self):
print("=== Organization Security Overview ===\n")
metrics = {
"Total repositories scanned": random.randint(50, 200),
"Rules enabled": random.randint(200, 500),
"Open findings": random.randint(20, 200),
"Critical/High": random.randint(5, 30),
"Medium": random.randint(10, 80),
"Low/Info": random.randint(20, 100),
"MTTR (avg)": f"{random.randint(12, 72)} hours",
"Coverage": f"{random.randint(85, 99)}%",
}
for m, v in metrics.items():
print(f" {m}: {v}")
def service_scores(self):
print(f"\n=== Service Security Scores ===")
services = [
{"name": "user-service", "score": random.randint(80, 100), "findings": random.randint(0, 10)},
{"name": "payment-api", "score": random.randint(70, 95), "findings": random.randint(2, 15)},
{"name": "auth-service", "score": random.randint(85, 100), "findings": random.randint(0, 5)},
{"name": "notification-svc", "score": random.randint(60, 90), "findings": random.randint(5, 20)},
{"name": "admin-portal", "score": random.randint(50, 85), "findings": random.randint(10, 30)},
]
for svc in sorted(services, key=lambda x: x["score"], reverse=True):
bar = "█" * (svc["score"] // 5)
print(f" {svc['name']:<20} Score: {svc['score']:>3}/100 | Findings: {svc['findings']:>3} {bar}")
def top_findings(self):
print(f"\n=== Top Findings ===")
findings = [
{"rule": "hardcoded-secret", "count": random.randint(5, 20), "severity": "HIGH"},
{"rule": "sql-injection", "count": random.randint(2, 10), "severity": "CRITICAL"},
{"rule": "xss-reflected", "count": random.randint(3, 15), "severity": "HIGH"},
{"rule": "insecure-deserialization", "count": random.randint(1, 5), "severity": "CRITICAL"},
{"rule": "missing-auth-check", "count": random.randint(5, 25), "severity": "MEDIUM"},
]
for f in sorted(findings, key=lambda x: x["count"], reverse=True):
print(f" [{f['severity']:>8}] {f['rule']:<30} × {f['count']}")
dash = SecurityDashboard()
dash.org_overview()
dash.service_scores()
dash.top_findings()
Developer Self-Service
# self_service.py — Developer self-service security
import json
class DeveloperSelfService:
FEATURES = {
"pre_commit": {
"name": "Pre-commit Hook",
"description": "สแกนก่อน commit ป้องกัน secrets และ vulns เข้า repo",
"setup": """
# .pre-commit-config.yaml
repos:
- repo: https://github.com/semgrep/semgrep
rev: v1.60.0
hooks:
- id: semgrep
args: ['--config', 'p/secrets', '--config', '.semgrep/', '--error']
""",
},
"ide_plugin": {
"name": "IDE Integration",
"description": "Real-time scanning ขณะเขียน code",
"setup": "VS Code: Semgrep extension | IntelliJ: Semgrep plugin",
},
"auto_fix": {
"name": "Auto-fix Suggestions",
"description": "Semgrep แนะนำวิธีแก้ไขอัตโนมัติ",
"setup": "semgrep scan --autofix (apply fixes automatically)",
},
"ignore": {
"name": "Triage & Ignore",
"description": "Developers สามารถ triage findings (false positive, won't fix)",
"setup": "# nosemgrep: rule-id (inline ignore) หรือ triage ใน Semgrep Cloud",
},
}
GOLDEN_PATH = """
Developer Golden Path (IDP):
1. สร้าง service ใหม่ → IDP scaffold → Semgrep config included
2. เขียน code → IDE plugin สแกน real-time
3. git commit → pre-commit hook สแกน
4. git push → CI pipeline สแกน (Semgrep CI)
5. PR → Semgrep comment ที่ findings
6. Fix → re-scan → merge
7. Deploy → production scan (scheduled)
8. Dashboard → security score ต่อ service
"""
def show_features(self):
print("=== Developer Self-Service ===\n")
for key, feature in self.FEATURES.items():
print(f"[{feature['name']}]")
print(f" {feature['description']}")
print()
def show_golden_path(self):
print("=== Golden Path ===")
print(self.GOLDEN_PATH)
ss = DeveloperSelfService()
ss.show_features()
ss.show_golden_path()
FAQ - คำถามที่พบบ่อย
Q: Semgrep กับ SonarQube อันไหนดีสำหรับ IDP?
A: Semgrep: เร็วกว่า, rule authoring ง่ายกว่า (YAML), CI-native, open source SonarQube: features เยอะกว่า (code quality + security), IDE integration ดี ใช้ Semgrep: security-focused, ต้องการ custom rules, lightweight ใช้ SonarQube: ต้องการ code quality + security รวม, enterprise หลายทีมใช้ทั้งคู่: Semgrep สำหรับ security, SonarQube สำหรับ code quality
Q: Custom rules เขียนยากไหม?
A: ง่ายมาก Semgrep ใช้ pattern matching ที่อ่านง่าย เขียน rule ใหม่ได้ใน 5-10 นาที ใช้ semgrep.dev/playground สำหรับ test rules ดู community rules เป็นตัวอย่าง (4,000+ rules) ทีม security เขียน rules ทีม dev ใช้ — ไม่ต้อง dev เขียนเอง
Q: Semgrep สแกนช้าไหม?
A: เร็วมาก สแกน repo ขนาดกลาง (100K lines): 10-30 วินาที เร็วกว่า SonarQube, CodeQL, Checkmarx มาก เหมาะสำหรับ CI pipeline (ไม่ทำให้ build ช้า) Incremental scan: สแกนเฉพาะ diff (เร็วขึ้นอีก)
Q: Backstage plugin สำหรับ Semgrep มีไหม?
A: Official plugin ยังไม่มี แต่สร้างได้ง่าย ใช้ Semgrep API ดึง findings มาแสดงใน Backstage Entity page Community plugins มีตัวอย่างบน GitHub หรือใช้ TechDocs integration แสดง security report