SiamCafe.net Blog
Technology

Semgrep SAST Open Source Contribution เขียน Security Rules และ Contribute กลบ Community

semgrep sast open source contribution
Semgrep SAST Open Source Contribution | SiamCafe Blog
2025-11-14· อ. บอม — SiamCafe.net· 1,174 คำ

Semgrep SAST ?????????????????????

Semgrep ???????????? open source Static Application Security Testing (SAST) tool ?????????????????? pattern matching ??????????????????????????? source code ?????????????????????????????? security, bugs ????????? code quality issues ?????????????????????????????? 30 ???????????? ?????????????????? Python, JavaScript, TypeScript, Java, Go, Ruby, PHP

Semgrep ???????????????????????????????????? SAST tools ???????????? ????????? semantic pattern matching ?????????????????? regex ?????????????????? ????????????????????????????????????????????? code ??????????????? ???????????? ??????????????????????????? variable ???????????? user input ????????????????????? ??????????????? false positive ????????????????????? tools ???????????? ??????????????? rules ???????????????????????? syntax ???????????????????????? code ????????????

Open Source Contribution ??????????????????????????????????????????????????????????????????????????? Semgrep ecosystem ???????????? ??????????????? custom rules ???????????????????????? community, report bugs, improve documentation, contribute code fixes ?????????????????????????????????????????????????????????????????????????????? security, ??????????????? portfolio ????????????????????? community ?????????????????????

???????????????????????????????????????????????? Semgrep

Setup Semgrep ?????????????????? security scanning

# === Semgrep Installation & Usage ===

# 1. Install Semgrep
pip install semgrep

# Or via Docker
docker pull semgrep/semgrep

# Or via Homebrew (macOS)
brew install semgrep

# 2. Verify Installation
semgrep --version

# 3. Run with Default Rules (Semgrep Registry)
# Scan current directory with recommended rules
semgrep --config auto .

# Scan with specific rulesets
semgrep --config p/python .
semgrep --config p/javascript .
semgrep --config p/owasp-top-ten .
semgrep --config p/security-audit .

# 4. Common Security Rulesets
cat > semgrep_rulesets.yaml << 'EOF'
recommended_rulesets:
  general:
    - "p/security-audit"
    - "p/owasp-top-ten"
    - "p/secrets"
    
  python:
    - "p/python"
    - "p/flask"
    - "p/django"
    - "p/bandit"
    
  javascript:
    - "p/javascript"
    - "p/react"
    - "p/nextjs"
    - "p/express"
    - "p/nodejs"
    
  golang:
    - "p/golang"
    - "p/gorilla"
    
  java:
    - "p/java"
    - "p/spring"
    
  infrastructure:
    - "p/terraform"
    - "p/dockerfile"
    - "p/kubernetes"
EOF

# 5. Output Formats
semgrep --config auto --json . > semgrep-results.json
semgrep --config auto --sarif . > semgrep-results.sarif
semgrep --config auto --output results.txt .

# 6. Scan Options
semgrep --config auto \
  --severity ERROR \
  --exclude "test/*" \
  --exclude "node_modules/*" \
  --max-target-bytes 1000000 \
  --timeout 30 \
  --jobs 4 \
  .

echo "Semgrep setup complete"

??????????????? Custom Rules

??????????????? Semgrep rules ?????????

# === Custom Semgrep Rules ===

cat > custom_rules.yaml << 'EOF'
rules:
  # 1. Detect SQL Injection in Python
  - id: python-sql-injection
    pattern: |
      cursor.execute($QUERY % ...)
    message: "Possible SQL injection. Use parameterized queries instead of string formatting."
    severity: ERROR
    languages: [python]
    metadata:
      cwe: "CWE-89: SQL Injection"
      owasp: "A03:2021 - Injection"
      fix: "cursor.execute('SELECT * FROM users WHERE id = %s', (user_id,))"

  # 2. Detect Hardcoded Secrets
  - id: hardcoded-api-key
    patterns:
      - pattern: |
          $KEY = "..."
      - metavariable-regex:
          metavariable: $KEY
          regex: ".*(api_key|apikey|secret|password|token|auth).*"
      - metavariable-regex:
          metavariable: $VALUE
          regex: ".{10,}"
    message: "Hardcoded secret detected. Use environment variables or secret manager."
    severity: ERROR
    languages: [python, javascript, typescript]
    metadata:
      cwe: "CWE-798: Hard-coded Credentials"

  # 3. Detect Insecure HTTP
  - id: insecure-http-request
    patterns:
      - pattern: requests.get("http://...")
      - pattern: requests.post("http://...")
      - pattern: fetch("http://...")
    message: "Using HTTP instead of HTTPS. Use HTTPS for secure communication."
    severity: WARNING
    languages: [python, javascript]
    metadata:
      cwe: "CWE-319: Cleartext Transmission"

  # 4. Detect Missing Input Validation (Flask)
  - id: flask-no-input-validation
    pattern: |
      @app.route(...)
      def $FUNC(...):
          ...
          $X = request.args.get(...)
          ...
    fix: |
      # Add input validation
      from werkzeug.exceptions import BadRequest
      $X = request.args.get(...)
      if not $X or not isinstance($X, str):
          raise BadRequest("Invalid input")
    message: "User input from request.args is used without validation."
    severity: WARNING
    languages: [python]

  # 5. Detect Eval Usage
  - id: dangerous-eval
    pattern: eval(...)
    message: "eval() is dangerous. Use ast.literal_eval() for safe evaluation."
    severity: ERROR
    languages: [python, javascript]
    metadata:
      cwe: "CWE-95: Eval Injection"
EOF

# Test custom rules
semgrep --config custom_rules.yaml .

# Run specific rule
semgrep --config custom_rules.yaml --include "*.py" src/

echo "Custom rules created"

Contribute ????????????????????? Open Source

???????????? contribute rules ???????????? Semgrep community

#!/usr/bin/env python3
# contribution_guide.py ??? Semgrep Open Source Contribution Guide
import json
import logging
from typing import Dict, List

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("contrib")

class ContributionGuide:
    def __init__(self):
        self.steps = []
    
    def contribution_types(self):
        return {
            "write_rules": {
                "description": "??????????????? Semgrep rules ????????????",
                "difficulty": "Easy-Medium",
                "impact": "High ??? ???????????? community ????????????????????? vulnerabilities ????????????",
                "steps": [
                    "??????????????? Semgrep rule syntax ????????? docs",
                    "???????????? vulnerability pattern ???????????????????????????????????????????????????",
                    "??????????????? rule ??????????????? test cases",
                    "???????????????????????? real-world code",
                    "Submit PR ??????????????? semgrep-rules repository",
                ],
                "repo": "github.com/semgrep/semgrep-rules",
            },
            "improve_rules": {
                "description": "???????????????????????? rules ???????????????????????????",
                "difficulty": "Easy",
                "impact": "Medium ??? ?????? false positives, ??????????????? coverage",
                "steps": [
                    "?????? issues ??????????????? label 'false-positive' ???????????? 'enhancement'",
                    "???????????????????????????????????? rule ????????????",
                    "??????????????? test cases ?????????????????? edge cases",
                    "???????????? pattern ???????????????????????????????????????",
                    "Submit PR ??????????????? explanation",
                ],
            },
            "documentation": {
                "description": "???????????????????????? documentation",
                "difficulty": "Easy",
                "impact": "Medium ??? ???????????? newcomers ?????????????????? Semgrep",
                "steps": [
                    "???????????? docs ?????? gaps ???????????? errors",
                    "??????????????????????????????????????????????????????????????????",
                    "????????????????????????????????????????????? (localization)",
                    "Submit PR ??????????????? docs repo",
                ],
            },
            "core_development": {
                "description": "Contribute code ????????? Semgrep core",
                "difficulty": "Hard",
                "impact": "Very High",
                "language": "OCaml (core engine)",
                "steps": [
                    "??????????????? Semgrep architecture",
                    "?????? good-first-issue labels",
                    "Setup development environment",
                    "Write code + tests",
                    "Submit PR ????????? contributing guidelines",
                ],
                "repo": "github.com/semgrep/semgrep",
            },
        }
    
    def rule_testing(self):
        return {
            "test_structure": """
# Rule file: rules/python/sql-injection.yaml
# Test file: rules/python/sql-injection.py

# Test cases:
# ruleid: python-sql-injection
cursor.execute("SELECT * FROM users WHERE id = %s" % user_id)

# ok: python-sql-injection  
cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))

# ruleid: python-sql-injection
cursor.execute(f"SELECT * FROM users WHERE id = {user_id}")

# ok: python-sql-injection
cursor.execute("SELECT * FROM users WHERE id = ?", [user_id])
            """,
            "run_tests": "semgrep --test rules/python/",
        }

guide = ContributionGuide()
types = guide.contribution_types()
print("Contribution Types:")
for ctype, info in types.items():
    print(f"\n  {ctype}: {info['description']}")
    print(f"    Difficulty: {info['difficulty']}, Impact: {info['impact']}")

testing = guide.rule_testing()
print(f"\nRun tests: {testing['run_tests']}")

CI/CD Integration

Integrate Semgrep ????????????????????? CI/CD pipeline

# === Semgrep CI/CD Integration ===

# 1. GitHub Actions
cat > .github/workflows/semgrep.yml << 'EOF'
name: Semgrep Security Scan
on:
  pull_request: {}
  push:
    branches: [main, develop]

jobs:
  semgrep:
    runs-on: ubuntu-latest
    container:
      image: semgrep/semgrep
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Full Scan (push to main)
        if: github.event_name == 'push'
        run: |
          semgrep ci \
            --config auto \
            --config p/security-audit \
            --config p/secrets \
            --sarif > semgrep.sarif
        env:
          SEMGREP_APP_TOKEN: }
      
      - name: Diff Scan (PR)
        if: github.event_name == 'pull_request'
        run: |
          semgrep ci \
            --config auto \
            --baseline-ref } \
            --sarif > semgrep.sarif
        env:
          SEMGREP_APP_TOKEN: }
      
      - name: Upload SARIF
        if: always()
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: semgrep.sarif
EOF

# 2. GitLab CI
cat > .gitlab-ci-semgrep.yml << 'EOF'
semgrep:
  stage: test
  image: semgrep/semgrep
  script:
    - semgrep ci --config auto --json > semgrep-report.json
  artifacts:
    reports:
      sast: semgrep-report.json
  rules:
    - if: $CI_MERGE_REQUEST_IID
    - if: $CI_COMMIT_BRANCH == "main"
EOF

# 3. Pre-commit Hook
cat > .pre-commit-config.yaml << 'EOF'
repos:
  - repo: https://github.com/semgrep/semgrep
    rev: v1.70.0
    hooks:
      - id: semgrep
        args: ['--config', 'auto', '--error', '--severity', 'ERROR']
EOF

pre-commit install

echo "CI/CD integration configured"

Advanced Patterns ????????? Performance

????????????????????????????????????????????????????????? Semgrep

#!/usr/bin/env python3
# advanced_semgrep.py ??? Advanced Semgrep Techniques
import json
import logging
from typing import Dict, List

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("advanced")

class AdvancedSemgrep:
    def __init__(self):
        pass
    
    def advanced_patterns(self):
        return {
            "taint_tracking": {
                "description": "Track user input (source) to dangerous function (sink)",
                "example": """
rules:
  - id: taint-sql-injection
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
      - pattern: request.form.get(...)
    pattern-sinks:
      - pattern: cursor.execute($QUERY)
    message: "User input flows to SQL query without sanitization"
    severity: ERROR
    languages: [python]
                """,
                "use_case": "XSS, SQLi, Command Injection tracking",
            },
            "metavariable_comparison": {
                "description": "Compare metavariable values",
                "example": """
rules:
  - id: weak-crypto-key
    pattern: |
      $CRYPTO.new($KEY, ...)
    metavariable-comparison:
      metavariable: $KEY
      comparison: len($KEY) < 256
    message: "Crypto key length is less than 256 bits"
                """,
            },
            "pattern_inside": {
                "description": "Match pattern only inside specific context",
                "example": """
rules:
  - id: flask-debug-mode
    patterns:
      - pattern: app.run(..., debug=True, ...)
      - pattern-not-inside: |
          if __name__ == "__main__":
              ...
    message: "Debug mode enabled outside development guard"
                """,
            },
        }
    
    def performance_tips(self):
        return {
            "parallel_scanning": {
                "command": "semgrep --jobs 8 --config auto .",
                "description": "????????? multi-threading scan ????????????????????????",
            },
            "targeted_scanning": {
                "command": "semgrep --config auto --include '*.py' --exclude 'test/*' src/",
                "description": "Scan ????????????????????????????????????????????????",
            },
            "incremental_scanning": {
                "command": "semgrep ci --baseline-ref HEAD~1",
                "description": "Scan ??????????????? diff ?????????????????? 80%+",
            },
            "rule_optimization": {
                "tips": [
                    "????????? pattern-inside ???????????? pattern ??????????????? narrow scope",
                    "????????? focus-metavariable ?????? false positives",
                    "Avoid broad patterns like $X(...)",
                    "Test rule performance: semgrep --time --config rule.yaml .",
                ],
            },
        }
    
    def comparison_with_other_tools(self):
        return {
            "semgrep": {"type": "Pattern-based SAST", "speed": "Fast", "false_positives": "Low", "custom_rules": "Easy (YAML)", "price": "Free (OSS) / Paid (Cloud)"},
            "codeql": {"type": "Query-based SAST", "speed": "Slow", "false_positives": "Very Low", "custom_rules": "Hard (QL language)", "price": "Free for public repos"},
            "sonarqube": {"type": "Rule-based SAST", "speed": "Medium", "false_positives": "Medium", "custom_rules": "Medium (Java plugin)", "price": "Free (Community) / Paid"},
            "bandit": {"type": "Python-only SAST", "speed": "Very Fast", "false_positives": "Medium-High", "custom_rules": "Hard (Python)", "price": "Free"},
        }

adv = AdvancedSemgrep()
patterns = adv.advanced_patterns()
print("Advanced Patterns:")
for name, info in patterns.items():
    print(f"  {name}: {info['description']}")

perf = adv.performance_tips()
print("\nPerformance Tips:")
for tip, info in perf.items():
    if "command" in info:
        print(f"  {tip}: {info['command']}")

comp = adv.comparison_with_other_tools()
print("\nTool Comparison:")
for tool, info in comp.items():
    print(f"  {tool}: Speed={info['speed']}, FP={info['false_positives']}, Rules={info['custom_rules']}")

FAQ ??????????????????????????????????????????

Q: Semgrep ????????? CodeQL ???????????????????????????????????????????

A: Semgrep ????????? pattern matching syntax ??????????????? code ???????????? ??????????????? rules ???????????? ???????????? ??????????????? developer ????????? daily scan ???????????? seconds-minutes Free ?????????????????? OSS ?????? paid cloud tier CodeQL ????????? query language ??????????????? (QL) ??????????????? queries ????????????????????????????????? ?????????????????????????????????????????? deep data flow analysis ????????? ????????????????????? (minutes-hours) Free ?????????????????? public repos ?????? GitHub ??????????????? Semgrep ?????????????????????????????? ???????????? ???????????? ??????????????? rules ????????? integrate CI/CD ???????????? ??????????????? CodeQL ?????????????????????????????? deep analysis, ????????? GitHub, ???????????????????????? scan ????????????????????????????????????????????? Semgrep ?????????????????? fast feedback, CodeQL ?????????????????? deep analysis

Q: ???????????????????????? contribute open source ??????????????????????

A: ?????????????????? Semgrep ???????????????????????? ??????????????? custom rules ?????????????????? project ?????????????????? ?????????????????? rule syntax ????????????, ?????? Semgrep Rules repo (github.com/semgrep/semgrep-rules) ?????? issues ??????????????? label "good first issue", ???????????????????????????????????????????????? rules ??????????????????????????? ??????????????? test cases ?????? false positives, ???????????? CONTRIBUTING.md ????????????????????????????????????, Fork repo, ??????????????? branch, ??????????????? code + tests, submit PR ?????????????????? open source ?????????????????? ???????????????????????? documentation fixes (typo, clarification) ???????????? PR ????????? ??????????????? track record ?????????????????????????????????????????? feature/code contributions

Q: Semgrep Free ????????? Paid ???????????????????????????????????????????

A: Semgrep OSS (Free) CLI tool scan ?????????????????????????????????, ????????? community rules ?????????, ??????????????? custom rules ?????????, integrate CI/CD ?????????, local scan ???????????????????????? Semgrep Cloud (Paid) ??????????????? dashboard ?????????????????? findings, triage ????????? track vulnerabilities, team collaboration, supply chain security (SCA), secrets detection ?????????????????????, policy enforcement, SSO/RBAC ?????????????????? developer ????????????????????????????????? small team Semgrep OSS ????????????????????? ???????????????????????????????????? 10+ developers ?????????????????? Cloud tier ($40/developer/???????????????) ??????????????? centralized management

Q: Taint Tracking ?????? Semgrep ?????????????????????????????????????

A: Taint tracking ????????????????????????????????????????????? data ????????? source (??????????????????????????????) ?????? sink (??????????????????????????????) Source ???????????????????????????????????? user input ???????????? request.args.get(), request.body, sys.argv Sink ????????? function ???????????????????????????????????????????????? untrusted input ???????????? cursor.execute(), eval(), os.system() Sanitizer ????????? function ???????????????????????? data ????????????????????? ???????????? escape(), parameterize() Semgrep ?????? alert ??????????????? data ?????????????????? source ?????? sink ?????????????????????????????? sanitizer ????????? mode: taint ?????? rule ??????????????????????????? pattern-sources, pattern-sinks, pattern-sanitizers ???????????? feature ??????????????????????????????????????? ?????? false positives ?????????????????? ???????????????????????? pattern matching ??????????????????

📖 บทความที่เกี่ยวข้อง

BigQuery Scheduled Query Open Source Contributionอ่านบทความ → Linux Namespaces Open Source Contributionอ่านบทความ → Data Lakehouse Open Source Contributionอ่านบทความ → Ollama Local LLM Open Source Contributionอ่านบทความ → Semgrep SAST Citizen Developerอ่านบทความ →

📚 ดูบทความทั้งหมด →