Semgrep SAST ?????????????????????
Semgrep ???????????? open source Static Application Security Testing (SAST) tool ?????????????????? pattern matching ??????????????????????????? source code ?????????????????????????????? security, bugs ????????? code quality issues ?????????????????????????????? 30 ???????????? ?????????????????? Python, JavaScript, TypeScript, Java, Go, Ruby, PHP
Semgrep ???????????????????????????????????? SAST tools ???????????? ????????? semantic pattern matching ?????????????????? regex ?????????????????? ????????????????????????????????????????????? code ??????????????? ???????????? ??????????????????????????? variable ???????????? user input ????????????????????? ??????????????? false positive ????????????????????? tools ???????????? ??????????????? rules ???????????????????????? syntax ???????????????????????? code ????????????
Open Source Contribution ??????????????????????????????????????????????????????????????????????????? Semgrep ecosystem ???????????? ??????????????? custom rules ???????????????????????? community, report bugs, improve documentation, contribute code fixes ?????????????????????????????????????????????????????????????????????????????? security, ??????????????? portfolio ????????????????????? community ?????????????????????
???????????????????????????????????????????????? Semgrep
Setup Semgrep ?????????????????? security scanning
# === Semgrep Installation & Usage ===
# 1. Install Semgrep
pip install semgrep
# Or via Docker
docker pull semgrep/semgrep
# Or via Homebrew (macOS)
brew install semgrep
# 2. Verify Installation
semgrep --version
# 3. Run with Default Rules (Semgrep Registry)
# Scan current directory with recommended rules
semgrep --config auto .
# Scan with specific rulesets
semgrep --config p/python .
semgrep --config p/javascript .
semgrep --config p/owasp-top-ten .
semgrep --config p/security-audit .
# 4. Common Security Rulesets
cat > semgrep_rulesets.yaml << 'EOF'
recommended_rulesets:
general:
- "p/security-audit"
- "p/owasp-top-ten"
- "p/secrets"
python:
- "p/python"
- "p/flask"
- "p/django"
- "p/bandit"
javascript:
- "p/javascript"
- "p/react"
- "p/nextjs"
- "p/express"
- "p/nodejs"
golang:
- "p/golang"
- "p/gorilla"
java:
- "p/java"
- "p/spring"
infrastructure:
- "p/terraform"
- "p/dockerfile"
- "p/kubernetes"
EOF
# 5. Output Formats
semgrep --config auto --json . > semgrep-results.json
semgrep --config auto --sarif . > semgrep-results.sarif
semgrep --config auto --output results.txt .
# 6. Scan Options
semgrep --config auto \
--severity ERROR \
--exclude "test/*" \
--exclude "node_modules/*" \
--max-target-bytes 1000000 \
--timeout 30 \
--jobs 4 \
.
echo "Semgrep setup complete"
??????????????? Custom Rules
??????????????? Semgrep rules ?????????
# === Custom Semgrep Rules ===
cat > custom_rules.yaml << 'EOF'
rules:
# 1. Detect SQL Injection in Python
- id: python-sql-injection
pattern: |
cursor.execute($QUERY % ...)
message: "Possible SQL injection. Use parameterized queries instead of string formatting."
severity: ERROR
languages: [python]
metadata:
cwe: "CWE-89: SQL Injection"
owasp: "A03:2021 - Injection"
fix: "cursor.execute('SELECT * FROM users WHERE id = %s', (user_id,))"
# 2. Detect Hardcoded Secrets
- id: hardcoded-api-key
patterns:
- pattern: |
$KEY = "..."
- metavariable-regex:
metavariable: $KEY
regex: ".*(api_key|apikey|secret|password|token|auth).*"
- metavariable-regex:
metavariable: $VALUE
regex: ".{10,}"
message: "Hardcoded secret detected. Use environment variables or secret manager."
severity: ERROR
languages: [python, javascript, typescript]
metadata:
cwe: "CWE-798: Hard-coded Credentials"
# 3. Detect Insecure HTTP
- id: insecure-http-request
patterns:
- pattern: requests.get("http://...")
- pattern: requests.post("http://...")
- pattern: fetch("http://...")
message: "Using HTTP instead of HTTPS. Use HTTPS for secure communication."
severity: WARNING
languages: [python, javascript]
metadata:
cwe: "CWE-319: Cleartext Transmission"
# 4. Detect Missing Input Validation (Flask)
- id: flask-no-input-validation
pattern: |
@app.route(...)
def $FUNC(...):
...
$X = request.args.get(...)
...
fix: |
# Add input validation
from werkzeug.exceptions import BadRequest
$X = request.args.get(...)
if not $X or not isinstance($X, str):
raise BadRequest("Invalid input")
message: "User input from request.args is used without validation."
severity: WARNING
languages: [python]
# 5. Detect Eval Usage
- id: dangerous-eval
pattern: eval(...)
message: "eval() is dangerous. Use ast.literal_eval() for safe evaluation."
severity: ERROR
languages: [python, javascript]
metadata:
cwe: "CWE-95: Eval Injection"
EOF
# Test custom rules
semgrep --config custom_rules.yaml .
# Run specific rule
semgrep --config custom_rules.yaml --include "*.py" src/
echo "Custom rules created"
Contribute ????????????????????? Open Source
???????????? contribute rules ???????????? Semgrep community
#!/usr/bin/env python3
# contribution_guide.py ??? Semgrep Open Source Contribution Guide
import json
import logging
from typing import Dict, List
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("contrib")
class ContributionGuide:
def __init__(self):
self.steps = []
def contribution_types(self):
return {
"write_rules": {
"description": "??????????????? Semgrep rules ????????????",
"difficulty": "Easy-Medium",
"impact": "High ??? ???????????? community ????????????????????? vulnerabilities ????????????",
"steps": [
"??????????????? Semgrep rule syntax ????????? docs",
"???????????? vulnerability pattern ???????????????????????????????????????????????????",
"??????????????? rule ??????????????? test cases",
"???????????????????????? real-world code",
"Submit PR ??????????????? semgrep-rules repository",
],
"repo": "github.com/semgrep/semgrep-rules",
},
"improve_rules": {
"description": "???????????????????????? rules ???????????????????????????",
"difficulty": "Easy",
"impact": "Medium ??? ?????? false positives, ??????????????? coverage",
"steps": [
"?????? issues ??????????????? label 'false-positive' ???????????? 'enhancement'",
"???????????????????????????????????? rule ????????????",
"??????????????? test cases ?????????????????? edge cases",
"???????????? pattern ???????????????????????????????????????",
"Submit PR ??????????????? explanation",
],
},
"documentation": {
"description": "???????????????????????? documentation",
"difficulty": "Easy",
"impact": "Medium ??? ???????????? newcomers ?????????????????? Semgrep",
"steps": [
"???????????? docs ?????? gaps ???????????? errors",
"??????????????????????????????????????????????????????????????????",
"????????????????????????????????????????????? (localization)",
"Submit PR ??????????????? docs repo",
],
},
"core_development": {
"description": "Contribute code ????????? Semgrep core",
"difficulty": "Hard",
"impact": "Very High",
"language": "OCaml (core engine)",
"steps": [
"??????????????? Semgrep architecture",
"?????? good-first-issue labels",
"Setup development environment",
"Write code + tests",
"Submit PR ????????? contributing guidelines",
],
"repo": "github.com/semgrep/semgrep",
},
}
def rule_testing(self):
return {
"test_structure": """
# Rule file: rules/python/sql-injection.yaml
# Test file: rules/python/sql-injection.py
# Test cases:
# ruleid: python-sql-injection
cursor.execute("SELECT * FROM users WHERE id = %s" % user_id)
# ok: python-sql-injection
cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
# ruleid: python-sql-injection
cursor.execute(f"SELECT * FROM users WHERE id = {user_id}")
# ok: python-sql-injection
cursor.execute("SELECT * FROM users WHERE id = ?", [user_id])
""",
"run_tests": "semgrep --test rules/python/",
}
guide = ContributionGuide()
types = guide.contribution_types()
print("Contribution Types:")
for ctype, info in types.items():
print(f"\n {ctype}: {info['description']}")
print(f" Difficulty: {info['difficulty']}, Impact: {info['impact']}")
testing = guide.rule_testing()
print(f"\nRun tests: {testing['run_tests']}")
CI/CD Integration
Integrate Semgrep ????????????????????? CI/CD pipeline
# === Semgrep CI/CD Integration ===
# 1. GitHub Actions
cat > .github/workflows/semgrep.yml << 'EOF'
name: Semgrep Security Scan
on:
pull_request: {}
push:
branches: [main, develop]
jobs:
semgrep:
runs-on: ubuntu-latest
container:
image: semgrep/semgrep
steps:
- uses: actions/checkout@v4
- name: Full Scan (push to main)
if: github.event_name == 'push'
run: |
semgrep ci \
--config auto \
--config p/security-audit \
--config p/secrets \
--sarif > semgrep.sarif
env:
SEMGREP_APP_TOKEN: }
- name: Diff Scan (PR)
if: github.event_name == 'pull_request'
run: |
semgrep ci \
--config auto \
--baseline-ref } \
--sarif > semgrep.sarif
env:
SEMGREP_APP_TOKEN: }
- name: Upload SARIF
if: always()
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: semgrep.sarif
EOF
# 2. GitLab CI
cat > .gitlab-ci-semgrep.yml << 'EOF'
semgrep:
stage: test
image: semgrep/semgrep
script:
- semgrep ci --config auto --json > semgrep-report.json
artifacts:
reports:
sast: semgrep-report.json
rules:
- if: $CI_MERGE_REQUEST_IID
- if: $CI_COMMIT_BRANCH == "main"
EOF
# 3. Pre-commit Hook
cat > .pre-commit-config.yaml << 'EOF'
repos:
- repo: https://github.com/semgrep/semgrep
rev: v1.70.0
hooks:
- id: semgrep
args: ['--config', 'auto', '--error', '--severity', 'ERROR']
EOF
pre-commit install
echo "CI/CD integration configured"
Advanced Patterns ????????? Performance
????????????????????????????????????????????????????????? Semgrep
#!/usr/bin/env python3
# advanced_semgrep.py ??? Advanced Semgrep Techniques
import json
import logging
from typing import Dict, List
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("advanced")
class AdvancedSemgrep:
def __init__(self):
pass
def advanced_patterns(self):
return {
"taint_tracking": {
"description": "Track user input (source) to dangerous function (sink)",
"example": """
rules:
- id: taint-sql-injection
mode: taint
pattern-sources:
- pattern: request.args.get(...)
- pattern: request.form.get(...)
pattern-sinks:
- pattern: cursor.execute($QUERY)
message: "User input flows to SQL query without sanitization"
severity: ERROR
languages: [python]
""",
"use_case": "XSS, SQLi, Command Injection tracking",
},
"metavariable_comparison": {
"description": "Compare metavariable values",
"example": """
rules:
- id: weak-crypto-key
pattern: |
$CRYPTO.new($KEY, ...)
metavariable-comparison:
metavariable: $KEY
comparison: len($KEY) < 256
message: "Crypto key length is less than 256 bits"
""",
},
"pattern_inside": {
"description": "Match pattern only inside specific context",
"example": """
rules:
- id: flask-debug-mode
patterns:
- pattern: app.run(..., debug=True, ...)
- pattern-not-inside: |
if __name__ == "__main__":
...
message: "Debug mode enabled outside development guard"
""",
},
}
def performance_tips(self):
return {
"parallel_scanning": {
"command": "semgrep --jobs 8 --config auto .",
"description": "????????? multi-threading scan ????????????????????????",
},
"targeted_scanning": {
"command": "semgrep --config auto --include '*.py' --exclude 'test/*' src/",
"description": "Scan ????????????????????????????????????????????????",
},
"incremental_scanning": {
"command": "semgrep ci --baseline-ref HEAD~1",
"description": "Scan ??????????????? diff ?????????????????? 80%+",
},
"rule_optimization": {
"tips": [
"????????? pattern-inside ???????????? pattern ??????????????? narrow scope",
"????????? focus-metavariable ?????? false positives",
"Avoid broad patterns like $X(...)",
"Test rule performance: semgrep --time --config rule.yaml .",
],
},
}
def comparison_with_other_tools(self):
return {
"semgrep": {"type": "Pattern-based SAST", "speed": "Fast", "false_positives": "Low", "custom_rules": "Easy (YAML)", "price": "Free (OSS) / Paid (Cloud)"},
"codeql": {"type": "Query-based SAST", "speed": "Slow", "false_positives": "Very Low", "custom_rules": "Hard (QL language)", "price": "Free for public repos"},
"sonarqube": {"type": "Rule-based SAST", "speed": "Medium", "false_positives": "Medium", "custom_rules": "Medium (Java plugin)", "price": "Free (Community) / Paid"},
"bandit": {"type": "Python-only SAST", "speed": "Very Fast", "false_positives": "Medium-High", "custom_rules": "Hard (Python)", "price": "Free"},
}
adv = AdvancedSemgrep()
patterns = adv.advanced_patterns()
print("Advanced Patterns:")
for name, info in patterns.items():
print(f" {name}: {info['description']}")
perf = adv.performance_tips()
print("\nPerformance Tips:")
for tip, info in perf.items():
if "command" in info:
print(f" {tip}: {info['command']}")
comp = adv.comparison_with_other_tools()
print("\nTool Comparison:")
for tool, info in comp.items():
print(f" {tool}: Speed={info['speed']}, FP={info['false_positives']}, Rules={info['custom_rules']}")
FAQ ??????????????????????????????????????????
Q: Semgrep ????????? CodeQL ???????????????????????????????????????????
A: Semgrep ????????? pattern matching syntax ??????????????? code ???????????? ??????????????? rules ???????????? ???????????? ??????????????? developer ????????? daily scan ???????????? seconds-minutes Free ?????????????????? OSS ?????? paid cloud tier CodeQL ????????? query language ??????????????? (QL) ??????????????? queries ????????????????????????????????? ?????????????????????????????????????????? deep data flow analysis ????????? ????????????????????? (minutes-hours) Free ?????????????????? public repos ?????? GitHub ??????????????? Semgrep ?????????????????????????????? ???????????? ???????????? ??????????????? rules ????????? integrate CI/CD ???????????? ??????????????? CodeQL ?????????????????????????????? deep analysis, ????????? GitHub, ???????????????????????? scan ????????????????????????????????????????????? Semgrep ?????????????????? fast feedback, CodeQL ?????????????????? deep analysis
Q: ???????????????????????? contribute open source ??????????????????????
A: ?????????????????? Semgrep ???????????????????????? ??????????????? custom rules ?????????????????? project ?????????????????? ?????????????????? rule syntax ????????????, ?????? Semgrep Rules repo (github.com/semgrep/semgrep-rules) ?????? issues ??????????????? label "good first issue", ???????????????????????????????????????????????? rules ??????????????????????????? ??????????????? test cases ?????? false positives, ???????????? CONTRIBUTING.md ????????????????????????????????????, Fork repo, ??????????????? branch, ??????????????? code + tests, submit PR ?????????????????? open source ?????????????????? ???????????????????????? documentation fixes (typo, clarification) ???????????? PR ????????? ??????????????? track record ?????????????????????????????????????????? feature/code contributions
Q: Semgrep Free ????????? Paid ???????????????????????????????????????????
A: Semgrep OSS (Free) CLI tool scan ?????????????????????????????????, ????????? community rules ?????????, ??????????????? custom rules ?????????, integrate CI/CD ?????????, local scan ???????????????????????? Semgrep Cloud (Paid) ??????????????? dashboard ?????????????????? findings, triage ????????? track vulnerabilities, team collaboration, supply chain security (SCA), secrets detection ?????????????????????, policy enforcement, SSO/RBAC ?????????????????? developer ????????????????????????????????? small team Semgrep OSS ????????????????????? ???????????????????????????????????? 10+ developers ?????????????????? Cloud tier ($40/developer/???????????????) ??????????????? centralized management
Q: Taint Tracking ?????? Semgrep ?????????????????????????????????????
A: Taint tracking ????????????????????????????????????????????? data ????????? source (??????????????????????????????) ?????? sink (??????????????????????????????) Source ???????????????????????????????????? user input ???????????? request.args.get(), request.body, sys.argv Sink ????????? function ???????????????????????????????????????????????? untrusted input ???????????? cursor.execute(), eval(), os.system() Sanitizer ????????? function ???????????????????????? data ????????????????????? ???????????? escape(), parameterize() Semgrep ?????? alert ??????????????? data ?????????????????? source ?????? sink ?????????????????????????????? sanitizer ????????? mode: taint ?????? rule ??????????????????????????? pattern-sources, pattern-sinks, pattern-sanitizers ???????????? feature ??????????????????????????????????????? ?????? false positives ?????????????????? ???????????????????????? pattern matching ??????????????????
