Try Live
Add Docs
Rankings
Pricing
Enterprise
Docs
Install
Theme
Install
Docs
Pricing
Enterprise
More...
More...
Try Live
Rankings
Create API Key
Add Docs
Betterleaks
https://github.com/betterleaks/betterleaks
Admin
Betterleaks is a tool for detecting secrets like passwords, API keys, and tokens in git
...
Tokens:
16,680
Snippets:
140
Trust Score:
5.4
Update:
1 week ago
Context
Skills
Chat
Benchmark
80
Suggestions
Latest
Show doc for...
Code
Info
Show Results
Context Summary (auto-generated)
Raw
Copy
Link
# Betterleaks Betterleaks is a secrets-scanning tool built on the legacy of Gitleaks, maintained by its original authors and sponsored by Aikido Security. It detects hardcoded credentials, API keys, and other sensitive values in git repositories, local filesystems, and piped input. The tool is designed as a drop-in evolution of Gitleaks, retaining full backwards compatibility with `.gitleaks.toml` config and `.gitleaksignore` files while introducing a significantly more powerful detection and filtering system. The core of Betterleaks is its CEL (Common Expression Language) based configuration. Instead of static allowlists, every rule can carry `prefilter` and `filter` CEL expressions that evaluate metadata (file path, git author, commit message) and finding data (secret, match, entropy) to eliminate false positives dynamically. Secrets can also be actively validated against live APIs via `validate` CEL expressions that fire asynchronous HTTP requests, enabling real-time confirmation of whether a detected credential is still active. The scanner achieves high throughput through an Aho-Corasick keyword pre-filter trie, RE2 regex matching, BPE token-efficiency filtering, and parallelized git history scanning. --- ## CLI Commands ### `betterleaks git` — Scan a Git repository's full commit history Traverses all commits via `git log` and scans each patch for secrets. Supports parallel workers, pre-commit mode, and staged-only scanning. ```bash # Scan the current repo's full history with verbose output betterleaks git . -v # Scan a remote repo clone at high parallelism and emit a JSON report betterleaks git /path/to/repo --git-workers=16 --report-path=findings.json --report-format=json # Pre-commit hook: scan only staged changes, redact secrets in output betterleaks git --pre-commit --staged --redact -v # Scope to specific commits and generate SARIF output for CI betterleaks git . --log-opts="--since=2024-01-01" -f sarif -r results.sarif # Use a custom config, suppress banner, and set exit code 0 always betterleaks git . -c /etc/betterleaks.toml --no-banner --exit-code=0 # Validate detected secrets live (only report valid ones) betterleaks git . --validation --validation-status=valid --validation-workers=20 -v ``` --- ### `betterleaks dir` — Scan files and directories (no git) Scans plain files and directories without any git involvement. Accepts multiple paths; nested paths are deduplicated automatically. ```bash # Scan a single directory betterleaks dir /path/to/project -v # Scan multiple independent paths betterleaks dir /app/config /app/secrets -v # Scan a specific file, output CSV betterleaks dir /deploy/.env -f csv -r secrets.csv # Skip large files, follow symlinks betterleaks dir /srv --max-target-megabytes=5 --follow-symlinks -v # Scan inside nested archives (e.g., zip inside tar) betterleaks dir /backups --max-archive-depth=3 -v # Redact 50% of each secret in log output betterleaks dir . --redact=50 -v ``` --- ### `betterleaks stdin` — Scan piped input Reads content from standard input and scans it as a single file fragment. Useful for scanning command output, build artifacts, or log streams. ```bash # Scan a file piped through stdin cat .env.production | betterleaks stdin -v # Scan the output of a command env | betterleaks stdin -v # Scan a file and write JSON results to stdout cat app.log | betterleaks stdin -f json -r - # Use a specific config and disable color cat config.yaml | betterleaks stdin -c rules.toml --no-color -v ``` --- ## Configuration File (`betterleaks.toml`) ### Config resolution order Betterleaks resolves configuration from the following sources, in order of precedence: ``` 1. --config / -c flag 2. BETTERLEAKS_CONFIG or GITLEAKS_CONFIG environment variable (file path) 3. BETTERLEAKS_CONFIG_TOML or GITLEAKS_CONFIG_TOML environment variable (inline TOML content) 4. .betterleaks.toml or .gitleaks.toml in the target directory 5. Built-in default config (embedded in the binary) ``` --- ### Top-level config fields The full set of top-level fields in a `betterleaks.toml`: ```toml # Minimum binary version required to use this config betterleaksMinVersion = "1.0.0" # Minimum Gitleaks-format version (backwards compatibility) minVersion = "8.0.0" # Global prefilter: evaluated BEFORE any regex, has access to `attributes` only. # Return true to SKIP the entire file/commit. Good for binary files or bot commits. prefilter = ''' matchesAny(attributes[?"path"].orValue(""), [ r"""(?i)\.(?:png|jpg|gif|svg|pdf|exe|bin)$""", r"""(?:^|/)node_modules(?:/.*)?$""" ]) || attributes[?"git.author_name"].orValue("") == "renovate[bot]" ''' # Global filter: evaluated AFTER regex match, has access to `attributes` + `finding`. # Return true to DISCARD the finding. filter = ''' containsAny(finding["secret"], [ "EXAMPLE", "CHANGEME", "YOUR_API_KEY_HERE", "REDACTED" ]) || (entropy(finding["secret"]) <= 2.5 && failsTokenEfficiency(finding["secret"])) ''' # Inherit all default built-in rules and also load a remote base config [extend] useDefault = true path = "https://raw.githubusercontent.com/example/configs/main/extra.toml" # Detection rules (see below) [[rules]] # ... ``` --- ### `[[rules]]` — Defining a detection rule Each rule identifies a specific secret type. `keywords` are required for performance; the Aho-Corasick pre-filter only executes the `regex` when a keyword matches. ```toml [[rules]] id = "github-fine-grained-pat" description = "GitHub Fine-Grained Personal Access Token" keywords = ["github_pat_"] regex = '''github_pat_\w{82}''' # Rule-level filter: discards false positives for this rule only filter = ''' ( attributes[?"git.author_name"].orValue("").endsWith("[bot]") && attributes[?"path"].orValue("").startsWith("tests/fixtures/") && containsAny(finding["secret"], ["_MOCK_", "_TEST_"]) ) || entropy(finding["secret"]) <= 3.0 ''' # Live validation against the GitHub API validate = ''' cel.bind(r, http.get("https://api.github.com/user", { "Accept": "application/vnd.github+json", "Authorization": "token " + secret }), r.status == 200 && r.json.?login.orValue("") != "" ? { "result": "valid", "username": r.json.?login.orValue(""), "name": r.json.?name.orValue(""), "scopes": r.headers[?"x-oauth-scopes"].orValue("") } : r.status in [401, 403] ? { "result": "invalid", "reason": "Unauthorized" } : unknown(r) ) ''' ``` --- ### `[[rules.required]]` — Composite (multi-part) rules Require auxiliary findings to be present near a primary match before a finding is emitted. Both `withinLines` and `withinColumn` are optional proximity constraints. ```toml # Primary rule: AWS Access Key ID [[rules]] id = "aws-credentials" keywords = ["AKIA"] regex = '''(?:A3T[A-Z0-9]|AKIA|AGPA|AIDA|AROA|AIPA|ANPA|ANVA|ASIA)[A-Z0-9]{16}''' # The primary match is only valid if a Secret Access Key is found within 5 lines [[rules.required]] id = "aws-secret-key" withinLines = 5 # Auxiliary rule (SkipReport = true means it is only used as a component) [[rules]] id = "aws-secret-key" keywords = ["secret", "key"] regex = '''[A-Za-z0-9/+=]{40}''' skipReport = true ``` --- ## CEL Filter Bindings ### `prefilter` — File/commit-level skip expressions Available inside `prefilter` only. Returns `true` to skip the entire resource before any regex runs. ```toml prefilter = ''' ( // Skip binary/media files matchesAny(attributes[?"path"].orValue(""), [ r"""(?i)\.(png|jpg|gif|mp4|zip|tar\.gz|exe)$""" ]) ) || ( // Skip the entire commit if authored by any known bot matchesAny(attributes[?"git.author_name"].orValue(""), [ r"""(?i)\[bot\]$""", r"""^renovate$""" ]) ) || ( // Skip vendor and generated directories matchesAny(attributes[?"path"].orValue(""), [ r"""(?:^|/)vendor/""", r"""(?:^|/)\.gen/""" ]) ) ''' ``` --- ### `filter` — Post-match finding discard expressions Available at global level and per-rule. Returns `true` to discard the candidate finding. Has access to both `attributes` and `finding`. ```toml filter = ''' ( // Discard if authored by a CI bot AND the file is a test fixture AND secret is a placeholder attributes[?"git.author_name"].orValue("").endsWith("[bot]") && attributes[?"path"].orValue("").startsWith("tests/fixtures/") && containsAny(finding["secret"], ["_MOCK_", "_TEST_", "placeholder"]) ) || ( // Discard if it's a markdown or text file with instructional language on the same line matchesAny(attributes[?"path"].orValue(""), [r"""(?i)\.(md|txt|rst)$"""]) && containsAny(finding["line"], ["Example:", "Replace this:", "YOUR_KEY_HERE"]) ) || ( // Discard low-entropy natural-language false positives entropy(finding["secret"]) <= 2.5 && failsTokenEfficiency(finding["secret"]) ) ''' ``` **Available bindings:** | Binding | Signature | Description | |---|---|---| | `attributes` | `map[string]string` | Metadata: `path`, `git.sha`, `git.author_name`, `git.author_email`, `git.date`, `git.message`, `git.remote_url`, `git.platform`, `fs.symlink` | | `finding` | `map[string]string` | Keys: `secret`, `match`, `line`, `rule_id`, `description` | | `matchesAny` | `(string, list<string>) → bool` | True if string matches any regex pattern in list (Aho-Corasick + RE2) | | `containsAny` | `(string, list<string>) → bool` | True if string contains any substring in list (Aho-Corasick) | | `entropy` | `(string) → double` | Shannon entropy in bits | | `failsTokenEfficiency` | `(string) → bool` | True if string tokenizes like natural language (BPE cl100k_base) | --- ## CEL Validation Bindings ### `validate` — Live secret verification via HTTP The `validate` expression must return a `map` with a `"result"` key. Valid statuses: `"valid"`, `"invalid"`, `"revoked"`, `"unknown"`, `"error"`. All additional keys are attached to the finding as metadata. ```toml # Validate a Stripe secret key validate = ''' cel.bind(r, http.get("https://api.stripe.com/v1/balance", { "Authorization": "Bearer " + secret }), r.status == 200 ? { "result": "valid" } : r.status in [401, 403] ? { "result": "invalid", "reason": "Unauthorized" } : unknown(r) ) ''' ``` **Available bindings:** | Binding | Signature | Description | |---|---|---| | `secret` | `string` | The extracted secret value | | `captures` | `map[string]string` | Named regex capture groups from the rule's regex | | `http.get` | `(url: string, headers: map) → map` | GET request; returns `{status, json, body, headers}` | | `http.post` | `(url, headers, body: string) → map` | POST request; same response map | | `cel.bind` | `(name, value, expr)` | Binds a variable to avoid repeating sub-expressions | | `unknown` | `(response: map) → map` | Returns `{"result": "unknown", "reason": "HTTP <N>"}` | | `crypto.md5` | `(bytes) → bytes` | MD5 hash | | `crypto.sha1` | `(bytes) → bytes` | SHA-1 hash | | `crypto.hmac_sha256` | `(key: bytes, msg: bytes) → bytes` | HMAC-SHA256 | | `hex.encode` | `(bytes) → string` | Lowercase hex encoding | | `time.now_unix` | `() → string` | Current Unix timestamp as string | | `aws.validate` | `(access_key_id, secret_access_key: string) → map` | SigV4-signed STS GetCallerIdentity call; returns `{status, arn, account, userid}` | ```toml # AWS credential validation using the built-in SigV4 helper [[rules]] id = "aws-access-key" keywords = ["AKIA"] regex = '''(?P<key>(?:A3T[A-Z0-9]|AKIA|AGPA)[A-Z0-9]{16})''' validate = ''' cel.bind(r, aws.validate(captures["key"], secret), r.status == 200 ? { "result": "valid", "arn": r[?"arn"].orValue(""), "account": r[?"account"].orValue(""), "userid": r[?"userid"].orValue("") } : r.status in [400, 403] ? { "result": "invalid", "reason": r[?"error_code"].orValue("InvalidClientTokenId") } : unknown(r) ) ''' # Validate using a custom HMAC-signed POST request (generic API example) validate = ''' cel.bind(ts, time.now_unix(), cel.bind(sig, hex.encode(crypto.hmac_sha256(bytes(secret), bytes("ts=" + ts))), cel.bind(r, http.post("https://api.example.com/v1/verify", { "Content-Type": "application/json", "X-Timestamp": ts, "X-Signature": sig }, "{\"token\": \"" + secret + "\"}"), r.status == 200 ? {"result": "valid"} : unknown(r) ) ) ) ''' ``` --- ## `Detector` — Programmatic Go API ### `detect.NewDetectorContext` — Create a detector with full options Compiles all CEL filter and validation programs from the config, sets up the validation worker pool, and builds the Aho-Corasick keyword trie. This is the primary constructor for programmatic use. ```go package main import ( "context" "fmt" "strings" "github.com/betterleaks/betterleaks/config" "github.com/betterleaks/betterleaks/detect" "github.com/betterleaks/betterleaks/sources" "github.com/spf13/viper" ) func main() { // Load config from TOML viper.SetConfigType("toml") if err := viper.ReadConfig(strings.NewReader(config.DefaultConfig)); err != nil { panic(err) } var vc config.ViperConfig if err := viper.Unmarshal(&vc); err != nil { panic(err) } cfg, err := vc.Translate() if err != nil { panic(err) } // Create detector with validation enabled valOpts := detect.ValidationOptions{ Enabled: true, Workers: 10, StatusFilter: "valid,revoked", // only emit valid or revoked findings } d := detect.NewDetectorContext(context.Background(), cfg, valOpts) // Scan a fragment (e.g., a file's contents) ctx := context.Background() src := &sources.Files{Path: "/path/to/scan", MaxFileSize: 5_000_000} for result := range d.Run(ctx, src) { if result.Err != nil { fmt.Println("error:", result.Err) continue } f := result.Finding fmt.Printf("[%s] %s at %s:%d (status=%s)\n", f.RuleID, f.Secret[:min(len(f.Secret), 12)]+"...", f.Attr("path"), f.StartLine, f.ValidationStatus) } } func min(a, b int) int { if a < b { return a } return b } ``` --- ### `detect.Detector.Run` — Streaming results iterator `Run` returns a Go 1.23 `iter.Seq[Result]` that yields `Result{Finding, Err}` as the scan proceeds. Context cancellation stops the scan gracefully. ```go ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second) defer cancel() src := &sources.Git{ Cmd: gitCmd, ShouldSkip: d.SkipFunc(), // applies the CEL prefilter Platform: scm.NoPlatform, Sema: d.Sema, } var findings []report.Finding for result := range d.Run(ctx, src) { if result.Err != nil { log.Printf("scan error: %v", result.Err) continue } findings = append(findings, result.Finding) fmt.Printf("Found secret: rule=%s file=%s line=%d\n", result.Finding.RuleID, result.Finding.Attr(sources.AttrPath), result.Finding.StartLine, ) } fmt.Printf("Total findings: %d\n", len(findings)) ``` --- ### `detect.Detector.DetectString` — Scan an in-memory string Scans a raw string with no source metadata (no file path, no git attributes). Useful for unit tests or scanning generated content. ```go d, _ := detect.NewDetectorDefaultConfig() content := ` DB_PASSWORD=super_secret_password_abc123 AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY ` findings := d.DetectString(content) for _, f := range findings { fmt.Printf("Rule: %-30s Secret: %s\n", f.RuleID, f.Secret) } // Output: // Rule: aws-access-key-id Secret: AKIAIOSFODNN7EXAMPLE // Rule: aws-secret-access-key Secret: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY ``` --- ### `detect.Detector.AddBaseline` — Suppress previously known findings Loads a prior JSON report as a baseline; any finding that already appears in the baseline is silently suppressed. Use to focus CI scans on net-new secrets. ```go // Step 1: generate baseline on main branch betterleaks git . -f json -r baseline.json // Step 2: in a PR scan, suppress all baseline findings d.AddBaseline("baseline.json", "/path/to/repo") // Step 3: only new findings are emitted for result := range d.Run(ctx, src) { ... } ``` ```go // Programmatic baseline usage if err := d.AddBaseline("./baseline.json", "/repo"); err != nil { log.Fatalf("baseline load failed: %v", err) } ``` --- ### `detect.Detector.AddGitleaksIgnore` — Load a `.betterleaksignore` file Loads fingerprint entries from a `.betterleaksignore` (or `.gitleaksignore`) file. Each line is a fingerprint in one of two formats: ``` # Global fingerprint: file:rule-id:start-line config/database.yml:generic-api-key:42 # Commit fingerprint: commit-sha:file:rule-id:start-line a1b2c3d4:config/database.yml:generic-api-key:42 ``` ```go if err := d.AddGitleaksIgnore("./.betterleaksignore"); err != nil { log.Fatal(err) } ``` --- ## Report Formats ### JSON report (`-f json`) The default structured output format. All `Finding` fields are serialized; validation metadata is included when `--validation` is active. ```bash betterleaks git . -f json -r findings.json -v # findings.json will contain a JSON array of Finding objects, e.g.: # [ # { # "RuleID": "github-pat", # "Description": "GitHub Personal Access Token", # "StartLine": 12, # "EndLine": 12, # "Match": "ghp_abc123...", # "Secret": "ghp_abc123...", # "Fingerprint": "a1b2c3:config/.env:github-pat:12", # "Attributes": {"path": "config/.env", "git.sha": "a1b2c3", ...}, # "ValidationStatus": "valid", # "ValidationMeta": {"username": "octocat"} # } # ] ``` --- ### SARIF report (`-f sarif`) Standard SARIF 2.1.0 format for integration with GitHub Advanced Security, VS Code, and other tools. ```bash betterleaks git . -f sarif -r results.sarif # Upload to GitHub code scanning: # gh api repos/OWNER/REPO/code-scanning/sarifs \ # -F commit_sha=$(git rev-parse HEAD) \ # -F ref=refs/heads/main \ # -F sarif=@results.sarif.b64 ``` --- ### CSV report (`-f csv`) ```bash betterleaks dir /app -f csv -r secrets.csv # RuleID,Description,StartLine,EndLine,Secret,File,Commit,Author,Email,Date,Fingerprint # github-pat,GitHub PAT,5,5,ghp_xxx,src/config.go,,,,,src/config.go:github-pat:5 ``` --- ### Template report (`-f template`) Use a Go `text/template` file to generate any custom output format. ```bash betterleaks git . -f template --report-template=./my-template.tmpl -r report.txt ``` ```go-template {{/* my-template.tmpl */}} {{range .}}FINDING: {{.RuleID}} in {{.File}} at line {{.StartLine}} Secret: {{.Secret}} Fingerprint: {{.Fingerprint}} {{end}} ``` --- ## Pre-commit Integration ### Native pre-commit hook (Go binary) Add to `.pre-commit-config.yaml` to run Betterleaks on every `git commit`: ```yaml repos: - repo: https://github.com/betterleaks/betterleaks rev: v1.0.0 # use the latest tag hooks: - id: betterleaks # uses installed Go binary # - id: betterleaks-docker # uses Docker image (no local install needed) # - id: betterleaks-system # uses system-installed binary ``` Each hook runs: ``` betterleaks git --pre-commit --redact --staged --verbose ``` --- ## Diagnostics / Profiling ### `--diagnostics` — CPU, memory, trace, and HTTP pprof ```bash # CPU + memory profiles saved to /tmp/prof/ betterleaks git . --diagnostics=cpu,mem --diagnostics-dir=/tmp/prof # Execution trace betterleaks git . --diagnostics=trace --diagnostics-dir=/tmp/prof # Live HTTP pprof server at http://localhost:6060/debug/pprof/ betterleaks git . --diagnostics=http # Analyze CPU profile go tool pprof /tmp/prof/cpu.pprof # Analyze trace go tool trace /tmp/prof/trace.out ``` --- ## Global Flags Reference The following flags are available on all subcommands: ``` -c, --config string Config file path (default: auto-discovered .betterleaks.toml) --exit-code int Exit code when leaks are found (default 1) -r, --report-path string Report output file path (use "-" for stdout) -f, --report-format string Output format: json, csv, junit, sarif, template --report-template string Template file for --report-format=template -b, --baseline-path string Baseline JSON report; matching findings are suppressed -l, --log-level string Log level: trace, debug, info, warn, error, fatal (default "info") -v, --verbose Print each finding as it is found --no-color Disable ANSI color in output --max-target-megabytes int Skip files larger than this many MB --redact uint Redact secrets (0=none, 1–99=partial %, 100=REDACTED) --enable-rule stringSlice Only run specific rule IDs -i, --gitleaks-ignore-path Path to .betterleaksignore file or directory --match-context string Context around matches, e.g. "10L", "100C", "-2C,+4C" --max-decode-depth int Recursive decode passes for base64/URL-encoded data (default 5) --max-archive-depth int Scan inside nested archives up to N levels deep (default 0) --timeout int Global timeout in seconds (default 0, no timeout) --regex-engine string Regex engine: re2 (default) or stdlib --validation Enable live API validation of findings --validation-status string Filter by status: valid, invalid, revoked, error, unknown, none --validation-timeout dur Per-request HTTP timeout (default 10s) --validation-workers int Concurrent validation workers (default 10) --validation-debug Include raw HTTP request/response in finding metadata --ignore-gitleaks-allow Ignore // betterleaks:allow and // gitleaks:allow comments --diagnostics string Profiling: cpu, mem, trace, http (comma-separated) --diagnostics-dir string Output directory for profiling files ``` --- Betterleaks covers the full spectrum of secrets-scanning use cases: automated CI/CD pipeline integration via the `betterleaks git` command with SARIF or JSON output, developer-side protection through pre-commit hooks that scan staged changes before they are committed, filesystem auditing with `betterleaks dir` for scanning deployment artifacts and configuration directories, and continuous pipe-based scanning of any command output through `betterleaks stdin`. In all modes, a shared global configuration allows teams to encode organization-specific exclusion rules (test fixtures, bot authors, known-safe placeholder strings) once in a `.betterleaks.toml` committed at the repository root. The most powerful integration pattern is combining CEL-based filtering with live validation: rules that detect a specific credential type define both a `filter` to cut noise and a `validate` expression to confirm the secret is active before alerting. Findings from validated scans can be consumed programmatically via the `detector.Run(ctx, src)` iterator in Go services, or reported in SARIF format to GitHub Advanced Security, enabling automatic PR annotations for newly introduced secrets. Baseline files (`--baseline-path`) allow teams to suppress pre-existing historical findings and focus developer attention only on secrets introduced since the baseline was captured.