### Copy Example Configuration
Source: https://github.com/lekssays/codebadger/blob/main/docs/configuration.md
Start by copying the example configuration file to begin customizing your settings.
```bash
cp config.example.yaml config.yaml
```
--------------------------------
### Local Setup and Dependency Installation
Source: https://github.com/lekssays/codebadger/blob/main/docs/installation.md
Installs Python dependencies within a virtual environment, builds and starts the Joern container using Docker Compose, copies the configuration file, and launches the MCP server.
```bash
# 1. Install Python dependencies (a venv is recommended)
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
# 2. Build and start the Joern container
docker compose up -d
# 3. Create your config from the template
cp config.example.yaml config.yaml
# 4. Start the MCP server
python main.py
```
--------------------------------
### Set Up Development Environment
Source: https://github.com/lekssays/codebadger/blob/main/docs/contributing.md
Create and activate a virtual environment, install dependencies, and start the Docker Compose services for Joern.
```bash
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
docker compose up -d # Joern container, needed for integration tests
```
--------------------------------
### Example Codebase Analysis Session
Source: https://github.com/lekssays/codebadger/blob/main/docs/usage.md
A step-by-step example session demonstrating how to use Codebadger tools to analyze a codebase, from building a CPG to running CPGQL queries.
```text
# 1. Build a CPG (GitHub URL or local path; a sub-path keeps it small/fast)
generate_cpg(source="https://github.com/GNOME/libsoup", language="c")
-> { "codebase_hash": "ddf44eb0a10a85e6", "status": "generating" }
# 2. Wait for it
get_cpg_status(codebase_hash="ddf44eb0a10a85e6") -> { "status": "ready" }
# 3. Orient
get_codebase_summary(codebase_hash="ddf44eb0a10a85e6")
list_methods(codebase_hash="ddf44eb0a10a85e6", name_filter=".*parse.*")
# 4. Hunt
find_taint_flows(codebase_hash="ddf44eb0a10a85e6")
find_integer_overflow(codebase_hash="ddf44eb0a10a85e6")
# 5. Drill into a candidate
get_method_source(codebase_hash="ddf44eb0a10a85e6", method_name="soup_header_parse")
get_program_slice(codebase_hash="ddf44eb0a10a85e6", ...)
# 6. Escape hatch - raw CPGQL for anything the tools don't cover
run_cpgql_query(codebase_hash="ddf44eb0a10a85e6",
query="cpg.call.name(\"memcpy\").l")
```
--------------------------------
### Python Tool Registration Example
Source: https://github.com/lekssays/codebadger/blob/main/docs/custom-tools.md
Example of registering a custom Python tool using the @mcp.tool decorator. It defines parameters, their descriptions, types, and default values, and includes error handling.
```python
@mcp.tool(
description="""One-line summary shown in client listings.
Args:
codebase_hash: Hash returned by generate_cpg.
my_param: What this controls (default "value").
Returns:
Text report with findings and locations.
""",
tags={"security", "CWE-NNN"},
)
def my_tool(
codebase_hash: Annotated[str, Field(description="Codebase hash from generate_cpg")],
my_param: Annotated[str, Field(description="Detection pattern")] = "default",
max_results: Annotated[int, Field(description="Max findings", ge=1, le=500)] = 50,
) -> str:
try:
info = _get_codebase(services, codebase_hash)
query = QueryLoader.load("my_tool", my_pattern=my_param, max_results=max_results)
return _run_query(
services, codebase_hash, info.cpg_path, query,
timeout=60, tool_name="my_tool",
cache_params={"my_param": my_param, "max_results": max_results},
)
except (ValueError, RuntimeError) as e:
return f"Error: {e}"
except Exception as e:
logger.error(f"my_tool: {e}", exc_info=True)
return f"Internal Error: {e}"
```
--------------------------------
### Run Integration Tests
Source: https://github.com/lekssays/codebadger/blob/main/docs/contributing.md
Start the MCP server, run integration tests, and then stop the server.
```bash
python main.py & # start the server in the background
pytest tests/integration -q
pkill -f "python main.py" # stop it
```
--------------------------------
### Verify Prerequisites
Source: https://github.com/lekssays/codebadger/blob/main/docs/installation.md
Check if Docker, Docker Compose, and Python 3.10+ are installed and accessible in your environment.
```bash
docker --version && docker compose version && python --version
```
--------------------------------
### Run Jaeger with Environment Variables
Source: https://github.com/lekssays/codebadger/blob/main/docs/configuration.md
Start a local Jaeger instance and run the application with telemetry enabled via environment variables.
```bash
docker run -d --name jaeger -p 16686:16686 -p 4317:4317 jaegertracing/all-in-one:latest
OTEL_ENABLED=true python main.py
```
--------------------------------
### Docker Compose Up Commands
Source: https://github.com/lekssays/codebadger/blob/main/docs/deployment.md
Commands to start Codebadger services using Docker Compose. Profiles are used to selectively enable PostgreSQL and Redis.
```bash
docker compose up -d # Joern only (default)
docker compose --profile postgres up -d # + Postgres (host port 55432)
docker compose --profile redis up -d # + Redis (host port 56379)
docker compose --profile postgres --profile redis up -d # Joern + Postgres + Redis together
```
--------------------------------
### Typical Code Analysis Workflow
Source: https://github.com/lekssays/codebadger/blob/main/docs/available-tools.md
Illustrates a common workflow for code analysis, starting from CPG generation, exploring code, hunting for vulnerabilities, and confirming findings.
```mermaid
flowchart LR
A[generate_cpg] --> B{get_cpg_status}
B -- generating --> B
B -- ready --> C[Explore
list_methods
get_method_source
get_call_graph
get_codebase_summary]
C --> D[Hunt
find_taint_flows
find_use_after_free
find_integer_overflow
get_program_slice]
D --> E{Promising?}
E -- no --> C
E -- yes --> F[Confirm
get_variable_flow
get_cfg
run_cpgql_query]
F --> G[Build & validate PoC]
```
--------------------------------
### Configure Codebadger Server with Postgres and Redis
Source: https://github.com/lekssays/codebadger/blob/main/docs/deployment.md
Environment variables to configure the Codebadger server to connect to PostgreSQL for the database and Redis for coordination. The server creates the Postgres schema on first start.
```bash
DATABASE_URL=postgresql://codebadger:codebadger@localhost:55432/codebadger \
REDIS_URL=redis://localhost:56379/0 \
CPG_QUEUE_BACKEND=durable python main.py
```
--------------------------------
### Scala Query Template Example
Source: https://github.com/lekssays/codebadger/blob/main/docs/custom-tools.md
A Scala query template that finds calls matching a pattern, extracts their code and location, and formats the output. It uses double-brace syntax for runtime variable substitution.
```scala
{
import io.shiftleft.codepropertygraph.generated.nodes._
import io.shiftleft.semanticcpg.language._
val myPattern = "{{my_pattern}}" // string - keep the quotes
val maxResults = {{max_results}} // numeric - no quotes
val output = new StringBuilder()
val results = cpg.call.name(myPattern).take(maxResults).l
if (results.isEmpty) output.append("No findings.\n")
else results.zipWithIndex.foreach { case (c, i) =>
output.append(s"--- Finding ${i + 1} ---\n")
output.append(s"${c.location.filename}:${c.location.lineNumber.getOrElse(-1)} ${c.code}\n")
}
"\n" + output.toString() + ""
}
```
--------------------------------
### Recommend Host Sizing Configuration
Source: https://github.com/lekssays/codebadger/blob/main/docs/deployment.md
Scripts to recommend resource configurations for Codebadger deployment. These scripts help in autodetecting host capabilities or planning for different environments.
```bash
python scripts/recommend_config.py # autodetect this host
python scripts/recommend_config.py --compare config.yaml # flag risky drift
python scripts/recommend_config.py --worker-mode pool # values for pool mode
python scripts/recommend_config.py --mem 256 --cores 96 # plan another host
```
--------------------------------
### Enable Telemetry Configuration
Source: https://github.com/lekssays/codebadger/blob/main/docs/configuration.md
Configure telemetry settings in your YAML file to enable tracing.
```yaml
telemetry:
enabled: true
service_name: codebadger
otlp_endpoint: http://localhost:4317
otlp_protocol: grpc # or "http/protobuf"
```
--------------------------------
### Run PostgreSQL Integration Tests
Source: https://github.com/lekssays/codebadger/blob/main/docs/contributing.md
Run PostgreSQL-specific tests by exporting the DSN environment variable.
```bash
CODEBADGER_TEST_PG_DSN=postgresql://codebadger:codebadger@localhost:55432/codebadger pytest tests/test_postgres_db_manager.py -q
```
--------------------------------
### VS Code / GitHub Copilot MCP Client Configuration
Source: https://github.com/lekssays/codebadger/blob/main/docs/usage.md
Configuration for VS Code or GitHub Copilot to connect to Codebadger via HTTP. This JSON file should be placed in the user's VS Code configuration directory.
```json
{
"servers": {
"codebadger": { "url": "http://localhost:4242/mcp", "type": "http" }
}
}
```
--------------------------------
### System Overview Diagram
Source: https://github.com/lekssays/codebadger/blob/main/docs/architecture.md
A Mermaid diagram illustrating the overall architecture of the Codebadger system, showing the interaction between the MCP client, server, tool layer, services, and Joern containers.
```mermaid
flowchart TB
Client[MCP client
Copilot / Claude / agent] -->|HTTP /mcp| MCP[FastMCP server - main.py]
subgraph tools[Tool layer - src/tools]
MCP --> CT[core / code_browsing /
taint_analysis / custom tools]
end
subgraph svc[Services - src/services]
CT --> QE[QueryExecutor
per-CPG lock + cache]
CT --> CG[CPGGenerator]
QE --> JM[JoernServerManager
spawn / sleep / evict]
CG --> JM
JM --> PM[PortManager]
JM --> CO[Coordinator
locks]
end
JM -->|exec / containers| JC[(Joern container/s)]
CG -->|build CPG| JC
QE --> STORE[(Catalog + cache + findings + jobs
SQLite or Postgres)]
JM -.pool state.-> REDIS[(Redis - optional)]
CO -.cross-process locks.-> REDIS
```
--------------------------------
### Memory-Aware Admission Flowchart
Source: https://github.com/lekssays/codebadger/blob/main/docs/architecture.md
A Mermaid flowchart depicting the memory-aware admission process, showing how spawn requests are handled, resources are planned, and eviction occurs when the budget is exceeded.
```mermaid
flowchart TD
A[spawn request for CPG] --> B[plan tier from CPG .bin size
→ heap + reservation]
B --> C{reserved + need ≤ budget?}
C -- no --> D[evict global LRU victim] --> C
C -- yes --> E{a port is free?}
E -- no --> D
E -- yes --> F[reserve + allocate port + start server]
F --> G[RSS backstop: evict LRU
if container RSS > threshold]
```
--------------------------------
### CPG Server Lifecycle Diagram
Source: https://github.com/lekssays/codebadger/blob/main/docs/architecture.md
A Mermaid state diagram illustrating the different states of a CPG server, including generating, ready, sleeping, and failed states, and the transitions between them.
```mermaid
stateDiagram-v2
[*] --> generating: generate_cpg
generating --> ready: build + load OK
generating --> failed: build error / timeout
ready --> sleeping: idle / evicted (LRU or RSS)
sleeping --> ready: query auto-wakes (importCpg)
failed --> generating: retry
ready --> [*]: delete
sleeping --> [*]: delete
```
--------------------------------
### Shared vs Pool Deployment Modes
Source: https://github.com/lekssays/codebadger/blob/main/docs/deployment.md
Illustrates the difference between `shared` and `pool` modes for running Joern query servers. `shared` runs all servers in one container, while `pool` uses separate cgroup-capped containers for each query server.
```mermaid
flowchart LR
subgraph shared[shared default]
direction TB
C1[codebadger-joern-server] --> P1[build + all query servers
as processes in ONE container]
end
subgraph pool[pool]
direction TB
C2[codebadger-joern-server
builds only] -.-> X[ ]
C2 --> WK1[worker container
cgroup-capped]
C2 --> WK2[worker container
cgroup-capped]
end
```
--------------------------------
### Repository Layout Overview
Source: https://github.com/lekssays/codebadger/blob/main/docs/architecture.md
Provides a high-level overview of the Codebadger project's directory structure, indicating the purpose of key files and directories.
```text
main.py MCP server entry point, lifespan, health, status logger
config.yaml / src/defaults configuration + centralized defaults
src/
tools/ core_tools, code_browsing_tools, taint_analysis_tools, custom_tools, queries/*.scala
services/ joern_server_manager, query_executor, cpg_generator, codebase_tracker,
coordination, pool_store, port_manager, git_manager
utils/ db_manager (SQLite), postgres_db_manager, postgres_job_store,
recommend, validators, cpgql_validator, cache_cleanup
scripts/ recommend_config.py
tests/ unit + integration suites
```
--------------------------------
### Concurrency and Injection Tools
Source: https://github.com/lekssays/codebadger/blob/main/docs/available-tools.md
Identifies Time-of-Check to Time-of-Use (TOCTOU) race conditions and OS command injection vulnerabilities.
```text
find_toctou
```
```text
find_command_injection_sinks
```
--------------------------------
### Cleanup and Reset
Source: https://github.com/lekssays/codebadger/blob/main/docs/installation.md
Removes all generated codebases, CPGs, and state from Postgres/Redis, and stops the running Docker Compose services.
```bash
bash cleanup.sh # clears codebases, CPGs, and Postgres/Redis state
docker compose down # stop containers
```
--------------------------------
### Run Unit Tests
Source: https://github.com/lekssays/codebadger/blob/main/docs/contributing.md
Execute all unit tests using pytest.
```bash
pytest tests/ -q # unit tests
```
--------------------------------
### Query Flow with Auto-Wake Diagram
Source: https://github.com/lekssays/codebadger/blob/main/docs/architecture.md
A Mermaid sequence diagram detailing the flow of a query, including cache checks, server spawning, and query execution within the Codebadger system.
```mermaid
sequenceDiagram
participant C as Client
participant T as Tool
participant Q as QueryExecutor
participant M as JoernServerManager
participant J as Joern server (per CPG)
C->>T: run_cpgql_query(hash, query)
T->>Q: execute(hash, query)
Q->>Q: cache hit? → return
Q->>M: get_or_create_client(hash)
alt server sleeping / absent
M->>M: plan tier + make room (evict LRU)
M->>J: spawn + importCpg
end
M-->>Q: client
Q->>J: run query (per-CPG lock, timeout)
J-->>Q: codebadger_result text
Q->>Q: cache result
Q-->>C: structured result
```
--------------------------------
### Claude Desktop / Claude Code MCP Client Configuration
Source: https://github.com/lekssays/codebadger/blob/main/docs/usage.md
Configuration for Claude Desktop or Claude Code to connect to Codebadger via HTTP. This JSON file specifies the MCP server details.
```json
{
"mcpServers": {
"codebadger": { "url": "http://localhost:4242/mcp", "type": "http" }
}
}
```
--------------------------------
### Researcher Workflow Diagram
Source: https://github.com/lekssays/codebadger/blob/main/docs/usage.md
A flowchart illustrating the researcher workflow for analyzing codebases using Codebadger, from generating a CPG to building and validating a Proof of Concept.
```mermaid
flowchart LR
A[generate_cpg
local path or GitHub URL] --> B{get_cpg_status}
B -- generating --> B
B -- ready --> C[Explore
list_methods, get_method_source,
get_call_graph, get_codebase_summary]
C --> D[Hunt
find_taint_flows, find_use_after_free,
find_integer_overflow, get_program_slice]
D --> E{Promising flow?}
E -- no --> C
E -- yes --> F[Confirm
get_variable_flow, get_cfg,
run_cpgql_query]
F --> G[Build & validate PoC]
```
--------------------------------
### Scala Path Boundary Regex Helper
Source: https://github.com/lekssays/codebadger/blob/main/docs/custom-tools.md
A Scala helper function to create a regex that anchors a file path to a boundary, ensuring accurate file filtering.
```scala
def pathBoundaryRegex(f: String) = "(^|.*/)" + java.util.regex.Pattern.quote(f) + "$"
```
--------------------------------
### Memory Safety Tools
Source: https://github.com/lekssays/codebadger/blob/main/docs/available-tools.md
These tools detect common memory safety issues such as use-after-free, double-free, null pointer dereferences, heap and stack overflows, and uninitialized reads.
```text
find_use_after_free
```
```text
find_double_free
```
```text
find_null_pointer_deref
```
```text
find_heap_overflow
```
```text
find_stack_overflow
```
```text
find_uninitialized_reads
```
--------------------------------
### Arithmetic and Format String Tools
Source: https://github.com/lekssays/codebadger/blob/main/docs/available-tools.md
Detects integer overflows/underflows that can affect allocations or array indices, and format-string vulnerabilities where non-literals are used in printf-family functions.
```text
find_integer_overflow
```
```text
find_format_string_vulns
```
--------------------------------
### Verify Server Status
Source: https://github.com/lekssays/codebadger/blob/main/docs/installation.md
Checks if the Codebadger server is running by querying the health endpoint and lists the status of Docker Compose services.
```bash
curl -s http://localhost:4242/health | python -m json.tool
docker compose ps
```
--------------------------------
### Discover Fixed Vulnerabilities Tool
Source: https://github.com/lekssays/codebadger/blob/main/docs/available-tools.md
An optional reconnaissance tool that mines Git commit history to identify potential security fixes, providing hints about attack surfaces and past vulnerability patterns.
```text
discover_fixed_vulnerabilities
```
--------------------------------
### Docker Compose Down Commands
Source: https://github.com/lekssays/codebadger/blob/main/docs/deployment.md
Commands to stop and remove Codebadger services managed by Docker Compose. Profiles are necessary for correct teardown.
```bash
docker compose --profile postgres --profile redis ps
docker compose --profile postgres --profile redis down
```
--------------------------------
### Codebadger Citation
Source: https://github.com/lekssays/codebadger/blob/main/README.md
Citation details for the codebadger paper 'Bridging Code Property Graphs and Language Models for Program Analysis'.
```bibtex
@inproceedings{lekssays2026bridging,
title={Bridging Code Property Graphs and Language Models for Program Analysis},
author={Lekssays, Ahmed},
booktitle={Proceedings of the 2026 IEEE/ACM 4th International Workshop on Software Vulnerability Management},
pages={33--40},
year={2026}
}
```
=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.