### Copy Example Configuration Source: https://github.com/lekssays/codebadger/blob/main/docs/configuration.md Start by copying the example configuration file to begin customizing your settings. ```bash cp config.example.yaml config.yaml ``` -------------------------------- ### Local Setup and Dependency Installation Source: https://github.com/lekssays/codebadger/blob/main/docs/installation.md Installs Python dependencies within a virtual environment, builds and starts the Joern container using Docker Compose, copies the configuration file, and launches the MCP server. ```bash # 1. Install Python dependencies (a venv is recommended) python -m venv venv && source venv/bin/activate pip install -r requirements.txt # 2. Build and start the Joern container docker compose up -d # 3. Create your config from the template cp config.example.yaml config.yaml # 4. Start the MCP server python main.py ``` -------------------------------- ### Set Up Development Environment Source: https://github.com/lekssays/codebadger/blob/main/docs/contributing.md Create and activate a virtual environment, install dependencies, and start the Docker Compose services for Joern. ```bash python -m venv venv && source venv/bin/activate pip install -r requirements.txt docker compose up -d # Joern container, needed for integration tests ``` -------------------------------- ### Example Codebase Analysis Session Source: https://github.com/lekssays/codebadger/blob/main/docs/usage.md A step-by-step example session demonstrating how to use Codebadger tools to analyze a codebase, from building a CPG to running CPGQL queries. ```text # 1. Build a CPG (GitHub URL or local path; a sub-path keeps it small/fast) generate_cpg(source="https://github.com/GNOME/libsoup", language="c") -> { "codebase_hash": "ddf44eb0a10a85e6", "status": "generating" } # 2. Wait for it get_cpg_status(codebase_hash="ddf44eb0a10a85e6") -> { "status": "ready" } # 3. Orient get_codebase_summary(codebase_hash="ddf44eb0a10a85e6") list_methods(codebase_hash="ddf44eb0a10a85e6", name_filter=".*parse.*") # 4. Hunt find_taint_flows(codebase_hash="ddf44eb0a10a85e6") find_integer_overflow(codebase_hash="ddf44eb0a10a85e6") # 5. Drill into a candidate get_method_source(codebase_hash="ddf44eb0a10a85e6", method_name="soup_header_parse") get_program_slice(codebase_hash="ddf44eb0a10a85e6", ...) # 6. Escape hatch - raw CPGQL for anything the tools don't cover run_cpgql_query(codebase_hash="ddf44eb0a10a85e6", query="cpg.call.name(\"memcpy\").l") ``` -------------------------------- ### Python Tool Registration Example Source: https://github.com/lekssays/codebadger/blob/main/docs/custom-tools.md Example of registering a custom Python tool using the @mcp.tool decorator. It defines parameters, their descriptions, types, and default values, and includes error handling. ```python @mcp.tool( description="""One-line summary shown in client listings. Args: codebase_hash: Hash returned by generate_cpg. my_param: What this controls (default "value"). Returns: Text report with findings and locations. """, tags={"security", "CWE-NNN"}, ) def my_tool( codebase_hash: Annotated[str, Field(description="Codebase hash from generate_cpg")], my_param: Annotated[str, Field(description="Detection pattern")] = "default", max_results: Annotated[int, Field(description="Max findings", ge=1, le=500)] = 50, ) -> str: try: info = _get_codebase(services, codebase_hash) query = QueryLoader.load("my_tool", my_pattern=my_param, max_results=max_results) return _run_query( services, codebase_hash, info.cpg_path, query, timeout=60, tool_name="my_tool", cache_params={"my_param": my_param, "max_results": max_results}, ) except (ValueError, RuntimeError) as e: return f"Error: {e}" except Exception as e: logger.error(f"my_tool: {e}", exc_info=True) return f"Internal Error: {e}" ``` -------------------------------- ### Run Integration Tests Source: https://github.com/lekssays/codebadger/blob/main/docs/contributing.md Start the MCP server, run integration tests, and then stop the server. ```bash python main.py & # start the server in the background pytest tests/integration -q pkill -f "python main.py" # stop it ``` -------------------------------- ### Verify Prerequisites Source: https://github.com/lekssays/codebadger/blob/main/docs/installation.md Check if Docker, Docker Compose, and Python 3.10+ are installed and accessible in your environment. ```bash docker --version && docker compose version && python --version ``` -------------------------------- ### Run Jaeger with Environment Variables Source: https://github.com/lekssays/codebadger/blob/main/docs/configuration.md Start a local Jaeger instance and run the application with telemetry enabled via environment variables. ```bash docker run -d --name jaeger -p 16686:16686 -p 4317:4317 jaegertracing/all-in-one:latest OTEL_ENABLED=true python main.py ``` -------------------------------- ### Docker Compose Up Commands Source: https://github.com/lekssays/codebadger/blob/main/docs/deployment.md Commands to start Codebadger services using Docker Compose. Profiles are used to selectively enable PostgreSQL and Redis. ```bash docker compose up -d # Joern only (default) docker compose --profile postgres up -d # + Postgres (host port 55432) docker compose --profile redis up -d # + Redis (host port 56379) docker compose --profile postgres --profile redis up -d # Joern + Postgres + Redis together ``` -------------------------------- ### Typical Code Analysis Workflow Source: https://github.com/lekssays/codebadger/blob/main/docs/available-tools.md Illustrates a common workflow for code analysis, starting from CPG generation, exploring code, hunting for vulnerabilities, and confirming findings. ```mermaid flowchart LR A[generate_cpg] --> B{get_cpg_status} B -- generating --> B B -- ready --> C[Explore
list_methods get_method_source get_call_graph get_codebase_summary] C --> D[Hunt
find_taint_flows find_use_after_free find_integer_overflow get_program_slice] D --> E{Promising?} E -- no --> C E -- yes --> F[Confirm
get_variable_flow get_cfg run_cpgql_query] F --> G[Build & validate PoC] ``` -------------------------------- ### Configure Codebadger Server with Postgres and Redis Source: https://github.com/lekssays/codebadger/blob/main/docs/deployment.md Environment variables to configure the Codebadger server to connect to PostgreSQL for the database and Redis for coordination. The server creates the Postgres schema on first start. ```bash DATABASE_URL=postgresql://codebadger:codebadger@localhost:55432/codebadger \ REDIS_URL=redis://localhost:56379/0 \ CPG_QUEUE_BACKEND=durable python main.py ``` -------------------------------- ### Scala Query Template Example Source: https://github.com/lekssays/codebadger/blob/main/docs/custom-tools.md A Scala query template that finds calls matching a pattern, extracts their code and location, and formats the output. It uses double-brace syntax for runtime variable substitution. ```scala { import io.shiftleft.codepropertygraph.generated.nodes._ import io.shiftleft.semanticcpg.language._ val myPattern = "{{my_pattern}}" // string - keep the quotes val maxResults = {{max_results}} // numeric - no quotes val output = new StringBuilder() val results = cpg.call.name(myPattern).take(maxResults).l if (results.isEmpty) output.append("No findings.\n") else results.zipWithIndex.foreach { case (c, i) => output.append(s"--- Finding ${i + 1} ---\n") output.append(s"${c.location.filename}:${c.location.lineNumber.getOrElse(-1)} ${c.code}\n") } "\n" + output.toString() + "" } ``` -------------------------------- ### Recommend Host Sizing Configuration Source: https://github.com/lekssays/codebadger/blob/main/docs/deployment.md Scripts to recommend resource configurations for Codebadger deployment. These scripts help in autodetecting host capabilities or planning for different environments. ```bash python scripts/recommend_config.py # autodetect this host python scripts/recommend_config.py --compare config.yaml # flag risky drift python scripts/recommend_config.py --worker-mode pool # values for pool mode python scripts/recommend_config.py --mem 256 --cores 96 # plan another host ``` -------------------------------- ### Enable Telemetry Configuration Source: https://github.com/lekssays/codebadger/blob/main/docs/configuration.md Configure telemetry settings in your YAML file to enable tracing. ```yaml telemetry: enabled: true service_name: codebadger otlp_endpoint: http://localhost:4317 otlp_protocol: grpc # or "http/protobuf" ``` -------------------------------- ### Run PostgreSQL Integration Tests Source: https://github.com/lekssays/codebadger/blob/main/docs/contributing.md Run PostgreSQL-specific tests by exporting the DSN environment variable. ```bash CODEBADGER_TEST_PG_DSN=postgresql://codebadger:codebadger@localhost:55432/codebadger pytest tests/test_postgres_db_manager.py -q ``` -------------------------------- ### VS Code / GitHub Copilot MCP Client Configuration Source: https://github.com/lekssays/codebadger/blob/main/docs/usage.md Configuration for VS Code or GitHub Copilot to connect to Codebadger via HTTP. This JSON file should be placed in the user's VS Code configuration directory. ```json { "servers": { "codebadger": { "url": "http://localhost:4242/mcp", "type": "http" } } } ``` -------------------------------- ### System Overview Diagram Source: https://github.com/lekssays/codebadger/blob/main/docs/architecture.md A Mermaid diagram illustrating the overall architecture of the Codebadger system, showing the interaction between the MCP client, server, tool layer, services, and Joern containers. ```mermaid flowchart TB Client[MCP client
Copilot / Claude / agent] -->|HTTP /mcp| MCP[FastMCP server - main.py] subgraph tools[Tool layer - src/tools] MCP --> CT[core / code_browsing /
taint_analysis / custom tools] end subgraph svc[Services - src/services] CT --> QE[QueryExecutor
per-CPG lock + cache] CT --> CG[CPGGenerator] QE --> JM[JoernServerManager
spawn / sleep / evict] CG --> JM JM --> PM[PortManager] JM --> CO[Coordinator
locks] end JM -->|exec / containers| JC[(Joern container/s)] CG -->|build CPG| JC QE --> STORE[(Catalog + cache + findings + jobs
SQLite or Postgres)] JM -.pool state.-> REDIS[(Redis - optional)] CO -.cross-process locks.-> REDIS ``` -------------------------------- ### Memory-Aware Admission Flowchart Source: https://github.com/lekssays/codebadger/blob/main/docs/architecture.md A Mermaid flowchart depicting the memory-aware admission process, showing how spawn requests are handled, resources are planned, and eviction occurs when the budget is exceeded. ```mermaid flowchart TD A[spawn request for CPG] --> B[plan tier from CPG .bin size
→ heap + reservation] B --> C{reserved + need ≤ budget?} C -- no --> D[evict global LRU victim] --> C C -- yes --> E{a port is free?} E -- no --> D E -- yes --> F[reserve + allocate port + start server] F --> G[RSS backstop: evict LRU
if container RSS > threshold] ``` -------------------------------- ### CPG Server Lifecycle Diagram Source: https://github.com/lekssays/codebadger/blob/main/docs/architecture.md A Mermaid state diagram illustrating the different states of a CPG server, including generating, ready, sleeping, and failed states, and the transitions between them. ```mermaid stateDiagram-v2 [*] --> generating: generate_cpg generating --> ready: build + load OK generating --> failed: build error / timeout ready --> sleeping: idle / evicted (LRU or RSS) sleeping --> ready: query auto-wakes (importCpg) failed --> generating: retry ready --> [*]: delete sleeping --> [*]: delete ``` -------------------------------- ### Shared vs Pool Deployment Modes Source: https://github.com/lekssays/codebadger/blob/main/docs/deployment.md Illustrates the difference between `shared` and `pool` modes for running Joern query servers. `shared` runs all servers in one container, while `pool` uses separate cgroup-capped containers for each query server. ```mermaid flowchart LR subgraph shared[shared default] direction TB C1[codebadger-joern-server] --> P1[build + all query servers
as processes in ONE container] end subgraph pool[pool] direction TB C2[codebadger-joern-server
builds only] -.-> X[ ] C2 --> WK1[worker container
cgroup-capped] C2 --> WK2[worker container
cgroup-capped] end ``` -------------------------------- ### Repository Layout Overview Source: https://github.com/lekssays/codebadger/blob/main/docs/architecture.md Provides a high-level overview of the Codebadger project's directory structure, indicating the purpose of key files and directories. ```text main.py MCP server entry point, lifespan, health, status logger config.yaml / src/defaults configuration + centralized defaults src/ tools/ core_tools, code_browsing_tools, taint_analysis_tools, custom_tools, queries/*.scala services/ joern_server_manager, query_executor, cpg_generator, codebase_tracker, coordination, pool_store, port_manager, git_manager utils/ db_manager (SQLite), postgres_db_manager, postgres_job_store, recommend, validators, cpgql_validator, cache_cleanup scripts/ recommend_config.py tests/ unit + integration suites ``` -------------------------------- ### Concurrency and Injection Tools Source: https://github.com/lekssays/codebadger/blob/main/docs/available-tools.md Identifies Time-of-Check to Time-of-Use (TOCTOU) race conditions and OS command injection vulnerabilities. ```text find_toctou ``` ```text find_command_injection_sinks ``` -------------------------------- ### Cleanup and Reset Source: https://github.com/lekssays/codebadger/blob/main/docs/installation.md Removes all generated codebases, CPGs, and state from Postgres/Redis, and stops the running Docker Compose services. ```bash bash cleanup.sh # clears codebases, CPGs, and Postgres/Redis state docker compose down # stop containers ``` -------------------------------- ### Run Unit Tests Source: https://github.com/lekssays/codebadger/blob/main/docs/contributing.md Execute all unit tests using pytest. ```bash pytest tests/ -q # unit tests ``` -------------------------------- ### Query Flow with Auto-Wake Diagram Source: https://github.com/lekssays/codebadger/blob/main/docs/architecture.md A Mermaid sequence diagram detailing the flow of a query, including cache checks, server spawning, and query execution within the Codebadger system. ```mermaid sequenceDiagram participant C as Client participant T as Tool participant Q as QueryExecutor participant M as JoernServerManager participant J as Joern server (per CPG) C->>T: run_cpgql_query(hash, query) T->>Q: execute(hash, query) Q->>Q: cache hit? → return Q->>M: get_or_create_client(hash) alt server sleeping / absent M->>M: plan tier + make room (evict LRU) M->>J: spawn + importCpg end M-->>Q: client Q->>J: run query (per-CPG lock, timeout) J-->>Q: codebadger_result text Q->>Q: cache result Q-->>C: structured result ``` -------------------------------- ### Claude Desktop / Claude Code MCP Client Configuration Source: https://github.com/lekssays/codebadger/blob/main/docs/usage.md Configuration for Claude Desktop or Claude Code to connect to Codebadger via HTTP. This JSON file specifies the MCP server details. ```json { "mcpServers": { "codebadger": { "url": "http://localhost:4242/mcp", "type": "http" } } } ``` -------------------------------- ### Researcher Workflow Diagram Source: https://github.com/lekssays/codebadger/blob/main/docs/usage.md A flowchart illustrating the researcher workflow for analyzing codebases using Codebadger, from generating a CPG to building and validating a Proof of Concept. ```mermaid flowchart LR A[generate_cpg
local path or GitHub URL] --> B{get_cpg_status} B -- generating --> B B -- ready --> C[Explore
list_methods, get_method_source,
get_call_graph, get_codebase_summary] C --> D[Hunt
find_taint_flows, find_use_after_free,
find_integer_overflow, get_program_slice] D --> E{Promising flow?} E -- no --> C E -- yes --> F[Confirm
get_variable_flow, get_cfg,
run_cpgql_query] F --> G[Build & validate PoC] ``` -------------------------------- ### Scala Path Boundary Regex Helper Source: https://github.com/lekssays/codebadger/blob/main/docs/custom-tools.md A Scala helper function to create a regex that anchors a file path to a boundary, ensuring accurate file filtering. ```scala def pathBoundaryRegex(f: String) = "(^|.*/)" + java.util.regex.Pattern.quote(f) + "$" ``` -------------------------------- ### Memory Safety Tools Source: https://github.com/lekssays/codebadger/blob/main/docs/available-tools.md These tools detect common memory safety issues such as use-after-free, double-free, null pointer dereferences, heap and stack overflows, and uninitialized reads. ```text find_use_after_free ``` ```text find_double_free ``` ```text find_null_pointer_deref ``` ```text find_heap_overflow ``` ```text find_stack_overflow ``` ```text find_uninitialized_reads ``` -------------------------------- ### Arithmetic and Format String Tools Source: https://github.com/lekssays/codebadger/blob/main/docs/available-tools.md Detects integer overflows/underflows that can affect allocations or array indices, and format-string vulnerabilities where non-literals are used in printf-family functions. ```text find_integer_overflow ``` ```text find_format_string_vulns ``` -------------------------------- ### Verify Server Status Source: https://github.com/lekssays/codebadger/blob/main/docs/installation.md Checks if the Codebadger server is running by querying the health endpoint and lists the status of Docker Compose services. ```bash curl -s http://localhost:4242/health | python -m json.tool docker compose ps ``` -------------------------------- ### Discover Fixed Vulnerabilities Tool Source: https://github.com/lekssays/codebadger/blob/main/docs/available-tools.md An optional reconnaissance tool that mines Git commit history to identify potential security fixes, providing hints about attack surfaces and past vulnerability patterns. ```text discover_fixed_vulnerabilities ``` -------------------------------- ### Docker Compose Down Commands Source: https://github.com/lekssays/codebadger/blob/main/docs/deployment.md Commands to stop and remove Codebadger services managed by Docker Compose. Profiles are necessary for correct teardown. ```bash docker compose --profile postgres --profile redis ps docker compose --profile postgres --profile redis down ``` -------------------------------- ### Codebadger Citation Source: https://github.com/lekssays/codebadger/blob/main/README.md Citation details for the codebadger paper 'Bridging Code Property Graphs and Language Models for Program Analysis'. ```bibtex @inproceedings{lekssays2026bridging, title={Bridging Code Property Graphs and Language Models for Program Analysis}, author={Lekssays, Ahmed}, booktitle={Proceedings of the 2026 IEEE/ACM 4th International Workshop on Software Vulnerability Management}, pages={33--40}, year={2026} } ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.