### Setup ProgramBench for development Source: https://github.com/facebookresearch/programbench/blob/main/README.md Clone the ProgramBench repository and install development dependencies using uv sync. This prepares the project for local development and contribution. ```bash git clone https://github.com/facebookresearch/programbench.git cd programbench uv sync ``` -------------------------------- ### Install ProgramBench using uvx Source: https://github.com/facebookresearch/programbench/blob/main/README.md Run ProgramBench directly without a global installation using uvx. This is a quick way to test the command-line interface. ```bash uvx programbench --help ``` -------------------------------- ### Install ProgramBench using pip Source: https://github.com/facebookresearch/programbench/blob/main/README.md Install ProgramBench globally or within a virtual environment using pip. This is a standard Python package installation method. ```bash pip install programbench ``` -------------------------------- ### Run mini-swe-agent baseline with uvx Source: https://github.com/facebookresearch/programbench/blob/main/README.md Execute the mini-swe-agent baseline for ProgramBench using uvx. This command installs and runs the baseline tool. ```bash uvx --from mini-swe-agent mini-extra programbench --help ``` -------------------------------- ### Install ProgramBench into a project Source: https://github.com/facebookresearch/programbench/blob/main/README.md Install ProgramBench as a dependency within your Python project using uv. This ensures project-specific versioning and management. ```bash uv pip install programbench ``` -------------------------------- ### Run mini-swe-agent baseline with pip Source: https://github.com/facebookresearch/programbench/blob/main/README.md Install and run the mini-swe-agent baseline for ProgramBench using pip. This is an alternative method for accessing the baseline tool. ```bash pip install mini-swe-agent && mini-extra programbench --help ``` -------------------------------- ### Evaluation Output Structure Source: https://github.com/facebookresearch/programbench/blob/main/docs/README.md Example of the directory structure after evaluation, showing generated JSON evaluation files alongside submission archives. ```text my-amazing-agent-run ├── abishekvashok__cmatrix.5c082c6 │ └── submission.tar.gz │ └── abishekvashok__cmatrix.5c082c6.eval.json ├── agourlay__zip-password-finder.704700d │ └── submission.tar.gz │ └── agourlay__zip-password-finder.704700d.eval.json ├── ajeetdsouza__zoxide.67ca1bc │ └── submission.tar.gz │ └── ajeetdsouza__zoxide.67ca1bc.eval.json ├── ... ``` -------------------------------- ### ProgramBench CLI Commands Source: https://github.com/facebookresearch/programbench/blob/main/CLAUDE.md Quick commands to manage dependencies, run the program, and execute tests using uv. ```bash uv sync # install deps uv run programbench # run the CLI uv run pytest # run tests ``` -------------------------------- ### Pre-download Test Blobs Source: https://github.com/facebookresearch/programbench/blob/main/docs/README.md Commands to synchronize test blobs from HuggingFace. Use the first command for all instances or the second for a single instance. ```bash uv run programbench blob sync ``` ```bash uv run programbench blob sync ``` -------------------------------- ### Agent Submission Archive Format Source: https://github.com/facebookresearch/programbench/blob/main/docs/README.md Illustrates the required directory structure for agent submissions. Each task directory should contain a `submission.tar.gz` file. ```text my-amazing-agent-run ├── abishekvashok__cmatrix.5c082c6 │ └── submission.tar.gz ├── agourlay__zip-password-finder.704700d │ └── submission.tar.gz ├── ajeetdsouza__zoxide.67ca1bc │ └── submission.tar.gz ├── alecthomas__chroma.8d04def │ └── submission.tar.gz ├── ... ``` -------------------------------- ### Run ProgramBench Evaluation Source: https://github.com/facebookresearch/programbench/blob/main/docs/README.md Command to execute the evaluation of an agent run using ProgramBench. This command will automatically pull necessary Docker containers. ```bash uv run programbench eval /path/to/my-amazing-agent-run ``` -------------------------------- ### View Evaluation Summary Source: https://github.com/facebookresearch/programbench/blob/main/docs/README.md Command to display a summary of all evaluation outputs previously generated and stored in the agent run directory. ```bash uv run programbench info /path/to/my-amazing-agent-run ``` -------------------------------- ### Python Style: Avoid Guarding What Would Fail Anyway Source: https://github.com/facebookresearch/programbench/blob/main/CLAUDE.md Demonstrates avoiding explicit checks for conditions that would cause a failure anyway, leading to clearer error reporting. ```python # bad input = input() if not "=" in input: raise ValueError("Input must be of form a=b") x, y = input.split("=") # good x, y = input().split("=") ``` -------------------------------- ### Docker Image for Inference Source: https://github.com/facebookresearch/programbench/blob/main/docs/README.md Specifies the Docker image to use for agent inference, tagged with `task_cleanroom`. Replace `__` with `_1776_` for specific tasks. ```text https://hub.docker.com/repository/docker/programbench/ffmpeg_1776_ffmpeg.360a402/tags/task_cleanroom/ ``` -------------------------------- ### ProgramBench Citation Source: https://github.com/facebookresearch/programbench/blob/main/README.md BibTeX entry for citing the ProgramBench project in academic work. Use this when referencing the project in publications. ```bibtex @misc{yang2026programbenchlanguagemodelsrebuild, title={ProgramBench: Can Language Models Rebuild Programs From Scratch?}, author={John Yang and Kilian Lieret and Jeffrey Ma and Parth Thakkar and Dmitrii Pedchenko and Sten Sootla and Emily McMilin and Pengcheng Yin and Rui Hou and Gabriel Synnaeve and Diyi Yang and Ofir Press}, year={2026}, eprint={2605.03546}, archivePrefix={arXiv}, primaryClass={cs.SE}, url={https://arxiv.org/abs/2605.03546}, } ``` -------------------------------- ### Python Style: Pass Expressions Directly Source: https://github.com/facebookresearch/programbench/blob/main/CLAUDE.md Illustrates passing expressions directly to functions instead of initializing variables first. ```python # bad a = func() Class(a) # good Class(func()) ``` -------------------------------- ### Test Style: Concise Assertions Source: https://github.com/facebookresearch/programbench/blob/main/CLAUDE.md Shows how to write concise assertions in tests by directly asserting the result of a function call. ```python # bad result = func() assert result == b # good assert func() == b ``` -------------------------------- ### ProgramBench JSON Output Structure Source: https://github.com/facebookresearch/programbench/blob/main/docs/README.md This JSON structure represents the output of a ProgramBench run, detailing test results, execution logs, and metadata. It is used to analyze the performance and correctness of code submissions. ```json { "test_results": [ { "name": "tests.test_foo.test_passes", "branch": "abc123def456", "status": "passed", "extra": { "time": 0.002 } }, { "name": "tests.test_foo.test_fails", "branch": "abc123def456", "status": "failure", "extra": { "time": 0.008, "message": "AssertionError: expected 'X' but got 'Y'", "text": "executable_path=/workspace/build/foo ..." } } ], "error_code": null, "error_details": null, "log": [ ... { "step": "results_read", "branch": "abc123def456", "command": "cat eval/results.xml", "wall_time": 0.071, "output": "...", "returncode": 0, "exception_info": "" } ], "solution_branch": "submission", "test_branches": ["abc123def456", "fedcba654321"], "test_branch_errors": {}, "executable_hash": "980ff4f78ca130cedceaa42cec78431184827154fbc4ef95d2df5c8fee948186", "warnings": [] } ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.