### Setup ProgramBench for development

Source: https://github.com/facebookresearch/programbench/blob/main/README.md

Clone the ProgramBench repository and install development dependencies using uv sync. This prepares the project for local development and contribution.

```bash
git clone https://github.com/facebookresearch/programbench.git
cd programbench
uv sync
```

--------------------------------

### Install ProgramBench using uvx

Source: https://github.com/facebookresearch/programbench/blob/main/README.md

Run ProgramBench directly without a global installation using uvx. This is a quick way to test the command-line interface.

```bash
uvx programbench --help
```

--------------------------------

### Install ProgramBench using pip

Source: https://github.com/facebookresearch/programbench/blob/main/README.md

Install ProgramBench globally or within a virtual environment using pip. This is a standard Python package installation method.

```bash
pip install programbench
```

--------------------------------

### Run mini-swe-agent baseline with uvx

Source: https://github.com/facebookresearch/programbench/blob/main/README.md

Execute the mini-swe-agent baseline for ProgramBench using uvx. This command installs and runs the baseline tool.

```bash
uvx --from mini-swe-agent mini-extra programbench --help
```

--------------------------------

### Install ProgramBench into a project

Source: https://github.com/facebookresearch/programbench/blob/main/README.md

Install ProgramBench as a dependency within your Python project using uv. This ensures project-specific versioning and management.

```bash
uv pip install programbench
```

--------------------------------

### Run mini-swe-agent baseline with pip

Source: https://github.com/facebookresearch/programbench/blob/main/README.md

Install and run the mini-swe-agent baseline for ProgramBench using pip. This is an alternative method for accessing the baseline tool.

```bash
pip install mini-swe-agent && mini-extra programbench --help
```

--------------------------------

### Evaluation Output Structure

Source: https://github.com/facebookresearch/programbench/blob/main/docs/README.md

Example of the directory structure after evaluation, showing generated JSON evaluation files alongside submission archives.

```text
my-amazing-agent-run
├── abishekvashok__cmatrix.5c082c6
│	└── submission.tar.gz
│	└── abishekvashok__cmatrix.5c082c6.eval.json
├── agourlay__zip-password-finder.704700d
│	└── submission.tar.gz
│	└── agourlay__zip-password-finder.704700d.eval.json
├── ajeetdsouza__zoxide.67ca1bc
│	└── submission.tar.gz
│	└── ajeetdsouza__zoxide.67ca1bc.eval.json
├── ...

```

--------------------------------

### ProgramBench CLI Commands

Source: https://github.com/facebookresearch/programbench/blob/main/CLAUDE.md

Quick commands to manage dependencies, run the program, and execute tests using uv.

```bash
uv sync                  # install deps
uv run programbench      # run the CLI
uv run pytest            # run tests
```

--------------------------------

### Pre-download Test Blobs

Source: https://github.com/facebookresearch/programbench/blob/main/docs/README.md

Commands to synchronize test blobs from HuggingFace. Use the first command for all instances or the second for a single instance.

```bash
uv run programbench blob sync

```

```bash
uv run programbench blob sync <instance_id>

```

--------------------------------

### Agent Submission Archive Format

Source: https://github.com/facebookresearch/programbench/blob/main/docs/README.md

Illustrates the required directory structure for agent submissions. Each task directory should contain a `submission.tar.gz` file.

```text
my-amazing-agent-run
├── abishekvashok__cmatrix.5c082c6
│	└── submission.tar.gz
├── agourlay__zip-password-finder.704700d
│	└── submission.tar.gz
├── ajeetdsouza__zoxide.67ca1bc
│	└── submission.tar.gz
├── alecthomas__chroma.8d04def
│	└── submission.tar.gz
├── ...

```

--------------------------------

### Run ProgramBench Evaluation

Source: https://github.com/facebookresearch/programbench/blob/main/docs/README.md

Command to execute the evaluation of an agent run using ProgramBench. This command will automatically pull necessary Docker containers.

```bash
uv run programbench eval /path/to/my-amazing-agent-run
```

--------------------------------

### View Evaluation Summary

Source: https://github.com/facebookresearch/programbench/blob/main/docs/README.md

Command to display a summary of all evaluation outputs previously generated and stored in the agent run directory.

```bash
uv run programbench info /path/to/my-amazing-agent-run
```

--------------------------------

### Python Style: Avoid Guarding What Would Fail Anyway

Source: https://github.com/facebookresearch/programbench/blob/main/CLAUDE.md

Demonstrates avoiding explicit checks for conditions that would cause a failure anyway, leading to clearer error reporting.

```python
# bad
input = input()
if not "=" in input:
    raise ValueError("Input must be of form a=b")
x, y = input.split("=")

# good
x, y = input().split("=")
```

--------------------------------

### Docker Image for Inference

Source: https://github.com/facebookresearch/programbench/blob/main/docs/README.md

Specifies the Docker image to use for agent inference, tagged with `task_cleanroom`. Replace `__` with `_1776_` for specific tasks.

```text
https://hub.docker.com/repository/docker/programbench/ffmpeg_1776_ffmpeg.360a402/tags/task_cleanroom/
```

--------------------------------

### ProgramBench Citation

Source: https://github.com/facebookresearch/programbench/blob/main/README.md

BibTeX entry for citing the ProgramBench project in academic work. Use this when referencing the project in publications.

```bibtex
@misc{yang2026programbenchlanguagemodelsrebuild,
    title={ProgramBench: Can Language Models Rebuild Programs From Scratch?},
    author={John Yang and Kilian Lieret and Jeffrey Ma and Parth Thakkar and Dmitrii Pedchenko and Sten Sootla and Emily McMilin and Pengcheng Yin and Rui Hou and Gabriel Synnaeve and Diyi Yang and Ofir Press},
    year={2026},
    eprint={2605.03546},
    archivePrefix={arXiv},
    primaryClass={cs.SE},
    url={https://arxiv.org/abs/2605.03546},
}
```

--------------------------------

### Python Style: Pass Expressions Directly

Source: https://github.com/facebookresearch/programbench/blob/main/CLAUDE.md

Illustrates passing expressions directly to functions instead of initializing variables first.

```python
# bad
a = func()
Class(a)

# good
Class(func())
```

--------------------------------

### Test Style: Concise Assertions

Source: https://github.com/facebookresearch/programbench/blob/main/CLAUDE.md

Shows how to write concise assertions in tests by directly asserting the result of a function call.

```python
# bad
result = func()
assert result == b

# good
assert func() == b
```

--------------------------------

### ProgramBench JSON Output Structure

Source: https://github.com/facebookresearch/programbench/blob/main/docs/README.md

This JSON structure represents the output of a ProgramBench run, detailing test results, execution logs, and metadata. It is used to analyze the performance and correctness of code submissions.

```json
{
    "test_results": [
        {
        "name": "tests.test_foo.test_passes",
        "branch": "abc123def456",
        "status": "passed",
        "extra": { "time": 0.002 }
        },
        {
        "name": "tests.test_foo.test_fails",
        "branch": "abc123def456",
        "status": "failure",
        "extra": {
            "time": 0.008,
            "message": "AssertionError: expected 'X' but got 'Y'",
            "text": "executable_path=/workspace/build/foo ..."
        }
        }
    ],
    "error_code": null,
    "error_details": null,
    "log": [
        ...
        {
        "step": "results_read",
        "branch": "abc123def456",
        "command": "cat eval/results.xml",
        "wall_time": 0.071,
        "output": "<?xml version=\"1.0\" ...?><testsuites>...</testsuites>",
        "returncode": 0,
        "exception_info": ""
        }
    ],
    "solution_branch": "submission",
    "test_branches": ["abc123def456", "fedcba654321"],
    "test_branch_errors": {},
    "executable_hash": "980ff4f78ca130cedceaa42cec78431184827154fbc4ef95d2df5c8fee948186",
    "warnings": []
}
```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.