ProgramBench evaluates LM-based SWE-agents' ability to reverse-engineer black-box software systems....

Tokens:2,901
Snippets:44
Trust Score:9.5
License:MIT
Update:3 weeks ago
Tokens:
Raw