SWE-bench is a benchmark for evaluating large language models on real-world software issues...

Tokens:17,584
Snippets:183
Trust Score:7.7
Update:2 months ago
Tokens:
Raw