ScienceAgentBench is a benchmark for rigorously evaluating language agents on data-driven scientific...

Tokens:6,366
Snippets:72
Trust Score:9.2
License:MIT
Update:1 month ago
Tokens:
Raw