JudgeBench is a comprehensive benchmark for evaluating LLM-based judges, assessing how well language...

Tokens:10,389
Snippets:60
Trust Score:4.1
Update:1 month ago
Tokens:
Raw