A benchmarking framework for evaluating large language models on ARC-AGI pattern recognition tasks,...

Tokens:38,839
Snippets:467
Trust Score:7.2
License:MIT
Update:2 weeks ago
Tokens:
Raw