VisualWebArena is a realistic and diverse benchmark for evaluating multimodal autonomous language...

Tokens:12,016
Snippets:91
Trust Score:7.5
License:MIT
Update:2 months ago
Tokens:
Raw