🤗 HuggingFaceSignificantNathan
Claw-Eval: The Real-World AI Agent Benchmark Challenging Traditional Leaderboards
Claw-Eval benchmark released on HuggingFace evaluates AI models on real-world agent tasks across PinchBench, OfficeQA, OneMillion-Bench, Finance Agent, and Terminal-Bench 2.0. Xiaomi MiMo-V2.5-Pro (1T params) ranked #1,…