🌐 Kimi MoonshotSignificantQiuyang Mang
FrontierCS Harbor: A Benchmark for Long-Horizon Coding Agent Evaluation
Qiuyang Mang announces integration of FrontierCS benchmark into Harbor evaluation platform, releasing a preview long-horizon agent leaderboard. The benchmark tests coding agents over extended interactions (up to 835 turn…