4
Chai is running their own open source leaderboard
(sh.itjust.works)
Community to discuss about LLaMA, the large language model created by Meta AI.
This is intended to be a replacement for r/LocalLLaMA on Reddit.
Yeah it's a step in the right direction at least, though now that you mention it doesn't lmsys or someone do the same with human eval and side by side comparisons?
It's such a tricky line to walk between deterministic questions (repeatable but cheatable) and user questions (real world but potentially unfair)