35
submitted 6 days ago* (last edited 6 days ago) by BB84@mander.xyz to c/localllama@sh.itjust.works

Absolutely humongous model. Mixture of 256 experts with 8 activated each time.

Aider leaderboard: The only model above ๐Ÿ‹ v3 here is ~~Open~~AI o1. DeepSeek is known to make amazing models and Aider rotates their benchmark over time, so it is unlikely that this is a train-on-benchmark situation.

Some more benchmarks: on Reddit.

all 3 comments
sorted by: hot top controversial new old
[-] xodoh74984@lemmy.world 5 points 4 days ago* (last edited 4 days ago)

For the user whose VRAM knob goes to 11

[-] BB84@mander.xyz 3 points 4 days ago

Someone managed to run it on a cluster of Mac Minis lol https://blog.exolabs.net/day-2/

this post was submitted on 26 Dec 2024
35 points (100.0% liked)

LocalLLaMA

2388 readers
3 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 2 years ago
MODERATORS