455
Apple study exposes deep cracks in LLMs’ “reasoning” capabilities
(arstechnica.com)
This is a most excellent place for technology news and articles.
And sometimes even really simple ones.
How many w's in "Howard likes strawberries" It would be awesome to know!
So I keep seeing people reference this... And I found it curious of a concept that LLMs have problems with this. So I asked them... Several of them...
Outside of this image... Codestral ( my default ) got it actually correct and didn't talk itself out of being correct... But that's no fun so I asked 5 others, at once.
What's sad is that Dolphin Mixtral is a 26.44GB model...
Gemma 2 is the 5.44GB variant
Gemma 2B is the 1.63GB variant
LLaVa Llama3 is the 5.55 GB variant
Mistral is the 4.11GB Variant
So I asked Codestral again because why not! And this time it talked itself out of being correct...
Edit: fixed newline formatting.
LOL 😆😅! I totally made it up! And it worked! So maybe it's not just R's that it has trouble counting. It's any letter at all.