(New) papers by Meta: Large Concept Models and BLT (palaver.p3x.de)

submitted 5 days ago* (last edited 5 days ago) by hendrik@palaver.p3x.de to c/localllama@sh.itjust.works

2 comments fedilink hide all child comments

Seems Meta have been doing some research lately, to replace the current tokenizers with new/different representations:

Large Concept Models: Language Modeling in a Sentence Representation Space [Github] (December 11, 2024)
Byte Latent Transformer: Patches Scale Better Than Tokens [Github] (December 12, 2024)

you are viewing a single comment's thread
view the rest of the comments

[-] stsquad@lemmy.ml 2 points 5 days ago

Does this use the same attention architecture as traditional tokenisation? As far as I understood it each token has a bunch of meaning associated with it encoded in a vector.

[-] hendrik@palaver.p3x.de 2 points 5 days ago* (last edited 5 days ago)

Uh, I'm not sure. I didn't have the time yet to read those papers. I suppose the Byte Latent Transformer does. It's still some kind of a transformer architecture. With the Large Concept Models, I'm not so sure. They're encoding whole sentences. And the researchers explore like 3 different (diffusion) architectures. The paper calls itself a "proof of feasibility", so it's more basic research about that approach, not one single/specific model architecture.

this post was submitted on 29 Dec 2024

12 points (100.0% liked)

LocalLLaMA

2392 readers

2 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 2 years ago

MODERATORS

SkySyrup@sh.itjust.works

pax@sh.itjust.works

noneabove1182@sh.itjust.works