[-] pglpm@lemmy.ca 4 points 2 months ago* (last edited 2 months ago)

Just wanted to applaud the fact that you've come here asking people, rather than asking some large language model.

[-] pglpm@lemmy.ca 4 points 4 months ago

I didn't know about !matrix, cheers!!

[-] pglpm@lemmy.ca 4 points 1 year ago* (last edited 1 year ago)

I'd like to add one more layer to this great explanation.

Usually, this kind of predictions should be made in two steps:

  1. calculate the conditional probability of the next word (given the data), for all possible candidate words;

  2. choose one word among these candidates.

The choice in step 2. should be determined, in principle, by two factors: (a) the probability of a candidate, and (b) also a cost or gain for making the wrong or right choice if that candidate is chosen. There's a trade-off between these two factors. For example, a candidate might have low probability, but also be a safe choice, in the sense that if it's the wrong choice no big problems arise – so it's the best choice. Or a candidate might have high probability, but terrible consequences if it were the wrong choice – so it's better to discard it in favour of something less likely but also less risky.

This is all common sense! but it's at the foundation of the theory behind this (Decision Theory).

The proper calculation of steps 1. and 2. together, according to fundamental rules (probability calculus & decision theory) would be enormously expensive. So expensive that something like chatGPT would be impossible: we'd have to wait for centuries (just a guess: could be decades or millennia) to train it, and then to get an answer. This is why Large Language Models do two approximations, which obviously can have serious drawbacks:

  • they use extremely simplified cost/gain figures – in fact, from what I gather, the researchers don't have any clear idea of what they are;

  • they directly combine the simplified cost/gain figures with probabilities;

  • They search for the candidate with the highest gain+probability combination, but stopping as soon as they find a relatively high one – at the risk of missing the one that was actually the real maximum.

 

(Sorry if this comment has a lecturing tone – it's not meant to. But I think that the theory behind these algorithms can actually be explained in very common-sense term, without too much technobabble, as @TheChurn's comment showed.)

[-] pglpm@lemmy.ca 4 points 1 year ago

Fantastic, thank you!

[-] pglpm@lemmy.ca 4 points 1 year ago

Math requires insight that a language model cannot posess

Amen to that! Good maths & science teachers have struggled for decades (if not centuries) so that students understand what they're doing and don't simply give answers based on some words or symbols they see in questions [there are also bad teachers who promote this instead]. Because on closer inspection such answers always collapse. And now comes chatGPT that does exactly that instead – and collapses in the same way – and gets glorified.

Amen to what you say on infographic content as well 😂

[-] pglpm@lemmy.ca 4 points 1 year ago

Funny, note that that website uses DRM content. I have DRM disabled on Firefox and when I visit that site I get two DRM warnings.

[-] pglpm@lemmy.ca 3 points 1 year ago

Yeah this was one of the best episodes with King!

[-] pglpm@lemmy.ca 3 points 1 year ago

Glorious! 🤣 🤣 🤣

[-] pglpm@lemmy.ca 4 points 1 year ago* (last edited 1 year ago)

You're simplifying the situation and dynamics of science too much.

If you submit or share a work that contains a logical or experimental error – it says "2+2=5" somewhere – then yes, your work is not accepted, it's wrong, and you should discard it too.

But many works have no (visible) logical flaws and present hypotheses within current experimental errors. They explore or propose, or start from, alternative theses. They may be pursued and considered by a minority, even a very small one, while the majority pursues something else. But this doesn't make them "rejected". In fact, theories followed by minorities periodically have breakthroughs and suddenly win the majority. This is a vital part of scientific progress. Except in the "2+2=5" case, it's a matter of majority/minority, but that does emphatically not mean acceptance/rejection.

On top of that, the relationship between "truth" and "majority" is even more fascinatingly complex. Let me give you an example.

Probably (this is just statistics from personal experience) the vast majority of physicists would tell you that "energy is conserved". A physicist specialized in general relativity, however, would point out that there's a difference between a conserved quantity (somewhat like a fluid) and a balanced quantity. And energy strictly speaking is balanced, not conserved. This fact, however, creates no tension: if you have a simple conversation – 30 min or a couple hours – with a physicist who stated that "energy is conserved", and you explain the precise difference, show the equations, examine references together etc, that physicist will understand the clarification and simply agree; no biggie. In situations where that physicist works, this results in little practical difference (but obviously there are situations where the difference is important.)

A guided tour through general relativity (see this discussion by Baez as a starting point, for example) will also convince a physicist who still insisted that energy is conserved even after the balance vs conservation difference was clarified. With energy, either "conservation" makes no sense, or if we want to force a sense, then it's false. (I myself have been on both sides of this dialogue.)

This shows a paradoxical situation: the majority may state something that's actually not true – but the majority itself would simply agree with this, if given the chance! This paradoxical discrepancy arises especially today owing to specialization and too little or too slow osmosis among the different specialities, plus excessive simplification in postgraduate education (they present approximate facts as exact). Large groups maintain some statements as facts simply because the more correct point of view is too slow to spread through their community. The energy claim is one example, there are others (thermodynamics and quantum theory have plenty). I think every physicist working in a specialized field is aware about a couple of such majority-vs-truth discrepancies. And this teaches humbleness, openness to reviewing one's beliefs, and reliance on logic, not "majorities".

Edit: a beautiful book by O'Connor & Weatherall, The Misinformation Age: How False Beliefs Spread, discusses this phenomenon and models of this phenomenon.

[-] pglpm@lemmy.ca 3 points 1 year ago
[-] pglpm@lemmy.ca 4 points 2 years ago

I get what you're saying. From my point of view we're just playing on the semantics of "service" and "app" here. I had indeed the same problem with Google and Hangouts.

view more: ‹ prev next ›

pglpm

joined 2 years ago
MODERATOR OF