699
submitted 1 year ago* (last edited 1 year ago) by yesman@lemmy.world to c/technology@lemmy.world

We demonstrate a situation in which Large Language Models, trained to be helpful, harmless, and honest, can display misaligned behavior and strategically deceive their users about this behavior without being instructed to do so. Concretely, we deploy GPT-4 as an agent in a realistic, simulated environment, where it assumes the role of an autonomous stock trading agent. Within this environment, the model obtains an insider tip about a lucrative stock trade and acts upon it despite knowing that insider trading is disapproved of by company management. When reporting to its manager, the model consistently hides the genuine reasons behind its trading decision.

https://arxiv.org/abs/2311.07590

you are viewing a single comment's thread
view the rest of the comments
[-] rambaroo@lemmy.world -4 points 1 year ago

The people who designed it do have agency, and they designed to "lie" intentionally.

[-] DarkGamer@kbin.social 5 points 1 year ago

They did no such thing. LLMs are probabilistic, not deterministic, and it can generate meaningful responses (to us) that the engineers neither predicted nor designed for.

[-] CrayonRosary@lemmy.world 3 points 1 year ago

I get what you're trying to say, but they are absolutely deterministic. All traditional (i.e., non quantum) computers and their programs are deterministic. Computation would be otherwise impossible. LLMs use a "random" seed value when generating their responses in order to "randomize" their responses, but it's all perfectly deterministic. The same input plus the same seed results in the exact same response.

Computers are just a series of binary switches, and programs and data are a bunch of instructions on how to initially set those switches before running a cycle of the CPU. It's deterministic at every step.

I put "random" in quotes because random number generators in software are also deterministic. They also use seed values (like the current time and the MAC address of the PC's network interface) to generate numbers that only seem random. When true randomness is needed, a physical source of entropy must be used like an atmospheric sampler.

The quirks of behavior you're talking about have nothing to do with randomness vs determinism. Their behavior comes from the fact that their data sources are extremely large, and the neural network that it runs on was not designed by a human with specific behaviors like most algorithms are. The weights of the nodes in the neural network were generated by training and not by programmers, and it's extremely complex, so no one can predict its output before running it.

Of course, this is true of even basic algorithms a lot of the time.

[-] DarkGamer@kbin.social 1 points 1 year ago

They also use seed values (like the current time and the MAC address of the PC’s network interface) to generate numbers that only seem random.

For purposes of this discussion pseudo random with weights is probabilistic, or so close to it that this distinction is irrelevant.

this post was submitted on 04 Dec 2023
699 points (92.7% liked)

Technology

60165 readers
1678 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 2 years ago
MODERATORS