Fine-Tuning LLMs for ‘Good’ Behavior Makes Them More Likely ...

Fine-Tuning LLMs for ‘Good’ Behavior Makes Them More Likely to Say No

RelatedHighlights

When people make sweeping statements like “language models are bullshit machines” or “ChatGPT lies,” it usually tells me they’re not seriously engaged in any kind of AI/ML work or productive discourse in this space.

First, because saying a machine “lies” or “bullshits” implies motivated intent in a social context, which language models don’t have. M... See more

Humanity's Last Exam

Claude Fights Back

Scott Alexander open.substack.com

Because these vectors are built from the way humans use words, they end up reflecting many of the biases that are present in human language. For example, in some word vector models, doctor minus man plus woman yields nurse . Mitigating biases like this is an area of active research.

Timothy B Lee • Large language models, explained with a minimum of math and jargon

Large language models cannot replace human participants because they cannot portray identity groups

arxiv.org

Study finds RLHF reduces LLM creativity and output variety : A new research paper posted in /r/LocalLLaMA shows that while alignment techniques like RLHF reduce toxic and biased content, they also limit the creativity of large language models, even in contexts unrelated to safety.

Humanity's Last Exam

Claude Fights Back

Timothy B Lee • Large language models, explained with a minimum of math and jargon

Large language models cannot replace human participants because they cannot portray identity groups

Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]