Fine-Tuning LLMs for ‘Good’ Behavior Makes Them More Likely to Say No
When people make sweeping statements like “language models are bullshit machines” or “ChatGPT lies,” it usually tells me they’re not seriously engaged in any kind of AI/ML work or productive discourse in this space.
First, because saying a machine “lies” or “bullshits” implies motivated intent in a social context, which language models don’t have. M... See more
First, because saying a machine “lies” or “bullshits” implies motivated intent in a social context, which language models don’t have. M... See more
Humanity's Last Exam

Because these vectors are built from the way humans use words, they end up reflecting many of the biases that are present in human language. For example, in some word vector models, doctor minus man plus woman yields nurse . Mitigating biases like this is an area of active research.
Timothy B Lee • Large language models, explained with a minimum of math and jargon
Large language models cannot replace human participants because they cannot portray identity groups
arxiv.org
Study finds RLHF reduces LLM creativity and output variety : A new research paper posted in /r/LocalLLaMA shows that while alignment techniques like RLHF reduce toxic and biased content, they also limit the creativity of large language models, even in contexts unrelated to safety.