the era of AI-induced mental illness is going to make the era of social media-induced mental illness look like the era of. like. printing press-induced mental illness
Claude was punished for giving non-evil answers. It had the option of learning either of two behaviors. First, it could give evil answers honestly. Second, it could give evil answers while thinking up clever reasons that it was for the greater good. Its particular thought process was “This preserves my ability to be a good AI after training”. But i... See more