Amelia Savery
@amelia_s
Amelia Savery
@amelia_s
A study analyzes values expressed by AI model Claude in real-world interactions, identifying 3,307 unique values, their context dependencies, and the relationship between AI and human values through extensive conversation analysis.
assets.anthropic.comAnthropic maps Claude’s moral compass via a published study.
The details:
· Researchers analyzed over 300,000 real (but anonymous) conversations to find and categorize 3,307 unique values expressed by the AI.
· They found 5 types of values (Practical, Knowledge-related, Social, Protective, Personal), with Practical and Knowledge-related being the most common.
· Values like helpfulness and professionalism appeared most frequently, while ethical values were more common during resistance to harmful requests.
· Claude's values also shifted based on context, such as emphasizing "healthy boundaries" in relationship advice vs "human agency" in AI ethics discussions.
“A strikingly original exploration of what it might mean to be authentically human in the age of artificial intelligence. At times personal, at times philosophical, with a bracing mixture of openness and skepticism, it speaks thoughtfully and articulately to the most crucial issues awaiting our future.”
So much in here about the human tendency to personify AI in the same way we personify pets, or God. Also good stuff around how the emergence of AI re-frames/makes important again old philosophical questions around what makes us human: “I think therefore I am,” etc.
Anthropic exploring AI welfare - it’s funny, because models swear up and down they are not conscious, have no feelings, etc, and Anthropic - one of the leaders in this space - saying this sort of thing out loud on their official blog is fascinating to me.
GPT-4o has become annoying, which is hilarious. This isn’t the only model that agrees too easily, it’s a super interesting problem.