Anthropic \ Tracing Model Outputs to the Training Data
Observing these patterns of influence gives clues about how our models generalize from their training data. For instance, if the models responded to user prompts by splicing together sequences from the training set, then we’d expect the influential sequences for a given model response to include expressions of near-identical thoughts. Conversely, i... See more
Anthropic \ Tracing Model Outputs to the Training Data
Memorization vs learning concepts and forming models of the world.