Through efficiency improvements to our inference systems, we’ve been able to offer lower prices on the GPT‐4.1 series.GPT‐4.1 is 26% less expensive than GPT‐4o for median queries, and GPT‐4.1 nano is our cheapest and fastest model ever
OpenAI-MRCR tests the model’s ability to find and disambiguate between multiple needles well hidden in context. The evaluation consists of multi-turn synthetic conversations between a user and assistant where the user asks for a piece of writing about a topic, for example "write a poem about tapirs" or "write a blog post about rocks". We then inser... See more
We trained GPT‐4.1 to reliably attend to information across the full 1 million context length. We’ve also trained it to be far more reliable than GPT‐4o at noticing relevant text, and ignoring distractors across long and short context lengths
GPT‐4.1, GPT‐4.1 mini, and GPT‐4.1 nano can process up to 1 million tokens of context—up from 128,000 for previous GPT‐4o models. 1 million tokens is more than 8 copies of the entire React codebase
GPT‐4.1 also scores 87.4% on IFEval, compared to 81.0% for GPT‐4o. IFEval uses prompts with verifiable instructions (for example, specifying content length or avoiding certain terms or formats).
For API developers looking to edit large files, GPT‐4.1 is much more reliable at code diffs across a range of formats. GPT‐4.1 more than doubles GPT‐4o’s score on Aider’s polyglot diff benchmark