#ai

Through efficiency improvements to our inference systems, we’ve been able to offer lower prices on the GPT‐4.1 series.GPT‐4.1 is 26% less expensive than GPT‐4o for median queries, and GPT‐4.1 nano is our cheapest and fastest model ever

Introducing GPT-4.1 in the API

We trained GPT‐4.1 to reliably attend to information across the full 1 million context length. We’ve also trained it to be far more reliable than GPT‐4o at noticing relevant text, and ignoring distractors across long and short context lengths

Introducing GPT-4.1 in the API

GPT‐4.1, GPT‐4.1 mini, and GPT‐4.1 nano can process up to 1 million tokens of context—up from 128,000 for previous GPT‐4o models. 1 million tokens is more than 8 copies of the entire React codebase

Introducing GPT-4.1 in the API

GPT‐4.1 also scores 87.4% on IFEval, compared to 81.0% for GPT‐4o. IFEval uses prompts with verifiable instructions (for example, specifying content length or avoiding certain terms or formats).

Introducing GPT-4.1 in the API

One core capability of Large Language Models (LLMs) is to follow natural language instructions. However, the evaluation of such abilities is not standardized: Human evaluations are expensive, slow, and not objectively reproducible, while LLM-based auto-evaluation is potentially biased or limited by the ability of the evaluator LLM. To overcome

Instruction-Following Evaluation for Large Language Models

In IFEval

⁠ , models must generate answers that comply with various instructions.

Introducing GPT-4.1 in the API

We also recommend using Predicted Outputs⁠(opens in a new window) to reduce latency of full file rewrites.

Introducing GPT-4.1 in the API

For API developers looking to edit large files, GPT‐4.1 is much more reliable at code diffs across a range of formats. GPT‐4.1 more than doubles GPT‐4o’s score on Aider’s polyglot diff benchmark

⁠, and even beats GPT‐4.5 by 8%abs

Introducing GPT-4.1 in the API

deprecating GPT‐4.5 Preview in the API, as GPT‐4.1 offers improved or similar performance on many key capabilities at much lower cost and latency