#ai

GPT‐4.1, GPT‐4.1 mini, and GPT‐4.1 nano can process up to 1 million tokens of context—up from 128,000 for previous GPT‐4o models. 1 million tokens is more than 8 copies of the entire React codebase

Introducing GPT-4.1 in the API

We also recommend using Predicted Outputs⁠(opens in a new window) to reduce latency of full file rewrites.

Introducing GPT-4.1 in the API

Through efficiency improvements to our inference systems, we’ve been able to offer lower prices on the GPT‐4.1 series.GPT‐4.1 is 26% less expensive than GPT‐4o for median queries, and GPT‐4.1 nano is our cheapest and fastest model ever

Introducing GPT-4.1 in the API

For API developers looking to edit large files, GPT‐4.1 is much more reliable at code diffs across a range of formats. GPT‐4.1 more than doubles GPT‐4o’s score on Aider’s polyglot diff benchmark

⁠, and even beats GPT‐4.5 by 8%abs

Introducing GPT-4.1 in the API

One core capability of Large Language Models (LLMs) is to follow natural language instructions. However, the evaluation of such abilities is not standardized: Human evaluations are expensive, slow, and not objectively reproducible, while LLM-based auto-evaluation is potentially biased or limited by the ability of the evaluator LLM. To overcome thes

Instruction-Following Evaluation for Large Language Models

For tasks that demand low latency, GPT‐4.1 nano is our fastest and cheapest model available. It delivers exceptional performance at a small size with its 1 million token context window, and scores 80.1% on MMLU, 50.3% on GPQA, and 9.8% on Aider polyglot coding—even higher than GPT‐4o mini. It’s ideal for tasks like classification or autocompletion.