OpenAI sidesteps Nvidia with unusually quick coding mannequin on plate-sized chips

0
GettyImages-2225076517_resize-1152x648.jpg



However 1,000 tokens per second is definitely modest by Cerebras requirements. The corporate has measured 2,100 tokens per second on Llama 3.1 70B and reported 3,000 tokens per second on OpenAI’s personal open-weight gpt-oss-120B mannequin, suggesting that Codex-Spark’s comparatively decrease velocity displays the overhead of a bigger or extra complicated mannequin.

AI coding brokers have had a breakout yr, with instruments like OpenAI’s Codex and Anthropic’s Claude Code reaching a brand new degree of usefulness for quickly constructing prototypes, interfaces, and boilerplate code. OpenAI, Google, and Anthropic have all been racing to ship extra succesful coding brokers, and latency has grow to be what separates the winners; a mannequin that codes quicker lets a developer iterate quicker.

With fierce competitors from Anthropic, OpenAI has been iterating on its Codex line at a speedy price, releasing GPT-5.2 in December after CEO Sam Altman issued an inner “code pink” memo about aggressive stress from Google, then transport GPT-5.3-Codex simply days in the past.

Diversifying away from Nvidia

Spark’s deeper {hardware} story could also be extra consequential than its benchmark scores. The mannequin runs on Cerebras’ Wafer Scale Engine 3, a chip the dimensions of a dinner plate that Cerebras has constructed its enterprise round since no less than 2022. OpenAI and Cerebras introduced their partnership in January, and Codex-Spark is the primary product to come back out of it.

OpenAI has spent the previous yr systematically decreasing its dependence on Nvidia. The corporate signed a large multi-year take care of AMD in October 2025, struck a $38 billion cloud computing settlement with Amazon in November, and has been designing its personal customized AI chip for eventual fabrication by TSMC.

In the meantime, a deliberate $100 billion infrastructure take care of Nvidia has fizzled to date, although Nvidia has since dedicated to a $20 billion funding. Reuters reported that OpenAI grew unhappy with the velocity of some Nvidia chips for inference duties, which is precisely the sort of workload that OpenAI designed Codex-Spark for.

No matter which chip is beneath the hood, velocity issues, although it might come at the price of accuracy. For builders who spend their days inside a code editor ready for AI solutions, 1,000 tokens per second could really feel much less like fastidiously piloting a jigsaw and extra like working a rip noticed. Simply watch what you’re reducing.

Leave a Reply

Your email address will not be published. Required fields are marked *