OpenAI pushes AI agent capabilities with new developer API

0
agent_test-1152x648.jpg



Builders utilizing the Responses API can entry the identical fashions that energy ChatGPT Search: GPT-4o search and GPT-4o mini search. These fashions can browse the net to reply questions and cite sources of their responses.

That is notable as a result of OpenAI says the added internet search capacity dramatically improves the factual accuracy of its AI fashions. On OpenAI’s SimpleQA benchmark, which goals to measure confabulation price, GPT-4o search scored 90 p.c, whereas GPT-4o mini search achieved 88 p.c—each considerably outperforming the bigger GPT-4.5 mannequin with out search, which scored 63 p.c.

Regardless of these enhancements, the expertise nonetheless has important limitations. Apart from points with CUA correctly navigating web sites, the improved search functionality does not utterly resolve the issue of AI confabulations, with GPT-4o search nonetheless making factual errors 10 p.c of the time.

Alongside the Responses API, OpenAI launched the open supply Brokers SDK, offering builders free instruments to combine fashions with inside techniques, implement safeguards, and monitor agent actions. This toolkit follows OpenAI’s earlier launch of Swarm, a framework for orchestrating a number of brokers.

These are nonetheless early days within the AI agent area, and issues will possible enhance quickly. Nevertheless, in the meanwhile, the AI agent motion stays weak to unrealistic claims, as demonstrated earlier this week when customers found that Chinese language startup Butterfly Impact’s Manus AI agent platform did not ship on lots of its guarantees, highlighting the persistent hole between promotional claims and sensible performance on this rising expertise class.

Leave a Reply

Your email address will not be published. Required fields are marked *