Researchers Release Open-Source AI Search Agent That Outperforms GPT-5.4
University researchers unveiled Harness-1, a 20-billion parameter open-source AI search agent that scored 73% on information recall tasks, surpassing GPT-5.4's 70.9% performance.

Researchers from the University of Illinois at Urbana-Champaign, UC Berkeley, and vector database platform Chroma have released Harness-1, an open-source AI search agent that demonstrates superior performance compared to several proprietary models in information retrieval tasks.
The 20-billion parameter model, built on OpenAI's gpt-oss-20B architecture, achieved a 73% average score on recall accuracy tests, outperforming GPT-5.4's 70.9% and exceeding the next-best open-source competitor by 11.4 percentage points. The researchers evaluated the models across eight complex search benchmarks spanning web searches, financial filings, patent databases, and multi-step reasoning tasks.
Harness-1's key innovation lies in its approach to managing search state information. Rather than forcing the AI model to maintain all search history and context within its memory, the system uses what researchers call a "state-externalizing harness" - a structured environment that handles routine bookkeeping tasks like maintaining document pools, evidence links, and verification records. This allows the AI to focus on semantic reasoning while the external system manages organizational tasks.
The training process required significantly less data than competing models, using only 899 supervised fine-tuning trajectories and 3,453 reinforcement learning queries. In comparison, other open-source models required datasets ranging from 17,200 to over 221,000 training items to achieve inferior results.
The model and its code are available under the Apache 2.0 license, which permits commercial use and modification. The researchers position this as enabling enterprise applications where AI systems need to search through large document repositories without incurring the high computational costs typically associated with expanding context windows.
The research was conducted using Tinker, a distributed AI training platform developed by Thinking Machines, demonstrating the system's practical viability for real-world deployment scenarios.