AI benchmarking organization criticized for waiting to reveal OpenAI funding

January 20, 2025

197

An organization that develops mathematical benchmarks for AI didn’t disclose that it had received funding from OpenAI until relatively recently, prompting accusations of impropriety from some members of the AI community.

Epoch AI, a nonprofit organization primarily funded by Open Philanthropy, a research and grantmaking foundation, announced on December 20 that OpenAI has supported the creation of FrontierMath. FrontierMath, a test of expert-level problems designed to measure an AI's mathematical abilities, was considered one of the benchmarks OpenAI used to reveal its upcoming flagship AI, o3.

In one post On the LessWrong forum, an Epoch AI contractor with the username “Meemi” says that many contributors to the FrontierMath benchmark weren’t informed of OpenAI's involvement until it was published.

“Communication about this was opaque,” Meemi wrote. “In my opinion, Epoch AI must have disclosed OpenAI funding, and contractors must have transparent information in regards to the potential of their work to utilize capabilities when deciding whether to work on a benchmark.”

On social media, some user expressed concern that the secrecy could undermine FrontierMath's status as an objective benchmark. In addition to supporting FrontierMath, OpenAI had insight into lots of the issues and solutions within the benchmark – a proven fact that Epoch AI didn’t disclose before o3's announcement on December twentieth.

In one post On

“Six mathematicians who contributed significantly to the FrontierMath benchmark have confirmed (to me) that they’re unaware that OpenAI can have exclusive access to this benchmark (and others is not going to),” Hong said. “Most express that they should not sure whether or not they would have contributed in the event that they had known.”

In a response to Meemi's post, Tamay Besiroglu, deputy director of Epoch AI and considered one of the organization's co-founders, claimed that FrontierMath's integrity had not been compromised, but admitted that Epoch AI “made a mistake” in not doing so transparent.

“We weren’t allowed to reveal the partnership until across the launch of o3, and in hindsight we must always have negotiated harder to offer transparency to the benchmark providers as quickly as possible,” Besiroglu wrote. “Our mathematicians should know who might need access to their work. Even though we were contractually limited in what we could do, we must always have made transparency with our contributors a non-negotiable a part of our agreement with OpenAI.”

Besiroglu added that while OpenAI has access to FrontierMath, it has made a “verbal agreement” with Epoch AI not to make use of FrontierMath's task set to coach its AI. (Training an AI on FrontierMath could be comparable to Put lessons to the test.) Epoch AI also has a “separate holdout set” that serves as additional protection for independent verification of FrontierMath benchmark results, Besiroglu said.

“OpenAI has … fully supported our decision to keep up a separate, invisible holdout set,” Besiroglu wrote.

But the leading mathematician of the AI era, Ellot Glazer, clouds the situation noted in a post on Reddit that Epoch AI was unable to independently confirm OpenAI's FrontierMath o3 results.

“My personal opinion is that (OpenAI)'s rating is legitimate (i.e. they didn’t train on the dataset) and that they don’t have any incentive to lie about internal benchmarking performance,” Glazer said. “However, we cannot vouch for them until our independent assessment is complete.”

The saga is one other example of the challenge of developing empirical benchmarks to guage AI – and securing the essential resources for benchmark development without creating the impression of conflicts of interest.

AI benchmarking organization criticized for waiting to reveal OpenAI funding

LEAVE A REPLY Cancel reply

Must Read

Meta plans to speculate $ 15 billion within the Skala -Ki to meet up with the rivals

Evaluation within the age of the KI – universities has to do greater than tell the scholars what they shouldn't do

Rethinking AI Collaboration: Vijay Karunamurthy on speech agents, filter bubbles and the long run of dialogue

Introduction to the personalized AI travel planning

At WWDC 2025 Apple developer praise in AI activities and App Store battles

Openai broadcasts 80% price drop for O3, it’s a strong argumentation model

Chatbots can assist clinicians to change into higher communicators, and this might increase vaccine absorption

Latest articles

Meta plans to speculate $ 15 billion within the Skala -Ki to meet up with the rivals

Evaluation within the age of the KI – universities has to do greater than tell the scholars what they shouldn't do

Rethinking AI Collaboration: Vijay Karunamurthy on speech agents, filter bubbles and the long run of dialogue

Our Newsletter

AI benchmarking organization criticized for waiting to reveal OpenAI funding

RELATED ARTICLES

LEAVE A REPLY Cancel reply

Must Read

Latest articles

Our Newsletter