Epoch AI Launches FrontierMath AI Benchmark to Test AI Model Capabilities

Epoch AI, a California-based research institute, launched a new benchmark in artificial intelligence (AI) last week. Called FrontierMath, the new AI benchmark tests large language models (LLM) on their ability to rehabilitate and solve mathematical problems. The AI ​​company says existing math benchmarks aren’t very useful due to factors like data contamination and AI models scoring very high. Epoch AI claims that even leading LLMs scored below 2% on the new benchmark.

Epoch AI launches FrontierMath benchmark

In a job on Epoch AI claims that these questions would even take mathematicians hours to solve. The reason behind the development of the new benchmark was cited as the limitations of existing benchmarks such as GSM8K and MATH, where AI models generally score high.

The company claimed that the high scores obtained by LLMs were largely due to data contamination. This means that the questions have already been fed into the AI ​​models, allowing them to easily solve the questions.

FrontierMath solves the problem by including new, unique problems that have not been published anywhere, thereby mitigating the risks associated with data contamination. Additionally, the test includes a wide range of questions, including calculation-intensive problems in number theory, real analysis, and algebraic geometry, as well as topics such as Zermelo–Fraenkel set theory. The AI ​​company says all questions are “guess-proof,” meaning they can’t be answered accidentally without solid reasoning.

Epoch AI pointed out that to measure AI aptitude, benchmarks should be created on creative problem solving where AI must maintain its reasoning across multiple steps. In particular, many industry veterans believe that existing benchmarks are not sufficient to properly measure how advanced an AI model is.

Respond to the new reference in a manner jobNoam Brown, an OpenAI researcher who was behind the company’s o1 model, praised the new benchmark and said: “I love seeing a new benchmark with such low success rates for frontier models . »

For the latest tech news and reviews, follow Gadgets 360 on X, Facebook, WhatsApp, Topics And Google News. For the latest videos on gadgets and technology, subscribe to our YouTube channel. If you want to know everything about the best influencers, follow our in-house guide Who is this360 on Instagram And YouTube.

Poco X7 Pro could be the first smartphone to ship with Xiaomi’s HyperOS 2 in India


iQOO 13 color options revealed ahead of its India launch on December 3

Comments

No comments yet. Why don’t you start the discussion?

    Leave a Reply