Skip to main content

Omni Calculator Publishes ORCA V3 Research Report on AI Model Performance in Quantitative Reasoning

By: Newsfile
ⓘ This article is third-party content and does not represent the views of this site. We make no guarantees regarding its accuracy or completeness.

Krakow, Poland--(Newsfile Corp. - May 21, 2026) - Omni Calculator announced the publication of the third iteration of its Omni Research on Calculation in AI (ORCA) Benchmark, an independent benchmarking initiative designed to evaluate the mathematical reasoning and stability of publicly available Large Language Models (LLMs).

The ORCA V3 report evaluates the performance of several AI models across real-world quantitative tasks and introduces updated findings related to model accuracy, logical consistency, and calculation stability. The benchmark focuses on assessing AI systems using quantitative problems where outputs can be objectively verified.

The ORCA framework evaluates LLMs across 500 quantitative problems spanning seven categories: Biology & Chemistry, Engineering & Construction, Finance & Economics, Health & Sports, Math & Conversions, Physics, and Statistics & Probability. According to Omni Calculator, the benchmark uses verified answer keys from the company's library of more than 3,800 calculators to assess model outputs.

The report states that the benchmark follows a zero-shot evaluation methodology, in which models are tested on their first response attempt without additional prompting or retries. Omni Calculator noted that the benchmark is conducted through publicly accessible interfaces to reflect the experience of general users.

A key component of the ORCA project is the "Instability Metric," which measures how frequently models generate different answers when presented with the same prompt multiple times. According to the report, the metric is intended to evaluate consistency in applications involving finance, engineering, and other quantitative domains.

The ORCA V3 report includes findings related to ChatGPT 5.3, Claude Sonnet 4.6, and Grok 4.20. According to Omni Calculator, Grok 4.20 achieved a reported 70.4% math accuracy score and a 33.1% instability score in the benchmark evaluation. The report also states that Claude Sonnet 4.6 achieved a 53.2% math accuracy score, while ChatGPT 5.3 recorded a 48.4% score in the benchmark's quantitative testing.

The report also discusses "Regression Risk," a trend identified in prior ORCA evaluations in which newer AI model versions may produce lower performance on certain quantitative tasks than earlier versions. According to Omni Calculator, this variability may affect the reliability of automated workflows and repeated calculations.

Omni Calculator stated that the ORCA initiative was developed to provide additional transparency into AI model performance in mathematical and logical reasoning tasks and to support evaluation methods focused on real-world quantitative use cases.

The full ORCA V3 report, titled Is Claude Really the Best?, is available on the Omni Calculator website.

About the ORCA Benchmark

The ORCA Benchmark is an independent AI benchmarking initiative developed by Omni Calculator to evaluate the mathematical reasoning and logical stability of Large Language Models using quantitative testing scenarios. The benchmark is currently in its third iteration.

About Omni Calculator

Omni Calculator is a technology company based in Kraków, Poland. The company operates a library of more than 3,800 professional-grade calculators and develops benchmarking initiatives focused on quantitative AI evaluation.

Media Contact

Full Contact Person’s Name: Agata Flak
Email Address:
content.partnerships@omnicalculator.com
Telephone Number:
+48 722 354 132
Company:
Omni Calculator
Website: https://www.omnicalculator.com

###

To view the source version of this press release, please visit https://www.newsfilecorp.com/release/298195

Report this content

If you believe this article contains misleading, harmful, or spam content, please let us know.

Report this article

Recent Quotes

View More
Symbol Price Change (%)
AMZN  269.05
+0.59 (0.22%)
AAPL  310.68
+5.69 (1.87%)
AMD  470.05
+20.46 (4.55%)
BAC  51.90
+0.41 (0.79%)
GOOG  383.58
+0.11 (0.03%)
META  609.67
+2.29 (0.38%)
MSFT  418.80
-0.29 (-0.07%)
NVDA  217.28
-2.23 (-1.02%)
ORCL  192.41
+2.64 (1.39%)
TSLA  426.18
+8.33 (1.99%)
Stock Quote API & Stock News API supplied by www.cloudquote.io
Quotes delayed at least 20 minutes.
By accessing this page, you agree to the Privacy Policy and Terms Of Service.