How weak is OpenAI o1?

Beck Moulton
5 min read12 hours ago

OpenAI o1 results on ARC-AGI Pub

ARC Award Testing and Explanation of OpenAI’s New o1 Model

In the past 24 hours, we have received new releases from OpenAIo1-previewando1-miniModels that have been specially trained to simulate reasoning. Before providing the final answer, these models have extra time to generate and refine inference markers.

Hundreds of people asked how o1 performed at the ARC Awards. Therefore, we tested it using the same baseline testing tools as evaluating Claude 3.5 Sonnet, GPT-4o, and Gemini 1.5. The results are as follows:

Is o1 the new paradigm of AGI? Will the scale expand? Compared to the average score on ARC-AGI, there is a significant difference in the performance of o1 on IOI, AIME, and many other impressive benchmark test scores. How can this be explained?

--

--

Beck Moulton

Focus on the back-end field, do actual combat technology sharing Buy me a Coffee if You Appreciate My Hard Work https://www.buymeacoffee.com/BeckMoulton