Member-only story

How weak is OpenAI o1?

Beck Moulton

5 min readOct 20, 2024

OpenAI o1 Results on ARC-AGI-Pub

How far are the o1 preview and mini models from AGI?

arcprize.org

OpenAI o1 results on ARC-AGI Pub

ARC Award Testing and Explanation of OpenAI’s New o1 Model

In the past 24 hours, we have received new releases from OpenAIo1-previewando1-miniModels that have been specially trained to simulate reasoning. Before providing the final answer, these models have extra time to generate and refine inference markers.

Hundreds of people asked how o1 performed at the ARC Awards. Therefore, we tested it using the same baseline testing tools as evaluating Claude 3.5 Sonnet, GPT-4o, and Gemini 1.5. The results are as follows:

Is o1 the new paradigm of AGI? Will the scale expand? Compared to the average score on ARC-AGI, there is a significant difference in the performance of o1 on IOI, AIME, and many other impressive benchmark test scores. How can this be explained?

How weak is OpenAI o1?

OpenAI o1 Results on ARC-AGI-Pub

How far are the o1 preview and mini models from AGI?

OpenAI o1 results on ARC-AGI Pub

ARC Award Testing and Explanation of OpenAI’s New o1 Model

Written by Beck Moulton

No responses yet