OpenAI o1 Technology Series 1: Overall Framework, Utilizing Test Time Scaling Law to Enhance Logical Reasoning Ability

15 min read2 days ago

The o1 model launched by OpenAI a few days ago, with its significantly improved logical reasoning ability, has sparked heated discussions about the training methods behind it. The introduction and output result demo of o1 will not be elaborated here. You can go to the official website of OpenAI to read it (it is very short and easy to read because the secrets are all hidden). I believe that in recent times, when people explore online how o1 is trained, they will definitely come across the following hot topics:

Test/Inference-Time scaling law， Enhance the inference capability of the model by increasing the computational power in the inference stage
Post Training， Improve the reasoning ability of the model through post training
PRM/ORM: Process/Outcome Based Reward Model
CoT: Chain of Thinking
Reinforcement learning, self play, and MCTS (Monte Carlo Search Tree Algorithm) ）

wait.

When these words appear individually in front of us, it seems difficult for us to string them together. Not only that, but we also don’t know the principles behind individual words, such as “what is test/reference time scaling law”? What does it mean to spend computing power on the inference stage? Why does spending computing power on the inference stage lead to better results? What is…

OpenAI o1 Technology Series 1: Overall Framework, Utilizing Test Time Scaling Law to Enhance Logical Reasoning Ability

Written by Beck Moulton