Let’s start with how these AIs are trained.
ChatGPT goes through pre-training, learning from an enormous amount of text scraped from the internet, the model then completes supervised fine-tuning for better instruction following, and then reinforcement learning with human feedback. Basically, it’s learning what we like and don’t like, improving its responses for all kinds of queries.
DeepSeek R1, on the other hand, took a different route.
DeepSeek skipped the supervised fine-tuning phase. Instead, it relied entirely on rule-based reinforcement learning, a method they call Group Relative Policy Optimization. Think of it as DeepSeek learning to reason through trial and error without much hand holding.
Both ChatGPT’s o1 & o3 models use Chain of Thought which was first introduced by OpenAI in September 2024. It took just a few months for the open source community to catch up and Deepseek utilizes the same method in r1.
The Chain of Thought process involves the large language model exploring and explaining its reasoning step by step, just like how you would solve a math problem or think through a tough question out loud.
Instead of jumping straight to the answer, the model will run multiple queries, breaking down the process into smaller, logical steps, making it easier to understand how it got to the final result.
Here is a simplified example for demonstration purposes:
Without Chain of Thought:
What’s 12 times 14?
Answer: 168.
With Chain of Thought:
What’s 12 times 14?
Answer: 12 times 14 can be broken into 12 times 10, which is 120, and 12 times 4, which is 48. Adding them together gives 168.
By working through a thinking process, the large language model becomes more transparent and accurate, especially in complex tasks like coding, math, or logical reasoning.
The catch with chain of thought is that up until now it almost felt like a step backwards with ChatGPT’s 4o model still being popular after the o1 release. For general text based tasks the Chain of Thought o1 model was slower and produced more convoluted outputs.
There’s no doubt that this technology is a step forwards and Deepseek’s r1 and probably OpenAI’s yet to be released o3 model really show off it’s capabilities, in a way that o1 didn’t.
ChatGPT is proprietary, meaning you’ll need to pay for access. ChatGPT Plus is $20ish a month but it’s been quoted that its latest o3 model can run queries which cost up to $2000 per query for exceptionally complex tasks.
DeepSeek on the other hand, open-sourced R1 under the MIT license making it free for anyone to use, whether for commercial projects, academic research, or just tinkering around.
Open sourcing AI models and code in general is important because it promotes transparency, collaboration, and innovation. It allows researchers and developers worldwide to study, improve, and adapt the models.
It provides a level playing field which prevents a single government influenced entity consolidating power through technological superiority. Something I’ve spoken about in previous videos.
The reason that DeepSeek was able to open source the model is because they built it for just $6 million dollars. This might sound a lot until you put it into perspective that OpenAI has raised total funding of $17.9 billion over multiple rounds.
Trump has just set out a plan to allocate $500 Billion, yep that’s half a trillion dollars to AI compute initiatives in the US.
How did Chinese DeepSeek create a model that can outperform the US industry giants with so little funding and then give it away for free?
As discussed at the start, DeepSeek-R1 was trained using pure reinforcement learning without the typical reliance on supervised fine-tuning. This approach allowed the model to develop reasoning capabilities through trial and error, reducing the need for extensive, curated datasets that are often costly to compile and label.
The reinforcement learning method involved rewarding the model for correct decisions and penalizing for errors, which enabled the model to learn from its actions directly, much like how humans learn through experience. This method was described as significantly more cost-effective.
DeepSeek also utilized a relatively smaller number of GPUs, Two thousand and forty eight H800 GPUs for two months, compared to the massive data center clusters often used by competitors, demonstrating that with the right approach, significant performance can be achieved with less hardware. This efficiency in resource utilization was fundamental in keeping compute costs down.
The r1 model also employs a Mixture-of-Experts architecture where only a subset of parameters (37 billion out of 671 billion) are activated per token, which reduces computational cost during both training and inference phases.
This was a strategic choice that made the model not only performant but also cost-efficient, as fewer computational resources were needed for each query or training step.
Let’s talk numbers because benchmarks don’t lie. However some benchmarks are funded by OpenAI so there might be some conflict of interest. The bottom line is whenever an AI company releases a new model they cherry pick the benchmarks that their model excels at.
ChatGPT performs well across a broad range of tasks, from general knowledge to creative writing. It’s the standard with which every other model is compared.
DeepSeek R1, on the other hand, crushes deterministic reasoning benchmarks. It nailed the AIME math benchmark with 71% accuracy and an astonishing 97.3% in MATH-500. This is the kind of performance that turns heads.
Its release has been a shock and awe moment for the AI industry and highlights what a small talented team can achieve.
DeepSeek’s open-source philosophy challenges the proprietary systems that have dominated the market. And this competition is going to push innovation to new heights. Maybe we will even see ChatGPT5 some time this year.
In the meantime if you want to try out Deepseek for yourself you can find instructions on how to set it up and run it locally in this video. Enjoy.