We struggled over the decision to write this piece. On the one hand, Deepseek has gotten everyone’s attention and is definitely important. On the other hand, an awful lot has already been written on the subject in a very short time frame.
First and foremost, we think the stock market has gotten far ahead of the reality in its reaction to Deepseek. Everyone is rushing to judgment despite no one having all the facts.
From what we can tell, and what we know about Deepseek, they have an immensely talented pool of AI software engineers. They have advanced the state-of-the-art for AI models in many important ways. But their model does not break the status quo, it just moves along the trajectories we have seen coming for some time.
The first reaction we heard to Deepseek was their claim to have trained their model at a fraction of the cost of other foundational models. Here we are fairly skeptical. We believe that Deepseek had access to deeper pools of GPUs than claimed in many headlines. Moreover, they had access to others’ foundational models. While we believe they have improved training time, we do not think those improvements obviate the need for orders of magnitude of GPUs as the most alarmist accounts would have it.
More impressive are their gains on inference where they have leadfrogged OpenAI’s chain-of-thought reasoning models. Again, technically impressive, but also firmly within the cost curves we have been seeing in inference for years. The cost of inference is falling rapidly and Deepseek just pushed those declines more rapidly.
We are seeking a middle path here. Deepseek has made important advances, and demonstrated strong potential. That being said, their advances are not going to lead to a catastrophic decline in the amount of compute needed to achieve useful AI. Time for a collective deep breath.