Deepseek and Forced Limitations
Deepseek is fascinating for many reasons.
That it appeared out of nowhere and benchmarks so closely to what OpenAIs o1 model is capable of - the supposed current state-of-the-art in AI generative models.
That I can download a distilled version of it and run it locally, on my Mac Studio, and that performance is perfectly reasonable. I do not need an Nvidia GPU, a third-party API, or my own personal nuclear power station.
But what fascinates me the most is that it was created using significantly less compute power during the training stage. Based on the figures I have seen, around $5 million worth, versus the $100 million OpenAI are spending.
Now there is the possibility that this is not true. It would certainly be in the best interest of the Chinese government to claim that it had been done cheaper, and that would be a good way to tank the US stock market the way it has.
It is also possible that they used training data generated by other AI systems, and that the cost does not take into account the true cost of the synthetic data they used.
But let us assume that the costs are true. One of the reasons they were forced to do it this way is because they do not have access to the latest Nvidia GPUs due to export bans put in place by the Biden administration. They only had access to older hardware.
These forced limitations meant their developers had to come up with novel solutions, many of which are now being seen by others thanks to the code being open-sourced.
When given a problem to solve, a developer will use every resource at their command to complete the task. If you work for a well-funded startup or a giant corporation with unlimited money like Google, there is no shortage of compute power available to you. Are you incentivised to make your code run faster, or incentivised to just build another power station?
They should invite game developers to come and help them optimise further. Not modern day ones, anyone who wrote games more than 30 years ago. Nobody made hardware sing more than those who were forced to work within the constraints of the Spectrum, Commodore and even early Playstation hardware.
Constraints breed creativity.