DeepSeek R1 $5.6M is like your $5 plate of Hokkien Mee
When DeepSeek announced they trained their V3 language model for just $5.6M, many were impressed including me. From an engineer’s perspective after reading the paper, its a “of course why not”, but to investors and many riding on the AI hypwagon, they were “caught of guard”.
That $5.6M figure – it’s just the final training run cost. Many became fixated at that number. But any engineer who have build AI models and systems will know that the true costs is a lot more. The $5.6M does not account for the many week/months/years spent building the know-how, team and infrastructure required to get to that final run.
It’s like saying you can run a marathon in 2 hours, without mentioning the years of training that went into it. Or the hawker that only spent 5-minutes frying that Hokkien Mee and discounting the fact he (or she) had to spend many hours prior to prepare the ingredients and stock for day, and the days, weeks and years to learn, experiment and perfect that plate of Hokkien Mee in minutes.

The Hidden Costs Nobody Talks About
Me and my team have built many of Singapore’s high performance computing (HPC) infrastructures using Linux and open-source tools like NPACI Rocks, and more importantly contributing back to Rocks development back in the 2000s. So coming from an engineering infrastructure point of view, here are the areas where DeepSeek had to spend large amongst of resources and money:
- Research & Development – Years of experimentation and refinements of their models and technqiues even before R1 came out.
- Data/Compute Infrastructure – DeepSeek has significant investments here. They run their own data centers for their Hedge Funds business, which also means they had the experience in fine-tunning their infrastructure to run efficiently and robustly.
- Engineering Talent – Our experience building our own AI engineering team shows quality AI talent does not come cheap. DeepSeek strong engineering team is a major investment.
The Innovation Factor
What interests me most is how DeepSeek innovated under constraints. Their engineering innovation included:
- A “mixture of experts” (MoE) architecture: While MoE have a been around for a while, their application of it at a larger scale and by turning on and off the MoEs allowed them to run more efficiently at lower memory footprint.
- Used FP8 low precision processing: This is again a well-known technique, but I guess DeepSeek used it more aggressively here. The basic idea is simple, use FP8 when you are far from the optimal and when you get closer you switch to higher precision FP16 and then FP32, so that you do not overshoot. (You may want to dig out your ‘O” or ‘A’ levels textbooks and read up on gradient descent and step-size).
- Optimized their hardware programming: There is some hard-core low level assembly programming work here to optimize code below CUDA to achieve faster and lower latency performance. These are what hedge funds engineers do all the time so as to achieve ultra-low-latency and high-performance systems to execute trades in sub-second timeframes.
The last point is particularly important, and one of the reasons why many research labs were “caught off guard” is because most of them operate at the Python/PyTorch level. Except for the very big labs where I am sure they used similar techniques but which they do not disclose (unlike DeepSeek), most deep-learning researchers and engineers focused on writing code with libraries and conduct research at these pythonic levels. Many university graduates today can only programme with Python and cannot code in C/C++ and definitely not assembly.
Only high-performance computing coders and hedge fund engineers, who must extract every ounce of performance from their hardware, have traditionally resorted to low-level programming to achieve the necessary efficiency. In contrast, Big Tech companies and many AI startups, often flush with cash, tend to take the easier route by simply purchasing more GPUs and servers, adhering to the scaling law. Unfortunately, many of these research labs and start-ups remain unaware that leveraging advanced software programming tools, techniques, or lower-level coding could yield a 5-10X improvement in hardware performance. This oversight highlights a critical gap between resource availability and optimization potential in the tech industry.
The picture below shows the benchmark of open-source R vs a commercial R tool – Revolution Analytics[1] R (acquired by Microsoft in 2015). The benchmark dataset is the well-known airline arrival dataset, and RevoR was like 1378X-20,000X faster than open-source R.

I recalled clearly, many times when we were building these HPC clusters for the universities in the 2000s, I told the professors and researchers to allocated budget for commercial tools and compilers from the likes of INTEL, PGI or RevolutionR, and also to get higher performance networking gear like Infiniband or Myrinet, so as to get 20%-30% more performance out of their clusters, but most ignored our advise and just wanted more servers and GPUs.
What This Means For Singapore
As Singapore continues its journey to become an AI-First nation, DeepSeek’s story offers valuable lessons:
- Focus on optimization and efficiency: and not just buy more and more GPUs (It is also better for sustainability).
- Build indigenous talent capabilities from both STEM and non-STEM to do AI.
- Embrace open source. See From Supercomputers to LLMs: How Open Source and Ingenuity Democratize Cutting-Edge Technology
Looking Ahead
While the $5.6M headline is impressive, the bigger story is about innovation under constraints. As we’ve learned through our AIAP and 100E programme, successful AI adoption requires more than just technology – it needs the right combination of talent, infrastructure and methodology.
For organizations embarking on their AI journey, I recommend looking beyond the headlines. Focus on building sustainable capabilities, just as we’re doing through programmes like the AI Apprenticeship Programme AIAP.
What are your thoughts on DeepSeek’s approach? I’d love to hear your perspectives on sustainable (technical and talent) AI development.
—
[1] I was the General Manager of Revolution Analytics for Asia until the acquisition by Microsoft in 2015.