Engineers at Facebook's parent company Meta revealed how they were able to offer free memory using a software solution called Transparent Memory Offloading (TMO).
It's now part of the Linux kernel, and simply put, it automatically offloads data to other tiers of storage (for example, Samsung's CX memory expander) that are less expensive and more efficient than memory.
The savings are significant; TMO has been running on millions of Facebook servers for over a year, saving up to almost a third of memory per server. While this would likely be insignificant across dozens or even hundreds of servers, the sheer scale of Facebook presents a unique challenge.
Analysis: Facebook's gigantic appetite for RAM
The world's largest social network has almost three billion monthly active users and millions of servers in 21 locations around the world. If each server had an average of 128 GB of RAM, that would represent 256 million GB (or 256 PB) of RAM, which, at an average cost of €4 per GB (DDR4 ECC RAM), represents approximately €1 billion memory That's assuming Facebook has at least two million servers (the Facebook blog cited "millions of servers" as early as July 2018), and the actual number is likely to be much higher.
Figures presented by the team that worked on TMO showed that memory cost is a third of Meta's server BOM, with compressed RAM and SSD accounting for less than 11%. More worryingly, the cost of RAM (as a percentage of total infrastructure) has more than doubled since Facebook launched its first generation of servers (it's currently in its fourth).
The adoption of TMO has some drawbacks; more specifically, performance degradation. But the gains in power and memory savings far outweigh the drawbacks, and future iterations combined with hardware improvements (for example, faster SSDs or CXLs) will provide further mitigation.