Nvidia has developed a way to turn 2D photos into 3D scenes

Nvidia has developed a way to turn 2D photos into 3D scenes

AI researchers at Nvidia have come up with a way to turn a handful of 2D images into a 3D scene almost instantly using lightning-fast neural network training coupled with fast rendering.

Known as inverse rendering, the process leverages AI to approximate the behavior of light in the real world to transform 2D images shot from different angles into 3D scenes.

Nvidia researchers applied their new approach to a popular new technology called Neural Radiation Fields, or NeRF for short. The result, which the company has dubbed Instant NeRF, is the fastest NeRF technique to date, and in some cases more than 1000 times faster. The neural model used takes just a few seconds to train on a few dozen still photos, though it also requires data on the camera angles from which they were taken.

Nvidia's VP of Graphics Research David Luebke provided additional information on the difference between NeRF and Instant NeRF in a blog post, saying:

“While traditional 3D representations such as polygon meshes are similar to vector images, NeRFs are like bitmaps: They densely capture the way light radiates from an object or into a scene. In this sense, Instant NeRF could be as important to 3D as digital cameras and JPEG compression have been to 2D photography, dramatically increasing the speed, ease, and scope of 3D capture and sharing.

Possible use cases

Using neural networks, NeRFs can render realistic 3D scenes based on an input collection of 2D images. The most interesting part, however, is how the neural networks used to create them can fill in the gaps between 2D images even when the objects or people in them are blocked by obstacles.

Typically, creating a 3D scene using traditional methods can take a few to several hours, depending on the complexity and resolution of the display. By introducing AI into the picture, even early NeRF models were able to generate sharp, artifact-free scenes within minutes after being trained for several hours.

Nvidia's Instant NeRFs are capable of reducing the required rendering time by orders of magnitude using a company-developed technique called multi-resolution hash grid encoding that has been optimized to run efficiently on Nvidia GPUs. The model the company unveiled at GTC 2022 uses the Nvidia CUDA Toolkit and Tiny CUDA Neural Networks Library, which can be trained and run on a single Nvidia GPU, though graphics cards with Nvidia Tensor Cores can handle the job even faster.

In the future, Instant NeRF technology could be used to rapidly create avatars or scenes for virtual worlds, capture video conference participants and their environments in 3D, or reconstruct scenes for 3D digital maps. Alternatively, the technology could also be used to train autonomous robots and cars to better understand the size and shape of real-world objects by capturing 2D images or video footage of them. At the same time, the architecture and entertainment industries can use Instant NeRF to rapidly generate digital representations of real environments that creators can modify and expand.

Nvidia researchers are also exploring how their new input encoding technique could be used to speed up various AI challenges, such as reinforcement learning, language translation, and general-purpose deep learning algorithms.