Nvidia's text-to-video technology will take your GIF gaming to the next level

Now that ChatGPT and Midjourney are pretty much mainstream, the next big AI race is text-to-video generators, and Nvidia just showed off some amazing demos of the technology that could soon take your GIFs to a new level.

A new research paper and microsite (opens in a new tab) from Nvidia's Toronto AI Lab, titled "High-Resolution Video Synthesis with Latent Diffusion Models," gives us an insight into the incredible creation tools video artists ready to join: growing list of the best AI art generators.

Latent Diffusion Models (or LDMs) are a type of AI that can generate video without the need for massive computing power. Nvidia says its technology does this by taking the work of text-to-image generators, in this case Stable Diffusion, and adding a "time dimension to the latent spatial diffusion model."

A gif of a stormtrooper vacuuming up a beach

(Image credit: Nvidia)

In other words, its generative AI can realistically move still images and scale them using super-resolution techniques. This means you can output short 4,7 second videos at 1280x2048 resolution, or longer videos at a lower 512x1024 resolution for driving videos.

Our immediate thought upon seeing the early demos (like the ones above and below) is how much this could push our GIF game forward. Granted, there are bigger ramifications, like the democratization of video creation and the possibility of automatic movie adaptations, but at this point, text to GIF seems like the most exciting use case.

A teddy bear playing an electric guitar.

(Image credit: Nvidia)

Simple prompts like "a stormtrooper is vacuuming the beach" and "a teddy bear is playing electric guitar, HD, 4K" produce quite usable results, though naturally there are artifacts and changes to some of the creations.

In the actuality, esto hace que la tecnología de texto a video, como las nuevas demostraciones de Nvidia, se más adecuada para miniaturas y GIF. But, given the rapid improvements seen in Nvidia's AI generation for longer scenes (opens in a new tab), we probably won't have to wait for longer text-to-video clips in stock libraries and beyond. of the.

Analytics: The Next Frontier of Generative AI

The sun peeks through the window of a New York loft

(Image credit: Hint)

Nvidia is not the first company to introduce an AI video text generator. We recently saw the debut of Google Phenaki (opens in a new tab), revealing its potential for longer cue-based 20-second clips. Its demos also show a clip, albeit longer, lasting for over two minutes.

Startup Runway, which helped create the Stable Diffusion text-to-image generator, also unveiled its Gen-2 AI video model (opens in a new tab) last month. In addition to responding to prompts like "afternoon sun peeking through a New York loft window" (result above), it lets you provide a still image on which to base the generated video, and lets you request styles to apply to your videos, too.

The latter was also a topic of recent Adobe Firefly demos, which showed how AI would make video editing easier. In programs like Adobe Premiere Rush, you'll soon be able to type the time of day or season you want to see in your video, and Adobe's AI will do the rest.

Recent demos from Nvidia, Google, and Runway show that full-text-to-video rendering is in a bit of a hazier state, often creating strange, dreamy, or distorted results. But for now, it'll be fine for our GIF game, and quick improvements are surely on the way that will make the technology suitable for longer videos.