AI startup Runway announced the launch of a new generative diffusion model it says allows users to create short AI videos of their choice from text, images, and video clips.
Dubbed Gen-2, the novel system enables users to “realistically and consistently synthesize” new videos, either by applying the composition and style of an image or text prompt to the structure of a source video (video-to-video). Or by simply using text prompts (text-to-video).
It is a significant development for the text-to-video technology. Runway, a web-based machine-learning-powered video editor, unveiled its first AI video editing model, Gen-1, in February. The software can make new videos using data from existing videos.
For example, Gen-1 could turn a realistically filmed scene into an animated render while still retaining the “proportions and movements of the original scene.” Users could also edit video by isolating subjects in said video and modifying them with simple text instructions.
Gen-2 creating entirely new videos
Runway helped create the popular Stable Diffusion AI image generator last year. The company was founded by artists who aimed “to bring the unlimited creative potential of AI to everyone, everywhere with anything to say,” according to its website.
Apparently, work on multimodal AI system Gen-2 has been ongoing since last September. The novel text-to-video model is designed to be a major improvement on its predecessor, combining features from Gen-1 to create relatively high resolution videos.
“Deep neural networks for image and video synthesis are becoming increasingly precise, realistic, and controllable. We have gone from blurry low-resolution images to both highly realistic and aesthetic imagery allowing for the rise of synthetic media,” said Runway.
Unlike Gen-1, the latest iteration can generate completely new video content from scratch using only text inputs. On the downside, however, AI video clips showcased by Runway are still very short – just three seconds long – and somewhat shaky.
Runway generated the short video scene below with the prompt, “Aerial drone footage of a mountain range”. There’s no audio but the firm says it is working on it.
The company was also able to make the three-second video below based on the simple prompt, “a close up of an eye.”
Gen-2 is being released to a limited number of users via a waitlist on the Runway Discord. The company is hoping creatives and storytellers use the tool once it becomes publicly available in the weeks to come. Others are already trying it out with fascinating results.
Text-to-video to change the world
Runaway is not the only firm making in-roads with text-to-video. Meta and Google have both made significant advances, with video clips that are reportedly much “longer and cohesive.”
Microsoft Germany CTO Andreas Braun recently teased the idea that the company’s GPT4-powered Bing search engine could let users create video, music, and images from text. But OpenAI, the creators of GPT-4, cautioned that text-to-video is still in development.
Observers say Runway’s Gen-2 could revolutionize content creation and filmmaking.
“My thoughts on this are that this is just the beginning, this is a new crazy tool that we will have at our disposal, as the models get better and better the expected output will get better,” said developer Linus Ekenstam.
Gen-2 is based on research on generative diffusion models by Runway research scientist Patrick Esser, published by Cornell University in February. Simply put, diffusion models use machine learning to create AI images, and now, video.
This article is originally from MetaNews.