HomeArtificial IntelligenceMeta is entering the AI ​​video war with the powerful Movie Gen,...

Meta is entering the AI ​​video war with the powerful Movie Gen, set to launch on Instagram in 2025

Meta founder and CEO Mark Zuckerberg, who built the corporate on the back of his popular social network Facebook, finished well this week. He posts a video of himself doing a leg press On his personal Instagram (a social network that Facebook acquired in 2012), he works out on a machine on the gym.

However, within the video, the leg press transforms right into a neon cyberpunk version, an ancient Roman version, and in addition a gold flaming version.

As it seems, Zuck was doing greater than just sports: he was using the video to make an announcement film genreMeta's recent family of generative multimodal AI models that may create each video and audio from text prompts, allowing users to customize their very own videos, add computer graphics, props, costumes, and alter chosen elements simply through text guidance, as Zuck did in his has video.

The models look like extremely powerful, allowing users to change only chosen elements of a video clip fairly than “reshooting” or regenerating the entire thing, much like Pika's spot editing on older models, but with longer clip generation and integrated sound.

Metas tests, described in a technical description Paper on the family of models released today show that it outperforms leading competitors in the sector, including Runway Gen 3, Luma Dream Machine, OpenAI Sora and Kling 1.5, in lots of audience rankings of assorted attributes corresponding to consistency and “naturalness” of movement.

Meta has positioned Movie Gen as a tool for each on a regular basis users trying to improve their digital storytelling and skilled video artists and editors, even Hollywood filmmakers.

Movie Gen represents Meta's latest advancement in generative AI technology, combining video and audio capabilities right into a single system.

Specifically, Movie Gen consists of 4 models:

1. Video of the film generation – a 30B parameter text-to-video generation model

2. Movie Gen Audio – a 13B parameter video-to-audio generation model

3. Personalized Movie Gen Video – a version of Movie Gen Video post-trained to generate personalized videos based on an individual's face

4. Movie Gene Edit – a model with a novel post-training method for precise video editing

These models enable the creation of realistic, personalized HD videos of as much as 16 seconds at 16 FPS in addition to 48 kHz audio and supply video editing capabilities.

Designed to handle tasks starting from personalized video creation to stylish video editing and high-quality audio generation, Movie Gen leverages powerful AI models to expand users' creative options.

Key features of Movie Gen suite include:

Video generation: Movie Gen allows users to supply high resolution (HD) videos by simply entering text prompts. These videos may be rendered at 1080p resolution, are as much as 16 seconds long and are powered by a 30 billion parameter transformer model. The AI's ability to administer detailed prompts allows it to handle various points of video creation, including camera movement, object interactions, and environmental physics.

Personalized Videos: Movie Gen offers an exciting personalized video feature that permits users to upload a picture of themselves or others to be featured in AI-generated videos. The model can adapt to different prompts while preserving the person's identity, making it useful for creating customized content.

Precise video editing: The Movie Gen suite also includes advanced video editing features that allow users to vary specific elements inside a video. This model can change localized points corresponding to objects or colours, in addition to global changes corresponding to background changes, all based on easy text instructions.

Audio generation: In addition to video capabilities, Movie Gen also has an audio generation model with 13 billion parameters. This feature enables the creation of sound effects, ambient music and synchronized audio that works seamlessly with visual content. Users can create Foley noises (sound effects that amplify and concurrently amplify real-world sounds corresponding to the ruffling of materials and the echo of footsteps), instrumental music, and other audio elements as much as 45 seconds in length. Meta posted a sample video of Foley sounds below (turn up the quantity to listen to it):

Trained on billions of online videos

Movie Gen is the newest advancement in Meta's ongoing AI research efforts. To train the models, Meta says it relied on “Internet-scale image, video and audio data,” specifically 100 million videos and 1 billion images, from which it “learned concerning the visual world by 'watching' videos.” learns”. technical paper.

However, Meta didn’t specify whether the information was in paper form, licensed into the general public domain, or whether it was simply deleted, as many other AI model makers have done – resulting in criticism from artists and creators corresponding to YouTuber Marques Brownlee (MKBHD) – and, within the case of AI video modeling provider Runway, a category motion lawsuit alleging copyright infringement by creators (which continues to be before the courts). Therefore, it may be assumed that Meta will immediately face criticism for its data sources.

Leaving aside the legal and ethical questions surrounding training, Meta clearly positions the Movie Gen creation process as novel, using a mix of typical diffusion model training (commonly utilized in video and audio AI) with Large Language Model (LLM) training and a brand new one A way called “flow matching,” which relies on modeling changes within the distribution of a knowledge set over time.

At each step, the model learns to predict the speed at which samples should “move” toward the goal distribution. Flow matching differs from standard diffusion-based models in key ways:

Zero Terminal Signal-to-Noise Ratio (SNR): Unlike traditional diffusion models that require special noise plans to take care of zero-end SNR, flow matching inherently ensures zero-end SNR without additional adjustments. This provides robustness to the alternative of noise plans and contributes to more consistent and better quality video outputs.

Efficiency in Training and Inference: Flow matching proves to be more efficient in comparison with diffusion models when it comes to each training and inference. It offers flexibility within the kind of noise plans used and demonstrates improved performance across a spread of model sizes. This approach has also shown higher agreement with human assessment results.

The Movie Gen system's training process focuses on maximizing flexibility and quality in each video and audio generation. It relies on two essential models, each with extensive training and fine-tuning procedures:

Video model of the film generation: This model has 30 billion parameters and starts with basic text-to-image generation. Text is then converted into video, creating videos as much as 16 seconds long in HD quality. The training process features a large dataset of videos and pictures, allowing the model to know complex visual concepts corresponding to motion, interactions and camera dynamics. To improve the model's capabilities, they refined it on a curated set of high-quality videos with text captions, which improved the realism and precision of its results. The team further expanded the model's flexibility by training it to handle personalized content and editing commands.

Audio model of the film generation: With 13 billion parameters, this model generates high-quality audio that’s synchronized with visual elements within the video. The training set included over one million hours of audio, allowing the model to acknowledge each physical and psychological connections between sound and image. They improved this model through supervised fine-tuning using chosen high-quality audio and text pairs. This process helped create realistic ambient sounds, synchronized sound effects, and mood-matched background music for various video scenes.

It follows previous projects corresponding to Make-A-Scene and the Llama Image models, which focused on producing high-quality images and animations.

This release marks the third major milestone in Meta's journey to generative AI and underscores the corporate's commitment to pushing the boundaries of media creation tools.

Launching on Insta in 2025

Set to debut on Instagram in 2025, Movie Gen is poised to make advanced video creation more accessible to the platform's wide selection of users.

While the models are currently within the research phase, Meta is optimistic that Movie Gen will enable users to supply compelling content with ease.

As the product evolves, Meta intends to work with developers and filmmakers to refine Movie Gen's features and ensure it meets users' needs.

Meta's long-term vision for Movie Gen reflects a broader goal of democratizing access to stylish video editing tools. While the suite offers significant potential, Meta recognizes that generative AI tools like Movie Gen are intended to reinforce, not replace, the work of skilled artists and animators.

As Meta prepares to launch Movie Gen, the corporate stays focused on refining the technology and eliminating existing limitations. Further optimizations are planned aimed toward improving inference time and expanding the capabilities of the model. Meta also hinted at possible future applications, corresponding to creating custom animated greetings or short movies based entirely on user input.

The release of Movie Gen could usher in a brand new era for content creation on Meta's platforms, with Instagram users among the many first to experience this revolutionary tool. As technology advances, Movie Gen could develop into a crucial a part of the meta-ecosystem and the ecosystem of makers – professionals and indie producers.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read