HomeIndustriesGoogle researchers introduce “VLOGGER,” an AI that may bring still images to...

Google researchers introduce “VLOGGER,” an AI that may bring still images to life

Google researchers have developed a brand new artificial intelligence system that may create lifelike videos of individuals talking, gesturing and moving from only a single still image. The technology, called VLOGGER, relies on advanced machine learning models to synthesize stunningly realistic footage, opening up a spread of potential applications while raising concerns about deepfakes and misinformation.

Described in a research paper titled “VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis“The AI ​​model can take a photograph of an individual and an audio clip as input after which output a video that matches the audio and shows the person saying the words and making corresponding facial expressions, head movements and hand gestures. The videos will not be perfect with some artifacts, but represent a major advance in the flexibility to animate still images.

VLOGGER generates photorealistic videos of speaking and gesturing avatars from a single image. (Source: enriccorona.github.io)

A breakthrough within the synthesis of talking heads

The researchers led by Henry Corona at Google Research used a variety of machine learning model called diffusion models to realize the novel result. Diffusion models have recently demonstrated remarkable performance in generating highly realistic images from text descriptions. By expanding into video and training on an enormous recent data set, the team was in a position to create an AI system that brings photos to life in extremely compelling ways.

“Unlike previous work, our method doesn’t require training for everybody, doesn’t depend on face detection and cropping, generates the total image (not only the face or lips), and considers a wide selection of scenarios (e.g. visible images). Torso or different subject identities) which might be crucial for the proper synthesis of communicating people,” the authors wrote.

A key enabler was the curation of an enormous recent dataset called MENTOR, containing over 800,000 distinct identities and a couple of,200 hours of video – an order of magnitude larger than what was previously available. This allowed VLOGGER to learn to create unbiased videos of individuals of various ethnicities, ages, clothing, poses and environments.

Possible applications and social implications

The technology opens up numerous compelling use cases. The article demonstrates VLOGGER's ability to robotically dub videos into other languages ​​by simply swapping the audio track, seamlessly edit and complement missing frames in a video, and create complete videos of an individual from a single photo.

One could imagine that actors could license detailed 3D models of themselves to generate recent performances. The technology is also used to create photorealistic avatars for virtual reality and gaming. And it could enable the event of AI-powered virtual assistants and chatbots which might be more engaging and expressive.

Google sees VLOGGER as a step toward “embodied conversational agents” that may naturally interact with people through speech, gestures and eye contact. “VLOGGER may be used as a standalone solution for presentations, education, narration, low-bandwidth online communication, and as an interface for purely text-based human-computer interaction,” the authors write.

However, the technology also has the potential for misuse, reminiscent of within the creation of deepfakes – synthetic media during which an individual in a video is replaced with the likeness of one other person. Because these AI-generated videos are more realistic and easier to create, this might exacerbate challenges related to misinformation and digital fakes.

A brand new frontier in AI research

While VLOGGER is impressive, it still has its limitations. The videos generated are relatively short and have a static background. The people don’t move in a 3D environment. And their behaviors and speech patterns, while realistic, will not be yet indistinguishable from those of real people.

Nevertheless, VLOGGER represents a major advance. “We evaluate VLOGGER against three different benchmarks and show that the proposed model outperforms other state-of-the-art methods by way of image quality, identity preservation, and temporal consistency,” the authors reported.

With further advances, one of these AI-generated media is anticipated to grow to be ubiquitous. We may soon live in a world where it should be difficult to inform whether the person talking to us in a video is real or generated by a pc program.

VLOGGER offers a primary glimpse into this future. It's a robust demonstration of the rapid advances in artificial intelligence and an indication of the increasing challenges we face in distinguishing between real and faux.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read