Meta releases V-JEPA, a predictive vision model

February 27, 2024

75

Meta has released V-JEPA, a predictive vision model that represents the subsequent step toward Meta Chief AI Scientist Yann LeCun's vision of advanced machine intelligence (AMI).

In order for AI-powered machines to interact with objects within the physical world, they should be trained, but traditional methods are very inefficient. They use 1000’s of video examples with pre-trained image encoders, text, or human annotations to permit a machine to learn a single concept, let alone multiple skills.

V-JEPA, which stands for Joint Embedding Predictive Architectures, is a vision model designed to learn these concepts more efficiently.

LeCun said: “V-JEPA is a step towards a more profound understanding of the world in order that machines can achieve more general considering and planning.”

V-JEPA learns how objects within the physical world interact in the identical way toddlers do. An necessary a part of our learning is filling in gaps to predict missing information. When an individual goes behind a screen and comes out the opposite side, our brain fills the gap with an understanding of what happened behind the screen.

V-JEPA is a non-generative model that learns by predicting missing or masked parts of a video. Generative models can recreate a masked piece of video pixel by pixel, but V-JEPA doesn't do this.

It compares abstract representations of unlabeled images reasonably than the pixels themselves. V-JEPA is presented with a video with a big portion hidden and simply enough video footage to supply some context. The model is then asked to supply an abstract description of what is occurring within the hidden space.

Instead of coaching for a selected skill, Meta says, “the corporate used self-supervised training through a series of videos and learned a series of things about how the world works.”

Today we’re releasing V-JEPA, a technique that enables machines to learn to know and model the physical world by watching videos. This work is one other necessary step on this direction @ylecunThe outlined vision of AI models that use a learned understanding of the world to plan, reason and… pic.twitter.com/5i6uNeFwJp

— AI at Meta (@AIatMeta) February 15, 2024

Frozen reviews

Metas research paper explains that one in all the important thing things that makes V-JEPA so far more efficient than another vision learning models is how good it’s at “frozen assessments.”

After the encoder and predictor undergo self-supervised learning on large unlabeled data, no further training is required when learning a brand new skill. The pre-trained model is frozen.

Previously, in case you desired to refine a model to learn a brand new skill, you needed to update the parameters or weights of the whole model. For V-JEPA to learn a brand new task, it only requires a small amount of labeled data with only a small set of task-specific parameters optimized on the frozen backbone.

V-JEPA's ability to efficiently learn recent tasks holds promise for the event of embodied AI. This may very well be the important thing to enabling machines to be contextually aware of their physical environment and to handle planning and sequential decision-making tasks.

Meta releases V-JEPA, a predictive vision model

Frozen reviews

LEAVE A REPLY Cancel reply

Must Read

OpenAI introduces GPT-4o mini, a smaller and cheaper AI model

The way forward for work: How Salesforce and Workday's AI alliance will transform your office

Silicon Valley shaken as open-source AI models Llama 3.1 and Mistral Large 2 rival industry leaders

Trend reversal in technology stocks pushes US megacaps into correction zone

A brand new Chinese video generation model appears to censor politically sensitive topics

OpenAI pronounces “SearchGPT” to remain at the highest

How Salesforce's STEM 1T dataset could revolutionize the AI industry

Latest articles

OpenAI introduces GPT-4o mini, a smaller and cheaper AI model

The way forward for work: How Salesforce and Workday's AI alliance will transform your office

Silicon Valley shaken as open-source AI models Llama 3.1 and Mistral Large 2 rival industry leaders

Our Newsletter

Meta releases V-JEPA, a predictive vision model

Frozen reviews

RELATED ARTICLES

LEAVE A REPLY Cancel reply

Must Read

Latest articles

Our Newsletter