Liquid AI has released LFM2-VLA brand new generation of visual foundation models for the visual foundation Developed for efficient provision via a wide selection of hardware – From smartphones and laptops to wearables and embedded systems.
The models promise performance with low latency, strong accuracy and suppleness for real applications.
LFM2-VL builds on the corporate's existing LFM2 architecture and extends them into multimodal processing, which supports each text and image inputs for variable resolutions.
According to liquid AI the Models deliver as much as twice as high because the GPU inference speed of comparable vision language modelsDuring the competition performance at common benchmarks.
“Efficiency is our product,” wrote the co -founder and CEO of Liquid Ai, Ramin Hasani In a post on X wherein the brand new model family is announced:
Two variants for various needs
The publication incorporates two model sizes:
- LFM2-VL-450M -A hyperefficient model with lower than half a billion parameters (internal settings) that aim at strongly resource -related environments.
- LFM2-VL-1.6B -A more capable model that’s light enough for the person GPU and the device base.
Both variants process images for native resolutions as much as 512 Ă— 512 pixels, which avoids distortions or unnecessary high scaling.
For larger images, the system doesn’t use overlapping patching and adds a thumbnail view for the worldwide context, in order that the model can capture each fantastic details and the broader scene.
Background on liquid AI
Liquid Ai was founded by former researchers from the Labor for Computer Science and Artificial Intelligence of the MIT (CSAIL) to construct AI architectures that transcend the widespread transformer model.
The company's flagship innovation, the Liquid Foundation Models (LFMS), relies on principles from dynamic systems, signal processing and numerical linear algebra, whereby all-purpose AI models are generated which are in a position to handle text, video, audio, time series and other sequential data.
In contrast to standard architectures, Liquid's approach goals to deliver competitive or superior performance with significantly less arithmetic resourcesTo enable real-time adaptability in the course of the inference and at the identical time maintain low storage requirements. This makes LFMS well suited to each large-scale applications and resource-limited EDGE deployments.
In July 2025, the corporate expanded its platform strategy with the beginning of the Liquid Edge Ai Platform (LEAP). A cross -platform SDK, which makes it easier for developers to perform small voice models directly on mobile and embedded devices.
LEAP offers OSAgagnostic support for iOS and Android, the mixing into the own models of liquid and other open source SLMs in addition to an integrated library with models of only 300 MB mall enough for contemporary phones with minimal RAM.
With his accompanying app, Apollo, developers can completely test models offline and deal with emphasizing the info protection management AI with low latency with liquid AI. Together, Leap and Apollo reflect the corporate's commitment to decentralize the AI execution, the dependence on cloud infrastructure and the strengthening of developers, optimized, optimized, tasks-specific models for real environments.
Speed/quality compromises and technical design
LFM2-VL uses a modular architecture Combining a language model backbone, a Siglip2 -Naflex vision and a multimodal projector.
The projector incorporates a two-layer MLP connector with a pixel bundle, which reduces the variety of image tokens and improves throughput.
Users can set parameters corresponding to the utmost variety of image token or patches so which you could compensate for speed and quality depending on the requesting scenario. The training process included roughly 100 billion multimodal tokens, which were obtained from open data records and internal synthetic data.
Performance and benchmarks
The models achieve competitive benchmark ends in plenty of references. LFM2-VL-1.6B evaluated in RealworldQA (65.23), Infovqa (58.68) and OCRBENCH (742) and keeps solid ends in multimodal argumentation tasks.
In the case of inference tests, LFM2-VL achieved the fastest GPU processing times in its class if it was tested on an ordinary workload of a 1024 Ă— 1024 picture and a brief entry request.

License and availability
LFM2 VL models at the moment are available on the hug face and an example of fine-tuning code in Colab. They are compatible with hugs facial transformers and TRL.
The models are published under a custom “LFM1.0 license”. Liquid AI has described this license as based on Apache 2.0 principles, but the whole text has not yet been published.
The company stated that business use is permitted under certain conditions, whereby the annual turnover varies for corporations over and lower than 10 million US dollars.
With LFM2-VL, Liquid Ai goals to make a multimodal AI for powerful and resource-limited provisions more accessible without affecting the power.

