In its latest push to redefine the AI landscape, Google announced Gemini 2.0 Flash Thinkinga multimodal reasoning model able to tackling complex problems quickly and transparently.
In one Post on social network XGoogle CEO Sundar Pichai wrote: “Our most sophisticated model yet :)”
And on that Developer documentationGoogle explains: “Thinking Mode is able to developing stronger pondering skills in its responses than the bottom model Gemini 2.0 Flash,” which was Google's latest and best model so far and was released just eight days ago.
The latest model only supports 32,000 tokens of input (approx 50-60 pages of text) and might produce 8,000 tokens per output response. In a side panel on Google AI Studio, the corporate claims it's best for “multimodal understanding, pondering” and “coding.”
Full details concerning the model's training process, architecture, licensing, and value have yet to be released. Currently, Google AI Studio doesn’t display a value per token.
More accessible and transparent reasoning
Unlike OpenAI's competing reasoning models o1 and o1 mini, Gemini 2.0 allows users to access step-by-step reasoning via a drop-down menu, providing clearer and more transparent insight into how the model reaches its conclusions.
By allowing users to see how decisions are made, Gemini 2.0 addresses long-standing concerns about AI's black box function and aligns this model – the licensing terms of that are still unclear – with other open source models competition.
My early easy tests of the model showed that it accurately and quickly (inside one to a few seconds) answered some questions that were notoriously difficult for other AI models, reminiscent of counting the Rs within the word “strawberry.” (See screenshot above).
In one other test, comparing two decimal numbers (9.9 and 9.11), the model systematically broke down the issue into smaller steps, from analyzing whole numbers to comparing decimals.
These results are supported by independent third-party evaluation LM arenawhich named Gemini 2.0 Flash Thinking the very best performing model in all LLM categories.
Native support for image uploads and evaluation
An extra improvement over the competing OpenAI o1 family, Gemini 2.0 Flash Thinking is designed to process images from the jump.
o1 began as a pure text model, but has since been expanded to incorporate the evaluation of image and file uploads. Both models can currently only return text.
Gemini 2.0 Flash Thinking also doesn’t currently support linking to Google Search or integration with other Google apps and external third-party tools Developer documentation.
Gemini 2.0 Flash Thinking's multimodal capability expands its potential use cases and enables it to handle scenarios that mix different data types.
For example, in a single test, the model solved a puzzle that required evaluation of text and visual elements, demonstrating its versatility in integrating and reasoning across formats.
Developers can use these features through Google AI Studio and Vertex AI, where the model is out there for experimentation.
As the AI landscape becomes increasingly competitive, Gemini 2.0 Flash Thinking could mark the start of a brand new era for problem-solving models. Its ability to process diverse data types, provide visible inferences, and deliver scalable performance makes it a serious contender within the inferential AI market, competing with OpenAI's o1 family and beyond.