Google’s attempting to make waves with Gemini, its flagship suite of generative AI models, apps, and services. But what’s Gemini? How can you employ it? And how does it stack as much as other generative AI tools corresponding to OpenAI’s ChatGPT, Meta’s Llama, and Microsoft’s Copilot?
To make it easier to maintain up with the most recent Gemini developments, we’ve put together this handy guide, which we’ll keep updated as recent Gemini models, features, and news about Google’s plans for Gemini are released.
What is Gemini?
Gemini is Google’s long-promised, next-gen generative AI model family. Developed by Google’s AI research labs DeepMind and Google Research, it is available in 4 flavors:
- Gemini Ultra, a really large model.
- Gemini Pro, a big model – though smaller than Ultra. The latest version, Gemini 2.0 Pro Experimental, is Google’s flagship.
- Gemini Flash, a speedier, “distilled” version of Pro. It also is available in a rather smaller and faster version, called Gemini Flash-Lite, and a version with reasoning capabilities, called Gemini Flash Thinking Experimental.
- Gemini Nano, two small models: Nano-1 and the marginally more capable Nano-2, which is supposed to run offline
All Gemini models were trained to be natively multimodal — that’s, in a position to work with and analyze greater than just text. Google says they were pre-trained and fine-tuned on quite a lot of public, proprietary, and licensed audio, images, and videos; a set of codebases; and text in numerous languages.
This sets Gemini aside from models corresponding to Google’s own LaMDA, which was trained exclusively on text data. LaMDA can’t understand or generate anything beyond text (e.g., essays, emails, and so forth), but that isn’t necessarily the case with Gemini models.
We’ll note here that the ethics and legality of coaching models on public data, in some cases without the information owners’ knowledge or consent, are murky. Google has an AI indemnification policy to shield certain Google Cloud customers from lawsuits should they face them, but this policy accommodates carve-outs. Proceed with caution — particularly should you’re intending on using Gemini commercially.
What’s the difference between the Gemini apps and Gemini models?
Gemini is separate and distinct from the Gemini apps on the internet and mobile (formerly Bard).
The Gemini apps are clients that hook up with various Gemini models and layer a chatbot-like interface on top. Think of them as front ends for Google’s generative AI, analogous to ChatGPT and Anthropic’s Claude family of apps.
Gemini on the internet lives here. On Android, the Gemini app replaces the present Google Assistant app. And on iOS, the Google and Google Search apps function that platform’s Gemini clients.
On Android, it also recently became possible to bring up the Gemini overlay on top of any app to ask questions on what’s on the screen (e.g., a YouTube video). Just press and hold a supported smartphone’s power button or say, “Hey Google”; you’ll see the overlay pop up.
Gemini apps can accept images in addition to voice commands and text — including files like PDFs and shortly videos, either uploaded or imported from Google Drive — and generate images. As you’d expect, conversations with Gemini apps on mobile carry over to Gemini on the internet and vice versa should you’re signed in to the identical Google Account in each places.
Gemini Advanced
The Gemini apps aren’t the one technique of recruiting Gemini models’ assistance with tasks. Slowly but surely, Gemini-imbued features are making their way into staple Google apps and services like Gmail and Google Docs.
To benefit from most of those, you’ll need the Google One AI Premium Plan. Technically a component of Google One, the AI Premium Plan costs $20 and provides access to Gemini in Google Workspace apps like Docs, Maps, Slides, Sheets, Drive, and Meet. It also enables what Google calls Gemini Advanced, which brings the corporate’s more sophisticated Gemini models to the Gemini apps.
Gemini Advanced users get extras here and there, too, like priority access to recent features, the power to run and edit Python code directly in Gemini, and a bigger “context window.” Gemini Advanced can remember the content of — and reason across — roughly 750,000 words in a conversation (or 1,500 pages of documents). That’s in comparison with the 24,000 words (or 48 pages) the vanilla Gemini app can handle.

Gemini Advanced also gives users access to Google’s Deep Research feature, which uses “advanced reasoning” and “long context capabilities” to generate research briefs. After you prompt the chatbot, it creates a multi-step research plan, asks you to approve it, after which Gemini takes just a few minutes to look the net and generate an intensive report based in your query. It’s meant to reply more complex questions corresponding to, “Can you help me redesign my kitchen?”
Google also offers Gemini Advanced users a memory feature, that permits the chatbot to make use of your old conversations with Gemini as context on your current conversation. Gemini Advanced users also get increased usage for NotebookLM, the corporate’s product that turns PDFs into AI-generated podcasts.
Gemini Advanced users also get access to Google’s experimental version of Gemini 2.0 Pro, the corporate’s flagship model that’s optimized for difficult coding and math problems.
Another Gemini Advanced exclusive is trip planning in Google Search, which creates custom travel itineraries from prompts. Taking under consideration things like flight times (from emails in a user’s Gmail inbox), meal preferences, and data about local attractions (from Google Search and Maps data), in addition to the distances between those attractions, Gemini will generate an itinerary that updates robotically to reflect any changes.Â
Gemini across Google services can be available to corporate customers through two plans, Gemini Business (an add-on for Google Workspace) and Gemini Enterprise. Gemini Business costs as little as $6 per user monthly, while Gemini Enterprise — which adds meeting note-taking and translated captions in addition to document classification and labeling — is mostly costlier, but is priced based on a business’s needs. (Both plans require an annual commitment.)
In Gmail, Gemini lives in a side panel that may write emails and summarize message threads. You’ll find the identical panel in Docs, where it helps you write and refine your content and brainstorm recent ideas. Gemini in Slides generates slides and custom images. And Gemini in Google Sheets tracks and organizes data, creating tables and formulas.
Google’s AI chatbot recently got here to Maps, where Gemini can summarize reviews about coffee shops or offer recommendations about how you can spend a day visiting a foreign city.
Gemini’s reach extends to Drive as well, where it will probably summarize files and folders and provides quick facts a couple of project. In Meet, meanwhile, Gemini translates captions into additional languages.

Gemini recently got here to Google’s Chrome browser in the shape of an AI writing tool. You can use it to write down something completely recent or rewrite existing text; Google says it’ll consider the net page you’re on to make recommendations.
Elsewhere, you’ll find hints of Gemini in Google’s database products, cloud security tools, and app development platforms (including Firebase and Project IDX), in addition to in apps like Google Photos (where Gemini handles natural language search queries), YouTube (where it helps brainstorm video ideas), and the NotebookLM note-taking assistant.
Code Assist (formerly Duet AI for Developers), Google’s suite of AI-powered assistance tools for code completion and generation, is offloading heavy computational lifting to Gemini. So are Google’s security products underpinned by Gemini, like Gemini in Threat Intelligence, which might analyze large portions of probably malicious code and let users perform natural language searches for ongoing threats or indicators of compromise.
Gemini extensions and Gems
Announced at Google I/O 2024, Gemini Advanced users can create Gems, custom chatbots powered by Gemini models. Gems might be generated from natural language descriptions — for instance, “You’re my running coach. Give me a every day running plan” — and shared with others or kept private.
Gems can be found on desktop and mobile in 150 countries and most languages. Eventually, they’ll have the opportunity to tap an expanded set of integrations with Google services, including Google Calendar, Tasks, Keep, and YouTube Music, to finish custom tasks.

Speaking of integrations, the Gemini apps on the internet and mobile can tap into Google services via what Google calls “Gemini extensions.” Gemini today integrates with Google Drive, Gmail, and YouTube to answer queries corresponding to “Could you summarize my last three emails?” Later this 12 months, Gemini will have the opportunity to take additional actions with Google Calendar, Keep, Tasks, YouTube Music and Utilities, the Android-exclusive apps that control on-device features like timers and alarms, media controls, the flashlight, volume, Wi-Fi, Bluetooth, and so forth.
Gemini Live in-depth voice chats
An experience called Gemini Live allows users to have “in-depth” voice chats with Gemini. It’s available within the Gemini apps on mobile and the Pixel Buds Pro 2, where it will probably be accessed even when your phone’s locked.
With Gemini Live enabled, you possibly can interrupt Gemini while the chatbot’s speaking (in one in every of several recent voices) to ask a clarifying query, and it’ll adapt to your speech patterns in real time. At some point, Gemini is speculated to gain visual understanding, allowing it to see and reply to your surroundings, either via photos or video captured by your smartphones’ cameras.

Live can be designed to function a virtual coach of sorts, helping you rehearse for events, brainstorm ideas, and so forth. For instance, Live can suggest which skills to spotlight in an upcoming job or internship interview, and it will probably give public speaking advice.
You can read our review of Gemini Live here. Spoiler alert: We think the feature has a ways to go before it’s super useful — but it surely’s early days, admittedly.
Image generation via Imagen 3
Gemini users can generate artwork and pictures using Google’s built-in Imagen 3 model.
Google says that Imagen 3 can more accurately understand the text prompts that it translates into images versus its predecessor, Imagen 2, and is more “creative and detailed” in its generations. In addition, the model produces fewer artifacts and visual errors (at the least in accordance with Google), and is one of the best Imagen model yet for rendering text.

Back in February 2024, Google was forced to pause Gemini’s ability to generate images of individuals after users complained of historical inaccuracies. But in August, the corporate reintroduced people generation for certain users, specifically English-language users signed up for one in every of Google’s paid Gemini plans (e.g., Gemini Advanced) as a part of a pilot program.
Gemini for teens
In June, Google introduced a teen-focused Gemini experience, allowing students to enroll via their Google Workspace for Education school accounts.
The teen-focused Gemini has “additional policies and safeguards,” including a tailored onboarding process and an “AI literacy guide” to (as Google phrases it) “help teens use AI responsibly.” Otherwise, it’s nearly an identical to the usual Gemini experience, all the way down to the “double check” feature that appears across the net to see if Gemini’s responses are accurate.
Gemini in smart home devices
A growing variety of Google-made devices tap Gemini for enhanced functionality, from the Google TV Streamer to the Pixel 9 and 9 Pro to the most recent Nest Learning Thermostat.
On the Google TV Streamer, Gemini uses your preferences to curate content suggestions across your subscriptions and summarize reviews and even whole seasons of TV.

On the most recent Nest thermostat (in addition to Nest speakers, cameras, and smart displays), Gemini will soon bolster Google Assistant’s conversational and analytic capabilities.
Subscribers to Google’s Nest Aware plan later this 12 months will get a preview of latest Gemini-powered experiences like AI descriptions for Nest camera footage, natural language video search and really useful automations. Nest cameras will understand what’s happening in real-time video feeds (e.g., when a dog’s digging within the garden), while the companion Google Home app will surface videos and create device automations given an outline (e.g., “Did the children leave their bikes within the driveway?,” “Have my Nest thermostat activate the heating once I get home from work every Tuesday”).

Also later this 12 months, Google Assistant will get just a few upgrades on Nest-branded and other smart home devices to make conversations feel more natural. Improved voices are on the way in which, along with the power to ask follow-up questions and “(more) easily go backwards and forwards.”
What can the Gemini models do?
Because Gemini models are multimodal, they’ll perform a spread of multimodal tasks, from transcribing speech to captioning images and videos in real time. Many of those capabilities have reached the product stage (as alluded to within the previous section), and Google is promising rather more within the not-too-distant future.
Of course, it’s a bit hard to take the corporate at its word. Google seriously underdelivered with the unique Bard launch. More recently, it ruffled feathers with a video purporting to indicate Gemini’s capabilities that was roughly aspirational — not live.
Also, Google offers no fix for a few of the underlying problems with generative AI tech today, like its encoded biases and tendency to make things up (i.e., hallucinate). Neither do its rivals, but it surely’s something to take note when considering using or paying for Gemini.
Assuming for the needs of this text that Google is being truthful with its recent claims, here’s what the various tiers of Gemini can do now and what they’ll have the opportunity to do once they reach their full potential:
What you possibly can do with Gemini Ultra
Google says that Gemini Ultra — due to its multimodality — might be used to assist with things like physics homework, solving problems step-by-step on a worksheet, and declaring possible mistakes in already filled-in answers.
However, we haven’t seen much of Gemini Ultra in recent months. The model doesn’t appear within the Gemini app, and isn’t listed on Google Gemini’s API pricing page. However, that doesn’t mean Google won’t bring Gemini Ultra back to the forefront of its offerings in the long run.
Ultra will also be applied to tasks corresponding to identifying scientific papers relevant to an issue, Google says. The model can extract information from several papers, as an illustration, and update a chart from one by generating the formulas needed to re-create the chart with more timely data.
Gemini Ultra technically supports image generation. But that capability hasn’t made its way into the productized version of the model yet — perhaps since the mechanism is more complex than how apps corresponding to ChatGPT generate images. Rather than feed prompts to a picture generator (like DALL-E 3, in ChatGPT’s case), Gemini outputs images “natively,” without an intermediary step.
Ultra is obtainable as an API through Vertex AI, Google’s fully managed AI dev platform, and AI Studio, Google’s web-based tool for app and platform developers.
Gemini Pro’s capabilities
Google says that its latest Pro model, Gemini 2.0 Pro, is its best model yet for coding performance and complicated prompts. It’s currently available as an experimental version, meaning it will probably have unexpected issues.
Gemini 2.0 Pro outperforms its predecessor, Gemini 1.5 Pro, in benchmarks measuring coding, reasoning, math, and factual accuracy. The model can soak up as much as 1.4 million words, two hours of video, or 22 hours of audio and may reason across or answer questions on that data (roughly).
However, Gemini 1.5 Pro still powers Google’s Deep Research feature.
Gemini 2.0 Pro works alongside a feature called code execution, released in June alongside Gemini 1.5 Pro, which goals to cut back bugs in code that the model generates by iteratively refining that code over several steps. (Code execution also supports Gemini Flash.)
Within Vertex AI, developers can customize Gemini Pro to specific contexts and use cases via a fine-tuning or “grounding” process. For example, Pro (together with other Gemini models) might be instructed to make use of data from third-party providers like Moody’s, Thomson Reuters, ZoomInfo and MSCI, or source information from corporate datasets or Google Search as an alternative of its wider knowledge bank. Gemini Pro will also be connected to external, third-party APIs to perform particular actions, like automating a back-office workflow.
AI Studio offers templates for creating structured chat prompts with Pro. Developers can control the model’s creative range and supply examples to present tone and magnificence instructions — and likewise tune Pro’s safety settings.
Vertex AI Agent Builder lets people construct Gemini-powered “agents” inside Vertex AI. For example, an organization could create an agent that analyzes previous marketing campaigns to grasp a brand style after which apply that knowledge to assist generate recent ideas consistent with the style.Â
Gemini Flash is lighter but packs a punch
Google calls Gemini 2.0 Flash its AI model for the agentic era. The model can natively generate images and audio, along with text, and may use tools like Google Search and interact with external APIs.
The 2.0 Flash model is quicker than Gemini’s previous generation of models and even outperforms a few of the larger Gemini 1.5 models on benchmarks measuring coding and image evaluation. You can try Gemini 2.0 Flash within the Gemini web or mobile app, and thru Google’s AI developer platforms.
In December, Google released a “considering” version of Gemini 2.0 Flash that’s able to “reasoning,” by which the AI model takes just a few seconds to work backwards through an issue before it gives a solution.
In February, Google made Gemini 2.0 Flash considering available within the Gemini app. The same month, Google also released a smaller version called Gemini 2.0 Flash-Lite. The company says this model outperforms its Gemini 1.5 Flash model, but runs at the identical price and speed.
An offshoot of Gemini Pro that’s small and efficient, built for narrow, high-frequency generative AI workloads, Flash is multimodal like Gemini Pro, meaning it will probably analyze audio, video, images, and text (but it will probably only generate text). Google says that Flash is especially well-suited for tasks like summarization and chat apps, plus image and video captioning and data extraction from long documents and tables.
Devs using Flash and Pro can optionally leverage context caching, which lets them store large amounts of data (e.g., a knowledge base or database of research papers) in a cache that Gemini models can quickly and comparatively cheaply access. Context caching is an extra fee on top of other Gemini model usage fees, nevertheless.
Gemini Nano can run in your phone
Gemini Nano is a much smaller version of the Gemini Pro and Ultra models, and it’s efficient enough to run directly on (some) devices as an alternative of sending the duty to a server somewhere. So far, Nano powers a few features on the Pixel 8 Pro, Pixel 8, Pixel 9 Pro, Pixel 9 and Samsung Galaxy S24, including Summarize in Recorder and Smart Reply in Gboard.
The Recorder app, which lets users push a button to record and transcribe audio, features a Gemini-powered summary of recorded conversations, interviews, presentations, and other audio snippets. Users get summaries even in the event that they don’t have a signal or Wi-Fi connection — and in a nod to privacy, no data leaves their phone in process.

Nano can be in Gboard, Google’s keyboard substitute. There, it powers a feature called Smart Reply, which helps to suggest the following thing you’ll need to say when having a conversation in a messaging app corresponding to WhatsApp.
In the Google Messages app on supported devices, Nano drives Magic Compose, which might craft messages in styles like “excited,” “formal,” and “lyrical.”
Google says that a future version of Android will tap Nano to alert users to potential scams during calls. The recent weather app on Pixel phones uses Gemini Nano to generate tailored weather reports. And TalkBack, Google’s accessibility service, employs Nano to create aural descriptions of objects for low-vision and blind users.
How much do the Gemini models cost?
Gemini 1.5 Pro, 1.5 Flash, 2.0 Flash, and a couple of.0 Flash-Lite can be found through Google’s Gemini API for constructing apps and services — all with free options. But the free options impose usage limits and omit certain features, like context caching and batching.
Gemini models are otherwise pay-as-you-go. Here’s the bottom pricing — not including add-ons like context caching — as of September 2024:
- Gemini 1.5 Pro:Â $1.25 per 1 million input tokens (for prompts as much as 128K tokens) or $2.50 per 1 million input tokens (for prompts longer than 128K tokens); $5 per 1 million output tokens (for prompts as much as 128K tokens) or $10 per 1 million output tokens (for prompts longer than 128K tokens)
- Gemini 1.5 Flash:Â 7.5 cents per 1 million input tokens (for prompts as much as 128K tokens), 15 cents per 1 million input tokens (for prompts longer than 128K tokens), 30 cents per 1 million output tokens (for prompts as much as 128K tokens), 60 cents per 1 million output tokens (for prompts longer than 128K tokens)
- Gemini 2.0 Flash: 10 cents per 1 million input tokens, 40 cents per 1 million output tokens. For audio specifically, it costs 70 center per 1 million input tokens, and likewise 40 centers per 1 million output tokens.
- Gemini 2.0 Flash-Lite: 7.5 cents per 1 million input tokens, 30 cents per 1 million output tokens.
Tokens are subdivided bits of raw data, just like the syllables “fan,” “tas,” and “tic” within the word “incredible”; 1 million tokens is corresponding to about 700,000 words. refers to tokens fed into the model, while refers to tokens that the model generates.
2.0 Pro pricing has yet to be announced, and Nano continues to be in early access.
What’s the most recent on Project Astra?
Project Astra is Google DeepMind’s effort to create AI-powered apps and “agents” for real-time, multimodal understanding. In demos, Google has shown how the AI model can concurrently process live video and audio. Google released an app version of Project Astra to a small variety of trusted testers in December but has no plans for a broader release without delay.
The company would really like to place Project Astra in a pair of smart glasses. Google also gave a prototype of some glasses with Project Astra and augmented reality capabilities to just a few trusted testers in December. However, there’s not a transparent product right now, and it’s unclear when Google would actually release something like this.
Project Astra continues to be just that, a project, and never a product. However, the demos of Astra reveal what Google would really like its AI products to do in the long run.
Is Gemini coming to the iPhone?
It might.Â
Apple has said that it’s in talks to place Gemini and other third-party models to make use of for plenty of features in its Apple Intelligence suite. Following a keynote presentation at WWDC 2024, Apple SVP Craig Federighi confirmed plans to work with models, including Gemini, but he didn’t disclose any additional details.

