Google’s attempting to make waves with Gemini, its flagship suite of generative AI models, apps, and services. But what’s Gemini? How can you employ it? And how does it stack as much as other generative AI tools corresponding to OpenAI’s ChatGPT, Meta’s Llama, and Microsoft’s Copilot?
To make it easier to maintain up with the most recent Gemini developments, we’ve put together this handy guide, which we’ll keep updated as latest Gemini models, features, and news about Google’s plans for Gemini are released.
What is Gemini?
Gemini is Google’s long-promised, next-gen generative AI model family. Developed by Google’s AI research labs DeepMind and Google Research, it is available in 4 flavors:
- Gemini Ultra
- Gemini Pro
- Gemini Flash, a speedier, “distilled” version of Pro. It also is available in a rather smaller and faster version, called Gemini Flash-8B.
- Gemini Nano, two small models: Nano-1 and the marginally more capable Nano-2, which is supposed to run offline
All Gemini models were trained to be natively multimodal — that’s, capable of work with and analyze greater than just text. Google says they were pre-trained and fine-tuned on a wide range of public, proprietary, and licensed audio, images, and videos; a set of codebases; and text in numerous languages.
This sets Gemini aside from models corresponding to Google’s own LaMDA, which was trained exclusively on text data. LaMDA can’t understand or generate anything beyond text (e.g., essays, emails, and so forth), but that isn’t necessarily the case with Gemini models.
We’ll note here that the ethics and legality of coaching models on public data, in some cases without the information owners’ knowledge or consent, are murky. Google has an AI indemnification policy to shield certain Google Cloud customers from lawsuits should they face them, but this policy comprises carve-outs. Proceed with caution — particularly in case you’re intending on using Gemini commercially.
What’s the difference between the Gemini apps and Gemini models?
Gemini is separate and distinct from the Gemini apps on the net and mobile (formerly Bard).
The Gemini apps are clients that connect with various Gemini models and layer a chatbot-like interface on top. Think of them as front ends for Google’s generative AI, analogous to ChatGPT and Anthropic’s Claude family of apps.
Gemini on the net lives here. On Android, the Gemini app replaces the present Google Assistant app. And on iOS, the Google and Google Search apps function that platform’s Gemini clients.
On Android, it also recently became possible to bring up the Gemini overlay on top of any app to ask questions on what’s on the screen (e.g., a YouTube video). Just press and hold a supported smartphone’s power button or say, “Hey Google”; you’ll see the overlay pop up.
Gemini apps can accept images in addition to voice commands and text — including files like PDFs and shortly videos, either uploaded or imported from Google Drive — and generate images. As you’d expect, conversations with Gemini apps on mobile carry over to Gemini on the net and vice versa in case you’re signed in to the identical Google Account in each places.
Gemini Advanced
The Gemini apps aren’t the one technique of recruiting Gemini models’ assistance with tasks. Slowly but surely, Gemini-imbued features are making their way into staple Google apps and services like Gmail and Google Docs.
To make the most of most of those, you’ll need the Google One AI Premium Plan. Technically an element of Google One, the AI Premium Plan costs $20 and provides access to Gemini in Google Workspace apps like Docs, Maps, Slides, Sheets, Drive, and Meet. It also enables what Google calls Gemini Advanced, which brings the corporate’s more sophisticated Gemini models to the Gemini apps.
Gemini Advanced users get extras here and there, too, like priority access to latest features, the flexibility to run and edit Python code directly in Gemini, and a bigger “context window.” Gemini Advanced can remember the content of — and reason across — roughly 750,000 words in a conversation (or 1,500 pages of documents). That’s in comparison with the 24,000 words (or 48 pages) the vanilla Gemini app can handle.
Gemini Advanced also gives users access to Google’s latest Deep Research feature, which uses “advanced reasoning” and “long context capabilities” to generate research briefs. After you prompt the chatbot, it creates a multi-step research plan, asks you to approve it, after which Gemini takes a number of minutes to go looking the net and generate an intensive report based in your query. It’s meant to reply more complex questions corresponding to, “Can you help me redesign my kitchen?”
Google also offers Gemini Advanced users a memory feature, that enables the chatbot to make use of your old conversations with Gemini as context in your current conversation.
Another Gemini Advanced exclusive is trip planning in Google Search, which creates custom travel itineraries from prompts. Taking under consideration things like flight times (from emails in a user’s Gmail inbox), meal preferences, and knowledge about local attractions (from Google Search and Maps data), in addition to the distances between those attractions, Gemini will generate an itinerary that updates routinely to reflect any changes.
Gemini across Google services can be available to corporate customers through two plans, Gemini Business (an add-on for Google Workspace) and Gemini Enterprise. Gemini Business costs as little as $6 per user per 30 days, while Gemini Enterprise — which adds meeting note-taking and translated captions in addition to document classification and labeling — is usually costlier, but is priced based on a business’s needs. (Both plans require an annual commitment.)
In Gmail, Gemini lives in a side panel that may write emails and summarize message threads. You’ll find the identical panel in Docs, where it helps you write and refine your content and brainstorm latest ideas. Gemini in Slides generates slides and custom images. And Gemini in Google Sheets tracks and organizes data, creating tables and formulas.
Google’s AI chatbot recently got here to Maps, where Gemini can summarize reviews about coffee shops or offer recommendations about how you can spend a day visiting a foreign city.
Gemini’s reach extends to Drive as well, where it may possibly summarize files and folders and provides quick facts a couple of project. In Meet, meanwhile, Gemini translates captions into additional languages.
Gemini recently got here to Google’s Chrome browser in the shape of an AI writing tool. You can use it to jot down something completely latest or rewrite existing text; Google says it’ll consider the net page you’re on to make recommendations.
Elsewhere, you’ll find hints of Gemini in Google’s database products, cloud security tools, and app development platforms (including Firebase and Project IDX), in addition to in apps like Google Photos (where Gemini handles natural language search queries), YouTube (where it helps brainstorm video ideas), and the NotebookLM note-taking assistant.
Code Assist (formerly Duet AI for Developers), Google’s suite of AI-powered assistance tools for code completion and generation, is offloading heavy computational lifting to Gemini. So are Google’s security products underpinned by Gemini, like Gemini in Threat Intelligence, which might analyze large portions of probably malicious code and let users perform natural language searches for ongoing threats or indicators of compromise.
Gemini extensions and Gems
Announced at Google I/O 2024, Gemini Advanced users can create Gems, custom chatbots powered by Gemini models. Gems will be generated from natural language descriptions — for instance, “You’re my running coach. Give me a every day running plan” — and shared with others or kept private.
Gems can be found on desktop and mobile in 150 countries and most languages. Eventually, they’ll give you the option to tap an expanded set of integrations with Google services, including Google Calendar, Tasks, Keep, and YouTube Music, to finish custom tasks.
Speaking of integrations, the Gemini apps on the net and mobile can tap into Google services via what Google calls “Gemini extensions.” Gemini today integrates with Google Drive, Gmail, and YouTube to answer queries corresponding to “Could you summarize my last three emails?” Later this 12 months, Gemini will give you the option to take additional actions with Google Calendar, Keep, Tasks, YouTube Music and Utilities, the Android-exclusive apps that control on-device features like timers and alarms, media controls, the flashlight, volume, Wi-Fi, Bluetooth, and so forth.
Gemini Live in-depth voice chats
An experience called Gemini Live allows users to have “in-depth” voice chats with Gemini. It’s available within the Gemini apps on mobile and the Pixel Buds Pro 2, where it may possibly be accessed even when your phone’s locked.
With Gemini Live enabled, you possibly can interrupt Gemini while the chatbot’s speaking (in one in every of several latest voices) to ask a clarifying query, and it’ll adapt to your speech patterns in real time. At some point, Gemini is speculated to gain visual understanding, allowing it to see and reply to your surroundings, either via photos or video captured by your smartphones’ cameras.
Live can be designed to function a virtual coach of sorts, helping you rehearse for events, brainstorm ideas, and so forth. For instance, Live can suggest which skills to focus on in an upcoming job or internship interview, and it may possibly give public speaking advice.
You can read our review of Gemini Live here. Spoiler alert: We think the feature has a ways to go before it’s super useful — but it surely’s early days, admittedly.
Image generation via Imagen 3
Gemini users can generate artwork and pictures using Google’s built-in Imagen 3 model.
Google says that Imagen 3 can more accurately understand the text prompts that it translates into images versus its predecessor, Imagen 2, and is more “creative and detailed” in its generations. In addition, the model produces fewer artifacts and visual errors (not less than in response to Google), and is the perfect Imagen model yet for rendering text.
Back in February, Google was forced to pause Gemini’s ability to generate images of individuals after users complained of historical inaccuracies. But in August, the corporate reintroduced people generation for certain users, specifically English-language users signed up for one in every of Google’s paid Gemini plans (e.g., Gemini Advanced) as a part of a pilot program.
Gemini for teens
In June, Google introduced a teen-focused Gemini experience, allowing students to enroll via their Google Workspace for Education school accounts.
The teen-focused Gemini has “additional policies and safeguards,” including a tailored onboarding process and an “AI literacy guide” to (as Google phrases it) “help teens use AI responsibly.” Otherwise, it’s nearly equivalent to the usual Gemini experience, all the way down to the “double check” feature that appears across the net to see if Gemini’s responses are accurate.
Gemini in smart home devices
A growing variety of Google-made devices tap Gemini for enhanced functionality, from the Google TV Streamer to the Pixel 9 and 9 Pro to the latest Nest Learning Thermostat.
On the Google TV Streamer, Gemini uses your preferences to curate content suggestions across your subscriptions and summarize reviews and even whole seasons of TV.
On the most recent Nest thermostat (in addition to Nest speakers, cameras, and smart displays), Gemini will soon bolster Google Assistant’s conversational and analytic capabilities.
Subscribers to Google’s Nest Aware plan later this 12 months will get a preview of latest Gemini-powered experiences like AI descriptions for Nest camera footage, natural language video search and beneficial automations. Nest cameras will understand what’s happening in real-time video feeds (e.g., when a dog’s digging within the garden), while the companion Google Home app will surface videos and create device automations given an outline (e.g., “Did the children leave their bikes within the driveway?,” “Have my Nest thermostat activate the heating after I get home from work every Tuesday”).
Also later this 12 months, Google Assistant will get a number of upgrades on Nest-branded and other smart home devices to make conversations feel more natural. Improved voices are on the best way, along with the flexibility to ask follow-up questions and “(more) easily go forwards and backwards.”
What can the Gemini models do?
Because Gemini models are multimodal, they will perform a spread of multimodal tasks, from transcribing speech to captioning images and videos in real time. Many of those capabilities have reached the product stage (as alluded to within the previous section), and Google is promising far more within the not-too-distant future.
Of course, it’s a bit hard to take the corporate at its word. Google seriously underdelivered with the unique Bard launch. More recently, it ruffled feathers with a video purporting to indicate Gemini’s capabilities that was roughly aspirational — not live.
Also, Google offers no fix for a few of the underlying problems with generative AI tech today, like its encoded biases and tendency to make things up (i.e., hallucinate). Neither do its rivals, but it surely’s something to take into account when considering using or paying for Gemini.
Assuming for the needs of this text that Google is being truthful with its recent claims, here’s what the various tiers of Gemini can do now and what they’ll give you the option to do once they reach their full potential:
What you possibly can do with Gemini Ultra
Google says that Gemini Ultra — due to its multimodality — will be used to assist with things like physics homework, solving problems step-by-step on a worksheet, and declaring possible mistakes in already filled-in answers.
Ultra will also be applied to tasks corresponding to identifying scientific papers relevant to an issue, Google says. The model can extract information from several papers, as an example, and update a chart from one by generating the formulas crucial to re-create the chart with more timely data.
Gemini Ultra technically supports image generation. But that capability hasn’t made its way into the productized version of the model yet — perhaps since the mechanism is more complex than how apps corresponding to ChatGPT generate images. Rather than feed prompts to a picture generator (like DALL-E 3, in ChatGPT’s case), Gemini outputs images “natively,” without an intermediary step.
Ultra is on the market as an API through Vertex AI, Google’s fully managed AI dev platform, and AI Studio, Google’s web-based tool for app and platform developers.
Gemini Pro’s capabilities
Google says that Gemini Pro is an improvement over LaMDA in its reasoning, planning, and understanding capabilities. The latest version, Gemini 1.5 Pro — which powers the Gemini apps for Gemini Advanced subscribers — exceeds even Ultra’s performance in some areas.
Gemini 1.5 Pro is improved in a lot of areas compared with its predecessor, Gemini 1.0 Pro, perhaps most obviously in the quantity of knowledge that it may possibly process. Gemini 1.5 Pro can soak up as much as 1.4 million words, two hours of video, or 22 hours of audio and might reason across or answer questions on that data (roughly).
Gemini 1.5 Pro became generally available on Vertex AI and AI Studio in June alongside a feature called code execution, which goals to scale back bugs in code that the model generates by iteratively refining that code over several steps. (Code execution also supports Gemini Flash.)
Within Vertex AI, developers can customize Gemini Pro to specific contexts and use cases via a fine-tuning or “grounding” process. For example, Pro (together with other Gemini models) will be instructed to make use of data from third-party providers like Moody’s, Thomson Reuters, ZoomInfo and MSCI, or source information from corporate datasets or Google Search as an alternative of its wider knowledge bank. Gemini Pro will also be connected to external, third-party APIs to perform particular actions, like automating a back-office workflow.
AI Studio offers templates for creating structured chat prompts with Pro. Developers can control the model’s creative range and supply examples to offer tone and magnificence instructions — and likewise tune Pro’s safety settings.
Vertex AI Agent Builder lets people construct Gemini-powered “agents” inside Vertex AI. For example, an organization could create an agent that analyzes previous marketing campaigns to grasp a brand style after which apply that knowledge to assist generate latest ideas consistent with the style.
Gemini Flash is lighter but packs a punch
While the primary version of Gemini Flash was made for less demanding workloads, the latest version, 2.0 Flash, is now Google’s flagship AI model. Google calls Gemini 2.0 Flash its AI model for the agentic era. The model can natively generate images and audio, along with text, and might use tools like Google Search and interact with external APIs.
The 2.0 Flash model is quicker than Gemini’s previous generation of models and even outperforms a few of the larger Gemini 1.5 models on benchmarks measuring coding and image evaluation. You can try an experimental version of two.0 Flash in the net version of Gemini or through Google’s AI developer platforms, and a production version of the model should land in January.
An offshoot of Gemini Pro that’s small and efficient, built for narrow, high-frequency generative AI workloads, Flash is multimodal like Gemini Pro, meaning it may possibly analyze audio, video, images, and text (but it may possibly only generate text). Google says that Flash is especially well-suited for tasks like summarization and chat apps, plus image and video captioning and data extraction from long documents and tables.
Devs using Flash and Pro can optionally leverage context caching, which lets them store large amounts of data (e.g., a knowledge base or database of research papers) in a cache that Gemini models can quickly and comparatively cheaply access. Context caching is an extra fee on top of other Gemini model usage fees, nevertheless.
Gemini Nano can run in your phone
Gemini Nano is a much smaller version of the Gemini Pro and Ultra models, and it’s efficient enough to run directly on (some) devices as an alternative of sending the duty to a server somewhere. So far, Nano powers a few features on the Pixel 8 Pro, Pixel 8, Pixel 9 Pro, Pixel 9 and Samsung Galaxy S24, including Summarize in Recorder and Smart Reply in Gboard.
The Recorder app, which lets users push a button to record and transcribe audio, features a Gemini-powered summary of recorded conversations, interviews, presentations, and other audio snippets. Users get summaries even in the event that they don’t have a signal or Wi-Fi connection — and in a nod to privacy, no data leaves their phone in process.
Nano can be in Gboard, Google’s keyboard alternative. There, it powers a feature called Smart Reply, which helps to suggest the following thing you’ll need to say when having a conversation in a messaging app corresponding to WhatsApp.
In the Google Messages app on supported devices, Nano drives Magic Compose, which might craft messages in styles like “excited,” “formal,” and “lyrical.”
Google says that a future version of Android will tap Nano to alert users to potential scams during calls. The latest weather app on Pixel phones uses Gemini Nano to generate tailored weather reports. And TalkBack, Google’s accessibility service, employs Nano to create aural descriptions of objects for low-vision and blind users.
How much do the Gemini models cost?
Gemini 1.0 Pro (the primary version of Gemini Pro), 1.5 Pro, and Flash can be found through Google’s Gemini API for constructing apps and services — all with free options. But the free options impose usage limits and miss certain features, like context caching and batching.
Gemini models are otherwise pay-as-you-go. Here’s the bottom pricing — not including add-ons like context caching — as of September 2024:
- Gemini 1.0 Pro: 50 cents per 1 million input tokens, $1.50 per 1 million output tokens
- Gemini 1.5 Pro: $1.25 per 1 million input tokens (for prompts as much as 128K tokens) or $2.50 per 1 million input tokens (for prompts longer than 128K tokens); $5 per 1 million output tokens (for prompts as much as 128K tokens) or $10 per 1 million output tokens (for prompts longer than 128K tokens)
- Gemini 1.5 Flash: 7.5 cents per 1 million input tokens (for prompts as much as 128K tokens), 15 cents per 1 million input tokens (for prompts longer than 128K tokens), 30 cents per 1 million output tokens (for prompts as much as 128K tokens), 60 cents per 1 million output tokens (for prompts longer than 128K tokens)
- Gemini 1.5 Flash-8B: 3.75 cents per 1 million input tokens (for prompts as much as 128K tokens), 7.5 cents per 1 million input tokens (for prompts longer than 128K tokens), 15 cents per 1 million output tokens (for prompts as much as 128K tokens), 30 cents per 1 million output tokens (for prompts longer than 128K tokens)
Tokens are subdivided bits of raw data, just like the syllables “fan,” “tas,” and “tic” within the word “implausible”; 1 million tokens is corresponding to about 700,000 words. refers to tokens fed into the model, while refers to tokens that the model generates.
Ultra and a couple of.0 Flash pricing has yet to be announced, and Nano remains to be in early access.
What’s the most recent on Project Astra?
Project Astra is Google DeepMind’s effort to create AI-powered apps and “agents” for real-time, multimodal understanding. In demos, Google has shown how the AI model can concurrently process live video and audio. Google released an app version of Project Astra to a small variety of trusted testers in December but has no plans for a broader release right away.
The company would love to place Project Astra in a pair of smart glasses. Google also gave a prototype of some glasses with Project Astra and augmented reality capabilities to a number of trusted testers in December. However, there’s not a transparent product right now, and it’s unclear when Google would actually release something like this.
Project Astra remains to be just that, a project, and never a product. However, the demos of Astra reveal what Google would love its AI products to do in the longer term.
Is Gemini coming to the iPhone?
It might.
Apple has said that it’s in talks to place Gemini and other third-party models to make use of for a lot of features in its Apple Intelligence suite. Following a keynote presentation at WWDC 2024, Apple SVP Craig Federighi confirmed plans to work with models, including Gemini, but he didn’t expose any additional details.