HomeArtificial IntelligenceGoogle Gemini: Everything that you must know concerning the generative AI models

Google Gemini: Everything that you must know concerning the generative AI models

Google’s attempting to make waves with Gemini, its flagship suite of generative AI models, apps, and services. But what’s Gemini? How can you utilize it? And how does it stack as much as other generative AI tools comparable to OpenAI’s ChatGPT, Meta’s Llama, and Microsoft’s Copilot?

To make it easier to maintain up with the most recent Gemini developments, we’ve put together this handy guide, which we’ll keep updated as latest Gemini models, features, and news about Google’s plans for Gemini are released.

What is Gemini?

Gemini is Google’s long-promised, next-gen generative AI model family. Developed by Google’s AI research labs DeepMind and Google Research, it is available in 4 flavors:

  • Gemini Ultra
  • Gemini Pro
  • Gemini Flash, a speedier, “distilled” version of Pro
  • Gemini Nano, two small models: Nano-1 and the marginally more capable Nano-2, which is supposed to run offline

All Gemini models were trained to be natively multimodal — that’s, in a position to work with and analyze greater than just text. Google says they were pre-trained and fine-tuned on quite a lot of public, proprietary, and licensed audio, images, and videos; a set of codebases; and text in numerous languages.

This sets Gemini other than models comparable to Google’s own LaMDA, which was trained exclusively on text data. LaMDA can’t understand or generate anything beyond text (e.g., essays, emails, and so forth), but that isn’t necessarily the case with Gemini models.

We’ll note here that the ethics and legality of coaching models on public data, in some cases without the information owners’ knowledge or consent, are murky. Google has an AI indemnification policy to shield certain Google Cloud customers from lawsuits should they face them, but this policy incorporates carve-outs. Proceed with caution — particularly if you happen to’re intending on using Gemini commercially.

What’s the difference between the Gemini apps and Gemini models?

Gemini is separate and distinct from the Gemini apps on the net and mobile (formerly Bard).

The Gemini apps are clients that hook up with various Gemini models and layer a chatbot-like interface on top. Think of them as front ends for Google’s generative AI, analogous to ChatGPT and Anthropic’s Claude family of apps.

Image Credits: Google

Gemini on the net lives here. On Android, the Gemini app replaces the prevailing Google Assistant app. And on iOS, the Google and Google Search apps function that platform’s Gemini clients.

On Android, it also recently became possible to bring up the Gemini overlay on top of any app to ask questions on what’s on the screen (e.g., a YouTube video). Just press and hold a supported smartphone’s power button or say, “Hey Google”; you’ll see the overlay pop up.

Gemini apps can accept images in addition to voice commands and text — including files like PDFs and shortly videos, either uploaded or imported from Google Drive — and generate images. As you’d expect, conversations with Gemini apps on mobile carry over to Gemini on the net and vice versa if you happen to’re signed in to the identical Google Account in each places.

Gemini Advanced

The Gemini apps aren’t the one technique of recruiting Gemini models’ assistance with tasks. Slowly but surely, Gemini-imbued features are making their way into staple Google apps and services like Gmail and Google Docs.

To make the most of most of those, you’ll need the Google One AI Premium Plan. Technically an element of Google One, the AI Premium Plan costs $20 and provides access to Gemini in Google Workspace apps like Docs, Slides, Sheets, and Meet. It also enables what Google calls Gemini Advanced, which brings the corporate’s more sophisticated Gemini models to the Gemini apps.

Gemini Advanced users get extras here and there, too, like priority access to latest features, the flexibility to run and edit Python code directly in Gemini, and a bigger “context window.” Gemini Advanced can remember the content of — and reason across — roughly 750,000 words in a conversation (or 1,500 pages of documents). That’s in comparison with the 24,000 words (or 48 pages) the vanilla Gemini app can handle.

Screenshot of a Google Gemini commercial
Image Credits: Google

Another Gemini Advanced exclusive is trip planning in Google Search, which creates custom travel itineraries from prompts. Taking into consideration things like flight times (from emails in a user’s Gmail inbox), meal preferences, and data about local attractions (from Google Search and Maps data), in addition to the distances between those attractions, Gemini will generate an itinerary that updates mechanically to reflect any changes. 

Gemini across Google services can also be available to corporate customers through two plans, Gemini Business (an add-on for Google Workspace) and Gemini Enterprise. Gemini Business costs as little as $20 per user monthly, and Gemini Enterprise — which adds meeting note-taking and translated captions in addition to document classification and labeling — is priced at $30 and up per user monthly. (Both plans require an annual commitment.)

In Gmail, Gemini lives in a side panel that may write emails and summarize message threads. You’ll find the identical panel in Docs, where it helps you write and refine your content and brainstorm latest ideas. Gemini in Slides generates slides and custom images. And Gemini in Google Sheets tracks and organizes data, creating tables and formulas.

Gemini’s reach extends to Drive as well, where it will possibly summarize files and provides quick facts a few project. In Meet, meanwhile, Gemini translates captions into additional languages.

Gemini in Gmail
Image Credits: Google

Gemini recently got here to Google’s Chrome browser in the shape of an AI writing tool. You can use it to jot down something completely latest or rewrite existing text; Google says it’ll consider the online page you’re on to make recommendations.

Elsewhere, you’ll find hints of Gemini in Google’s database products, cloud security tools, and app development platforms (including Firebase and Project IDX), in addition to in apps like Google Photos (where Gemini handles natural language search queries), YouTube (where it helps brainstorm video ideas), and the NotebookLM note-taking assistant.

Code Assist (formerly Duet AI for Developers), Google’s suite of AI-powered assistance tools for code completion and generation, is offloading heavy computational lifting to Gemini. So are Google’s security products underpinned by Gemini, like Gemini in Threat Intelligence, which might analyze large portions of doubtless malicious code and let users perform natural language searches for ongoing threats or indicators of compromise.

Gemini extensions and Gems

Announced at Google I/O 2024, Gemini Advanced users can create Gems, custom chatbots powered by Gemini models. Gems may be generated from natural language descriptions — for instance, “You’re my running coach. Give me a each day running plan” — and shared with others or kept private.

Gems can be found on desktop and mobile in 150 countries and most languages. Eventually, they’ll have the opportunity to tap an expanded set of integrations with Google services, including Google Calendar, Tasks, Keep, and YouTube Music, to finish custom tasks.

Gemini Gems
Image Credits: Google

Speaking of integrations, the Gemini apps on the net and mobile can tap into Google services via what Google calls “Gemini extensions.” Gemini today integrates with Google Drive, Gmail, and YouTube to reply to queries comparable to “Could you summarize my last three emails?” Later this yr, Gemini will have the opportunity to take additional actions with Google Calendar, Keep, Tasks, YouTube Music and Utilities, the Android-exclusive apps that control on-device features like timers and alarms, media controls, the flashlight, volume, Wi-Fi, Bluetooth, and so forth.

Gemini Live in-depth voice chats

A latest experience called Gemini Live, exclusive to Gemini Advanced subscribers, allows users to have “in-depth” voice chats with Gemini. It’s available within the Gemini apps on mobile and the Pixel Buds Pro 2, where it will possibly be accessed even when your phone’s locked.

With Gemini Live enabled, you may interrupt Gemini while the chatbot’s speaking (in one in every of several latest voices) to ask a clarifying query, and it’ll adapt to your speech patterns in real time. And sometime later this yr, Gemini will have the opportunity to see and reply to your surroundings, either via photos or video captured by your smartphones’ cameras.

Gemini Live
Image Credits: Google

Live can also be designed to function a virtual coach of sorts, helping you rehearse for events, brainstorm ideas, and so forth. For instance, Live can suggest which skills to focus on in an upcoming job or internship interview, and it will possibly give public speaking advice.

You can read our review of Gemini Live here. Spoiler alert: We think the feature has a ways to go before it’s super useful — however it’s early days, admittedly.

Image generation via Imagen 3

Gemini users can generate artwork and pictures using Google’s built-in Imagen 3 model.

Google says that Imagen 3 can more accurately understand the text prompts that it translates into images versus its predecessor, Imagen 2, and is more “creative and detailed” in its generations. In addition, the model produces fewer artifacts and visual errors (not less than based on Google), and is the very best Imagen model yet for rendering text.

Google Imagen 3
A sample from Imagen 3.
Image Credits: Google

Back in February, Google was forced to pause Gemini’s ability to generate images of individuals after users complained of historical inaccuracies. But in August, the corporate reintroduced people generation for certain users, specifically English-language users signed up for one in every of Google’s paid Gemini plans (e.g., Gemini Advanced) as a part of a pilot program.

Gemini for teens

In June, Google introduced a teen-focused Gemini experience, allowing students to enroll via their Google Workspace for Education school accounts.

The teen-focused Gemini has “additional policies and safeguards,” including a tailored onboarding process and an “AI literacy guide” to (as Google phrases it) “help teens use AI responsibly.” Otherwise, it’s nearly an identical to the usual Gemini experience, right down to the “double check” feature that appears across the online to see if Gemini’s responses are accurate.

Gemini in smart home devices

A growing variety of Google-made devices tap Gemini for enhanced functionality, from the Google TV Streamer to the Pixel 9 and 9 Pro to the latest Nest Learning Thermostat.

On the Google TV Streamer, Gemini uses your preferences to curate content suggestions across your subscriptions and summarize reviews and even whole seasons of TV.

Google TV Streamer set up
Image Credits: Google

On the most recent Nest thermostat (in addition to Nest speakers, cameras, and smart displays), Gemini will soon bolster Google Assistant’s conversational and analytic capabilities.

Subscribers to Google’s Nest Aware plan later this yr will get a preview of latest Gemini-powered experiences like AI descriptions for Nest camera footage, natural language video search and really helpful automations. Nest cameras will understand what’s happening in real-time video feeds (e.g., when a dog’s digging within the garden), while the companion Google Home app will surface videos and create device automations given an outline (e.g., “Did the children leave their bikes within the driveway?,” “Have my Nest thermostat activate the heating once I get home from work every Tuesday”).

Google Gemini in smart home
Gemini will soon have the opportunity to summarize security camera footage from Nest devices.
Image Credits: Google

Also later this yr, Google Assistant will get a couple of upgrades on Nest-branded and other smart home devices to make conversations feel more natural. Improved voices are on the best way, along with the flexibility to ask follow-up questions and “(more) easily go forwards and backwards.”

What can the Gemini models do?

Because Gemini models are multimodal, they’ll perform a variety of multimodal tasks, from transcribing speech to captioning images and videos in real time. Many of those capabilities have reached the product stage (as alluded to within the previous section), and Google is promising far more within the not-too-distant future.

Of course, it’s a bit hard to take the corporate at its word. Google seriously underdelivered with the unique Bard launch. More recently, it ruffled feathers with a video purporting to indicate Gemini’s capabilities that was roughly aspirational — not live.

Also, Google offers no fix for a number of the underlying problems with generative AI tech today, like its encoded biases and tendency to make things up (i.e., hallucinate). Neither do its rivals, however it’s something to consider when considering using or paying for Gemini.

Assuming for the needs of this text that Google is being truthful with its recent claims, here’s what different tiers of Gemini can do now and what they’ll have the opportunity to do once they reach their full potential:

What you may do with Gemini Ultra

Google says that Gemini Ultra — because of its multimodality — may be used to assist with things like physics homework, solving problems step-by-step on a worksheet, and stating possible mistakes in already filled-in answers.

Ultra may also be applied to tasks comparable to identifying scientific papers relevant to an issue, Google says. The model can extract information from several papers, as an example, and update a chart from one by generating the formulas needed to re-create the chart with more timely data.

Gemini Ultra technically supports image generation. But that capability hasn’t made its way into the productized version of the model yet — perhaps since the mechanism is more complex than how apps comparable to ChatGPT generate images. Rather than feed prompts to a picture generator (like DALL-E 3, in ChatGPT’s case), Gemini outputs images “natively,” without an intermediary step.

Ultra is on the market as an API through Vertex AI, Google’s fully managed AI dev platform, and AI Studio, Google’s web-based tool for app and platform developers.

Gemini Pro’s capabilities

Google says that Gemini Pro is an improvement over LaMDA in its reasoning, planning, and understanding capabilities. The latest version, Gemini 1.5 Pro — which powers the Gemini apps for Gemini Advanced subscribers — exceeds even Ultra’s performance in some areas.

Gemini 1.5 Pro is improved in a lot of areas compared with its predecessor, Gemini 1.0 Pro, perhaps most obviously in the quantity of information that it will possibly process. Gemini 1.5 Pro can soak up as much as 1.4 million words, two hours of video, or 22 hours of audio and might reason across or answer questions on that data (roughly).

Gemini 1.5 Pro became generally available on Vertex AI and AI Studio in June alongside a feature called code execution, which goals to scale back bugs in code that the model generates by iteratively refining that code over several steps. (Code execution also supports Gemini Flash.)

Within Vertex AI, developers can customize Gemini Pro to specific contexts and use cases via a fine-tuning or “grounding” process. For example, Pro (together with other Gemini models) may be instructed to make use of data from third-party providers like Moody’s, Thomson Reuters, ZoomInfo and MSCI, or source information from corporate datasets or Google Search as a substitute of its wider knowledge bank. Gemini Pro may also be connected to external, third-party APIs to perform particular actions, like automating a back-office workflow.

AI Studio offers templates for creating structured chat prompts with Pro. Developers can control the model’s creative range and supply examples to provide tone and magnificence instructions — and in addition tune Pro’s safety settings.

Vertex AI Agent Builder lets people construct Gemini-powered “agents” inside Vertex AI. For example, an organization could create an agent that analyzes previous marketing campaigns to grasp a brand style after which apply that knowledge to assist generate latest ideas consistent with the style. 

Gemini Flash is for less demanding work

For less demanding applications, there’s Gemini Flash. The newest version is 1.5 Flash; Gemini app users subscribed to Gemini Advanced get access to this.

An offshoot of Gemini Pro that’s small and efficient, built for narrow, high-frequency generative AI workloads, Flash is multimodal like Gemini Pro, meaning it will possibly analyze audio, video, images, and text (but it will possibly only generate text). Google says that Flash is especially well-suited for tasks like summarization and chat apps, plus image and video captioning and data extraction from long documents and tables.

Devs using Flash and Pro can optionally leverage context caching, which lets them store large amounts of data (e.g., a knowledge base or database of research papers) in a cache that Gemini models can quickly and comparatively cheaply access. Context caching is an extra fee on top of other Gemini model usage fees, nevertheless.

Gemini Nano can run in your phone

Gemini Nano is a much smaller version of the Gemini Pro and Ultra models, and it’s efficient enough to run directly on (some) devices as a substitute of sending the duty to a server somewhere. So far, Nano powers a few features on the Pixel 8 Pro, Pixel 8, Pixel 9 Pro, Pixel 9 and Samsung Galaxy S24, including Summarize in Recorder and Smart Reply in Gboard.

The Recorder app, which lets users push a button to record and transcribe audio, features a Gemini-powered summary of recorded conversations, interviews, presentations, and other audio snippets. Users get summaries even in the event that they don’t have a signal or Wi-Fi connection — and in a nod to privacy, no data leaves their phone in process.

Image Credits: Google

Nano can also be in Gboard, Google’s keyboard alternative. There, it powers a feature called Smart Reply, which helps to suggest the subsequent thing you’ll need to say when having a conversation in a messaging app comparable to WhatsApp.

In the Google Messages app on supported devices, Nano drives Magic Compose, which might craft messages in styles like “excited,” “formal,” and “lyrical.”

Google says that a future version of Android will tap Nano to alert users to potential scams during calls. The latest weather app on Pixel phones uses Gemini Nano to generate tailored weather reports. And TalkBack, Google’s accessibility service, employs Nano to create aural descriptions of objects for low-vision and blind users.

How much do the Gemini models cost?

Gemini 1.0 Pro (the primary version of Gemini Pro), 1.5 Pro, and Flash can be found through Google’s Gemini API for constructing apps and services — all with free options. But the free options impose usage limits and miss certain features, like context caching and batching.

Gemini models are otherwise pay-as-you-go. Here’s the bottom pricing — not including add-ons like context caching — as of September 2024:

  • Gemini 1.0 Pro: 50 cents per 1 million input tokens, $1.50 per 1 million output tokens
  • Gemini 1.5 Pro: $3.50 per 1 million input tokens (for prompts as much as 128K tokens) or $7 per 1 million input tokens (for prompts longer than 128K tokens); $10.50 per 1 million output tokens (for prompts as much as 128K tokens) or $21.00 per 1 million output tokens (for prompts longer than 128K tokens)
  • Gemini 1.5 Flash: 7.5 cents per 1 million input tokens (for prompts as much as 128K tokens), 15 cents per 1 million input tokens (for prompts longer than 128K tokens), 30 cents per 1 million output tokens (for prompts as much as 128K tokens), 60 cents per 1 million output tokens (for prompts longer than 128K tokens)

Tokens are subdivided bits of raw data, just like the syllables “fan,” “tas,” and “tic” within the word “unbelievable”; 1 million tokens is such as about 700,000 words. refers to tokens fed into the model, while refers to tokens that the model generates.

Ultra pricing has yet to be announced, and Nano remains to be in early access.

Is Gemini coming to the iPhone?

It might. 

Apple has said that it’s in talks to place Gemini and other third-party models to make use of for a lot of features in its Apple Intelligence suite. Following a keynote presentation at WWDC 2024, Apple SVP Craig Federighi confirmed plans to work with models, including Gemini, but he didn’t expose any additional details.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read