HomeArtificial IntelligenceGoogle says it has fixed Gemini's person generation feature

Google says it has fixed Gemini's person generation feature

Back in February, Google stopped its AI-powered chatbot Gemini from generating images of individuals after users complained that historical InaccuraciesFor example, Gemini was meant to represent a “Roman Legion” and would feature an anachronistic group of soldiers of various races, while “Zulu Warriors” could be portrayed as stereotypically black.

Google CEO Sundar Pichai apologized and Demis Hassabis, co-founder of Google's AI research division DeepMind, said a fix should arrive “very shortly” – inside the subsequent few weeks. In the tip, nonetheless, it took much, for much longer (although some Google employees work 120-hour weeks!). But in the subsequent few days, Gemini will find a way to provide images that include people again.

Well… kind of.

Only certain users—those that are subscribed to considered one of Google's paid Gemini plans (Gemini Advanced, Business, or Enterprise)—could have access to Gemini's persona generation feature as a part of an English-only early access test.

Google declined to say when the test could be expanded to the free Gemini tier and other languages.

“With Gemini Advanced, our users get priority access to our newest features,” a Google spokesperson told TechCrunch. “This allows us to assemble worthwhile feedback and deliver a highly anticipated feature to our Premium subscribers first.”

So what fixes has Google implemented for person generation? According to the corporate, Imagen 3, the most recent image generation model built into Gemini, includes mitigations to make the person images created by Gemini “fairer.” For example, Imagen 3 was trained with AI-generated captions which are designed to “improve the variability and variety of concepts related to images in (its) training data,” based on a Technical article shared with TechCrunch. And the model's training data was filtered for “safety” and “reviewed for fairness issues,” Google claims.

We asked for more details on Imagen 3's training data, however the spokesperson only said that the model was trained on “a big dataset consisting of images, text, and associated annotations.”

“We have significantly reduced the potential for antagonistic reactions through extensive internal and external red teaming testing and work with independent experts to make sure continuous improvements,” the spokesperson continued. “Our focus has been to thoroughly test human generation before turning it back on.”

Picture 3 and gemstones

There is best news, nonetheless: all Gemini users will receive Imagen 3 inside every week – minus the person generation fee for individuals who should not have a subscription to Gemini’s premium tiers.

Google says Imagen 3 can more accurately understand the text inputs it translates into images in comparison with its predecessor Imagen 2, and is “more creative and detailed” across its generations. Additionally, the model produces fewer artifacts and errors, Google claims, and is the most effective Imagen model yet for rendering text.

An example from Google's Imagen 3.
Photo credits: Google

To address concerns in regards to the potential for deepfakes, Imagen 3 will use SynthID, an approach developed by DeepMind to use invisible, cryptographic watermarks to numerous types of AI-based media. Google had already announced that Imagen 3 would use SynthID, so this isn't an enormous surprise. But I might note that the contrast between the best way Google handles image generation in Gemini and other products, like his Pixel Studiois a bit curious.

Google Image 3
Another example from Figure 3.
Photo credits: Google

Alongside Imagen 3, Google is introducing Gems for Gemini – but just for Gemini Advanced, Business, and Enterprise users. Like OpenAI's GPTs, Gems are customized versions of Gemini that may act as “experts” on specific topics (e.g. vegetarian cooking).

Here's how Google describes them in a blog post: “With Gems, you possibly can assemble a team of experts to assist you think through a difficult project, brainstorm ideas for an upcoming event, or write the right headline for a social media post. Your Gem can even remember an in depth set of instructions to assist you save time on tedious, repetitive, or difficult tasks.”

To create a gem, users write instructions, give it a reputation, and off they go.

Gems can be found on desktop and mobile in 150 countries and “most languages,” based on Google (but not yet supported in Gemini Live). There are several examples at launch, including a “learning coach,” a “profession guide,” a “brainstormer,” and a “coding buddy.”

Gemini Gemstones
Photo credits: Google

We asked Google if there have been any plans for tactics to permit users to publish and use one another's gems, much like how GPTs are enabled in OpenAI's GPT Store. The answer was essentially “no.”

“Right now, we're focused on checking out how people will use Gems for creativity and productivity,” the spokesperson said. “There's no further information right now.”

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read