“Embarrassing and fallacious”: Google admits it has lost control of image-generating AI

February 25, 2024

47

Google apologized (or was near apologizing) this week for an additional embarrassing AI mistake, an image-generating model that added diversity to pictures without regard to historical context. While the underlying problem is totally comprehensible, Google accuses the model of being “overly sensitive.” But the model didn’t make itself, folks.

The AI system in query is Gemini, the corporate’s flagship conversational AI platform, which uses a version of the Imagen 2 model to create on-demand images.

However, recently people have noticed that asking people to create images of specific historical circumstances or people was producing ridiculous results. For example, the Founding Fathers, who we all know were white slave owners, were portrayed as a multicultural group that included people of color.

This embarrassing and simply reproducible problem was quickly mocked by online commenters. Predictably, it has also been folded into the continuing debate about diversity, equity and inclusion (currently at an area reputational low), and brought by experts as evidence that the woke-mind virus is penetrating further into the already liberal tech sector.

Photo credit: An image created by Twitter user Patrick Ganley.

“DEI has gone crazy,” conspicuously concerned residents shouted. This is Biden’s America! Google is an “ideological echo chamber”, a stalking horse for the left! (It should be said that the left was also suitably disturbed by this strange phenomenon.)

But as anyone who knows the technology can let you know, and as Google explains in its relatively silly little apology post today, this problem was the results of a wonderfully reasonable workaround for systemic bias in training data.

Say you wish to use Gemini to create a marketing campaign and ask to create 10 images of “an individual walking a dog in a park.” Since you do not specify the form of person, dog or park, it is the trader’s decision – the generative model outputs what it knows best. And in lots of cases this just isn’t a product of reality, but of coaching data, through which all kinds of biases will be embedded.

What forms of people, including dogs and parks, appear most frequently within the 1000’s of relevant images the model captured? The fact is that white individuals are overrepresented in a lot of these image collections (stock images, royalty free photography, etc.) and so in lots of cases the model will default to white people when you don’t. t specify.

That’s just an artifact of the training data, but as Google points out, “Since our users come from all around the world, we wish it to work well for everybody.” If you are on the lookout for an image of soccer players or someone walking a dog , ask, it’s possible you’ll need to get plenty of people. You probably don’t need to get images of individuals of only a certain ethnicity (or other characteristic).”

Illustration of a group of recently laid off people holding boxes.

Imagine asking for an image like this – what if it was only one person? Bad result! Photo credit: Getty Images / victorikart

There’s nothing fallacious with imagining a white guy walking a golden retriever in a suburban park. But when you ask for 10 and so they’re white guys walking in suburban parks with gold coins? And you reside in Morocco, where the people, dogs and parks all look different? This is just not a desirable end result. If someone doesn’t specify a feature, the model should select diversity over homogeneity, no matter how its training data might influence it.

This is a standard problem with all sorts of generative media. And there isn’t a easy solution. But in cases which might be particularly common, sensitive, or each, firms like Google, OpenAI, Anthropic, etc. invisibly add additional instructions to the model.

I can not emphasize enough how commonplace all these implicit instructions are. The entire LLM ecosystem relies on implicit instructions – system prompts, as they’re sometimes called, through which things like “be concise,” “don’t swear,” and other guidelines are given to the model before each conversation. If you ask for a joke, you will not get a racist joke – because although the model has swallowed 1000’s of them, she, like most of us, has also been trained not to inform them. This just isn’t a secret agenda (even though it could do with more transparency), but relatively an infrastructure issue.

The flaw with Google’s model was that there have been no implicit instructions for situations where historical context was vital. So while a prompt like “an individual walking a dog in a park” is improved by the silent addition “the person has a random gender and ethnicity” or whatever they are saying, this definitely is not the case is thereby improved.

As Google SVP Prabhakar Raghavan put it:

First, in our optimization to be sure that Gemini displays a series of individuals, we didn’t consider cases where clearly no series must be displayed. And second, over time the model became way more cautious than we intended, refusing to completely reply to certain prompts – and misinterpreting some very innocuous prompts as sensitive.

These two things caused the model to overcompensate in some cases and be too conservative in others, leading to embarrassing and incorrect images.

I understand how hard it’s to say “sorry” sometimes, so I forgive Raghavan for stopping in need of it. More vital is an interesting wording in it: “The model became way more cautious than we intended.”

How would a model “turn into” something? It’s software. Someone – 1000’s of Google engineers – built it, tested it, and iterated on it. Someone wrote the implicit instructions that improved some answers and caused others to fail comically. If this failed and someone could have checked the total prompt, they probably would have found that the Google team made a mistake.

Google blames the model for “becoming” something it was not “intended to be.” But they made the model! It’s like they broke a glass, and as a substitute of claiming, “We dropped it,” they are saying, “It fell down.” (I did that.)

Errors in these models are actually inevitable. They hallucinate, they reflect prejudices, they behave in unexpected ways. But the responsibility for these mistakes lies not with the models, but with the individuals who made them. Today it’s Google. Tomorrow it’ll be OpenAI. The next day, and possibly for just a few months straight, it’ll be X.AI.

These firms have a vested interest in convincing you that AI makes its own mistakes. Do not let it occur.

“Embarrassing and fallacious”: Google admits it has lost control of image-generating AI

LEAVE A REPLY Cancel reply

Must Read

Trend reversal in technology stocks pushes US megacaps into correction zone

A brand new Chinese video generation model appears to censor politically sensitive topics

OpenAI pronounces “SearchGPT” to remain at the highest

How Salesforce's STEM 1T dataset could revolutionize the AI industry

Forget coding bootcamps: Airtable's AI can construct your app in seconds

Level AI applies algorithms to the weak points within the contact center

ChatGPT: Everything you have to know concerning the AI-powered chatbot

Latest articles

Trend reversal in technology stocks pushes US megacaps into correction zone

A brand new Chinese video generation model appears to censor politically sensitive topics

OpenAI pronounces “SearchGPT” to remain at the highest

Our Newsletter

“Embarrassing and fallacious”: Google admits it has lost control of image-generating AI

RELATED ARTICLES

LEAVE A REPLY Cancel reply

Must Read

Latest articles

Our Newsletter