According to internal correspondence seen by TechCrunch, contractors working to enhance Google's Gemini AI are comparing their responses to results from Anthropic's competitor Claude model.
When contacted by TechCrunch for comment, Google wouldn’t say whether it had received permission to make use of Claude in testing against Gemini.
As technology corporations race to develop higher AI models, the performance of those models is usually evaluated relative to competitors, typically by subjecting their very own models to industry benchmarks reasonably than having contractors rigorously evaluate their competitors' AI responses.
The contractors working on Gemini, who’re tasked with assessing the accuracy of the model results, must evaluate each answer they see based on several criteria, equivalent to truthfulness and comprehensiveness. According to correspondence seen by TechCrunch, contractors have as much as half-hour per prompt to work out which answer is best, Gemini's or Claude's.
According to the correspondence, the contractors recently noticed references to Anthropic's Claude on Google's internal platform, which they use to match Gemini with other unnamed AI models. At least one among the outcomes presented to Gemini contractors seen by TechCrunch specifically stated: “I’m Claude, created by Anthropic.”
An internal chat showed that contractors noticed that Claude's responses looked as if it would place more emphasis on security than Gemini. “Claude’s security settings are the strictest” among the many AI models, one contractor wrote. In certain cases, Claude wouldn’t reply to prompts it deemed unsafe, equivalent to role-playing one other AI assistant. In one other case, Claude avoided answering a request, while Gemini's response was deemed a “major security breach” since it included “nudity and bondage.”
Anthropics industrial terms of use Ban customers from accessing Claude “to develop a competing services or products” or “to coach competing AI models” without Anthropic’s approval. Google is a serious investor in Anthropic.
Shira McNamara, a spokeswoman for Google DeepMind, which runs Gemini, declined to say when asked by TechCrunch whether Google received permission from Anthropic to access Claude. When contacted prior to publication, an Anthropic spokesperson had no comment at press time.
McNamara said that DeepMind compares “model outputs” for assessments, but Gemini doesn’t train on anthropic models.
“Of course, as a part of our evaluation process, in some cases consistent with industry practice, we compare model results,” McNamara said. “However, any claim that we used anthropic models to coach twins is inaccurate.”
Last week, TechCrunch exclusively reported that Google contractors working on the corporate's AI products will now be forced to guage Gemini's AI responses in areas outside of their expertise. Internal correspondence expressed concerns from contractors that Gemini could generate inaccurate information on highly sensitive topics equivalent to healthcare.