The AI startup Mistral has began a brand new API for content moderation.
The API, which is similar API that supports moderation in Mistral's chatbot platform Le Chat, may be tailored to specific applications and security standards, Mistral says. It relies on a fine-tuned model (Ministral 8B) trained to categorise texts in a spread of languages, including English, French and German, into one among nine categories: Sexual, Hate and Discrimination, Violence and Threats, More Dangerous and criminal content, self-harm, health, financial, legal and personally identifiable information.
The Moderation API may be applied to either raw or conversational text, Mistral says.
“In recent months, now we have seen growing enthusiasm within the industry and research community for brand spanking new AI-based moderation systems that can assist make moderation more scalable and robust across applications,” Mistral wrote in a blog post. “Our content moderation classifier leverages essentially the most relevant policy categories for effective guardrails and introduces a practical approach to model security by addressing model-related harms equivalent to unqualified advice and private data.”
AI-powered moderation systems are theoretically useful. But also they are vulnerable to the identical biases and technical flaws that plague other AI systems.
For example, some models trained to detect toxicity view phrases in African American Vernacular English (AAVE), the informal grammar utilized by some Black Americans, as disproportionately “toxic.” Social media posts about individuals with disabilities are also often rated as more negative or toxic by commonly used public sentiment and toxicity detection models, studies show found.
Mistral claims that its moderation model could be very accurate – but additionally admits that it remains to be under development. Notably, the corporate has not compared the performance of its API to other popular moderation APIs equivalent to Jigsaw's Perspective API and OpenAI's Moderation API.
“We work with our customers to develop and share scalable, lightweight and customizable moderation tools,” the corporate said, “and can proceed to work with the research community to contribute security advances more broadly.”