HomeIndustriesPatronus AI secures $17 million to combat AI hallucinations and copyright infringement...

Patronus AI secures $17 million to combat AI hallucinations and copyright infringement and speed up enterprise adoption

As corporations struggle to implement generative AI, concerns in regards to the accuracy and security of enormous language models (LLMs) threaten to jeopardize widespread enterprise adoption. Jumping into the fray is Patronus AIa San Francisco startup that just raised $17 million in Series A funding to routinely detect costly — and potentially dangerous — LLM errors at scale.

The round, which brings Patronus AI’s total funding to $20 million, was led by Glenn Solomon at Notable Capital, with participation from Lightspeed Venture Partnerformer DoorDash executive Gokul Rajaram, Factorial Capital, Datadog and a number of other unnamed technology executives.

Founded by former meta-machine learning (ML) experts Anand Kannappan and Rebecca Qian, Patronus AI has developed a first-of-its-kind automated assessment platform that guarantees to discover errors corresponding to hallucinations, copyright infringement and security breaches in LLM results. Using proprietary AI, the system evaluates model performance, stress tests models against adversarial examples, and enables detailed benchmarking – all without the manual effort required by most corporations today.

“Our product is excellent at detecting quite a lot of bugs,” Kannappan, CEO of Patronus AI, said in an interview with VentureBeat. “That includes things like hallucinations, copyright and security risks, and quite a lot of company-specific features related to brand style and tone.”

The emergence of powerful LLMs like OpenAI's GPT-4o and Meta's Llama 3 has sparked an arms race in Silicon Valley to harness the technology's generative capabilities. But as hype cycles have accelerated, spectacular model errors have also increased, based on one news site. CNET publishes incorrect AI-generated articles to startups in drug research Withdrawal of research work based on LLM hallucinated molecules.

These public missteps only scratch the surface of broader problems endemic to the present generation of LLMs, claims Patronus AI. The company’s previously published research, including the “CopyrightCatcher“API was released three months ago and the “Finance Bench“, introduced six months ago, reveals startling deficiencies in the power of leading models to accurately answer fact-based questions.

FinanceBench and CopyrightCatcher: Groundbreaking research from Patronus AI reveals shortcomings in LLM

For its “FinanceBench” benchmark, Patronus hired models like GPT-4 to reply financial queries based on public SEC filings. Surprisingly, after including a whole annual report, the best-performing model only answered 19% of questions appropriately. A separate experiment using Patronus' latest “CopyrightCatcher” API found that open source LLMs reproduce copyrighted text verbatim in 44% of outputs.

“Even state-of-the-art models were hallucinatory, only getting about 90% of answers correct in finance,” explained Qian, who serves as CTO. “Our research showed that open source models had over 20% uncertain answers in lots of high-priority areas. And copyright infringement is a large risk – major publishers, media corporations or anyone using LLMs must be fearful.”

While a handful of other startups prefer it I imagine in AI, Weights and tendencies And Robust intelligence Patronus develops tools for LLM assessment and believes its research-driven approach, leveraging the founders' extensive expertise, sets the corporate apart. The core technology relies on training dedicated assessment models that reliably uncover edge cases where a given LLM is more likely to fail.

“No other company currently has as comprehensive research and technology as we do,” said Kannappan. “What is really unique about our approach is our research-first approach – in the shape of coaching assessment models, developing latest targeting techniques and publishing research papers.”

This strategy has already found traction with several Fortune 500 corporations in industries corresponding to automotive, education, finance and software, that are using Patronus AI to “safely deploy LLMs inside their organizations,” based on the startup, which declined to call specific customers. With the fresh capital, Patronus plans to expand its research, development and sales teams while developing additional industry standards.

If Patronus achieves its vision, rigorous automated evaluation of LLMs could develop into standard for corporations trying to use the technology—just as security audits paved the way in which for widespread cloud adoption. Qian sees a future where testing models with Patronus is as commonplace as unit testing code.

“Our platform is domain agnostic and the evaluation technology we develop might be prolonged to any domain, be it law, healthcare or others,” she said. “We need to enable corporations across all industries to harness the ability of LLMs while having the peace of mind that the models are secure and tailored to their specific use case needs.”

Nevertheless, the ultimate validation of an LLM's performance stays an open challenge given the black-box nature of baseline models and the virtually infinite space of possible outcomes. By advancing the cutting-edge in AI assessment, Patronus goals to speed up the trail to responsible use in practice.

“Automated measurement of LLM performance is de facto difficult and that’s just because there’s such a large scope for behavior as these models are generative in nature,” admitted Kannappan. “But through a research-driven approach, we’re in a position to detect errors in a really reliable and scalable way, which is fundamentally impossible with manual testing.”

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read