HomeArtificial IntelligenceHas this stealth startup finally cracked the code to enterprise AI agent...

Has this stealth startup finally cracked the code to enterprise AI agent reliability? Meet AUI's Apollo-1

For greater than a decade, conversational AI has promised human-like assistants that may do greater than just chat. But whilst large language models (LLMs) like ChatGPT, Gemini and Claude learn to reason, explain and program, a critical category of interaction stays largely unsolved – reliably completing tasks for humans outside the chat.

Even those The best AI models only rating points thirtieth percentile on Terminal-Bench Hard, a third-party benchmark designed to judge the performance of AI agents when performing various browser-based tasks and is well below the reliability required by most firms and users. And task-specific benchmarks just like the airline TAU-Bench, which one measures it? Reliability of AI agents when searching and booking flights on behalf of a user also don't have much higher success rates, with only 56% for the highest performing agents and models (Claude 3.7 Sonnet) – meaning that the agent fails almost half the time.

Based in New York City Augmented Intelligence (AUI) Inc.Co-founder of Ohad Elhelo And Ori Cohenbelieves it has finally found an answer that may increase the reliability of AI agents to a level where most firms can trust them to reliably perform their instructions.

The company's recent founding model known as Apollo 1 – which is currently in preview with early testers but is nearing an imminent general release – is predicated on a principle it calls State-related neurosymbolic considering.

It is a hybrid architecture developed by even LLM skeptics like Gary Marcusdesigned to make sure consistent, compliant leads to every customer interaction.

“Conversational AI is basically two halves,” Elhelo said in a recent interview with VentureBeat. “The first half – the open dialogue – is handled beautifully by LLMs. They are designed for creative or exploratory use cases. The other half is a task-oriented dialogue, where there may be at all times a particular goal behind the conversation. This half has remained unresolved since it requires certainty.”

AUI defined Security because the difference between an agent who’s “likely” to perform a task and an agent who almost “at all times” does so.

For example on TAU-Bench Airline achieves an astonishing success rate of 92.5%and leaves all other current competitors far behind – based on benchmarks shared with VentureBeat and published on the AUI website.

Elhelo gave easy examples: a bank that must implement identity verification for refunds over $200, or an airline that must at all times offer an upgrade to business class before economy class.

“These will not be preferences,” he said. “These are requirements. And no purely generative approach can provide such behavioral security.”

AUI and its work to enhance reliability was previously reported by a subscription news portal The informationbut has not yet been comprehensively covered within the publicly available media.

From pattern matching to predictable motion

The team argues that transformer models cannot reach this limit because of their design. Large language models generate plausible text, not guaranteed behavior. “If you tell an LLM to at all times offer insurance before paying, that may occur – normally,” Elhelo said. “Configure Apollo-1 with this rule, and it’ll—each time.”

This distinction, he said, comes from the architecture itself. Transformers predict the subsequent token in a sequence. In contrast, Apollo-1 predicts this next motion working in a conversation with what AUI refers to as a typified symbolic state.

Cohen explained the concept in a more technical way. “Neurosymbolic means we merge the 2 dominant paradigms,” he said. “The symbolic layer gives you structure – it knows what an intent, an entity and a parameter is – while the neural layer gives you language competence. The neurosymbolic thinker sits in between. It's a distinct type of brain for dialogue.”

While transformers treat each output as text generation, Apollo-1 performs a closed loop reasoning: an encoder translates natural language right into a symbolic state, a state machine maintains that state, a choice engine determines the subsequent motion, a planner executes it, and a decoder converts the result back into speech. “The process is iterative,” Cohen said. “It runs in a loop until the duty is accomplished. So you get determinism as a substitute of probability.”

A basic model for task execution

Unlike traditional chatbots or customized automation systems, Apollo-1 is meant to function… Founding model for task-oriented dialogue – a single, domain-independent system that will be configured for banking, travel, retail or insurance through what AUI calls a System prompt.

“The system prompt just isn’t a configuration file,” Elhelo said. “It's a behavioral contract. You define exactly how your agent must behave in interesting situations, and Apollo-1 guarantees that those behaviors will probably be implemented.”

Organizations can use the prompt to encode symbolic slots—intents, parameters, and policies—in addition to tool boundaries and stateful rules.

For example, a food delivery app might dictate: “When allergy is mentioned, at all times inform the restaurant,” while a telecommunications provider might specify: “Stop service after three failed payment attempts.” In each cases the behavior is performed deterministically and never statistically.

Eight years of development

AUI's journey to Apollo-1 began in 2017, when the team began encoding thousands and thousands of real-world task-oriented conversations conducted by a 60,000-strong workforce of human agents.

This work resulted in a symbolic language that may divide Procedural knowledge – Steps, restrictions and procedures – from descriptive knowledge like entities and attributes.

“The insight was that task-oriented dialogue has universal procedural patterns,” said Elhelo. “Food delivery, claims processing and order management all have similar structures. Once you model this explicitly, you possibly can calculate it deterministically.”

From there, the corporate developed the Neurosymbolic Reasoner – a system that uses symbolic state to make a decision what happens next, slightly than guessing through token prediction.

Benchmarks suggest that the architecture makes a measurable difference.

In AUI's own assessments, Apollo-1 greater than achieved 90 percent Task completion on the Ď„-Bench airline benchmark, in comparison with 60 percent for Claude-4.

It's accomplished 83 percent of live booking chats on Google Flights in comparison with 22 percent for Gemini 2.5 Flash and 91 percent of retail scenarios on Amazon in comparison with 17 percent for Rufus.

“These will not be incremental improvements,” Cohen said. “These are reliability differences of magnitude.”

A complement, not a competitor

AUI presents Apollo-1 not as a alternative for giant language models, but as their crucial counterpart. In Elhelo's words: “Transformers optimize creative probability. Apollo-1 optimizes behavioral safety. Together they form the entire spectrum of conversational AI.”

The model is already running in limited pilots with undisclosed Fortune 500 firms in various sectors, including finance, travel and retail.

AUI has also confirmed a Strategic partnership with Google and plans for general availability in November 2025when it opens APIs, releases full documentation, and adds voice and image features. Interested prospects and partners can enroll to receive more information if needed will probably be available on AUI's website form.

Until then, the corporate is keeping details secret. When asked what's next, Elhelo smiled. “Let’s just say we’re preparing an announcement,” he said. “Soon.”

On the technique to conversations that have an effect

For all its technical sophistication, Apollo-1's approach is easy: Make AI that firms can trust to act – not only talk. “Our mission is to democratize access to working AI,” Cohen said near the tip of the interview.

Whether Apollo-1 will change into the brand new standard for task-oriented dialogue stays to be seen. But if AUI's architecture works as promised, the long-standing gap between chatbots that sound human and agents that reliably do human work could finally close.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read