To scale agent AI, Notion stripped down its technology stack and commenced fresh

October 8, 2025

252

Many firms can be hesitant to overhaul their technology stack and begin from scratch. Not Performance. For version 3.0 of its productivity software (released in September), the corporate didn't hesitate to construct it from scratch. They realized that there was actually a must support agentic AI on the enterprise level. While traditional AI-powered workflows require explicit step-by-step instructions based on learning in just a few steps, AI agents based on advanced reasoning models rigorously define tools, can discover and understand what tools can be found to them, and plan next steps. “Rather than attempting to retrofit what we built, we wanted to use the strengths of reasoning models,” Sarah Sachs, head of AI modeling at Notion, told EnterpriseBeat. “We rebuilt a brand new architecture since the workflows are different from those of the agents.”

Re-orchestration to permit models to work autonomously

Notion has been adopted by 94% of Forbes AI 50 firms, has 100 million total users and counts OpenAI, Cursor, Figma, Ramp and Vercel amongst its customers. In a rapidly evolving AI landscape, the corporate recognized the necessity to move beyond simpler, task-based workflows to goal-oriented reasoning systems that enable agents to autonomously select, orchestrate and execute tools in networked environments.

Sachs noted that reasoning models have quickly grow to be “much better” in terms of learning to make use of tools and following chain of thought (CoT) instructions. This allows them to be “much more independent” and make multiple decisions inside an agent workflow. “We have rebuilt our AI system to accommodate this,” she said. From a technical perspective, this meant replacing rigid, prompt-based flows with a unified orchestration model, Sachs explained. This core model is supported by modular subagents that search Notion and the net, query and complement databases, and edit content. Each agent uses tools contextually; For example, they’ll determine whether or not they want to look Notion itself or one other platform like Slack. The model performs successive searches until the relevant information is found. It can then, for instance, convert notes into suggestions, create follow-up messages, track tasks, and discover knowledge bases and make updates. With Notion 2.0, the team focused on making the AI perform specific tasks, which required them to “think exhaustively” about the right way to control the model, Sachs noted. However, with version 3.0, users can assign tasks to agents, and agents can actually take actions and perform multiple tasks at the identical time. “We redesigned it to be self-selecting by way of tools, moderately than simply just a few shots, which explicitly dictates the right way to undergo all of those different scenarios,” Sachs explained. The goal is to ensure all the things is connected to the AI and that “your Notion agent can do all the things you may do.”

Bifurcation to isolate hallucinations

Notion's “higher, faster, cheaper” philosophy drives a continuous iteration cycle that balances latency and accuracy through fine-tuned vector embeddings and elastic search optimization. Sachs' team uses a rigorous evaluation framework that mixes deterministic testing, linguistic optimization, human-annotated data, and LLMs as judges with model-based assessment to discover discrepancies and inaccuracies. “By splitting the assessment we are able to see where the issues are coming from and that helps us isolate unnecessary hallucinations,” Sachs explained. Furthermore, simplifying the architecture itself signifies that it is simpler to make changes as models and techniques evolve. “We optimize latency and parallel considering as much as possible,” which ends up in “a lot better accuracy,” Sachs noted. The models are based on data from the net and Notion’s connected workspace. Ultimately, Sachs reported, the investment in transforming its architecture has already delivered Notion returns in the shape of performance and faster rates of change. She added: “We are completely open to rebuilding it when the following breakthrough happens, if we’ve got to.”

Understanding contextual latency

When constructing and fine-tuning models, it’s important to know that latency is subjective: AI must provide probably the most relevant information, not necessarily probably the most, on the expense of speed. “You'd be surprised at how different customers are of their willingness to attend for things and never wait for things,” Sachs said. That makes for an interesting experiment: How slowly are you able to go before people surrender on the model? For example, in a purely navigational search, users will not be as patient; They want near-instant answers. “When you ask, 'What is 2 plus two?', you don't wish to wait in your agent to look throughout Slack and JIRA,” Sachs emphasized. But the longer the time given, the more exhausting an argument agent may be. For example, Notion can work independently on tons of of internet sites, files and other materials for 20 minutes. In these cases, users are more willing to attend, explained Sachs; They allow the model to run within the background while they care for other tasks. “It's a product query,” Sachs said. “How can we set user expectations for the interface? How can we set user expectations around latency?”

Notion is its largest user

Notion understands the importance of using its own product – in actual fact, its employees are a few of its biggest power users. Sachs explained that teams have energetic sandboxes that generate training and assessment data, in addition to a “really energetic” thumbs-up, thumbs-down user feedback loop. Users aren’t afraid to say what they think ought to be improved or what features they would love to see. Sachs emphasized that when a user thumbs down on an interaction, they’re explicitly giving permission to a human annotator to investigate that interaction in a way that de-anonymizes it as much as possible. “We as an organization use our own tool all day, daily, and so we get really quick feedback loops,” Sachs said. “We really make our own pet food product.” However, it's their very own product they're constructing, Sachs noted, in order that they're aware they might be putting on safety glasses in terms of quality and functionality. To compensate for this, Notion relies on “very AI-savvy” design partners who get early access to latest features and supply vital feedback. Sachs emphasized that that is just as vital as internal prototyping. “We're all about experimenting out within the open. I believe you get rather a lot more comprehensive feedback,” Sachs said. “Because at the tip of the day, if we just have a look at how Notion uses Notion, we're not likely providing the perfect experience for our customers.” Equally vital, continuous internal testing allows teams to judge progress and ensure models don’t regress (as accuracy and performance degrade over time). “Everything you do stays true,” Sachs explained. “You know your latency is restricted.”

Many firms make the error of focusing too heavily on retrospectively focused Evans; This makes it difficult for them to know how and where they’re improving, Sachs emphasized. Notion views evaluations as a “litmus test” for development and forward-looking progress, in addition to evaluations for observability and regression security. “I believe a giant mistake loads of firms make is mixing the 2,” Sachs said. “We use them for each purposes; we take into consideration them very in another way.”

Lessons learned from Notion’s journey

For enterprises, Notion can function a blueprint for responsibly and dynamically operationalizing agent AI in a connected, empowered enterprise workspace. Sachs insights for other technology leaders:

Don't be afraid to rebuild when fundamental skills change; Notion has completely redesigned its architecture to align with inference-based models.
Treat latency as contextual: optimize per use case, not across the board.
Justify all expenses with trusted, curated company data to make sure accuracy and trust. She advised: “Be prepared to make difficult decisions. Be prepared to be on the forefront of your development, so to talk, to create the perfect possible product in your customers.”

To scale agent AI, Notion stripped down its technology stack and commenced fresh

Re-orchestration to permit models to work autonomously

Bifurcation to isolate hallucinations

Understanding contextual latency

Notion is its largest user

Lessons learned from Notion’s journey

LEAVE A REPLY Cancel reply

Must Read

From Svedka to Anthropic, brands are making daring plays with AI in Super Bowl ads

“That’s science!” – MIT President speaks on GBH's Boston Public Radio in regards to the importance of America's research enterprise

New technologies are strengthening the worldwide fight against wildlife trafficking

How diverse voices are changing the UN's climate science

Why comparisons between AI and human intelligence miss the purpose

Helping AI agents search to get the very best results from large language models

AI-generated text overwhelms institutions and triggers a hopeless “arms race” with AI detectors

Latest articles

From Svedka to Anthropic, brands are making daring plays with AI in Super Bowl ads

“That’s science!” – MIT President speaks on GBH's Boston Public Radio in regards to the importance of America's research enterprise

New technologies are strengthening the worldwide fight against wildlife trafficking

Our Newsletter

To scale agent AI, Notion stripped down its technology stack and commenced fresh

Re-orchestration to permit models to work autonomously

Bifurcation to isolate hallucinations

Understanding contextual latency

Notion is its largest user

Lessons learned from Notion’s journey

RELATED ARTICLES

LEAVE A REPLY Cancel reply

Must Read

Latest articles

Our Newsletter