HomeArtificial IntelligenceScaling smarter: How enterprise IT teams can right-size their compute for AI

Scaling smarter: How enterprise IT teams can right-size their compute for AI

AI pilots rarely start with a deep discussion of infrastructure and hardware. But seasoned scalers warn that deploying high-value production workloads won’t end happily without strategic, ongoing concentrate on a key enterprise-grade foundation. 

Good news: There’s growing recognition by enterprises in regards to the pivotal role infrastructure plays in enabling and expanding generative, agentic and other intelligent applications that drive revenue, cost reduction and efficiency gains. 

According to IDC, organizations in 2025 have boosted spending on compute and storage hardware infrastructure for AI deployments by 97% in comparison with the identical period a 12 months before. Researchers predict global investment within the space will surge from $150 billion today to $200 billion by 2028. 

But the competitive edge “doesn’t go to those that spend essentially the most,” John Thompson, best-selling AI writer and head of the gen AI Advisory practice at The Hackett Group said in an interview with VentureBeat, “but to those that scale most intelligently.” 

Ignore infrastructure and hardware at your individual peril 

Other experts agree, saying that chances are high slim-to-none that enterprises can expand and industrialize AI workloads without careful planning and right-sizing of the finely orchestrated mesh of processors and accelerators, in addition to upgraded power and cooling systems. These purpose-built hardware components provide the speed, availability, flexibility and scalability required to handle unprecedented data volume, movement and velocity from edge to on-prem to cloud.  

A screenshot of a computer component list

AI-generated content may be incorrect.

Study after study identifies infrastructure-related issues, akin to performance bottlenecks, mismatched hardware and poor legacy integration, alongside data problems, as major pilot killers. Exploding interest and investment in agentic AI further raise the technological, competitive and financial stakes. 

Among tech corporations, a bellwether for your complete industry, nearly 50% have agent AI projects underway; the remainder can have them getting in 24 months. They are allocating half or more of their current AI budgets to agentic, and lots of plan further increases this 12 months. (Good thing,  because these complex autonomous systems require costly, scarce GPUs and TPUs to operate independently and in real time across multiple platforms.)

From their experience with pilots, technology and business leaders now understand that the demanding requirements of AI workloads — high-speed processing, networking, storage, orchestration and immense electrical power — are unlike anything they’ve ever built at scale. 

For many enterprises, the pressing query is, “Are we able to do that?” The honest answer might be: Not without careful ongoing evaluation, planning and, likely, non-trivial IT upgrades.  

They’ve scaled the AI mountain — listen

Like snowflakes and youngsters, we’re reminded that AI projects are similar yet unique. Demands differ wildly between various AI functions and kinds (training versus inference, machine learning vs reinforcement). So, too, do wide variances exist in business goals, budgets, technology debt, vendor lock-in and available skills and capabilities. 

Predictably, then, there’s no single “best” approach. Depending on circumstances, you’ll scale AI infrastructure up or horizontally (more power for increased loads), out or vertically (upgrading existing hardware) or hybrid (each).   

Nonetheless, these early-chapter mindsets, principles, recommendations, practices, real-life examples and cost-saving hacks will help keep your efforts aimed and moving in the correct direction.

 It’s a sprawling challenge, with plenty of layers: data, software, networking, security and storage. We’ll keep the main focus high-level and include links to helpful, related drill-downs, akin to those above.

Modernize your vision of AI infrastructure  

The biggest mindset shift is adopting a brand new conception of AI — not as a standalone or siloed app, but as a foundational capability or platform embedded across business processes, workflows and tools. 

To make this occur, infrastructure must balance two necessary roles: Providing a stable, secure and compliant enterprise foundation, while making it easy to quickly and reliably field purpose-built AI workloads and applications, often with tailored hardware optimized for specific domains like natural language processing (NLP) and reinforcement learning.

In essence, it’s a serious role reversal, said Deb Golden, Deloitte’s chief innovation officer. “AI should be treated like an operating system, with infrastructure that adapts to it, not the opposite way around.”

She continued: “The future isn’t nearly sophisticated models and algorithms. Hardware isn’t any longer passive. (So any further), infrastructure is fundamentally about orchestrating intelligent hardware because the operating system for AI.”  

To operate this fashion at scale and without waste requires a “fluid fabric,” Golden’s term for the dynamic allocation that adapts in real-time across every platform, from individual silicon chips up to finish workloads. Benefits will be huge: Her team found that this approach can cut costs by 30 to 40% and latency by 15 to twenty%. “If your AI isn’t respiration with the workload, it’s suffocating.”

It’s a demanding challenge. Such AI infrastructure should be multi-tier, cloud-native, open, real-time, dynamic, flexible and modular. It must be highly and intelligently orchestrated across edge and mobile devices, on-premises data centers, AI PCs and workstations, and hybrid and public cloud environments. 

What feels like buzzword bingo represents a brand new epoch in the continuing evolution, redefining and optimizing enterprise IT infrastructure for AI. The major elements are familiar: hybrid environments, a fast-growing universe of increasingly specialized cloud-based services, frameworks and platforms.  

In this latest chapter, embracing architectural modularity is vital for long-term success, said Ken Englund, EY Americas technology growth leader. “Your ability to integrate different tools, agents, solutions and platforms might be critical. Modularity creates flexibility in your frameworks and architectures.”

Decoupling systems components helps future-proof in several ways, including vendor and technology agnosticism, lug-and-play model enhancement and continuous innovation and scalability.  

Infrastructure investment for scaling AI must balance prudence and power  

Enterprise technology teams seeking to expand their use of enterprise AI face an updated Goldilocks challenge: Finding the “excellent” investment levels in latest, modern infrastructure and hardware that may handle the fast-growing, shifting demands of distributed, in all places AI.

Under-invest or persist with current processing capabilities? You’re taking a look at show-stopping performance bottlenecks and subpar business outcomes that may tank entire projects (and careers). 

Over-invest in shiny latest AI infrastructure? Say hello to massive capital and ongoing operating expenditures, idle resources and operational complexity that no one needs. 

Even greater than in other IT efforts,  seasoned scalers agreed that simply throwing processing power at problems isn’t a winning strategy. Yet it stays a temptation, even when not fully intentional. 

“Jobs with minimal AI needs often get routed to expensive GPU or TPU infrastructure,” said Mine Bayrak Ozmen, a metamorphosis veteran who’s led enterprise AI deployments at Fortune 500 corporations and a Center of AI Excellence for a serious global consultancy. 

Ironically, said Ozmen, also co-founder of AI platform company Riernio, “it’s just because AI-centric design decisions have overtaken more classical organization principles.” Unfortunately, the long-term cost inefficiencies of such deployments can get masked by deep discounts from hardware vendors, she said.

Right-size AI infrastructure with proper scoping and distribution, not raw power

What, then, should guide strategic and tactical decisions? One thing that , experts agreed, is a paradoxically misguided reasoning: Because infrastructure for AI must deliver ultra-high performance, more powerful processors and hardware should be higher. 

“AI scaling is  about brute-force compute,” said Hackett’s Thompson, who has led quite a few large global AI projects and is the writer of ,published in February. He and others emphasize that the goal is having the correct hardware in the correct place at the correct time, not the largest and baddest in all places.  

According to Ozmen, successful scalers employ “a right-size for right-executing approach.” That means “optimizing workload placement (inference vs. training), managing context locality, and leveraging policy-driven orchestration to scale back redundancy, improve observability and drive sustained growth.”

Sometimes the evaluation and decision are back-of-a-napkin easy.  “A generative AI system serving 200 employees might run just advantageous on a single server,” Thomspon said. But it’s a complete different case for more complex initiatives. 

Take an AI-enabled core enterprise system for tons of of hundreds of users worldwide, requiring cloud-native failover and serious scaling capabilities. In these cases, Thompson said, right-sizing infrastructure demands disciplined, rigorous scoping, distribution and scaling exercises. Anything else is foolhardy malpractice.   

Surprisingly, such basic IT planning discipline can get skipped. It’s often corporations, desperate to realize a competitive advantage, that attempt to speed up things by aiming outsized infrastructure budgets at a key AI project.

New Hackett research challenges some basic assumptions about what is really needed in infrastructure for scaling AI, providing additional reasons to conduct rigorous upfront evaluation. 

Thompson’s own real-world experience is instructive. Building an AI customer support system with over 300,000 users, his team soon realized it was “more necessary to have global coverage than massive capability in any single location.” Accordingly, infrastructure is situated across the U.S., Europe and the Asia-Pacific region; users are dynamically routed worldwide.

The practical takeaway advice?  “Put fences around things. Is it 300,000 users or 200? Scope dictates infrastructure,” he said.

The right hardware in the correct place for the correct job

A contemporary multi-tiered AI infrastructure strategy relies on versatile processors and accelerators that will be optimized for various roles across the continuum. For helpful insights on selecting processors, try  Going Beyond GPUs

A table with text on it

AI-generated content may be incorrect.

Sourcing infrastructure for AI scaling: cloud services for many 

You’ve got a fresh picture of what AI scaling infrastructure can and ought to be, a superb idea in regards to the investment sweet spot and scope, and what’s needed where. Now it’s time for procurement. 

As noted in VentureBeat’s last special issue, for many enterprises, essentially the most effective strategy might be to proceed using cloud-based infrastructure and equipment to scale AI production. 

Surveys of enormous organizations show most have transitioned from custom on-premises data centers to public cloud platforms and pre-built AI solutions. For many, this represents a next-step continuation of ongoing modernization that sidesteps big upfront capital outlays and talent scrambles while providing critical flexibility for quickly changing requirements. 

Over the subsequent three years, Gartner predicts ,50% of cloud compute resources might be dedicated to AI workloads, up from lower than 10% today. Some enterprises are also upgrading on-premises data centers with accelerated compute, faster memory and high-bandwidth networking.

The excellent news: Amazon, AWS, Microsoft, Google and a booming universe of specialty providers proceed to take a position staggering sums in end-to-end offerings built and optimized for AI, including full -stack infrastructure, platforms, processing including GPU cloud providers, HPC, storage (hyperscalers plus Dell, HPE, Hitachi Vantara), frameworks and myriad other managed services. 

Especially for organizations wanting to dip their toes quickly, said Wyatt Mayham, lead AI consultant at Northwest AI Consulting, cloud services offer an awesome, low-hassle selection.  

In an organization already running Microsoft, for instance, “Azure OpenAI is a natural extension (that) requires little architecture to get running safely and compliantly,” he said. “It avoids the complexity of spinning up custom LLM infrastructure, while still giving corporations the safety and control they need. It’s an awesome quick-win use case.”

However, the bounty of options available to technology decision-makers has one other side. Selecting the suitable services will be daunting, especially as more enterprises go for multi-cloud approaches that span multiple providers. Issues of compatibility, consistent security, liabilities, service levels and onsite resource requirements can quickly grow to be entangled in a fancy web, slowing development and deployment.     

To simplify things, organizations may resolve to persist with a primary provider or two. Here, as in pre-AI cloud hosting, the danger of vendor lock-in looms (although open standards offer the opportunity of selection). Hanging over all that is the specter of past and up to date attempts to migrate infrastructure to paid cloud services, only to find, with horror, that costs far surpass the unique expectations. 

All this explains why experts say that the IT 101 discipline of knowing as clearly as possible what performance and capability are needed – at the sting, on-premises, in cloud applications, in all places – is crucial before starting procurement. 

Take a fresh have a look at on-premises

Conventional wisdom suggests that handling infrastructure internally is primarily reserved for deep-pocketed enterprises and heavily regulated industries. However, on this latest AI chapter, key in-house elements are being re-evaluated, often as a part of a hybrid right-sizing strategy. 

Take Microblink, which provides AI-powered document scanning and identity verification services to clients worldwide. Using Google Cloud Platform (GCP) to support high-throughput ML workloads and data-intensive applications, the corporate quickly bumped into issues with cost and scalability, said Filip Suste, engineering manager of platform teams. “GPU availability was limited, unpredictable and expensive,” he noted.    

To address these problems, Suste’s teams made a strategic shift, moving computer workloads and supporting infrastructure on-premises. A key piece within the shift to hybrid was a high-performance, cloud-native object storage system from MinIo.

For Microblink, taking key infrastructure back in-house paid off. Doing so cut related costs by 62%, reduced idle capability and improved training efficiency, the corporate said. Crucially, it also regained control over AI infrastructure, thereby improving customer security.      

Consider a specialty AI platform 

Makino, a Japanese manufacturer of computer-controlled machining centers operating in 40 countries, faced a classic skills gap problem. Less experienced engineers could take as much as 30 hours to finish repairs that more seasoned employees can do in eight.  

To close the gap and improve customer support, leadership decided to show 20 years of maintenance data into immediately accessible expertise. The fastest and most cost-effective solution, they concluded, is to integrate an existing service-management system with a specialized AI platform for service professionals from Aquant.  

The company says taking the simple technology path produced great results. Instead of laboriously evaluating different infrastructure scenarios, resources were focused on standardizing lexicon and developing processes and procedures, Ken Creech, Makino’s director of customer support, explained. 

Remote resolution of problems has increased by 15%, solution times have decreased, and customers now have self-service access to the system, Creech said. “Now, our engineers ask a plain-language query, and the AI hunts down the reply quickly. It’s a giant wow factor.” 

Adopt mindful cost-avoidance hacks

At Albertsons, one in all the nation’s largest food and drug chains, IT teams employ several easy but effective tactics to optimize AI infrastructure without adding latest hardware, said Chandrakanth Puligundla, tech lead for data evaluation, engineering and governance. 

Gravity mapping, for instance, shows where data is stored and the way it’s moved, whether on edge devices, internal systems or on multi-cloud systems. This knowledge not only reduces egress costs and latency, Puligundla explained, but guides more informed decisions about where to allocate computing resources. 

Similarly, he said, using specialist AI tools for language processing or image identification takes less space, often delivering higher performance and economy than adding or updating dearer servers and general-purpose computers.      

Another cost-avoidance hack: Tracking watts per inference or training hour. Looking beyond speed and price to energy-efficiency metrics prioritizes sustainable performance, which is crucial for increasingly power-thirsty AI models and hardware.   

Puligundla concluded: “We can really increase efficiency through this sort of mindful preparation.”

Write your individual ending 

The success of AI pilots has brought tens of millions of corporations to the subsequent phase of their journeys: Deploying generative and LLMs, agents and other intelligent applications with high business value into wider production. 

The latest AI chapter guarantees wealthy rewards for enterprises that strategically assemble infrastructure and hardware that balances performance, cost, flexibility and scalability across edge computing, on-premises systems and cloud environments.

In the approaching months, scaling options will expand further, as industry investments proceed to pour into hyper-scale data centers, edge chips and hardware (AMD, Qualcomm, Huawei), cloud-based AI full-stack infrastructure like Canonical and Guru, context-aware memory, secure on-prem plug-and-play devices like Lemony, and rather more. 

How properly IT and business leaders plan and select infrastructure for expansion will determine the heroes of company stories and the unfortunates doomed to pilot purgatory or AI damnation.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read