As demand increases for AI solutions, the competition around the massive infrastructure required to run AI models is becoming ever more fierce. This affects your complete AI chain, from computing and storage capability in data centres, through processing power in chips, to consideration of the energy needed to run and funky equipment.
When implementing an AI strategy, corporations have to have a look at all these points to seek out one of the best fit for his or her needs. This is harder than it sounds. A business’s decision on find out how to deploy AI may be very different to picking a static technology stack to be rolled out across a complete organisation in the same way.
Businesses have yet to know that a successful AI strategy is “now not a tech decision made in a tech department about hardware”, says Mackenzie Howe, co-founder of Atheni, an AI strategy consultant. As a result, she says, nearly three-quarters of AI rollouts don’t give any return on investment.
Department heads unaccustomed to creating tech decisions can have to learn to know technology. “They are used to being told ‘Here’s your stack’,” Howe says, but leaders now need to be more involved. They must know enough to make informed decisions.
While most businesses still formulate their strategies centrally, decisions on the specifics of AI need to be devolved as each department can have different needs and priorities. For instance legal teams will emphasise security and compliance but this will not be the primary consideration for the marketing department.
“If they need to leverage AI properly — which suggests going after best-in-class tools and far more tailored approaches — best at school for one function looks like a distinct best at school for a distinct function,” Howe says. Not only will the alternative of AI application differ between departments and teams, but so might the hardware solution.
One phrase you may hear as you delve into artificial intelligence is “AI compute”. This is a term for all of the computational resources required for an AI system to perform its tasks. The AI compute required in a specific setting will rely on the complexity of the system and the quantity of knowledge being handled.
The decision flow: what are you trying to unravel?
Although this report will deal with AI hardware decisions, corporations should keep in mind the primary rule of investing in a technology: discover the issue that you must solve first. Avoiding AI is not any longer an option but simply adopting it since it is there won’t transform a business.
Matt Dietz, the AI and security leader at Cisco, says his first query to clients is: what process and challenge are you trying to unravel? “Instead of attempting to implement AI for the sake of implementing AI . . . is there something that you simply try to drive efficiency in by utilizing AI?,” he says.
Companies must understand where AI will add essentially the most value, Dietz says, whether that’s enhancing customer interactions or making these feasible 24/7. Is the aim to offer staff access to AI co-pilots to simplify their jobs or is it to make sure consistent adherence to rules on compliance?
“When you discover an operational challenge you are attempting to unravel, it is simpler to connect a return on investment to implementing AI,” Dietz says. This is especially necessary should you try to bring leadership on board and the initial investment seems high.
Companies must address further considerations. Understanding how much “AI compute” is required — within the initial phases in addition to how demand might grow — will help with decisions on how and where to speculate. “An individual leveraging a chatbot doesn’t have much of a network performance effect. An entire department leveraging the chatbot actually does,” Dietz says.
Infrastructure is subsequently key: specifically having the suitable infrastructure for the issue you are attempting to unravel. “You can have an unbelievably intelligent AI model that does some really amazing things, but when the hardware and the infrastructure isn’t set as much as support that then you definately are setting yourself up for failure,” Dietz says.
He stresses that flexibility around providers, fungible hardware and capability is very important. Companies should “scale as the necessity grows” once the model and its efficiencies are proven.
The data server dilemma: which path to take?
When it involves data servers and their locations, corporations can choose from owning infrastructure on site, or leasing or owning it off site. Scale, flexibility and security are all considerations.
While on-premises data centres are safer they may be costly each to establish and run, and never all data centres are optimised for AI. The technology have to be scalable, with high-speed storage and low latency networking. The energy to run and funky the hardware needs to be as inexpensive as possible and ideally sourced from renewables, given the massive demand.
Space-constrained enterprises with distinct requirements are inclined to lease capability from a co-location provider, whose data centre hosts servers belonging to different users. Customers either install their very own servers or lease a “bare metal”, a style of (dedicated) server, from the co-location centre. This option gives an organization more control over performance and security and it is good for businesses that need custom AI hardware, as an example clusters of high-density graphics processing units (GPUs) as utilized in model training, deep learning or simulations.
Another possibility is to make use of prefabricated and pre-engineered modules, or modular data centres. These suit corporations with distant facilities that need data stored close at hand or that otherwise should not have access to the resources for mainstream connection. This route can reduce latency and reliance on costly data transfers to centralised locations.
Given aspects comparable to scalability and speed of deployment in addition to the power to equip recent modules with the most recent technology, modular data centres are increasingly relied upon by the cloud hyperscalers, comparable to Microsoft, Google and Amazon, to enable faster expansion. The modular market was valued at $30bn in 2024 and its value is predicted to succeed in $81bn by 2031, based on a 2025 report by The Insight Partners.
Modular data centres are only a segment of the larger market. Estimates for the worth of knowledge centres worldwide in 2025 range from $270bn to $386bn, with projections for compound annual growth rates of 10 per cent into the early 2030s when the market is projected to be value greater than $1tn.
Much of the demand is driven by the expansion of AI and its higher resource requirements. McKinsey predicts that the demand for data centre capability could greater than triple by 2030, with AI accounting 70 per cent of that.
While the US has the most data centres, other countries are fast constructing their very own. Cooler climates and plentiful renewable energy, as in Canada and northern Europe, can confer a bonus, but countries within the Middle East and south-east Asia increasingly see having data centres close by as a geopolitical necessity. Access to funding and research can be an element. Scotland is the latest emerging European data centre hub.

Choose the cloud . . .
Companies that can’t afford or don’t wish to speculate in their very own hardware can opt to make use of cloud services, which may be scaled more easily. These provide access to any part or the entire components vital to deploy AI, from GPU clusters that execute vast numbers of calculations concurrently, through to storage and networking.
While the hyperscalers grab the headlines due to their investments and size — they’ve some 40 per cent of the market — they are usually not the one option. Niche cloud operators can provide tailored solutions for AI workloads: CoreWeave and Lambda, as an example, specialize in AI and GPU cloud computing.
Companies may prefer smaller providers for a primary foray into AI, not least because they may be easier to navigate while offering room to grow. Digital Ocean boasts of its simplicity while being optimised for developers; Kamatera offers cloud services run out of its own data centres within the US, Emea and Asia, with proximity to customers minimising latency; OVHcloud is powerful in Europe, offering cloud and co-location services with an option for purchasers to be hosted exclusively within the EU.
Many of the smaller cloud corporations should not have their very own data centres and lease the infrastructure from larger groups. In effect which means that a customer is leasing from a leaser, which is value taking into consideration in a world fighting for capability. That said, such businesses can also give you the chance to change to newer data centre facilities. These could have the advantage of being built primarily for AI and designed to accommodate the technology’s greater compute load and energy requirements.
. . . or plump for a hybrid solution
Another solution is to have a mix of proprietary equipment with cloud or virtual off-site services. These may be hosted by the identical data centre provider, lots of which supply ready-made hybrid services with hyperscalers or the choice to combine and match different network and cloud providers.
For instance Equinix supports Amazon Web Services with a connection between on-premises networks and cloud services through AWS Direct Connect; the Equinix Fabric ecosystem provides a alternative between cloud, networking, infrastructure and application providers; Digital Realty can connect clients to 500 cloud service providers, meaning its customers are usually not limited to using large players.
There are different approaches that apply to the hybrid route, too. Each has its benefits:
-
This can offer higher connectivity between proprietary and third-party facilities with direct access to some larger cloud operators.
-
This solution gives the owner more control with increased security, customisation options and compliance. If an organization already has on-premises equipment it might be easier to integrate cloud services over time. Drawbacks can include latency problems or compatibility and network constraints when integrating cloud services. There can also be the prohibitive cost of running a knowledge centre in house.
-
This is an easy option for individuals who seek customisation and scale. With servers managed by the information centre provider, it requires less customer input but this comes with less control, including over security.
In all cases at any time when a customer relies on a 3rd party to handle some server needs, it gives them the advantage of with the ability to access innovations in data centre operations with no huge investment.
Arti Garg, the chief technologist at Aveva, points to the massive innovation happening in data centres. “It’s significant and it’s every little thing from power to cooling to early fault detection (and) error handling,” she says.
Garg adds that a hybrid approach is very helpful for facilities with limited compute capability that depend on AI for critical operations, comparable to power generation. “They have to think how AI is perhaps leveraged in fault detection (so) that in the event that they lose connectivity to the cloud they’ll still proceed with operations,” she says.
Using modular data centres is one technique to achieve this. Aggregating data within the cloud also gives operators a “fleet-level view” of operations across sites or to supply backup.
In an uncertain world, sovereignty is very important
Another consideration when assessing data centre options is the necessity to comply with a house country’s rules on data. “Data sovereignty” can dictate the jurisdiction through which data is stored in addition to the way it is accessed and secured. Companies is perhaps sure to make use of facilities positioned only in countries that comply with those laws, a condition sometimes known as data residency compliance.
Having data centre servers closer to users is increasingly necessary. With technology borders bobbing up between China and the US, many industries must take a look at where their servers are based for regulatory, security and geopolitical reasons.
In addition to sovereignty, Garg of Aveva says: “There can also be the query of tenancy of the information. Does it reside in a tenant that a customer controls (or) can we host data for the client?” With AI and the regulations surrounding it changing so rapidly such questions are common.
Edge computing can bring extra resilience
One technique to get around that is by computing “at the sting”. This places computing centres closer to the information source, so improving processing speeds.
Edge computing not only reduces bandwidth-heavy data transmission, it also cuts latency, allowing for faster responses and real-time decision-making. This is crucial for autonomous vehicles, industrial automation and AI-powered surveillance. Decentralisation spreads computing over many points, which is able to assist in the event of an outage.
As with modular data centres, edge computing is beneficial for operators who need greater resilience, as an example those with distant facilities in adversarial conditions comparable to oil rigs. Garg says: “More advanced AI techniques have the power to support people in these jobs . . . if the operation only has a cell or a tablet and we wish to make sure that any solution is resilient to lack of connectivity . . . what’s the answer that may run in power and compute-constrained environments?”
Some of the resilience of edge computing comes from exploring smaller or more efficient models and using technologies deployed within the mobile phones sector.
While such operations might demand edge computing out of necessity, it’s a complementary approach to cloud computing slightly than a substitute. Cloud is healthier suited to larger AI compute burdens comparable to model training, deep learning and large data analytics. It provides high computational power, scalability and centralised data storage.
Given the constraints of edge when it comes to capability — but its benefits in speed and access — most corporations will probably find that a hybrid approach works best for them.
Chips with every little thing, CPUs, GPUs, TPUs: an explainer
Chips for AI applications are developing rapidly. The examples below give a flavour of those being deployed, from training to operation. Different chips excel in numerous parts of the chain although the lines are blurring as corporations offer more efficient options tailored to specific tasks.
GPUs, or graphics processing units, offer the parallel processing power required for AI model training, best applied to complex computations of the type required for deep learning.
Nvidia, whose chips are designed for gaming graphics, is the market leader but others have invested heavily to attempt to catch up. Dietz of Cisco says: “The market is rapidly evolving. We are seeing growing diversity amongst GPU providers contributing to the AI ecosystem — and that’s a great thing. Competition at all times breeds innovation.”
AWS uses high-performance GPU clusters based on chips from Nvidia and AMD nevertheless it also runs its own AI-specific accelerators. Trainium, optimised for model training, and Inferentia, utilized by trained models to make predictions, have been designed by AWS subsidiary Annapurna. Microsoft Azure has also developed corresponding chips, including the Azure Maia 100 for training and an Arm-based CPU for cloud operations.
CPUs, or central processing units, are the chips once used more commonly in personal computers. In the AI context, they do lighter or localised execution tasks comparable to operations in edge devices or within the inference phase of the AI process.
Nvidia, AWS and Intel all have custom CPUs designed for networking and all major tech players have produced some type of chip to compete in edge devices. Google’s Edge TPU, Nvidia’s Jetson and Intel’s Movidius all boost AI model performance in compact devices. CPUs comparable to Azure’s Cobalt CPU can be optimised for cloud-based AI workloads with faster processing, lower latency and higher scalability.

Many CPUs use design elements from Arm, the British chip designer bought by SoftBank in 2016, on whose designs nearly all mobile devices rely. Arm says its compute platform “delivers unmatched performance, scalability, and efficiency”.
TPUs, or tensor processing units, are an extra specification. Designed by Google in 2015 to speed up the inference phase, these chips are optimised for high-speed parallel processing, making them more efficient for large-scale workloads than GPUs. While not necessarily the identical architecture, competing AI-dedicated designs include AI accelerators comparable to AWS’s Trainium.
Breakthroughs are continually occurring as researchers try to enhance efficiency and speed and reduce energy usage. Neuromorphic chips, which mimic brain-like computations, can run operations in edge devices with lower power requirements. Stanford University in California, in addition to corporations including Intel, IBM and Innatera, have developed versions each with different benefits. Researchers at Princeton University in New Jersey are also working on a low-power AI chip based on a distinct approach to computation.
High-bandwidth memory helps nevertheless it isn’t an ideal solution
Memory capability plays a critical role in AI operation and is struggling to maintain up with the broader infrastructure, giving rise to the so-called memory wall problem. According to techedgeai.com, prior to now two years AI compute power has grown by 750 per cent and speeds have increased threefold, while dynamic random-access memory (Dram) bandwidth has grown by just one.6 times.
AI systems require massive memory resources, starting from lots of of gigabytes to terabytes and above. Memory is especially significant within the training phase for giant models, which demand high-capacity memory to process and store data sets while concurrently adjusting parameters and running computations. Local memory efficiency can also be crucial for AI inference, where rapid access to data is vital for real-time decision-making.
High bandwidth memory helps to alleviate this bottleneck. While built on evolved Dram technology, high bandwidth memory introduces architectural advances. It may be packaged into the identical chipset because the core GPU to supply lower latency and it’s stacked more densely than Dram, reducing data travel time and improving latency. It isn’t an ideal solution, nevertheless, as stacking can create more heat, amongst other constraints.
Everyone needs to contemplate compatibility and adaptability
Although models proceed to develop and proliferate, the excellent news is that “the power to interchange between models is pretty easy so long as you’ve the GPU power — and a few don’t even require GPUs, they’ll run off CPUs,” Dietz says.
Hardware compatibility doesn’t commit users to any given model. Having said that, change may be harder for corporations tied to chips developed by service providers. Keeping your options open can minimise the chance of being “locked in”.
This could be a problem with the more dominant players. The UK regulator Ofcom referred the UK cloud market to the Competition and Markets Authority due to the dominance of three of the hyperscalers and the problem of switching providers. Ofcom’s objections included high fees for transferring data out, technical barriers to portability and committed spend discounts, which reduced costs but tied users to 1 cloud provider.
Placing business with various suppliers offsets the chance of anybody supplier having technical or capability constraints but this may create side-effects. Problems may include incompatibility between providers, latency when transferring and synchronising data, security risk and costs. Companies need to contemplate these and mitigate the risks. Whichever route is taken, any company planning to make use of AI should make portability of knowledge and repair a primary consideration in planning.
Flexibility is critical internally, too, given how quickly AI tools and services are evolving. Howe of Atheni says: “Numerous what we’re seeing is that corporations’ internal processes aren’t designed for this sort of pace of change. Their budgeting, their governance, their risk management . . . it’s all built for that very far more stable, predictable form of technology investment, not rapidly evolving AI capabilities.”
This presents a specific problem for corporations with complex or glacial procurement procedures: months-long approval processes hamper the power to utilise the most recent technology.
Garg says: “The agility must be within the openness to AI developments, keeping abreast of what’s happening after which at the identical time making informed — as best you possibly can — decisions around when to adopt something, when to be a bit bit more mindful, when to hunt advice and who to hunt advice from.”
Industry challenges: attempting to keep pace with demand
While individual corporations might need modest demands, one issue for industry as a complete is that the present demand for AI compute and the corresponding infrastructure is big. Off-site data centres would require massive investment to maintain pace with demand. If this falls behind, corporations without their very own capability could possibly be left fighting for access.
McKinsey says that, by 2030, data centres will need $6.7tn more capital to maintain pace with demand, with those equipped to supply AI processing needing $5.2tn, although this assumes no further breakthroughs and no tail-off in demand.
The seemingly insatiable demand for capability has led to an arms race between the most important players. This has further increased their dominance and given the impression that only the hyperscalers have the capital to supply flexibility on scale.

Sustainability: find out how to get essentially the most from the ability supply
Power is a major problem for AI operations. In April 2025 the International Energy Agency released a report dedicated to the sector. The IEA believes that grid constraints could delay one-fifth of the information centre capability planned to be built by 2030. Amazon and Microsoft cited power infrastructure or inflated lease prices because the cause for recent withdrawals from planned expansion. They refuted reports of overcapacity.
Not only do data centres require considerable energy for computation, they draw an enormous amount of energy to run and funky equipment. The power requirements of AI data centres are 10 times those of an ordinary technology rack, based on Soben, the worldwide construction consultancy that’s now a part of Accenture.
This demand is pushing data centre operators to provide you with their very own solutions for power while they wait for the infrastructure to catch up. In the short term some operators are taking a look at “power skids” to extend the voltage drawn off a neighborhood network. Others are planning long-term and considering installing their very own small modular reactors, as utilized in nuclear submarines and aircraft carriers.
Another approach is to scale back demand by making cooling systems more efficient. Newer centres have turned to liquid cooling: not only do liquids have higher thermal conductivity than air, the systems may be enhanced with more efficient fluids. Algorithms preemptively adjust the circulation of liquid through cold plates attached to processors (direct-to-chip cooling). Reuse of waste water makes such solutions seem green, although data centres proceed to face objections in locations comparable to Virginia as they compete for scarce water resources.
The DeepSeek effect: smaller is perhaps higher for some
While corporations proceed to throw large amounts of cash at capability, the event of DeepSeek in China has raised questions comparable to “do we’d like as much compute if DeepSeek can achieve it with a lot less?”.
The Chinese model is cheaper to develop and run for businesses. It was developed despite import restrictions on top-end chips from the US to China. DeepSeek is free to make use of and open source — and it’s also in a position to confirm its own considering, which makes it way more powerful as a “reasoning model” than assistants that pump out unverified answers.
Now that DeepSeek has shown the ability and efficiency of smaller models, this could add to the impetus to a rethink around capability. Not all operations need the biggest model available to attain their goals: smaller models less greedy for compute and power may be more efficient at a given job.
Dietz says: “Numerous businesses were really cautious about adopting AI because . . . before (DeepSeek) got here out, the perception was that AI was for people who had the financial means and infrastructure means.”
DeepSeek showed that users could leverage different capabilities and fine-tune models and still get “the identical, if not higher, results”, making it way more accessible to those without access to vast amounts of energy and compute.
Definitions
Training: teaching a model find out how to perform a given task.
The inference phase: the method by which an AI model can draw conclusions from recent data based on the data utilized in its training
Latency: the time delay between an AI model receiving an input and generating an output.
Edge computing: processing on a neighborhood device. This reduces latency so is crucial for systems that require a real-time response, comparable to autonomous cars, nevertheless it cannot take care of high-volume data processing.
Hyperscalers: providers of big data centre capability comparable to Amazon’s AWS, Microsoft’s Azure, Google Cloud and Oracle Cloud. They offer off-site cloud services with every little thing from compute power and pre-built AI models through to storage and networking, either all together or on a modular basis.
AI compute: the hardware resources that run AI applications, algorithms and workloads, typically involving servers, CPUs, GPUs or other specialised chips.
Co-location: the use of knowledge centres which rent space where businesses can keep their servers.
Data residency: the placement where data is physically stored on a server.
Data sovereignty: the concept that data is subject to the laws and regulations of the land where it was gathered. Many countries have rules about how data is gathered, controlled, stored and accessed. Where the information resides is increasingly an element if a rustic feels that its security or use is perhaps in danger.

