HomeNewsRunware leverages custom hardware and advanced orchestration for fast AI inference

Runware leverages custom hardware and advanced orchestration for fast AI inference

Sometimes a demo is enough to grasp a product. And that’s the case with runware. If you go over Runware's websitetype at a command prompt and press Enter to generate a picture. You'll be surprised at how quickly Runware generates the image for you – it takes lower than a second.

Runware is a newcomer to the AI ​​inference or generative AI startup landscape. The company builds its own servers and optimizes the software layer on these servers to eliminate bottlenecks and improve inference speed for imaging models. The startup has already secured $3 million in funding from Andreessen Horowitz' Speedrun, LakeStar's Halo II and Lunar Ventures.

The company doesn't need to reinvent the wheel. It just wants it to spin faster. Behind the scenes, Runware builds its own servers with as many GPUs as possible on the identical motherboard. It has its own customized cooling system and manages its own data centers.

When it involves running AI models on its servers, Runware has optimized the orchestration layer with BIOS and OS optimizations to enhance cold boot times. It has developed proprietary algorithms that map interference workloads.

The demo alone is impressive. Now the corporate desires to put all that work into research and development and switch it right into a business.

Unlike many GPU hosting corporations, Runware doesn’t rent its GPUs based on GPU time. Instead, they imagine corporations needs to be encouraged to extend their workload. For this reason, Runware offers a picture generation API with a standard cost structure per API call. It is predicated on popular AI models from Flux and Stable Diffusion.

“If you have a look at Together AI, Replicate, Hugging Face – all of them – they sell computing power based on GPU time,” co-founder and CEO Flaviu Radulescu told TechCrunch. “If you compare the time it takes us to create a picture to them. And then if you compare the costs, you will notice that we’re less expensive and far faster.”

“It can be unattainable for them to attain this feat,” he added. “Especially with a cloud provider, you may have to run in a virtualized environment, which causes additional delays.”

As Runware explores your complete inference pipeline and optimizes hardware and software, the corporate hopes to leverage multi-vendor GPUs within the near future. This has been a crucial undertaking for several startups as Nvidia is the clear leader within the GPU space, meaning Nvidia GPUs are inclined to be quite expensive.

“Right now we only use Nvidia GPUs. But that needs to be an abstraction of the software layer,” Radulescu said. “We can turn a model's GPU memory on and off very, in a short time, which allows us to place multiple customers on the identical GPUs.

“So we will not be like our competitors. You simply load a model into the GPU after which the GPU performs a really specific variety of task. In our case, we developed this software solution that permits us to modify a model in GPU memory while performing inference.”

If AMD and other GPU vendors can create compatibility layers that work with typical AI workloads, Runware can be well positioned to construct a hybrid cloud based on GPUs from multiple vendors. And that will definitely help if you must remain cheaper than the competition in terms of AI inference.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read