HomeArtificial IntelligenceHow much does it cost to develop conversational AI?

How much does it cost to develop conversational AI?

More than 40% of promoting, sales and customer support organizations have introduced generative AI — second only to IT and cybersecurity. Of all of the AI ​​technologies of the generation, conversational AI will spread fastest in these sectors attributable to its ability to bridge current communication gaps between businesses and customers.

Yet many marketing leaders I've spoken to are at a crossroads on the subject of implementing this technology. They don't know which of the available large language models (LLMs) to decide on and whether to go open source or closed source. They're afraid of spending an excessive amount of money on a brand new and unexplored technology.

Companies can after all buy ready-made conversational AI tools off the shelf, but in the event that they want them to develop into a core a part of their business, they will develop them in-house.

To help lower the fear factor for many who determine to construct one, I'd prefer to share a number of the internal research my team and I conducted in our own seek for the perfect LLM to construct our conversational AI on. We spent a while taking a look at the several LLM providers and considering how much you'd must spend on each provider, depending on the inherent costs and the variety of usage you expect out of your target market.

We decided to check GPT-4o (OpenAI) and Llama 3 (Meta). These are two of the principal LLMs that almost all corporations will compare and we consider them to be the very best quality models available on the market. They also allow us to check a closed source LLM (GPT) and an open source LLM (Llama).

How do you calculate the LLM cost for a conversational AI?

The two most significant financial considerations when selecting an LLM are the setup costs and any processing fees.

The setup cost covers the whole lot needed to get the LLM ready on your end goal, including development and operational costs. The processing cost is the actual cost of every conversation once your tool is live.

When it involves setup, the cost-benefit ratio will depend on what you might be using the LLM for and the way often you’ll use it. If you should deploy your product as quickly as possible, then you definately could be joyful to pay a premium for a model that requires little to no setup, like GPT-4o. Llama 3 can take weeks to establish, and in that point you may have already optimized a GPT product for the market.

However, when you serve a lot of clients or want more control over your LLM, you could want to just accept the upper initial setup costs for greater advantages later.

When it involves the fee of conversation processing, we'll take a look at token usage as this provides probably the most direct comparison. LLMs like GPT-4o and Llama 3 use a basic metric called a “token” – a unit of text that these models can process as input and output. There is not any universal standard for a way tokens are defined in several LLMs. Some charge tokens per word, per subword, per character, or other variations.

Because of all these aspects, a direct comparison of the LLMs is difficult. However, we’ve achieved this by simplifying the inherent costs of every model as much as possible.

We've found that while GPT-4o is cheaper by way of upfront cost, Llama 3 proves to be exponentially cheaper over time. Let's explore why, starting with setup considerations.

What are the fundamental costs of every LLM?

Before we are able to take a look at the fee per interview for every LLM, we want to know how much it is going to cost us to get there.

GPT-4o is a closed-source model hosted by OpenAI, so all you should do is about up your tool to ping GPT's infrastructure and data libraries via a straightforward API call. Setup is minimal.

Llama 3, then again, is an open source model that should be hosted on your personal private servers or with cloud infrastructure providers. Your company can download the model components without spending a dime – then it's as much as you to search out a bunch.

Hosting costs are a very important consideration here. Unless you purchase your personal servers, which is comparatively unusual, you'll must pay a cloud provider a fee to make use of their infrastructure – and every provider can have a special way of adjusting the pricing structure.

Most hosting providers will “rent” you an instance and charge you for compute capability by the hour or second. For example, AWS's ml.g5.12xlarge instance charges per server time. Others may bundle usage into different packages and charge you flat annual or monthly fees based on various aspects, corresponding to your storage needs.

Amazon Bedrock, then again, charges based on the variety of tokens processed, so it could prove to be a cheap solution for your corporation even with low usage volumes. Bedrock is a managed, serverless platform from AWS that also simplifies the deployment of LLM by managing the underlying infrastructure.

In addition to the direct costs, to get your conversational AI up and running on Llama 3 you can even need to take a position loads more money and time in operations, including the initial selection and setup of a server or serverless option, and ongoing maintenance. You can even must spend extra money on developing things like error logging tools and system alerts for any issues which will arise with the LLM servers.

The key aspects to contemplate when calculating the fundamental cost-benefit ratio include time to deployment, level of product usage (if you’ve got tens of millions of conversations monthly, the setup costs are quickly outweighed by the final word savings), and the extent of control you would like over your product and data (open source models work best here).

How much does it cost per interview at the key LLMs?

Now we are able to examine the bottom cost of every call unit.

For our modeling we used the heuristic: 1,000 words = 7,515 characters = 1,870 tokens.

We assumed that the common consumer conversation between AI and human includes 16 messages in total. This corresponds to an input of 29,920 tokens and an output of 470 tokens – for a complete of 30,390 tokens. (The input is way higher attributable to prompt rules and logic).

On GPT-4o, the Price per 1,000 input tokens it’s $0.005 and per 1,000 output tokens it’s $0.015, leading to the “benchmark” conversation costing roughly $0.16.

GPT-4o Input / Output Number of tokens Price per 1,000 tokens Cost
Input token 29,920 0.00500 USD 0.14960 USD
Issue token 470 0.01500 USD 0.00705 USD
Total cost per call 0.15665 USD

For Llama 3-70B on AWS Bedrock, the Price per 1,000 input tokens it’s $0.00265 and per 1,000 output tokens it’s $0.00350, leading to the “benchmark” conversation costing roughly $0.08.

Llama 3-70B Input / Output Number of tokens Price per 1,000 tokens Cost
Input token 29,920 0.00265 USD 0.07929 USD
Issue token 470 0.00350 USD 0.00165 USD
Total cost per call 0.08093 USD

In summary, once each models are fully arrange, the fee of a conversation running on Llama 3 is nearly 50% lower than the equivalent conversation running on GPT-4o. However, any server costs would have to be added to the calculation for Llama 3.

Note that that is only a snapshot of the full cost of every LLM. Many other variables come into play when customizing the product to your individual needs, corresponding to whether you employ a multi-prompt approach or a single-prompt approach.

For corporations that need to use conversational AI as a core service but not as a fundamental a part of their brand, the investment in developing AI in-house may simply not be price it in comparison with the standard they will achieve with off-the-shelf products.

Whichever path you select, integrating conversational AI may be incredibly useful. Just make sure that you're all the time guided by what is sensible for the context of your corporation and the needs of your customers.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read