Anthropic's latest Claude Prompt caching will save developers a fortune

August 15, 2024

195

Anthropic introduced fast caching on its APIthat remembers the context between API calls and allows developers to avoid repetitive prompts.

The fast caching feature is available in public beta on Claude 3.5 Sonnet and Claude 3 Haiku, but support for the biggest Claude model, Opus, is coming soon.

Fast caching, described on this 2023 paperallows users to retain commonly used contexts of their sessions. Because the models remember these prompts, users can add additional background information without increasing overhead. This is beneficial in cases where someone desires to send a considerable amount of context in a single prompt after which refer back to it in numerous conversations with the model. It also allows developers and other users to higher optimize model responses.

Anthropic said early users have “seen significant speed and value improvements with prompt caching for quite a lot of use cases – from incorporating a full knowledge base to 100-shot examples to incorporating every conversation turn into their prompt.”

Potential use cases include reducing costs and latency for long instructions and uploaded documents for conversational agents, faster auto-completion of codes, providing multiple instructions for agent search tools, and embedding entire documents in a command prompt, in keeping with the corporate.

Anthropic (@AnthropicAI) has just announced a groundbreaking innovation for its API: prompt caching.

Think of prompt caching like this: You're at a coffee shop. On your first visit, you might have to inform the barista your entire order. But next time? Just say “the standard.”

That’s prompt… pic.twitter.com/ASB1nkdY4U

—Dan Shipper? (@danshipper) 14 August 2024

Cached price requests

One advantage of cached prompts is lower prices per token. According to Anthropic, using cached prompts is “significantly cheaper” than the bottom price of the prompt token.

For Claude 3.5 Sonnet, writing a prompt to cache costs $3.75 per 1 million tokens (MTok), but using a cached prompt costs $0.30 per MTok. The base price of a prompt for the Claude 3.5 Sonnet model is $3/MTok, so in the event you pay somewhat more up front, you may expect a 10x savings the following time you utilize the cached prompt.

We just introduced prompt caching within the Anthropic API.

It reduces API input costs by as much as 90% and reduces latency by as much as 80%.

And that is how it really works:

— Alex Albert (@alexalbert__) 14 August 2024

Speaking of costs, the primary API call is barely dearer (as a consequence of caching the prompt), but all subsequent calls cost a tenth of the traditional price. pic.twitter.com/3cPkz8c0rm

— Alex Albert (@alexalbert__) 14 August 2024

Claude 3 Haiku users pay $0.30/MTok for caching and $0.03/MTok when using saved prompts.

While prompt caching just isn’t yet available for Claude 3 Opus, Anthropic has already released its pricing. Writing to the cache costs $18.75/MTok, but accessing the cached prompt costs $1.50/MTok.

However, as AI influencer Simon Willison noted on X, Anthropic's cache only has a lifespan of 5 minutes and is refreshed each time it’s used.

Looks much like Gemini's context caching, but Anthropic's pricing model is different

Gemini charges $4.50 per million tokens per hour to maintain the context cache warm

Anthropic fee for cache writes and “Cache has a lifetime of 5 minutes and is refreshed each time the cached content is used” https://t.co/rfMQE2J3Rs

– Simon Willison (@simonw) 14 August 2024

Of course, this just isn’t the primary time Anthropic has tried to compete with other AI platforms through pricing. Before the discharge of the Claude 3 model family, Anthropic lowered the costs of its tokens.

The company is currently in a race to the underside against competitors like Google and OpenAI to supply low-cost options to third-party developers constructing on its platform.

Frequently requested feature

Other platforms offer a version of prompt caching. Lamina, an LLM inference system, uses KV caching to scale back the fee of GPUs. A cursory have a look at the OpenAI developer forums or GitHub raises questions on prompt caching.

Caching prompts just isn’t the identical as having a big language model memory. For example, OpenAI's GPT-4o provides a memory where the model remembers preferences or details. However, it doesn’t store the actual prompts and responses like prompt caching does.

Anthropic's latest Claude Prompt caching will save developers a fortune

Cached price requests

Frequently requested feature

LEAVE A REPLY Cancel reply

Must Read

People offer the essential “checks and balances” for AI, says Lattice CEO

Chinese supervisory authorities attempt to decelerate self -driving features in cars

Europe's Mistral advantages from the seek for alternatives for artificial intelligence

Ki -Start -ups enter the stage to present your organization to a jury of judges

Computing -based computing grows on the Internet as we understand it

Microsoft to judge the protection of AI models sold to Cloud customers

Why the investment in AI startups of the expansion stage becomes dangerous and more complicated

Latest articles

People offer the essential “checks and balances” for AI, says Lattice CEO

Chinese supervisory authorities attempt to decelerate self -driving features in cars

Europe's Mistral advantages from the seek for alternatives for artificial intelligence

Our Newsletter

Anthropic's latest Claude Prompt caching will save developers a fortune

Cached price requests

Frequently requested feature

RELATED ARTICLES

LEAVE A REPLY Cancel reply

Must Read

Latest articles

Our Newsletter