It is a known undeniable fact that different model families can use different tokenizers. However, it was only analyzed to a limited extent how the technique of “Tokenization“” Even varies over these tokenizers. Do all tokenizers run in the identical variety of tokens for a selected input text? If not, how different are the tokens generated? How significant are the differences?
In this text we examine these questions and examine the sensible effects of the variability of tokenization. We present a comparative story of two border model families: Openai'S Chatgpt vs AnthropicClaude. Although their announced “inexpensive” numbers are very competitive, experiments show that anthropic models could be 20–30% dearer than GPT models.
API price design-Claude 3.5 Sonet against GPT-4O
From June 2024, the value structure for these two advanced frontier models may be very competitive. Both the Claude 3.5 Bonnet from Anthropic and the GPT-4O from Openaai have similar costs for output tokens, while Claude 3.5 Sonnet offers 40% cheaper costs for input token.
The hidden “tokenizer -inefficiency”
Despite the lower input token rates of the anthropic model, we observed that the whole costs for ongoing experiments (with a certain set of firm input requests) with GPT-4O are less expensive in comparison with Claude-Sonnet-3.5.
Why?
The anthropic tokenizer tends to disassemble the identical input into more tokens than with Openai's tokenizer. This signifies that anthropic models produce considerably more tokens for similar requests than their Openai counterparts. While the pro-member costs for Claude 3.5 Sonett-Input could also be lower, increased tokenization can compensate for these savings, which results in higher total costs in practical applications.
These hidden costs are based in the way in which anthropics tokenizer codes information and infrequently use more tokens to display the identical content. The inflation of the token count has a major impact on the prices and using the context window.
Domain -dependent tokenization efficiency
Different kinds of domain contents are token in a different way by Anthropics Tokenizers, which results in different numbers of the tokens in comparison with Openai models. The AI research community has found similar tokenization differences Here. We tested our ends in three popular domains, namely: English articles, code (Python) and arithmetic.
domain | Model input | GPT -TOKEN | Claude tokens | % Token -overhead |
English article | 77 | 89 | ~ 16% | |
Code (python) | 60 | 78 | ~ 30% | |
math | 114 | 138 | ~ 21% |
When comparing Claude 3.5 sonet with GPT-4O, the degree of tokenizer inefficiency varies significantly between the content domains. For English articles, Claude's tokenizer produces roughly 16% more tokens than GPT-4O for a similar entrance text. This overhead increases with a structured or technical content: for mathematical equations, the overhead is 21% and Claude generates 30% more tokens for Python code.
This variation arises because some content types resembling technical documents and code often contain patterns and symbols that put together anthropics tokenizer fragments into smaller pieces, which results in a better variety of tokens. In contrast, the content of the natural language tends to indicate a lower token -overhead.
Other practical implications of the inefficiency of the tokenizer
In addition to the direct effects on the prices, indirect effects on using the context window also has an indirect impact. While anthropical models have a bigger context window of 200k token in contrast to Openais 128 -k -token resulting from detail, the effective usable token space for anthropic models could be smaller. Therefore, there could possibly be a small or big difference within the “advertised” context window sizes in comparison with the “effective” context window sizes.
Implementation of tokenizers
Use GPT models Byta pair coding (BPE)))the usually contiguous pairs of characters merge to tokens at the identical time. In particular, the most recent GPT models use the open source O200K_Base tokenizer. The actual tokens utilized by GPT-4O (within the Tikoke tokenizer) Here.
JSON
{
#reasoning
"o1-xxx": "o200k_base",
"o3-xxx": "o200k_base",
# chat
"chatgpt-4o-": "o200k_base",
"gpt-4o-xxx": "o200k_base", # e.g., gpt-4o-2024-05-13
"gpt-4-xxx": "cl100k_base", # e.g., gpt-4-0314, etc., plus gpt-4-32k
"gpt-3.5-turbo-xxx": "cl100k_base", # e.g, gpt-3.5-turbo-0301, -0401, etc.
}
Unfortunately, not much could be said about anthropic tokenizers, for the reason that tokenizer isn’t as direct and straightforward as GPT is offered. Anthropic Published their token Counting Api in December 2024. However, it was soon switched off within the later versions in 2025.
Late red reports that “Anthropic uses a novel tokenzace with only 65,000 token variations in comparison with the 100.261 token variations from Openai for GPT-4”. The Colab notebook Contains Python code to investigate the tokenization differences between GPT and Claude models. Another Tool This enables networking with some common, publicly available tokenizers who confirm our results.
The ability to proactive token counts (without the actual model -api) and the budget costs for AI firms of crucial importance.
Key Takeaways
- Anthropic's competitive pricing is related to hidden costs:
While the Claude 3.5 sonet from Anthropic offers 40% lower input token costs in comparison with GPT-4O from Openaai, this obvious cost advantage could be misleading resulting from the differences in the way in which through which the input text is token. - Hidden “inefficiency of the tokenizer”:
Anthropic models are naturally more detailed. For firms that process large text volumes, the understanding of this discrepancy is of crucial importance when evaluating the actual costs for the supply of models. - Domain-dependent tokenizer inefficiency:
When selecting between Openai and Anthropic models, Rate the form of input text. In the case of tasks of the natural language, the associated fee difference could be minimal, but technical or structured domains can result in significantly higher costs with anthropic models. - Effective context window:
Due to the detail of anthropics tokenizer, its larger 200 -K context window can offer less effectively usable space than Openai's 128k, which results in one potential Clepch between advertised and actual context window.
Anthropic didn’t reply to the inquiries from Venturebeat after the press time. We will update the story if you happen to answer.