Tokenomics 101: Navigating the Nuances of LLM Product Pricing
A Product Manager's Guide to Mastering the Art of Cost Estimation in the Age of Generative AI
Hi everyone, we are back with another quick-bites of ScaleDown.
Are you someone who has your sleeves rolled up to put Generative AI over your next product? Keep reading to know if it's worth the money and how much of it.
This month, we want to discuss money, specifically how much your LLM products might cost. What are the things you need to consider, and what might you be overlooking?
Navigating the Nuances of LLM Product Pricing
When it comes to integrating Large Language Models (LLMs) like GPT-4 into production applications, understanding the pricing model is crucial for engineers and product managers. It's not just about the cost per token; many factors can influence the overall price of implementing GenAI technology. Let’s dive in and clear up some common misconceptions.
What Contributes to the Cost of LLM Applications?
Input and Output Tokens
Tokens represent pieces of text, which can be words or parts of words, that LLMs use to process and generate responses. Both the input (the text you send to the model) and the output (the text the model generates) count towards the total number of tokens used.
The Problems with Token-Based Pricing
Token-based pricing seems straightforward, but it can be deceptively complex. You might fall into the trap of underestimating the number of tokens required for a task, only considering the primary input and output text.
In reality though, to take a GenAI app to production, you also need to consider the tokens required for other parts of the system that may not be immediately evident. These include system prompt, context, evaluation, guardrails and regeneration, among others. As we have shown previously, these overhead tokens can ramp up quickly and, in some cases, can be more than 9x the amount of tokens required to perform the original task.
So what are some of these overhead tokens that you should take into account and how much should you budget for them?
What People Forget to Include in Token-Based Pricing
Context Tokens
A common oversight is failing to account for tokens needed for context, such as previous conversation history or additional data that helps the model understand the prompt better. These context tokens ensure more accurate outputs and increase the token count. In the case of a chatbot, you should consider the typical chat lengths and how long each query and response is in the chat. This will give you a good starting point to estimate the number of context tokens.
Metadata and System Prompts
User metadata and system prompts, including instructions or examples given to the LLM for few-shot learning, add to the token count. These are essential for the model to perform tasks accurately and should not be overlooked.
System prompts can be extensive, especially when setting up complex tasks. System prompts also include multiple sample input-output pairs when doing few-shot prompting. These are often overlooked but can quickly add up, more than tripling the initial token estimate.
Metadata includes any additional context regarding the query. For instance, the current date and time, the name of the user and their details and so on.
Evaluation Tokens
Ensuring the LLM's outputs meet quality standards is non-negotiable. However, we rarely see proper guardrails and output evaluations in production GenAI applications. Implementing evaluation metrics and guardrails to prevent off-topic or harmful content adds layers of token usage that must be accounted for in the initial cost analysis.
Unfortunately, they consume a significant number of tokens. In our in-depth look at RAGAs, a popular library for evaluating RAG applications, we found there to be a token overhead of about 9x compared to the actual RAG query and answer pair. Read more about our findings below:
Death by RAG Evals
Real-world Applications
Deploying LLM apps to production is messy. You have no idea how your users will interact with your app. You should account for regeneration, error handling, prompt attacks and people using the app maliciously. These further increase the token estimates.
Estimating the Price of a GenAI Chatbot
Consider a hypothetical customer service chatbot that analyzes and responds to customer queries. For simplicity, we’ll assume each interaction is a dialogue comprising an initial customer query and a response from the AI.
Estimating Tokens for Context
Firstly, we must account for the context. If the AI needs to review previous interactions to understand the current query better, the token count will increase. A typical support chat thread will contain 200-250 words or about 500 tokens.
Estimating Tokens for System Prompts
Next, we need to estimate the tokens for the system prompt and few-shot examples. These will consume at least 1000 tokens. However, a good rule of thumb is to allocate at least 4x the number of your input and output tokens (considering at least 3 few-shot examples).
Estimating Tokens for Input and Output
For the input (customer's query) and the bot’s response, let's allocate 100 and 200 tokens respectively.
Total Tokens per Interaction
Adding up, we have a total of 1,800 tokens per interaction.
For a business with 10 million interactions yearly, this would sum up to 18 billion tokens.
Cost Calculation
Currently the price for GPT-3.5 is $0.0005 / 1K input tokens and $0.0015 / 1K output tokens. GPT-4 Turbo is $0.01 / 1K tokens and $0.03 / 1K for input and output tokens, respectively.
Considering our application is more input token-heavy, we can consider the costs only for input tokens.
The cost per 1 million tokens is $0.5 for GPT-3.5. This would mean our annual cost would be 18,000 x $0.5 = $9,000.
However, for GPT-4, the cost per 1 million tokens is $10. The annual cost becomes 18,000 x $10 = $180,000.
At the GPT-4 level of costs, you can hire 2 or more people to analyse the chatbot outputs manually! Or even build your own custom model to do the analysis!
Don’t forget the overheads!
These calculations do not yet consider the evaluation of outputs, guardrails, or any sophisticated analysis that may require even more tokens.
Accounting for these, you can expect to pay at least another 9x the value we calculated in the last section. This equates to $81,000 for GPT-3.5 and $1.62M for GPT-4!
Unfortunately for GenAI apps, the devil is in the details—or in this case, the tokens. Token costs can be like icebergs—what you see is just the tip, and there’s a whole mountain of expenses lurking beneath the surface.