How to estimate the cost of LLM usage

Niha Chogle

Nov 07, 2024

Cost of LLMs usage

DCL stories - Dec 05 - Episode 01

AIs Transformation of BPO and Contact Centers

DCL stories - Dec 05 - Episode 02

AI Assistants with Your Data

DCL stories - Dec 05 - Episode 03

How to estimate the cost of LLM usage

Large Language Models (LLMs) like OpenAI' GPT series, Google's Bard , LLaMA or other powerful AI models can significantly enhance your applications, providing everything from automated content generation to advanced problem-solving capabilities. But, as with any service, there's a cost associated with using these tools. Understanding how to estimate the cost of LLM usage is crucial for businesses, developers, and anyone interested in integrating AI into their projects.

Notable LLMs and their parameter counts

Here’s an overview of some of the most well-known LLMs, their architectures, and their parameter counts and how pricing tends to vary based on model size and architecture.

1. GPT-4 (OpenAI)

Parameters: 1.77 trillion (Mixture of Experts architecture with 8 active experts of 222B parameters each)

Pricing: GPT-4 is a highly sophisticated model, and because of its massive size, operating costs are higher compared to smaller models like GPT-3. While OpenAI has not disclosed exact pricing for GPT-4, we can expect it to be significantly more expensive per 1k tokens than GPT-3, which costs around $0.002 per 1k tokens.
Pricing Insight: Larger models like GPT-4 often have higher per-token costs due to the computational resources needed for inference, including memory and processing power.

2. GPT-3 (OpenAI)

Parameters: 175 billion

Pricing: GPT-3.5-turbo, which powers ChatGPT, is available at $0.002 per 1k tokens on OpenAI’s API platform.
Pricing Insight: While GPT-3 is still quite large and capable, its pricing is relatively lower than GPT-4 due to fewer parameters and less computational power required.

3. PaLM (Google)

Parameters: 540 billion (for PaLM-540B)

Pricing: Exact pricing for PaLM is not publicly available, but we can assume that larger models like PaLM-540B would be priced higher than GPT-3 models, likely around $0.01 per 1k tokens or more, depending on usage.
Pricing Insight: As with GPT-4, large-scale models like PaLM are typically more expensive to run due to the resources required to support models of that size.

4. GPT-Neo (EleutherAI)

Parameters: 1.3B to 2.7B (for GPT-Neo models)

Pricing: Since EleutherAI's models are open-source, they are free to use for anyone with the computing resources to run them. However, for commercial API access, services like Cohere or Hugging Face might offer pricing based on the model's size.
Pricing Insight: Open-source models like GPT-Neo have no direct token cost, but if you're using a commercial cloud service to run the model, you'll still incur infrastructure costs.

5. LaMDA (Google)

Parameters: 137 billion

Pricing: While specific pricing isn’t disclosed, Google’s LaMDA model is likely priced similarly to other models of this scale, in the range of $0.01 to $0.03 per 1k tokens depending on usage.
Pricing Insight: Large conversational models like LaMDA, which specialise in dialogue, will often have higher pricing than general-purpose models, as they require more training data and optimization for realistic conversations.

6. BLOOM (BigScience)

Parameters: 176 billion

Pricing: BLOOM is open-source and available for free, but commercial cloud services might charge infrastructure costs for running it. These costs could range from $0.005 to $0.02 per 1k tokens depending on the service provider.
Pricing Insight: Open-source models like BLOOM are generally free to use, but hosting or running them on cloud platforms comes with added costs for computational resources.

What affects the cost of LLM usage?

To estimate the cost effectively, you need to understand the key factors that contribute to the cost structure of LLM usage:

1. Token Usage

Understanding tokens in simple terms

In the world of Large Language Models (LLMs) like GPT-3 and GPT-4, a token is a basic unit of text that the model understands and processes. Tokens can be as small as a single character or as large as an entire word, depending on the language and how the text is broken down. To put it simply, a token is a piece of text, and the model "reads" these pieces to understand and generate language.

What is a token?

Token = A chunk of text.

A token can be a word, part of a word, or even punctuation. For example, "Hello!" might be counted as two tokens: "Hello" and "!".

LLMs process and generate text by working with tokens. The more tokens a model processes, the more it costs, because costs are based on token usage.

LLMs like GPT-3 and GPT-4 process input and output as "tokens" (chunks of text), where a token can be as short as one character or as long as one word. For example, the word "hello" is one token, and the phrase "I am learning about LLM costs" would likely be several tokens.

Input Tokens: These are the tokens you send to the model (the prompt).

Output Tokens: These are the tokens the model generates in response.

Simple example: Counting tokens

Let's say you're interacting with a language model and you want to understand how tokens are counted. Here’s an example:

Input: "Hello, how are you today?"

Output: "I'm doing well, thank you!"

INPUT:

This sentence has 5 words, but how many tokens does it contain? When broken down into tokens by the model, it might look like this:

"Hello" → 1 token

"," (comma) → 1 token

"how" → 1 token

"are" → 1 token

"you" → 1 token

"today" → 1 token

"?" (question mark) → 1 token

So in total, the sentence has 7 tokens.

OUTPUT:

This sentence has 5 words , but how many tokens does it contain? When broken down into tokens by the model, it might look like this:

here’s how the tokens might be counted:

"I'm" → 1 token

"doing" → 1 token

"well" → 1 token

"," → 1 token

"thank" → 1 token

"you" → 1 token

"!" → 1 token

This makes 7 tokens for the output

Total Token Count:

7 tokens (input) + 7 tokens (output) = 14 tokens.

Why do tokens matter?

Costs: Language models like GPT-3 or GPT-4 charge based on the number of tokens you use. The more tokens you send in a prompt and receive in the response, the higher the cost.

Efficiency: Understanding how tokens work helps you optimize your prompts. Shorter prompts and more focused requests can reduce token usage and save money.

Key points to remember:

1 Token ≠ 1 Word: Sometimes, 1 word can be multiple tokens. For example, "unhappiness" might be broken down into "un", "happiness" — 2 tokens.
Punctuation is a Token: Every punctuation mark (., !, ?, etc.) counts as a token.
Spaces Matter: Spaces between words also count as tokens in the model's understanding.

2. Model type and size

Different models have different pricing. Larger, more powerful models (like GPT-4) typically cost more than smaller models (like GPT-3). The more complex the model, the higher the cost per token.

3. Model usage frequency

How often you interact with the model also impacts cost. Frequent calls (requests) to the API will accumulate more charges, especially if each call requires a large amount of processing.

Key metrics you need to calculate costs

Tokens per request

The first step is to estimate how many tokens your requests will involve. This is usually based on:

Prompt length (number of tokens you send).

Response length (number of tokens the model generates).

Many LLM providers give estimates on how many tokens you can expect per query, but a general rule of thumb is that shorter, simpler prompts will consume fewer tokens.

Example:

1 Request = 100 tokens (input) + 150 tokens (output) = 250 tokens total.

Monthly usage estimate

Consider how many requests you'll be making monthly. Multiply the number of tokens per request by the cost per token, then multiply by the number of requests you'll make each month.

Example calculation:

- Tokens per request: 250 tokens.

- Cost per token (GPT-3): $0.002 per 1,000 tokens.

- Monthly requests: 10,000 requests.

Cost per month = 250 tokens x 10,000 requests x ($0.002 / 1,000)

= $5 per month.

Example of LLM cost estimation

Scenario: A content generation app

Imagine you're building an AI-powered content generation app using GPT-3. Your users submit queries, and the model generates content. Here’s how to estimate the costs:

Average tokens per request: 300 tokens (input + output).

Cost per token: $0.002 per 1,000 tokens (GPT-3).

Requests per day: 1,000 requests.

Working days per month: 30 days.

Monthly tokens usage:

300 tokens x 1,000 requests x 30 days = 9,000,000 tokens per month.

Monthly cost:

9,000,000 tokens x ($0.002 / 1,000) = $18 per month.

Other cost considerations

1. Additional features & services

Some LLM providers offer additional services like fine-tuning the model, real-time collaboration tools, or specific integrations with cloud services. These can increase costs but may also improve the efficiency of the model for your use case.

2. Free Tier

Many LLM providers offer free trials or free-tier usage with limited tokens. If you're just getting started, make sure to check for any free tier offerings that could help you explore the service without incurring costs immediately.

3. Volume Discounts

Large-scale usage often comes with discounts. If you’re planning to use an LLM for high-volume tasks, you might be eligible for a pricing discount based on your usage.

Recommendations to minimise LLM usage costs

While it's impossible to eliminate costs entirely, there are some smart ways to optimize and reduce them:

Optimise Prompts

Reduce the length of your prompts and the model’s responses. Be concise and specific to avoid unnecessary token consumption.

Batch Requests

Instead of making many small requests, try to batch them into fewer, more efficient requests. This will lower the number of tokens used for setup and infrastructure.

Use lower-cost models for basic tasks

For tasks that don’t require cutting-edge performance, use smaller models like GPT-3 instead of the more expensive GPT-4.

Monitor usage regularly

Track your usage closely. Most providers offer dashboards where you can see your token usage and costs in real time. Adjust your strategy accordingly.

Conclusion

Estimating the cost of LLM usage can seem daunting at first, but by breaking it down into manageable steps, you can understand the various factors influencing the price. By considering token usage, the type of model you choose, and how frequently you make requests, you can predict and manage your LLM costs more effectively.Whether you're a developer, a business owner, or just someone experimenting with AI, knowing how to estimate and control these costs will help you get the most value from your investment in large language models.