# Understanding Tokens

### What Are Tokens?

Tokens are the basic units that AI models use to process text. When you send a message to an AI model, your text is broken down into tokens before the model can read and respond to it. A token can be as short as a single character or as long as a full word, depending on the language and context. As a general rule of thumb, one token is roughly equivalent to about four characters of English text, or approximately three-quarters of a word. For example, the sentence "Hello, how are you today?" would be broken down into approximately 7 tokens.

***

Every AI model has a **context window** — the maximum number of tokens it can process in a single interaction. This context window includes both the input (your message, any system instructions, and conversation history) and the output (the model's response). Think of it like a whiteboard with a fixed amount of space: everything the model reads and writes has to fit on that whiteboard.

**Token usage matters because AI providers charge based on the number of tokens consumed**, and each model has hard limits on how many tokens it can handle at once.

***

### Token-Related Errors

Because Verapath is platform-agnostic and supports multiple AI providers, the error messages you see related to tokens will come directly from the underlying provider (OpenAI, Anthropic, Google, etc.) rather than from Verapath itself. Below are the most common token-related errors you may encounter.

#### 1. Context Length Exceeded

This is the most common token error. It occurs when the total size of your input (your message plus any conversation history, attached documents, or system instructions) exceeds the model's maximum context window.

**What it looks like:**

* **OpenAI:** `"This model's maximum context length is 128000 tokens. However, your messages resulted in 135000 tokens. Please reduce the length of the messages."`
* **Anthropic (Claude):** Returns an HTTP 400 error indicating the request exceeds the model's context limit.

**What to do:** Reduce the size of your input. This typically means shortening your message, starting a new conversation to clear the history, or removing attached documents. If you are working with large documents, consider breaking them into smaller sections.

#### 2. Rate Limit Exceeded

This error occurs when too many requests or too many tokens are sent to the AI provider within a short period of time. Providers enforce rate limits to ensure fair usage across all their customers.

**What it looks like:**

* **OpenAI:** HTTP 429 error — `"Rate limit reached for [model] on tokens per min (TPM). Limit: [X], Requested: [Y]. Please try again in [Z]s."`
* **Anthropic (Claude):** HTTP 429 error — `"This request would exceed your organization's rate limit of [X] input tokens per minute."`

**What to do:** Wait a moment and try again. If you are processing large volumes of data, consider spacing out your requests. This is not an error with your content — it simply means the provider needs a brief pause before handling more requests.

#### 3. Output Truncated (Max Tokens)

This occurs when the model's response is cut off because it hit the maximum output token limit before it could finish. The response you receive will be incomplete.

**What it looks like:**

* **OpenAI:** The response will have a `finish_reason` of `"length"` instead of `"stop"`, indicating the output was cut short.
* **Anthropic (Claude):** The response will have a `stop_reason` of `"max_tokens"`, indicating the model reached its output limit before completing its answer.

**What to do:** If your response appears to be cut off mid-sentence or is clearly incomplete, try asking the model to continue its response, or ask a more specific question that requires a shorter answer. You can also request that the model summarize or condense its response.

***

### Tips for Managing Token Usage

* **Start fresh conversations** when switching topics. Long conversation histories accumulate tokens and can push you toward context limits.
* **Be specific in your questions.** Concise, targeted prompts use fewer tokens and tend to produce more focused responses.
* **Break large documents into sections** rather than submitting them all at once.
* **If a response seems incomplete**, ask the model to continue or rephrase your question to get a more concise answer.
