QueryPal

Tokens are the smallest units of meaning in a language and are used by language models to understand the structure and meaning of a text. In Natural Language Processing (NLP), tokens are typically created by segmenting a sentence or a document into individual words or other meaningful units, such as phrases or named entities. This tokenization process allows Large Language Models (LLMs) to process and analyze large volumes of text data, making it possible to perform tasks such as language translation, text classification, sentiment analysis, and more.

Generally, you won’t tokenize on your own, but you should be aware that there are always many ways to tokenize a single sentence! Suppose you want to tokenize “Let’s tokenize! Isn’t this easy?”

What are token limits?

The token limit is the maximum number of tokens that can be used in the prompt and the completion of the model. Most LLMs have token limits, which refer to the maximum number of tokens that the model can process at once. The token limit is determined by the architecture of the model. Have you heard of GPT4 or GPT3.5? Each model has a different token limit, even though they all originate from OpenAI.

You can take a look here for token limits based on each model: https://platform.openai.com/docs/models/

As you can see, GPT4 has a higher token limit than GPT3.

As you can imagine if you have a long document or a lot of text to summarize or to Q/A on top of, this can pose a problem. We have two mitigation strategies to help.

Summarization:

Let’s say you want to summarize all Slack conversations that occurred in the last week. If you have a chatty team, it’s definitely over the token limit. You can overcome this by splitting up, or chunking, the conversation and summarizing each chunk. You can even create a summary on top of them all.

Tips:

Think carefully about how you want to chunk your text. It will vary based on the type of text, but you should keep related content together in the same chunk. It could be time-based, user-based, or just a simple window-based chunking.

Summarization prompts: Consider what is important to you, so the LLM can retain that information in your summaries. Do you want to retain information about the user? Do you want to make sure any links or code-shared are preserved and not just summarized?

This approach is generally simple but may not work for all use cases because it does lead to information loss.

Build a Knowledgebase:

To summarize and ask questions for a large corpus of text, another strategy is creating and then querying a knowledge base built on your data. You can store each data point (this can be a conversation, a paragraph in a book, etc.) as an embedding. In simple terms, an embedding is a representation of text strings, that can help us measure relatedness. You can create these embeddings using openAI APIs.

Learn more about embeddings here: https://platform.openai.com/docs/guides/embeddings/what-are-embeddings

You can store these embeddings in a vector database like Pinecone, ElasticSearch, etc. Now based on a question, you can retrieve the top N similar embeddings based on similarity score like cosine and use that as context to ask your question.

The prompt can be as simple as :

Tips:

Keep the data points not too long, so you don’t go over the context limit.
Most of the vector databases have the query retrieval process implemented, so you don’t have to worry about the ranking.
Along with the embeddings, you can store metadata like timestamps, users involved, etc. so that you can filter the embeddings and then perform a similarity search.

We’ve learned these techniques while building QueryPal, a Slackbot that can help with all your DevOps needs. An ever-present issue in DevOps is ensuring that the on-call engineer has all the necessary context to solve their problem- a case tailor-made for building a knowledge base.

We hope you find these tips helpful. If you’re interested in learning more about how you can build products using LLMs follow us on Twitter or Linkedin, or just reach out to us at llm@querypal.com.

What are tokens?

What are token limits?

Summarization:

Build a Knowledgebase: