Embeddings - Frequently Asked Questions
FAQ for the new and improved embedding models

How can I tell how many tokens a string will have before I try to embed it?

For V2 embedding models, as of Dec 2022, there is not yet a way to split a string into tokens. The only way to get total token counts is to submit an API request.

  • If the request succeeds, you can extract the number of tokens from the response: `response[“usage”][“total_tokens”]`

  • If the request fails for having too many tokens, you can extract the number of tokens from the error message: `This model's maximum context length is 8191 tokens, however you requested 10000 tokens (10000 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.`

For V1 embedding models, which are based on GPT-2/GPT-3 tokenization, you can count tokens in a few ways:

How can I retrieve K nearest embedding vectors quickly?

For searching over many vectors quickly, we recommend using a vector database.

Vector database options include:

  • Pinecone, a fully managed vector database

  • Weaviate, an open-source vector search engine

  • Faiss, a vector search algorithm by Facebook

Which distance function should I use?

We recommend cosine similarity. The choice of distance function typically doesn’t matter much.

OpenAI embeddings are normalized to length 1, which means that:

  • Cosine similarity can be computed slightly faster using just a dot product

  • Cosine similarity and Euclidean distance will result in the identical rankings

