All Collections
Embeddings - Frequently Asked Questions
Embeddings - Frequently Asked Questions

FAQ for the new and improved embedding models

Updated over a week ago

On January 25, 2024 we released two new embeddings models: text-embedding-3-small and text-embedding-3-large. These are our newest and most performant embedding models with lower costs, higher multilingual performance, and a new parameter for shortening embeddings. Read more.

What's different about the latest embeddings models?

Our latest v3 models provide stronger performance on common benchmarks at a reduced price. You can read more about the performance improvements in the announcement blog post and developer documentation.

How can I tell how many tokens a string will have before I try to embed it?

You can use OpenAI's Tiktoken package to check how many tokens a string will have. Learn more in our embeddings developer guide.

How can I retrieve K nearest embedding vectors quickly?

For searching over many vectors quickly, we recommend using a vector database.

Which distance function should I use?

We recommend cosine similarity. The choice of distance function typically doesn’t matter much.

OpenAI embeddings are normalized to length 1, which means that:

  • Cosine similarity can be computed slightly faster using just a dot product

  • Cosine similarity and Euclidean distance will result in the identical rankings

Did this answer your question?