On January 25, 2024 we released two new embeddings models: text-embedding-3-small
and text-embedding-3-large
. These are our newest and most performant embedding models with lower costs, higher multilingual performance, and a new parameter for shortening embeddings. Read more.
What's different about the latest embeddings models?
Our latest v3 models provide stronger performance on common benchmarks at a reduced price. You can read more about the performance improvements in the announcement blog post and developer documentation.
How can I tell how many tokens a string will have before I try to embed it?
You can use OpenAI's Tiktoken package to check how many tokens a string will have. Learn more in our embeddings developer guide.
How can I retrieve K nearest embedding vectors quickly?
For searching over many vectors quickly, we recommend using a vector database.
Which distance function should I use?
We recommend cosine similarity. The choice of distance function typically doesn’t matter much.
OpenAI embeddings are normalized to length 1, which means that:
Cosine similarity can be computed slightly faster using just a dot product
Cosine similarity and Euclidean distance will result in the identical rankings