Search requests are billed based on the total number of tokens in the documents you provide, plus the tokens in the query and the tokens needed to instruct the model on how to perform the operation. The API also uses a reference document to generate a response, adding 1 to the total document count. These tokens are billed at the per-engine rates outlined at the top of the Pricing page.

You may provide a file containing the documents to search over, or you can explicitly specify documents in your request. Providing a file makes search faster and more cost effective when the number of documents you'd like to search over is greater than max_rerank. In this scenario, costs are largely based on the number of documents reranked (controlled by max_rerank) and the total length of those documents. If you pass documents in your request instead, costs are based on the total length of all those documents.

Below you'll find the formula for calculating overall token consumption. The 14 represents the additional tokens the API uses per document to accomplish the Semantic Search task, and the added 1 is a reference document:

Number of tokens in all of your documents
+ (Number of documents + 1) * 14
+ (Number of documents + 1) * Number of tokens in your query
----------------------
= Total tokens

As an example, if you had 5 documents (plus one added by the API) with token lengths of 12, 34, 22, 33, 78 (179 total) and your query was 8 tokens, the total tokens consumed would be: 179 + (6 * 14) + (6 * 8) = 311

You may use the Search token estimator or see the code from the Python estimator to further understand search token usage.

Did this answer your question?