Search requests are billed based on the total number of tokens in the documents you provide, plus the tokens in the query and the tokens needed to instruct the model on how to perform the operation. The API also uses a reference document to generate a response, adding 1 to the total document count. These tokens are billed at the per-engine rates outlined at the top of the Pricing page.
You may provide a
file containing the documents to search over, or you can explicitly specify
documents in your request. Providing a file makes search faster and more cost effective when the number of documents you'd like to search over is greater than
max_rerank. In this scenario, costs are largely based on the number of documents reranked (controlled by
max_rerank) and the total length of those documents. If you pass
documents in your request instead, costs are based on the total length of all those documents.
Below you'll find the formula for calculating overall token consumption. The 14 represents the additional tokens the API uses per document to accomplish the Semantic Search task, and the added 1 is a reference document:
Number of tokens in all of your documents
+ (Number of documents + 1) * 14
+ (Number of documents + 1) * Number of tokens in your query
= Total tokens
As an example, if you had 5 documents (plus one added by the API) with token lengths of
12, 34, 22, 33, 78 (179 total) and your query was 8 tokens, the total tokens consumed would be:
179 + (6 * 14) + (6 * 8) = 311