The latency of a completion request is mostly influenced by two factors: the model and the number of tokens generated. Please read our updated documentation for guidance on improving latencies.
Guidance on improving latencies
How can I improve latencies around the text generation models?

Written by Shay Atwood
Updated over a week ago
Updated over a week ago