Introduction
Since releasing the Answers endpoint in beta last year, we’ve developed new methods that achieve better results for this task. As a result, we’ll be removing the Answers endpoint from our documentation and removing access to this endpoint on December 3, 2022 for all organizations. New accounts created after June 3rd will not have access to this endpoint.
We strongly encourage developers to switch over to newer techniques which produce better results, outlined below.
Current documentation
Options
As a quick review, here are the high level steps of the current Answers endpoint:
All of these options are also outlined here
Option 1: Transition to Embeddings-based search (recommended)
We believe that most use cases will be better served by moving the underlying search system to use a vector-based embedding search. The major reason for this is that our current system used a bigram filter to narrow down the scope of candidates whereas our embeddings system has much more contextual awareness. Also, in general, using embeddings will be considerably lower cost in the long run. If you’re not familiar with this, you can learn more by visiting our guide to embeddings.
If you’re using a small dataset (<10,000 documents), consider using the techniques described in that guide to find the best documents to construct a prompt similar to this. Then, you can just submit that prompt to our Completions endpoint.
If you have a larger dataset, consider using a vector search engine like Pinecone or Weaviate to power that search.
Option 2: Reimplement existing functionality
If you’d like to recreate the functionality of the Answers endpoint, here’s how we did it. There is also a script that replicates most of this functionality.
At a high level, there are two main ways you can use the answers endpoint: you can source the data from an uploaded file or send it in with the request.
If you’re using the document parameter
There’s only one step if you provide the documents in the Answers API call.
Here’s roughly the steps we used:
Construct the prompt with this format.
Gather all of the provided documents. If they fit in the prompt, just use all of them.
Do an OpenAI search (note that this is also being deprecated and has a transition guide) where the documents are the user provided documents and the query is the query from above. Rank the documents by score.
In order of score, attempt to add Elastic search documents until you run out of space in the context.
Request a completion with the provided parameters (logit_bias, n, stop, etc)
Throughout all of this, you’ll need to check that the prompt’s length doesn’t exceed the model's token limit. To assess the number of tokens present in a prompt, we recommend https://huggingface.co/docs/transformers/model_doc/gpt2#transformers.GPT2TokenizerFast.
If you're using the file parameter
Step 1: upload a jsonl file
Behind the scenes, we upload new files meant for answers to an Elastic search cluster. Each line of the jsonl is then submitted as a document.
If you uploaded the file with the purpose “answers,” we additionally split the documents on newlines and upload each of those chunks as separate documents to ensure that we can search across and reference the highest number of relevant text sections in the file.
Each line requires a “text” field and an optional “metadata” field.
These are the Elastic search settings and mappings for our index:
{
"properties": {
"document": {"type": "text", "analyzer": "standard_bigram_analyzer"}, -> the “text” field
"metadata": {"type": "object", "enabled": False}, -> the “metadata” field
}
}
{
"analysis": {
"analyzer": {
"standard_bigram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "english_stop", "shingle"],
}
},
"filter": {"english_stop": {"type": "stop", "stopwords": "_english_"}},
}
}
After that, we performed standard Elastic search search calls and used `max_rerank` to determine the number of documents to return from Elastic search.
Step 2: Search
Here’s roughly the steps we used. Our end goal is to create a Completions request with this format. It will look very similar to Documents
From there, our steps are:
Start with the `experimental_alternative_question` or, if that's not provided, what’s in the `question` field. Call that the query.
Query Elastic search for `max_rerank` documents with query as the search param.
Take those documents and do an OpenAI search on them where the entries from Elastic search are the docs, and the query is the query that you used above. Use the score from the search to rank the documents.
In order of score, attempt to add Elastic search documents until you run out of space in the prompt.
Request an OpenAI completion with the provided parameters (logit_bias, n, stop, etc). Return that answer to the user.
Completion Prompt
===
Context: {{ provided examples_context }}
===
Q: example 1 question
A: example 1 answer
---
Q: example 2 question
A: example 2 answer
(and so on for all examples provided in the request)
===
Context: {{ what we return from Elasticsearch }}
===
Q: {{ user provided question }}
A: