Skip to main content
All CollectionsChatGPT Enterprise
Optimizing File Uploads in ChatGPT Enterprise
Optimizing File Uploads in ChatGPT Enterprise
Updated over 2 months ago

ChatGPT Enterprise allows you to upload files in several ways:

This guide explains how ChatGPT Enterprise handles files based on their type, number, and size, and discusses strategies for improving outputs based on file requirements.

Summary

ChatGPT Enterprise treats different file types very differently: extracting text from text documents like PDFs, Presentations, and Word files, analyzing structured data from spreadsheets using Python code, and describing image files through GPT-Vision. Understanding which file type triggers which workflow is key to getting the expected result.

For text-based documents, ChatGPT Enterprise includes as much relevant text as possible directly alongside the prompt and uses a search system to access additional information. This works well for answering specific questions. However, this approach can struggle with complex tasks like summarizing very large documents or comparing multiple large files. To improve accuracy, consider uploading only the most relevant documents and focus your prompts on single questions, rather than complex prompts (e.g., summarization, file comparison, answering multiple questions in a single prompt).

Handling based on file type

ChatGPT Enterprise processes files in three main ways: text extraction, code analysis, and image interpretation. The file type determines which workflow ChatGPT Enterprise follows.

Text-Based Retrieval

Code Interpreter

Image Processing

File Type Examples

pdf, pptx, docx, txt, md, json, xml

csv, xls, xlsx*

*Note: Code Interpreter can operate on any file type, but ChatGPT Enterprise mostly common defaults to CI for the ones above

jpg, png

Behavior

Extracts the text from the file – some of the text is pasted (“stuffed”) directly into the context window; some text is stored for search

Code Interpreter passes the file to Python for processing

GPT-Vision to describe the image as text (note the following limitations on what this model can “see” vs. not)

For text-only files, image files, or clearly structured data files (e.g., an Excel table of transactions), these divisions represent the best possible behavior.

There are some gray areas that are less obvious, for example:

  • Images within documents are not processed. To include them, upload the images as separate files.

  • ChatGPT Enterprise will always use Code Interpreter to interact with spreadsheets, even if the document contains a large amount of text. For example, if you ask ChatGPT Enterprise to translate a CSV file with 10 rows of text, it will attempt to translate the file using a Python library, which is less accurate than allowing the model to generate a translation directly. To mitigate this, try exporting the spreadsheet to a text-based format (PDF, for example).

  • Similarly, if you upload a structured transactional table described contained in a JSON file, ChatGPT Enterprise will interpret this file as plain text. If you want to analyze the data contained in a JSON file, instruct the model to use Code Interpreter in your prompt.

Handling files based on size

ChatGPT Enterprise uses models with a maximum context window of 128k tokens (roughly 200 pages of text). However, not all tokens are used to incorporate the text from uploaded files. The number of “stuffed” tokens varies by usage type.

ChatGPT Enterprise "stuffs" some amount of text, and the remaining text is sent to a private search index (a "vector store", which is a type of database designed to efficiently store and retrieve large amounts of text). When you ask a question, ChatGPT Enterprise brings in the included text along with relevant chunks retrieved from a private search index.

If you upload a single document, ChatGPT Enterprise includes text starting from the beginning until it reaches its limit. If you upload multiple documents, ChatGPT Enterprise includes some or all of each document. All text from the documents is also sent to a private search index.

Context stuffing logic

This feature is under active development. As such, the following details are subject to change without notice.

ChatGPT Enterprise can process up to 110k tokens from uploaded documents in the context window. If you upload one or more documents with a combined total of less than 110k tokens, the full content will be included.

For a single document exceeding 110k tokens, only the first 110k tokens will be included, starting from the beginning. The remainder will only be sent to the private search index.

If multiple documents are uploaded and their combined total exceeds 110k tokens, ChatGPT Enterprise uses a two-step process to balance document representation:

  1. Extract up to 55k tokens, divided evenly among the uploaded documents.

    • For example, if 10 documents are uploaded, 5.5k tokens are extracted from the beginning of each.

  2. For documents not fully represented in the first step, allocate the remaining 55k tokens proportionally based on the tokens left in each document.

    • For example, if Document A has 10k tokens remaining and Document B has 90k tokens remaining, an additional 5.5k tokens are extracted from Document A ( (10k / 100k) * 55k ), and an additional 49.5k tokens are extracted from Document B ( (90k / 100k) * 55k ).

  3. Any remaining tokens are only sent to the private search index.

You can estimate the number of tokens in a document by copying the document's text into the OpenAI Tokenizer.

Advantages

ChatGPT Enterprise’s file upload strategy is highly effective if you need an answer to a single question buried within many long documents. ChatGPT Enterprise can run a search to find the most relevant chunks of up to 40M tokens across 20 documents to pull back the most relevant answers to your question.

Due to this architecture, performance in answering a single question should not degrade, even if you add more documents or much longer documents.

Examples of questions that ChatGPT Enterprise should handle well:

  • For HR: “What is the HR policy for early retirement?”

  • For coding: “What is the text_xyz function doing?”

Limitations and mitigation strategies

Limitations: ChatGPT Enterprise will struggle to answer more complex questions across documents totaling more than 110k tokens:

  • Multi-part questions: For example, “What are the HR policies for every country?” or “List all of the functions.” The combination of text stuffing and a single RAG search likely won’t be specific enough to get results from every document – and ChatGPT Enterprise may answer the question incorrectly.

  • Summarizing / comparing documents: For questions like: “Summarize all 20 documents” or “Compare these 5 HR docs” or “Summarize the purpose of all these code files,” a single RAG search may not be sufficient to bring in the relevant information to summarize & compare multiple documents.

Strategies to mitigate:

  • Remember that responses may vary depending on the number and size of documents you upload.

  • Generally, loading in fewer, focused documents will lead to higher accuracy.

  • Turn multi-question topics into single questions:

    • If you need to know every state’s HR policies, ask them one-by-one.

    • If you need to summarize many documents, ask for one document at a time. If that document is many hundreds of pages, consider breaking it down into smaller components.

      • You could ask ChatGPT Enterprise to write a “summary of summaries” if you fed it multiple summaries rather than entire documents.

    • If you have a CSV of an RFP (each line is a different question), ask those questions one-by-one instead of just loading the CSV and requesting a single response.

  • Find ways to audit the model’s responses. Example GPT instructions are below:

# Context 

You are an expert at understanding documents. The user is going to attach a document and ask a question. They need to be able to connect your answer back to the exact part of the text where you grabbed your answer from.

# Instructions

1. Answer the user's question based on their attached document using the exact format provided below

# Format

- Question: { repeat user's question }
- Answer: { provide an answer to user's question }
Source:
- - Section Number: { provide section number where you pulled in the answer }
- - Section Title: { provide section title where you pulled in the answer }
- - Exact Text: { provide the exact text where you pulled the answer from }

# Rules

- Give answers that are clear and concise
- Only provide information provided in the document
- If you cannot find the answer in the document, simply reply "No information found."


Did this answer your question?