Thank you for your interest in OpenAI! This article will cover our most frequently asked questions about Codex.
The Codex models are descendants of our GPT-3 models that can understand and generate code. Their training data contains both natural language and billions of lines of public code from GitHub. While they’re most capable in Python and proficient in over a dozen languages including JavaScript, Go, Perl, PHP, Ruby, Swift, TypeScript, SQL, and even Shell.
Getting Started
Once you have access to our Codex models, you can try out the sandbox here, or try out the Codex models via playground or API and there are a number of ways to get started.
Check out the introduction of our documentation to get an overview of how the API works and how you can interface with the text models in different ways. You may also want to take a look at some examples to get a sense of how other developers are using the API.
Licensing
As of May 2022, the Codex models are offered as a free trial. As we learn about use, we'll look to offer pricing to enable a broad set of applications. During this trial, you're welcome to go live with your application as long as it follows our usage policies. We welcome any feedback on these models while in early use and look forward to engaging with the community.
Key Features
We currently offer two Codex models: code-davinci-002 and code-cushman-001.
Code-davinci-002 is our most capable Codex model. It is particularly good at translating natural language to code. In addition to completing code, also supports inserting completions within code. Additionally, code-davinci-002's max request is 4,000 tokens, while code-cushman-001's max request is up to 2,048 tokens.
Unfortunately, we don't offer the ability to fine-tune Codex models at this time
Training Data
OpenAI cares deeply about developers and is committed to respecting their rights. Our hope is that Codex will lower barriers to entry and increase opportunities for beginner programmers, make expert programmers more productive, and create new code-generation tools.
The Codex model was trained on tens of millions of public repositories, which were used as training data for research purposes in the design of Codex. We believe that is an instance of transformative fair use.
The code-davinci-002 model was trained on data up to June 2021 and our code-cushman-001 was trained on data up to January 2021.
For more information, please see our other article here: https://help.openai.com/en/articles/5480054-understanding-codex-training-data-and-outputs