What is the C2PA standard and what does it enable?
C2PA is an open technical standard that allows publishers, companies, and others to embed metadata in media for verifying its origin and related information. C2PA isn’t just for AI generated images - the same standard is also being adopted by camera manufacturers, news organizations, and others to certify the source and history (or provenance) of media content.
What is OpenAI's implementation of C2PA?
Images generated with ChatGPT on the web and our API serving the DALL·E 3 model, will now include C2PA metadata. This change will also roll out to all mobile users by February 12th. People can use sites like Content Credentials Verify to check if an image was generated by the underlying DALL·E 3 model through OpenAI’s tools. This should indicate the image was generated through our API or ChatGPT unless the metadata has been removed.
Metadata like C2PA is not a silver bullet to address issues of provenance. It can easily be removed either accidentally or intentionally. For example, most social media platforms today remove metadata from uploaded images, and actions like taking a screenshot can also remove it. Therefore, an image lacking this metadata may or may not have been generated with ChatGPT or our API.
We believe that adopting these methods for establishing provenance and encouraging users to recognize these signals are key to increasing the trustworthiness of digital information.
Does text or voice generated by ChatGPT or OpenAI's API contain C2PA metadata?
Currently, only images generated with ChatGPT or our API serving the DALL·E 3 model will contain the C2PA metadata.
How does C2PA impact the file size?
Below are some illustrative examples of how the image sizes might change with the addition of this data:
3.1MB → 3.2MB for PNG through API (3% increase)
287k → 302k for WebP through API (5% increase)
287k → 381k for WebP through ChatGPT (32% increase)
This should have a negligible effect on latency and will not affect the quality of the image generation.
What metadata will be embedded in generated images?
You can see examples below:
Images generated via the API
Images produced through the API will contain a signature indicating they were generated by the underlying DALL·E 3 model.
Images generated via ChatGPT
Images produced within ChatGPT will contain an additional manifest, indicating the content was created using ChatGPT. This creates a dual-provenance lineage, as shown below.
Part 1 (Second metadata manifest signaling the image was surfaced in ChatGPT)
Part 2 (Initial metadata manifest signaling the image was created using DALL-E 3)