Greg Brockman, President of OpenAI posted from his X account This appears to be the primary public image created using the corporate's brand recent GPT-4o model.
As you possibly can see within the image below, it is kind of convincingly photorealistic and shows an individual wearing a black t-shirt with an OpenAI logo and writing chalk text that claims “Transfer between modalities” on a blackboard. Suppose we model P (text, pixels, sound) directly with a big autoregressive transformer. What are the benefits and drawbacks?”
The recent GPT-4o model, introduced on Monday, improves on the previous GPT-4 model family (GPT-4, GPT-4 Vision and GPT-4 Turbo) by being faster, inexpensive and retaining more information from inputs like audio and image.
This is feasible because OpenAI takes a unique approach than its previous GPT-4 class LLMs. While these chained together several different models and converted other media similar to audio and image to text and back, the brand new GPT-4o was trained on multimedia tokens from the beginning, allowing it to directly analyze and interpret image and audio without first converting it into Convert text.
Based on the image above, the brand new approach represents a noticeable improvement over OpenAI's last image generation model DALL-E 3, which was released in September 2023. I ran the same prompt via DALL-E 3 in ChatGPT and here is the result.
As you possibly can see, the image created by Brockman with GPT-4o significantly improves in quality, photorealism and text generation accuracy.
However, GPT-4o's native image generation capabilities will not be yet publicly available. As Brockman alluded to in his X post by saying, “The team is working hard to bring this to the world.”