AI Art

GPT 4o image generation is finally here


OpenAI recently released an interesting new feature. The tech giant has developed a revolutionary image generation tool that not only creates lifelike images, but also uses a model that interprets the prompt very accurately.

So from now on, image generation is not done by Dall-E by default in ChatGPT, but by the GPT-4o model. This development has also been integrated into the Sora video generator. The descriptions and demonstration images are quite promising, so I thought I would try what this tool can do. Update: it is no longer available in the free version, so I have not tested the service yet. I will only highlight a few official images.

According to OpenAI, a good image generator should not only be able to create spectacular photorealistic, fantasy, and surreal images, but also be useful and functional.

When creating the new image generation model, attention was paid to ensuring that the texts appearing in the generated images were accurate. The model not only monitors the correctness of the text, but we can also specify the exact arrangement of the text, so we can write poems or create beautifully illustrated recipes or infographics.

The model also supports differentiated imaging, so after image generation, we can easily modify the existing image by entering a short prompt.

Furthermore, this model can easily create comics and even user interfaces (UI) for games with full visuals and consistent icons.

This latter feature could be a game changer, as it will now be possible to create UIs and infographics with precise text much faster and more efficiently.

Sample images

Unfortunately, access to the new image generator has now been disabled in the Free version, so I’ll only show some of the official images:

Example #1 – Infographic

Image credit: OpenAI

Correct captions and convincing depiction.

Example #2 – Reflections and text

Image credit: OpenAI

The reflections look quite accurate. Many AI image generators have trouble representing reflections properly.

Example #3 – UI design

Image credit: OpenAI

We can create computer game visual designs even with consistent icons.

Example #4 – Transparent background

Credit: OpenAI

Example #5 – Illustration for a science experiment

Image credit: OpenAI

Availability

The gpt-4o image generator is now available on Plus, Pro, and Team subscriptions. Due to overwhelming demand, the ability to generate images with the new model was disabled in the free version to prevent server overload.

The service will soon be available in Enterprise and Edu versions.

Performance

This model is a bit slower than Dall-E, as it interprets and processes our prompt more precisely. It can take more than 1 minute to render an image.

Advanced prompting

Since there is no UI element to set parameters (such as aspect ratio, colors used, image style) during image generation, all of this can be specified simply in text form. You can even specify the hexadecimal code of specific colors in the prompt. If you want to create a photorealistic image, you can specify the type of camera or lens or even the style of the photo (example: polaroid photo).

Limitations

The model does not always work flawlessly. There may be problems with rendering multilingual text, hallucinations may occur. Longer, small texts may be incorrectly rendered. There may be problems with content authenticity if we want to display more than 10-20 small icons in an image. We can see examples of this in the official article.

Summary

Based on the sample images on the official website, I can say that the gpt-4o image generator can be a very useful tool for creating illustrations, infographics, UI elements and icons.

Of course, we have to be careful if we want to use the images for commercial purposes. If we have the image made in the style of a specific artist, it can lead to legal problems. This applies to any AI image generator.



Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button