TechOpenAI's new GPT-4o dazzles with real-time sound and image analysis

OpenAI's new GPT‑4o dazzles with real-time sound and image analysis

OpenAI has unveiled its latest achievement—the GPT-4o model, which can analyze sound, image, and text in real time. Surprisingly, the model demonstrates an extraordinary speed of reaction to received sound signals.

Images source: © Unsplash

Dominik Adamczyk

14 May 2024 08:04

Artificial intelligence enthusiasts eagerly awaited the OpenAI Spring Update - a presentation by the creators of ChatGPT. The anticipation before the event was boosted by loud industry announcements about the potential unveiling of a new AI technology-based internet search engine. However, this time, the spotlight was on the latest model.

GPT-4o operates in real-time

OpenAI introduced the GPT-4o model, facilitating more natural interactions. According to the company's declarations, GPT-4o responds to sound signals in barely 230 milliseconds, which means an average response time of 320 milliseconds for a reply. This speed is comparable to the response time during a conversation with a human. Performance-wise, the model is on par with GPT-4 Turbo for analyzing English text and even outperforming it in other languages.

OpenAI claims that its new GPT-4o model has also significantly improved the interpretation of images and sounds compared to existing models. So, what can this new tool do? One of my highlights was a demonstration in which GPT-4o was asked to start counting from one to ten.

GPT-4o's response to commands to change the pace was instantaneous, happening in real time. Another fascinating instance was when GPT-4o acted as a Spanish language teacher, analyzing objects it saw through the camera.

When can we expect access to GPT-4o? OpenAI has announced that text and graphic functions of the GPT-4o model are already available today in ChatGPT. The new model can be accessed in the free version, and subscription users can enjoy up to five times the message limits. OpenAI also plans to release a new version of the GPT-4o voice mode in an alpha version for ChatGPT Plus users in the upcoming weeks.

Remember, OpenAI isn't just about ChatGPT. The forthcoming Sora model will enable users to create videos, an advancement many creators keenly anticipate.