ChatGPT gets real-time screen sharing and video analysis, which rivals Gemini 2

Photo of author

By [email protected]


Join our daily and weekly newsletters for the latest updates and exclusive content on our industry-leading AI coverage. He learns more


OpenAI It has finally added the long-awaited video and screen sharing to the advanced voice mode, allowing users to interact with the chatbot in different ways.

Both capabilities are now available on iOS and Android mobile apps for ChatGPT Teams, Plus, and Pro users, and will be rolled out to ChatGPT Enterprise and Edu subscribers in January. However, users in the European Union, Switzerland, Iceland, Norway and Liechtenstein will not be able to access Advanced Voice Mode.

OpenAI first teased the feature in May, when the company unveiled GPT-4o and discussed learning ChatGPT to “watch” a game and explain what’s happening. Advanced audio mode has been introduced For users in September.

Credit: OpenAI

Users can access video via new buttons on the Advanced Voice Mode screen to start video.

OpenAI’s video mode feels like a Facetime-like video call, because ChatGPT responds in real-time to what users view in the video. It can see the user’s surroundings, recognize objects and even remember people who introduce themselves. In an OpenAI demo as part of the company’s “12 Days of Shipmas” event, ChatGPT used the video feature to help prepare coffee. ChatGPT watched the coffee makers, told him when to put the filter in and critiqued the result.

It is also very similar to Google Project Astra was recently announcedwhere users can open a video chat, and Gemini 2.0 will answer questions about what it sees, such as identifying a statue found on a London street. In many ways, these features are more advanced versions of AI devices like the Humane Pin and rabbit r1 It’s marketed to do just that: Ask your AI-powered voice assistant to answer questions about what it sees in a video.

Screen sharing

The new screen sharing feature brings ChatGPT out of the app and into the browser world.

For screen sharing, the three-dot menu allows users to navigate outside of the ChatGPT app. They can open apps on their phone and ask ChatGPT questions about what they see. In the demo, OpenAI researchers turned on screen sharing, then opened the Messages app to ask ChatGPT to help respond to an image sent via text message.

However, the screen sharing feature in advanced voice mode bears similarities to the recently released features of Microsoft And Google.

Last week, Microsoft released Preview version of Copilot Visionwhich allows Pro subscribers to open Copilot chat while browsing a web page. Copilot Vision can look at pictures on the store’s website or even help you play the map guessing game Geoguessr. Google’s Project Astra can also read browsers in the same way.

Both Google and OpenAI have released chat AI features for screen sharing on phones to target a consumer base who may use ChatGPT or Gemini more on the go. But these kinds of features could point to a way for organizations to collaborate more with AI agents, as the agent can see what the person is looking at on the screen. It can be an introduction to models that use computers, e.g Anthropic Using a computerwhere the AI ​​model not only looks at the screen, but actively opens tabs and programs for the user.

Ho ho ho, ask Santa a question

In an effort to ease the pressure, OpenAI has also introduced “Santa Mode” in advanced audio mode. The new preset voice sounds a lot like the jolly old man in a red suit.

Unlike new features that are limited to specific users, “Santa Mode” is now available to users with access to Advanced Voice Mode on the mobile app, the web version of ChatGPT and the Windows and MacOS apps through early January.

However, chats with Santa will not be saved in your chat history and will not affect ChatGPT memory.

Even OpenAI is feeling the Christmas spirit.



https://venturebeat.com/wp-content/uploads/2024/12/SantaVoice_Still.png?w=1024?w=1200&strip=all
Source link

Leave a Comment