The addition of voice and image capabilities to ChatGPT is sure to kick off a new round of interest in incorporating conversational user interfaces into HR technology.
Last week, OpenAI began to roll out voice and image-sharing capabilities that allow users to interact with the generative AI in a simpler, more conversational way and share images as part of their queries. Almost immediately, analysts and experts began their comparisons with Apple’s Siri and Amazon’s Alexa, most of them favorable.
“While the system is just reading back a ChatGPT text response, this isn’t the robotic, staid text-to-speech systems we’ve grown up with,” wrote Joanna Stern in The Wall Street Journal.
The move is sure to get the attention of HCM technology vendors, many of whom have been exploring the use of voice interfaces for several years. Oracle, Ceridian, IBM and others have incorporated various levels of spoken-word interaction into their systems, providing the option of asking products to “show me the employee file of Jack Doe in customer service” or helping employees keep abreast of their work schedules by asking “when’s my next shift?”
Opening the UI
Together, ChatGPT’s voice and image capabilities will offer “a more intuitive type of interface,” OpenAI said. That will allow users “to have a voice conversation or show ChatGPT what you’re talking about.” As an example, the company described taking a photo of a math-homework problem, circling the items of interest and sharing the resulting image with ChatGPT. The system can then share hints for solving the problem.
A new text-to-speech model powers the voice capability, which can generate “human-like audio” from text and a few seconds of sample speech. For images, ChatGPT’s mobile apps will offer a drawing tool to focus on specific areas of a picture, using the AI’s language reasoning skills to interpret image types including photographs, screenshots, and documents containing both text and pictures.
Personalized Natural Language
ChatGPT’s new capabilities are several steps removed from the basic voice commands available until now. Already, users have progressed from playing a particular song by Bruce Springsteen to drafting marketing copy or writing code. The technology can help organizations eliminate repetitive work, vendors say, offer personalized responses to employee questions, improve the onboarding experience and provide real-time performance data.
The market potential is big. According to Grand View Research, the global market for voice-driven user interfaces will grow from $24 billion in 2023 to $92 billion in 2030, a CAGR of 21.3%. As AI and natural language processing technologies improve, voice interfaces will become more accurate and more context-aware, the firm predicted. And, voice interfaces are already widely available to consumers, an indicator that developers of business solutions are sure to incorporate similar features into their own products.
The voice and image capabilities will be available to ChatGPT Plus and Enterprise users over the next two weeks, the company said. Voice will be available on iOS and Android devices, while images will be available on all platforms.