Google’s Gemini Hints at More Robust Use of Generative AI

Bridge Construction

Google’s Gemini, the new AI product launched last week, is positioned as a leap into AI’s next generation, able to work with a variety of media types across a variety of products. Its capabilities could power solutions to tackle issues that go beyond content generation, debugging and summarizing. Some these potential applications would directly impact HR’s work.

Not everyone believes Gemini will rock the world, however. Some critics noted that most of the demos presented during the rollout were pre-recorded. Others said Gemini answers simple questions with the wrong information. And many noted that the most impressive demos of the product were staged.

Still, Gemini suggests many of generative AI’s promised capabilities are nearing reality.

Because it’s natively multimodal – meaning pre-trained on different modalities – Gemini can generalize and understand, and also operate across different types of information including text, code, audio, images and video, Google said. That differs from OpenAI’s approach, which involves separate products for images (DALL-E) and voice (Whisper).

Gemini AI at Work

To get a sense of how technologies like Gemini can be applied, keep an eye on the web’s various photo libraries. These products, like Google Photos and Apple Photos, are constantly looking for new ways to organize, simplify and customize photo libraries with tools that can identify individuals or compile scrapbooks.

According to CNBC, Google is discussing a product that uses Gemini to examine information from biographies and user photos to provide descriptions of images that go beyond date and format. The product would be able to organize the information into time periods, such as college years, time living in a particular city or time as a parent.

“We trawl through your photos, looking at their tags and locations to identify a meaningful moment,” said an internal presentation seen by the network. “When we step back and understand your life in its entirety, your overarching story becomes clear.” The model can even “infer” events such as births and identify the child’s parents.

Google told CNBC the presentation was only “an early internal exploration” and it would “take the time needed to ensure they were helpful to people, and designed to protect users’ privacy and safety as our top priority.”

Other applications might make digital assistants put information in context, support decision-making across information channels, autogenerate data visualizations, customize tutoring or coaching applications to individual users, and develop learning tools that incorporate text, images, speech and even touch.

Early Days

Gemini 1.0 is offered in three solutions: Gemini Ultra, coming soon for highly complex tasks; Gemini Pro for scaling across a wide range of tasks; and Gemini Nano, on-device tasks. Its approach, Google said, allows Gemini to “think more carefully before answering difficult questions, leading to significant improvements over just using its first impression.”

Initially, Gemini has been incorporated into Google products such as Bard and Pixel 8 Pro. On December 13, developers and enterprises will be able to access Gemini Pro via the Gemini API in Google AI Studio or Google Cloud Vertex AI. Gemini will be integrated into Google’s search engine at some point in the future, along with its advertising services and Chrome.

Image: Wikimedia Commons

Previous articleBersin Builds Content Initiative Around the Notion of ‘Systemic HR’
Next articleStrategies for Building a Diverse and Agile Workforce