GenAI-Powered Agents Bring Promise Of Digital Assistants Back To Life

Summary:

OpenAI took the wraps off its new GPT-4o model and the accompanying update to ChatGPT that makes it possible to speak with ChatGPT in more realistic ways.
At its I/O developer event, Google unveiled a huge range of updates to its Gemini model and showed an impressive set of implementations of it.
Demos from both companies leveraged similar technologies that they and many other companies are clearly developing in parallel.
The timing and conceptual similarity of what OpenAI and Google demonstrated makes it clear that we’re a lot closer to having functional digital assistants than most people realize.

Business man using digital chatbot application Artificial Intelligence at home for online customer service. AI Chat Technology for Seamless Communication and Customer Engagement. — napong rattanaraktiya/iStock via Getty Images

Remember when we thought Siri, Alexa, and the Google Assistant were going to be really helpful? Yeah, me too…

Fast forward about ten years to today, however, and we’re starting to see some incredibly impressive demos of just how far personal digital assistants have progressed. The possibilities look both very compelling and very intriguing.

On Monday, OpenAI took the wraps off its new GPT-4o model and the accompanying update to ChatGPT that makes it possible to not only speak with ChatGPT but do so in some eerily realistic ways. The new model lets you interrupt it for a somewhat more natural conversation flow and responds with more personality and emotion than we’ve heard from other digital assistants. Using the new updated ChatGPT apps for iOS and Android, it can also see and understand more things via a smartphone camera. For example, OpenAI showed a homework helper-type app that could walk students through doing simple math problems when viewed through the camera.

Then, on Tuesday at its I/O developer event, Google (NASDAQ:GOOG, NASDAQ:GOOGL) unveiled a huge range of updates to its Gemini model and showed an impressive set of implementations of it, including, ironically, a conceptually similar homework helper function that could be called up from within Android itself. Google also demonstrated Gemini-powered AI summaries for Search, more sophisticated applications of Gemini in Google Workspace, and a new text-to-video algorithm called Veo that’s akin to OpenAI’s recently introduced Sora model.

Demos from both companies leveraged similar technologies that they and many other companies are clearly developing in parallel. More importantly, they highlighted that some of the core capabilities that have been needed to create the kind of intelligent digital personal assistants that we all hoped Siri, Alexa, and Google Assistant would be are nearly within reach.

First is the increasingly wide support for multi-modal models that are capable of taking in audio, video, image, and more sophisticated types of text input and then drawing connections between them. It’s these connections, in particular, that made the demos seem fairly magical, because they were some of the best examples of devices imitating how we as human beings perceive the world around us. To put it simply, they finally demonstrated how our smart devices could actually be “smart.”

The other thing that started to become apparent is the growing sophistication of agents that can understand the context and the environment in which they’re located, but then also seemingly reason through actions to take on our behalf. Google’s Project Astra demonstration, in particular, offered a very impressive example of how contextual intelligence combined with reasoning, personal/local knowledge, and memory could create an interaction that made the AI assistant feel “real.”

Right now, the definitions of what an AI-powered agent is and what it can do aren’t very consistent across the industry, so it’s tough to make generalized comments about their advancements. Nevertheless, the timing and conceptual similarity of what OpenAI and Google demonstrated makes it clear that we’re a lot closer to having functional digital assistants than I believe most people realize. Even from the demos, it’s easy to tell that they’re far from perfect and there’s still plenty of work to be done. However, the capabilities they did show and the possibilities they implied strongly suggest we are getting tantalizingly close to having capabilities in our devices that were in the realm of science fiction only a few years back.

As great as the potential applications may be, however, there remains the problem of convincing people that these kinds of GenAI-powered capabilities are worth using on a regular basis. After the initial hype over ChatGPT began to slow towards the end of last year, there’s been more modest adoption of the technology than some people anticipated. To be clear, many consumers and large numbers of organizations are in the midst of determining how to get GenAI to change their everyday lives, both personally and professionally, but the transition has clearly slowed. What remains to be seen is whether or not these kinds of digital assistant applications can become the trigger that makes large numbers of people willing to start using GenAI-powered features. Equally important is whether or not they can start changing people’s lives in the ways that some have predicted Generative AI could.

Of course, part of the problem is that – as with any other technology that’s designed to customize experiences and information for everyone in their own unique way – people have to be willing to let these products and these companies have deeper access into their lives than they ever have if they want to get the full benefit from them. Like it or not, the only way you can get an effective digital assistant is if it can get unfettered access to your files, communications, work habits, contacts, and much more. In an era when some people are growing more concerned and suspicious about the huge impact that tech companies and tech products already have, that could be a tough sell.

Plus, here in the US, a lot will be impacted by what kind of capabilities both Microsoft (MSFT) and Apple (AAPL) will unveil at their respective developer conferences over the next few weeks. Given the iPhone’s dominant share in the US smartphone market, in particular, the GenAI-powered capabilities that Apple chooses to enable (whether through its own development or licensed via OpenAI or Google, as the company is rumored to be doing) will have a huge impact on what people consider to be acceptable and/or important. Call it Siri’s revenge or whatever you like, but there’s no doubt that any digital assistant and/or agent technologies that Apple announces for the next version of iOS will have an outsized influence on how many people view these technological advancements – particularly in the near term.

Ultimately, the question also boils down to how willing people are to become even more attached to their digital devices and the applications and services they enable. Given the enormous and still growing amount of time we already spend with them, this may be a foregone conclusion, but there is still the question of whether people will perceive some of these digital assistant/digital agent capabilities as going too far. Only one thing is certain, it’s definitely going to be an interesting trend to watch.

Disclaimer: Some of the author’s clients are vendors in the tech industry.

Disclosure: None.

Source: Author

Editor’s Note: The summary bullets for this article were chosen by Seeking Alpha editors.

Leave a Reply Cancel reply