The Rise of AI Agents: Interview with Google CEO Sundar Pichai

AI Summary

Google and OpenAI unveiled their latest AI assistants, Project Astra and GPT-4o, which can emoting, reasoning, making jokes, and remembering where users left items like glasses. These agents boast real-time responsiveness and emotional detection, with capabilities similar to human response time. Despite limitations in the demonstrations, both companies expect a wide rollout of their projects within the next year, sparking concerns about ethics and manipulation as AI agents become more integrated into daily life.

May 18 2024 02:38
The world of AI is undergoing a significant transformation as we move from the era of chatbots to a new generation of AI agents. This week, both Google and OpenAI unveiled their latest AI assistants, which are capable of emoting, reasoning, making jokes, and even remembering where you left your glasses. These advancements mark a huge leap forward from the AI we've seen over the last 18 months and an even bigger leap from the Alexa and Google Assistant we're used to, which are slow and unbearable to actually talk to.

OpenAI's GPT-4o AI Assistant

OpenAI kicked off this new race with a demo of its GPT-4o AI assistant, showcasing its ability to help with math problems, coding, and storytelling.

After the event, OpenAI CEO Sam Altman posted just one word on X/Twitter: “her.” He previously said the 2013 sci-fi movie "Her" is his favorite movie. The assistant's voice response from the demo was a fascinating echo of the character brought to life by Scarlett Johansson in the 2013 movie 'Her'. This film spins a tale of a man who finds companionship in an advanced AI assistant, a narrative that our current interaction intriguingly mirrors. It's as if we've stepped into a scene from the movie itself!

her
— Sam Altman (@sama) May 13, 2024

Google's Response: Project Astra

The next day, Google answered back with its own demonstration of Project Astra, which boasts similar capabilities. Google CEO Sundar Pichai described Astra as having "agentic capabilities" - the ability to process the real world in front of you, constantly process it, and answer intelligently. This is a major departure from chatbots designed for simple interactions. The following Project Astra demo shows two continuous takes: one with the prototype running on a Google Pixel phone and another on a prototype glasses device:

In the CNBC interview, Google CEO Sundar Pichai is calling this the most exciting he has seen in his life:

In search, we announce multi-step reasoning you can write a very very complex queries. Behind the scenes we are breaking it into multiple parts and composing that answer for you, so these are all agentic directions. Very early days, we're going to be able to do a lot more I think that's what makes this moment one of the most exciting I've seen in my life

In the interview, Pichai also addressed the common question about the glasses used in the demo, here's what he said:

We build Gemini to be multimodal because we see use cases like that project Astra shines when you have a form factor like glasses, so we working on prototypes but through Android you know we've always had plans to work on AR with multiple partners and so over time they'll bring products based on it as well

Key Advancements: Speed, Emotion Detection, and Interactivity

One of the key advancements in these new AI agents is their real-time responsiveness. Unlike chatbots, which have a 2-3 second lag before responding, these new models can respond in an average of 320 milliseconds - similar to human response time. Users can also interrupt the model as it's speaking, mimicking real-life conversations.

Another significant development is the ability of these AI agents to detect emotion and even be as emotional as the user asks them to be. OpenAI's model can amplify drama in storytelling and provide feedback on breathing exercises.

While these advancements are impressive, it's important to note that the demonstrations by both Google and OpenAI had their limitations. Google's showcase of Project Astra was pre-recorded and only 2 minutes long, while OpenAI's live demo had some glitches. However, these are still early days, and both companies are working to improve and scale up their AI agents.

Sundar Pichai expects a wide rollout of Project Astra sometime in the next year, driven by quality just like Google Lens. Meanwhile, OpenAI's GPT-4o is already available to many paying subscribers and is set to slowly roll out for free in the coming weeks, with the voice feature becoming available for free later this summer.

Ethical Concerns and Privacy Risks

As we move towards more advanced AI agents, there are ethical concerns to consider. The movie "Her" doesn't take into account how the story ends - with the AI leaving and the protagonist confronting his own messy human relationships. As users interact with AI agents in a more vulnerable way, there are risks of manipulation and weaponization by AI.

Privacy is another major concern. With AI agents like Astra recording everything around you, even remembering where you left your glasses, the potential for hackers to exploit this data is significant, especially in corporate office settings. Project Astra's ability to identify objects like glasses, apples, and even car license plates raises questions about how this data could be used. Could it be subpoenaed in a criminal investigation?

By cross-referencing location data with object recognition, Google could potentially track your movements and activities in concerning ways. While doing AI processing locally on devices is preferable for privacy, many of the newly announced Google's AI features rely on cloud processing, meaning your data is being sent to Google's servers.

For Google's AI push to succeed, it will need to prioritize privacy as much as innovation. Some key steps Google should take:

Clear disclosure - Provide plain language explanations of what data is collected, how it's used, and how it's protected. Don't hide behind legalese.
Opt-in consent - Make new AI features strictly opt-in. No automatic enrollment in data collection.
Decoupling - Allow users to enable/disable specific AI features without losing access to core product functionality.
Local processing - Whenever possible, do AI processing on-device to limit data sharing.
Quick fixes - If privacy issues are uncovered, pause the affected features, fix the issues promptly, and push updates to all users before re-enabling.

The Embrace of "Move Fast and Break Things" in AI Development

The development of these AI agents also marks a shift in the approach to AI development. Previously, generative AI was thought to be too risky and consequential to deploy quickly. However, there seems to be a new embrace of the "move fast and break things" mentality.

This shift is evident in the departure of Ilya Sutskever from OpenAI, who was known for sounding the alarm on AI safety and pushing back against the drive to develop AI quickly. Even Google, which has promised to balance boldness and responsibility, is moving faster in the development of its AI agents.

After almost a decade, I have made the decision to leave OpenAI. The company’s trajectory has been nothing short of miraculous, and I’m confident that OpenAI will build AGI that is both safe and beneficial under the leadership of @sama, @gdb, @miramurati and now, under the…
— Ilya Sutskever (@ilyasut) May 14, 2024

As the race to develop more advanced AI agents continues, it's clear that we are just scratching the surface of what's possible. The demonstrations by Google and OpenAI this week provide a glimpse into a future where AI agents are an integral part of our daily lives, assisting us with tasks, providing emotional support, and even remembering where we left our glasses.

The rise of AI agents marks a new era in AI, and it will be fascinating to see how this technology evolves in the coming years. As Google CEO Sundar Pichai said, "We are working at the cutting edge of technology and bringing it as fast to our products as possible." The future is here, and it's more intelligent than ever before.