Google, OpenAI, and Meta Could Revolutionize Smart Glasses

May 18 2024 18:38
Google's recent demo of Project Astra, an early preview of a future AI-powered universal assistant, has reignited interest in the potential of smart glasses. Near the end of the demo, a Google employee seamlessly continues conversing with the assistant while wearing a pair of glasses, hinting at the possibility of a game-changing feature for the next generation of Google Glass.

Sundar Pichai: Most Exciting I've seen in My Life

One of the key focuses of Project Astra is achieving human-like proficiency in processing and understanding multiple modalities of information. Astra is designed to seamlessly interpret and reason across text, images, audio, video, and structured data. By learning from diverse data types and the complex relationships between them, Astra can engage in advanced cross-modal understanding and generation.

For instance, Astra can analyze an image and generate a detailed, coherent description or narrative about its contents. It can then engage in a conversation about the image, answering follow-up questions that require a deep semantic understanding. Astra is also capable of the inverse process - generating realistic images from natural language descriptions by grasping the subtle nuances and details conveyed in the text.

In a recent CNBC interview, Google CEO Sundar Pichai is calling this the most exciting he has seen in his life:

In search, we announce multi-step reasoning you can write a very very complex queries. Behind the scenes we are breaking it into multiple parts and composing that answer for you, so these are all agentic directions. Very early days, we're going to be able to do a lot more I think that's what makes this moment one of the most exciting I've seen in my life

In the interview, Pichai also addressed the common question about the glasses used in the demo, here's what he said:

We build Gemini to be multimodal because we see use cases like that project Astra shines when you have a form factor like glasses, so we working on prototypes but through Android you know we've always had plans to work on AR with multiple partners and so over time they'll bring products based on it as well

OpenAI's Be My Eyes

Be My Eyes is a Danish startup that has been creating technology for the blind and low vision community since 2012. Their app connects people who are blind or have low vision with volunteers who can help them with hundreds of daily life tasks, such as identifying products or navigating an airport. With over 250 million people worldwide who are blind or have low vision, Be My Eyes has been making a significant impact on improving accessibility.

With the new visual input capability of GPT-4o (currently in research preview), Be My Eyes has begun developing a GPT-4o powered Virtual Volunteer™ within their app. This virtual assistant can generate the same level of context and understanding as a human volunteer, opening up a world of possibilities for the blind and low vision community.

What sets GPT-4o apart from other language and machine learning models is its ability to engage in conversation and its advanced analytical prowess. Jesper Hvirring Henriksen, CTO of Be My Eyes, explains:

Basic image recognition applications only tell you what's in front of you. They can't have a discussion to understand if the noodles have the right kind of ingredients or if the object on the ground isn't just a ball, but a tripping hazard—and communicate that.

This conversational aspect allows users to ask follow-up questions and receive more detailed, usable information nearly instantly. For example, if a user sends an image of the contents of their fridge, GPT-4o not only recognizes and names the items but can also suggest recipes based on those ingredients.

Ray-Ban Meta Smart Glasses

The Ray-Ban Meta Smart Glasses, launched last fall, have recently received a significant upgrade with the addition of multimodal AI. This new feature allows the glasses to process various types of information, such as photos, audio, and text, making them a more versatile and useful wearable device. Despite some limitations and quirks, the Meta glasses offer a glimpse into the future of AI-powered gadgets and their potential to seamlessly integrate into our daily lives.

Multimodal AI enables the Meta glasses to understand and respond to a wide range of user queries. By simply saying, "Hey Meta, look and..." followed by a specific command, users can ask the glasses to identify plants, read signs in different languages, write Instagram captions, or provide information about landmarks and monuments.

The glasses capture a picture, process the information in the cloud, and deliver the answer through the built-in earphones. While the possibilities are not endless, the AI's capabilities are impressive and constantly evolving. Here are some examples what you can do with the Multimodal AI function:

Ask about what you see yourself: "Hey Meta, look and describe what I'm seeing."
Understand text: "Hey Meta, look and translate this text into English."
Get gardening tips: "Hey Meta, look and tell me how much water these flowers need?"
Express yourself: "Hey Meta, look and write a funny Instagram caption about this dog."

Meta's Vision for the Next Era of Computing

In a presentation at Advertising Week Europe in London, Derya Matras, Meta's VP for the UK, Northern Europe, Middle East, and Africa, declared that "phones had their time for the last few decades" and that "the next form factor is going to be smart glasses." This statement underscores Meta's vision for the future of computing, where smart glasses and virtual reality headsets will play a pivotal role in how we interact with online platforms.

One of the key developments in Meta's AI journey is the integration of multimodal capabilities into its LLM, Llama 3. By enabling the AI to react not just to text but also to audio and visual information, it opens up a world of possibilities, particularly in the realm of smart glasses. With the integration of Llama 3, Meta's smart glasses will allow users to simply press a button and ask the AI for help with various tasks, such as picking out an outfit based on the clothes the AI can see through the glasses' cameras. As Matras put it, "It's seamless, like you have an assistant with you, always."

Matras highlighted the immense possibilities for accessibility, where smart glasses could provide users with a set of AI eyes that can describe physical environments and provide localized information. Similar to the OpenAI's Be My Eyes, this could be a game-changer for individuals with visual impairments, enabling them to navigate the world with greater ease and independence.

The Time is Right for Smart Glasses: Could be the iPhone Moment

Back in 2013, when Google Glass first debuted, the technology needed to make smart glasses truly useful simply wasn't there. At Google I/O 2012, the Project Glass team and Google founder Sergey Brin took product demoing to a new level. They talked about working with some of the world's top athletes, combined skydiving and mountain biking, and shared the experience -- through their eyes -- with the world. See the Google Glass announcement below:

In June 2020, Google acquired Canadian North, an Amazon-backed company that makes smart glasses. Google said that the acquisition would help realize its vision of “ambient computing,” where ubiquitous connected devices work together. The company, formerly known as Thalmic Labs, rebranded in 2018 when it unveiled its Focals holographic smart glasses, see below the product video of Focals:

At Google I/O 2022, Google also gave a preview of new AR glasses (based on the technologies acquired from the North acquisition) that can live translate audio to text. See below the preview video from Google I/O 2022:

With the advancements showcased in Project Astra, Google Glass may finally be poised for its iPhone moment. If Google and DeepMind can successfully address the issue of hallucinations and create a reliable foundational model for the assistant, we could have a powerful AI assistant at our fingertips and in our ears. The most natural way to interact with such an assistant would be through voice, making lightweight smart glasses equipped with a camera and microphone the perfect interface.

Also the GPT-4o use cases with By My Eyes, imagine running it on smart glasses instead of running on the phones, the entire user experience is much improved and tremendous opportunities to make this AI model works on smart glasses.

Imagining the Possibilities with Smart Glasses

Picture yourself wearing smart glasses with a heads-up display (HUD) advanced enough to show you all the information you need.This could be the shakeup the world of personal computing needs, allowing smartphones to remain in our pockets while we interact with our digital lives through our glasses.

In the future, an AI assistant accessed through smart glasses shouldn't just answer your questions; it should also be able to operate all of your phone apps on your behalf. This would include:

Messages and appointment schedules
Getting realtime answers from search or AI
Play song and check playlist from music app like Spotify
Take photo using Camera app
Check weather using Weather app
Food delivery apps like Uber Eats
Social media apps like Tiktok or Instagram
Communication apps like Teams or Slack

By having a highly versatile, universal AI assistant with you at all times, you could handle every phone-related task without constantly juggling multiple devices.

To achieve this, OpenAI, Google and Meta may need to create a new kind of multimodal foundation model that can understand both voice and typed commands. This model would serve as a subroutine that communicates with the main AI assistant and executes commands to control your phone apps. By residing on your phone, this model could address privacy concerns and eliminate the need for cloud processing.

Addressing Privacy Concerns

One of the most significant issues with the original Google Glass was its perceived invasion of privacy. However, much has changed since 2013, with the proliferation of cameras in public spaces and the widespread use of social media. Despite this, Google should prioritize privacy features if they plan to release the next generation of Google Glass.

With AI agents recording everything around you, even remembering where you left your glasses, the potential for hackers to exploit this data is significant, especially in corporate office settings. Its ability to identify objects like glasses, apples, and even car license plates raises questions about how this data could be used.

By cross-referencing location data with object recognition, Google could potentially track your movements and activities in concerning ways. While doing AI processing locally on devices is preferable for privacy, many of the newly announced Google's AI features rely on cloud processing, meaning your data is being sent to Google's servers.

For Google's next generation of Google Glass or other OpenAI-powered smart glasses to succeed, they will need to prioritize privacy as much as innovation. Some key steps they should take:

The smart glass frame could light up when taking photos or videos
Recording audio or video could require holding the frame so people around you are fully aware of the recording.
The AI assistant should disable the HUD while driving, only allowing text and email reading and replies.
Clear disclosure - Provide plain language explanations of what data is collected, how it's used, and how it's protected. Don't hide behind legalese.
Opt-in consent - Make new AI features strictly opt-in. No automatic enrollment in data collection.
Decoupling - Allow users to enable/disable specific AI features without losing access to core product functionality.
Local processing - Whenever possible, do AI processing on-device to limit data sharing.
Quick fixes - If privacy issues are uncovered, pause the affected features, fix the issues promptly, and push updates to all users before re-enabling.
Companies may forbid the use of glasses on their premises to prevent industrial espionage.

By implementing these privacy features, Google and OpenAI can help ensure that the next generation of smart glasses is both useful and socially acceptable.

2024 will be ‘The year of AI glasses’

The glasses’ “ability to take live data ingested visually and translate that into actionable text or recommendations is going to be increasingly exciting,” says Henry Ajder, Founder, Latent Space. “It is something that people will use.” He is not alone: Computerworld says “2024 will be ‘The year of AI glasses.’”

Despite the challenges, the benefits of multimodal AI are clear, and companies that successfully implement this technology will be well-positioned to create innovative products, improve user experiences, and drive growth in the years to come. As Henry Ajder, founder of AI consultancy Latent Space, notes, "Multimodal has clear benefits and advantages, but there's no sugarcoating the complexities it brings." Embracing these complexities and investing in the development of responsible, robust multimodal AI systems will be key to unlocking the full potential of this exciting new frontier.

Google's Project Astra has given us a glimpse into the future of smart glasses and AI assistants. If these companies can successfully address the challenges of creating a reliable and versatile AI model while prioritizing user privacy, the next generation of Smart Glasses from Google, OpenAI, or Meta could herald a new era of personal computing, transforming how we interact with our digital lives.