geeky NEWS: Navigating the New Age of Cutting-Edge Technology in AI, Robotics, Space, and the latest tech Gadgets
As a passionate tech blogger and vlogger, I specialize in four exciting areas: AI, robotics, space, and the latest gadgets. Drawing on my extensive experience working at tech giants like Google and Qualcomm, I bring a unique perspective to my coverage. My portfolio combines critical analysis and infectious enthusiasm to keep tech enthusiasts informed and excited about the future of technology innovation.
AI Phone: Smartest LLM Personal Assistants from Google, Samsung, and Apple
AI Phone: Smartest LLM Personal Assistants from Google, Samsung, and Apple
Updated: May 10 2024 19:53
Updated: Add some rumored AI features in iOS 18 from Apple - May 10 2024
Updated: Moved the AI Gadgets to another post - May 8 2024
In today's fast-paced world, our mobile phones have become our primary digital personal assistants, thanks to helpful voice assistant bots like Alexa, Google Assistant, and Siri. While these bots offer some level of vocal interaction, they are merely scratching the surface when it comes to handling our personal and professional tasks and routines. A new ecosystem is emerging, encompassing key AI technologies from OpenAI, Apple, Google, Samsung for automated discovery and knowledge management. These advancements point towards a future where LLM-powered AI Phone (or some call it "AI Personal Assistants") will be able to handle a wide range of intelligent tasks seamlessly and autonomously.
Tech Companies Adding AI to Smartphones
Apple, Samsung, Google are all leaning towards transactional LAMs with their AI Phone strategies. These tech giants will be adding a lot of AI Agent functionality to their smartphones in the coming months, similar to those recently launched by startup companies like Rabbit Technologies and Humane. Hope they all learn the lessons from these early reviews of the devices.
In a recent Apple Q2 2024 earnings call, Tim Cook assured investors that Apple is going all-in on AI, highlighting their unique advantages in Hardware, software, and services integration. Apple boasts a seamless ecosystem that could be a major asset in developing powerful AI features. Apple also has custom silicon chips with industry-leading neural engines that are specifically designed for AI tasks.
Cook also emphasized that Apple emphasizes user privacy as a core value, which could be a differentiator in the AI space. While remained vague about specifics, he did mention a significant investment in AI research and development, exceeding $100 billion over the past five years. We can expect more details to emerge at WWDC24 in June, where major changes to iOS, macOS, and iPadOS with a focus on AI are going to be announced.
Apple Poised to Revolutionize Voice Memos and Notes with AI Transcription in iOS 18
According to a report from AppleInsider, Apple is planning to introduce real-time transcription capabilities to its Voice Memos and Notes apps, potentially transforming the way users interact with their audio recordings and notes. These groundbreaking features promise to significantly boost efficiency and productivity for users across various domains, from students and professionals to journalists and creatives.
Voice Memos app will be among the first to receive the AI-powered upgrade. Users can expect a running transcript of their audio recordings, similar to Apple's recent Live Voicemail feature. The transcriptions will take center stage in the app, replacing the traditional graphical representation of recorded audio.
Notes will also benefit from the transcription feature. A dedicated transcription button, represented by a speech bubble icon, will allow users to display a transcription of any audio recorded within the app. This seamless integration of audio recording and transcription capabilities is set to make Notes a true powerhouse for capturing and organizing information.
In addition to transcription, Apple is introducing AI-generated summarization, which will provide users with concise text summaries of the key points and action items from their recorded audio. This feature will work in tandem with the new in-app audio recording and real-time transcription options, enabling users to process large amounts of data quickly and easily.
The potential applications for this technology are vast. Students can record lectures and classes without relying on third-party tools, while professionals can efficiently capture and summarize conference calls, virtual business meetings, and seminars. Journalists can transcribe and summarize lengthy interviews, and creatives such as authors and screenwriters can record and review key ideas without the need to listen to entire recordings.
While Apple has invested significant effort in ensuring the accuracy of its transcription and summarization features, the company acknowledges that mistakes may occur. To address this, the original audio will be maintained alongside the transcript and AI-generated summary, ensuring that no source information is lost in the process.
Privacy is a top priority for Apple, and the company is committed to protecting user data. While certain AI features are expected to function entirely on-device, server-side processing may be required for advanced capabilities like audio transcription and summarization.
Apple Safari AI-Powered Features and UI Enhancements in iOS 18
In addition to Voice Memos & Notes, Apple's Safari browser is gearing up for a significant upgrade with the upcoming release of iOS 18. According to a report by AppleInsider, Safari will incorporate a range of AI-powered features and user interface enhancements to elevate the browsing experience for users.
One of the standout features coming to Safari is "Intelligent Search," a browser assistant tool that harnesses Apple's on-device AI technology. This "Safari Browsing Assistant" tool aims to identify key topics and phrases on webpages, providing users with concise summaries. By leveraging AI capabilities, Intelligent Search will enable users to quickly grasp the essence of a webpage's content without having to read through the entire article or post.
Another notable addition to Safari is the "Web Eraser" tool, which empowers users to effortlessly remove unwanted portions of webpages. This feature will prove particularly useful for eliminating distracting elements or irrelevant content, allowing users to focus on the information that matters most. The erasure is designed to be persistent, meaning that the removed sections will remain absent even when revisiting the site, unless the user chooses to revert the changes.
Safari's user interface is also set to undergo a revamp, with a new quick-access menu emerging from the address bar. This menu will consolidate various page tools, bringing over some functions that currently reside in the Share Sheet and placing them alongside the newly introduced tools. By centralizing these features, Apple aims to provide a more streamlined and intuitive browsing experience for users.
In addition to the enhancements in iOS 18, the iPadOS and macOS versions of Safari are expected to align further. This move towards greater consistency across platforms will ensure a seamless browsing experience for users, regardless of the device they are using. The unified approach will make it easier for users to navigate and utilize Safari's features, whether they are on their iPhone, iPad, or Mac.
Apple Set to Revolutionize Siri with On-Device Generative AI
According to a report by The New York Times, the new version of Siri will be powered by generative AI, marking a significant leap forward in the assistant's capabilities and user experience. For years, users have been clamoring for a more advanced and intuitive version of Siri, as the current iteration often struggles with understanding context and handling complex requests. Apple executives Craig Federighi and John Giannandrea recognized the need for change after extensively testing ChatGPT, a popular AI-powered chatbot. This realization prompted the company to prioritize the development of a new, AI-driven Siri.
The New York Times reports that Apple has labeled the generative AI project as a tentpole initiative, with the company deciding to give Siri "a brain transplant" in early 2023. This overhaul aims to make Siri more conversational, better at understanding context, and capable of providing more reasonable answers to user queries.
While the new Siri won't focus on open-ended requests like ChatGPT, it will use the generative AI foundations to enhance its ability to handle everyday tasks such as setting timers, creating reminders, and more. Additionally, the updated assistant will be able to summarize incoming text messages and notifications, making it easier for users to stay on top of their communications.
Apple is known for its commitment to user privacy, and the company plans to differentiate its AI offering by processing most requests entirely on-device, rather than relying on cloud-based processing like its competitors. This approach will help to protect user data and maintain a high level of privacy.
Despite the exciting prospects of a generative AI-powered Siri, there are still some uncertainties surrounding the implementation. It remains unclear whether all of Apple's devices will have the necessary processing power to run on-device large language models, or if the new Siri will be limited to devices with more powerful chips, such as the latest iPhone models or the recently announced M4 iPad Pro. This could potentially leave products like the HomePod or Apple Watch without access to the enhanced assistant.
To support the new AI-driven features, Apple is reportedly increasing the RAM in the chips that power the upcoming iPhone 16 models. The company is also investing in its AI server capacity by filling data centers with its own Apple Silicon chips. Additionally, there are ongoing negotiations with Google to license its Gemini backend for certain features, indicating that Apple may adopt a hybrid approach to AI processing.
Apple's Broader AI Intiative
The introduction of transcription & summarization features and Generative AI powered Siri is just one aspect of Apple's broader AI initiative. By incorporating AI-powered summarization and transcription into its system applications, Apple aims to showcase the real-world advantages of AI technology in tackling everyday tasks and improving user efficiency. The company also seeks to position itself competitively against the growing number of third-party applications leveraging AI, such as Otter and Microsoft's OneNote.
The upcoming release of iOS 18, macOS 15, and iPadOS 18 marks a significant milestone in Apple's AI journey. With the introduction of real-time audio transcription and AI-powered summarization in Notes, Voice Memos, and other core applications, Apple is set to revolutionize the way users capture, organize, and process information. These features promise to boost efficiency and productivity across various domains, cementing Apple's position as a leader in AI-driven innovation. As we eagerly await the official unveiling at WWDC in June, it's clear that the future of note-taking and audio recording is about to be redefined by Apple's groundbreaking AI technology.
Apple Ferret-UI Multimodal LLMs
In the Multimodal LLM development, Apple recently unveiled its Ferret-UI, a multimodal large language model (MLLM) that can understand what's happening on your iPhone screen. This AI model can identify icon types, find specific pieces of text, and provide precise instructions for accomplishing specific tasks.
The Ferret-UI is designed to automate our interactions with our phones, making them even easier. It can help with accessibility, testing apps, and testing usability¹[6]. For instance, when shown a picture of AirPods in the Apple store and asked how one would purchase them, Ferret-UI correctly replied that one should tap on the 'Buy' button.
We will definitely hear more about their AI Phone plans at the upcoming Apple's WWDC Developer Conference from June 10th to 14th this year.
Samsung Galaxy AI on Galaxy S24
Samsung is also investing heavily in this area, released the Galaxy AI-powered S24 series. Here are some standout features:
Generative Edit: This feature allows you to transform ordinary images into photographic masterpieces with just a few taps.
Chat Assist: This feature provides real-time tone suggestions to make your writing sound more professional or conversational.
Live Translate: This feature removes communication hurdles with near-real-time voice translations right through the Phone app.
Note Assist: This feature revolutionizes your note-taking and organizes your life.
Circle to Search: This feature allows you to circle an object of interest on your screen to get Google Search results.
Google Pixel 8 with On-Device Gemini Nano
Google's Pixel 8 Pro is the first smartphone with AI built-in. It runs on-device Gemini Nano, Google's most efficient model built for on-device tasks. Here are some of its AI features:
Summarize in Recorder: This feature provides a summary of your recorded conversations, interviews, presentations, and more — even without a network connection.
Smart Reply in Gboard: This feature saves you time by suggesting high-quality responses with conversational awareness.
Magic Audio Eraser: This feature identifies background noise in your videos, allowing you to remove unwanted sounds.
Best Take: This feature changes a subject's face in a group photo to a selection of expressions generated from other images.
The Concept AI Phone from Deutsche Telekom
I attended the MWC 2024 in Barcelona this year and came across this concept AI Phone, which is a collaboration between Deutsche Telekom, Qualcomm, and Brain.ai. This AI Phone is designed to operate with an app-free interface, powered by cloud-based AI. It’s a bold step towards a more intuitive and seamless user experience. It aims to replace the traditional app-based interface with an onboard AI that processes your requests through textual prompts. It generates its own UI in response to your needs, effectively reading your mind. The phone is equipped with a Snapdragon 8 Gen 3 processor, enhancing its AI capabilities, especially in image and video tasks.
"Book me a flight to the quarterfinals". The intelligent assistant can help perform tasks by voice command. The showcase uses concrete examples to demonstrate how an AI smartphone can make life easier when planning trips, shopping, creating video or editing photos. Deutsche Telekom's generative interface powered by Brain.ai makes it possible. Using AI, it takes over the functions of a wide range of apps and can carry out all daily tasks that would normally require several applications on the device. The concierge can be controlled effortlessly and intuitively via voice and text. Below is the full video demo of the concept AI phone:
Jon Abrahamson Chief Product & Digital Officer Deutsche Telekom, is convinced: "Artificial intelligence and Large Language Models (LLM) will soon be an integral part of mobile devices. We will use them to improve and simplify the lives of our customers. Our vision is a magenta concierge for an app-free smartphone. A real everyday companion that fulfills needs and simplifies digital life."
AutoDroid Generalized Task Automation
AutoDroid combines the reasoning capabilities of LLMs with app-specific knowledge to automate arbitrary tasks on smartphones. It aims to build an autonomous agent that can understand user commands in natural language and complete the specified tasks by intelligently interacting with the smartphone GUI.
The major approaches to mobile task automation can be classified into three categories: developer-based, demonstration-based, and learning-based techniques. Most existing commercial products (e.g., Siri, Google Assistant, etc.) take a developer-based approach, which requires significant development efforts to support a new task. For example, to enable an automated task with Google Assistant, app developers need to identify the functionality they want to trigger, configure and implement the corresponding intent, and register the intent with the assistant.
When executing a task, the assistant uses natural language understanding (NLU) modules to map the user command to the intent, extract the intent parameters, and invoke the corresponding developer-defined function. The emergence of LLMs like GPT-3 and GPT-4 presents an opportunity to address these limitations and enable more generalized task automation.
GUI Representation
Challenge: LLMs are designed to process natural language, while task automation requires understanding and interacting with graphical user interfaces (GUIs).
Solution: AutoDroid converts the GUI states and actions into a structured HTML-like text format that LLMs can understand and reason about to make precise interaction decisions.
Knowledge Integration
Challenge: LLMs lack domain-specific knowledge about mobile apps, which is crucial for navigating complex app states and completing tasks effectively.
Solution: AutoDroid performs automated app exploration to extract UI transition graphs and task completion knowledge. This app-specific knowledge is then integrated into the LLM prompts and memory to guide task execution.
Cost Optimization
Challenge: Querying LLMs is costly and compute-intensive, especially for complex multi-step tasks that may require many lengthy queries.
Solution: AutoDroid optimizes LLM queries by reducing and simplifying them based on the extracted app knowledge. It injects relevant information into prompts and matches UI traces to minimize unnecessary queries.
AutoDroid System Overview
Offline Stage:
App Exploration: AutoDroid automatically explores the target app's UI states and transitions, building a UI Transition Graph (UTG).
Task Synthesis: By analyzing the UTG with LLMs, AutoDroid extracts task completion knowledge and stores it in the app memory.
Online Stage:
Prompt Generation: When the user specifies a task, AutoDroid generates an optimized prompt using the current UI state, task description, and relevant app knowledge.
Privacy Filtering: Sensitive user information in the prompt is replaced to protect privacy.
LLM Querying: The filtered prompt is sent to the LLM to obtain the next action to perform.
Task Execution: The LLM's response is parsed into an executable action, which is verified for security and executed on the device. User confirmation is sought for potentially risky actions.
AutoDroid Benchmark Dataset
AutoDroid authors created a benchmark dataset with 158 tasks across 13 common Android apps. Using GPT-3.5, GPT-4, and open-source LLMs like Vicuna. AutoDroid achieved a 71.3% task completion rate with GPT-4, with 90.9% accuracy for each action. Compared to baselines using off-the-shelf LLMs, AutoDroid improved task completion rates by 36-40% and reduced LLM query costs by over 50%. Check out their paper AutoDroid: LLM-powered Task Automation in Android for more details.
The Future of AI Phone
LAMs in AI Phone are capable of more than just communication and response generation. They can analyze the preferences, habits, and past interactions of a user to provide personalized recommendations for various activities. Whether it's suggesting restaurants, movies, books, or travel destinations, or offering personalized advice on health, fitness, or personal finance based on individual goals and preferences, LAMs are designed to cater to the unique needs of each user.
In addition, LAMs can integrate with smart home devices and IoT (Internet of Things) systems. They can control appliances, monitor power consumption, or enhance home security. They can respond to voice instructions, adjust devices' settings based on user preferences, and automate routine activities, enhancing convenience and comfort.
As we look towards the future, we can expect LAMs and LLMs to become even more defined and explored. While major tech players like Apple, Samsung & Google are working on their own LLM-powered smart devices, companies like Rabbit are already pushing the boundaries of what is possible with AI Phone. The hope is that they will evolve into truly useful and accurate AI Phone for people.
LAMs/LLMs have all the prerequisites to become one of the most powerful AI technologies. As they continue to improve their understanding of human intention and action execution, they will become increasingly effective at automating complex tasks within a variety of industries. This applies not only to routine administrative tasks but also to more complex decision-making and problem-solving processes. Looking ahead, the proliferation of AI in our daily lives will continue to grow. The future of AI is not just about smart devices, but about creating an ecosystem of interconnected AI-powered services that work together to make our lives easier, more efficient, and more enjoyable.