Unleashing the Power of Machine Learning on Apple Silicon with Foundation Models

Unleashing the Power of Machine Learning on Apple Silicon with Foundation Models

Updated: June 11 2024 04:01


At the 2024 Worldwide Developers Conference, Apple introduced Apple Intelligence, a revolutionary personal intelligence system deeply integrated into iOS 18, iPadOS 18, and macOS Sequoia. This system is powered by a family of highly-capable generative models, including a ~3 billion parameter on-device language model and a larger server-based language model available with Private Cloud Compute, running on Apple silicon servers.


Apple Intelligence: Baked-in OS Features

Apple's latest release with Apple Intelligence powers new features across the system and within apps. One such feature is Writing Tools, which helps users communicate more effectively by rewriting text for tone and clarity, proofreading for mistakes, and summarizing key points.


Another notable addition is Image Playground, allowing developers to effortlessly integrate image creation features into their apps without the need to train models or design safety guardrails.


ML-Powered APIs: Enhancing App Capabilities with Create ML


For developers looking to offer intelligent features in their apps, Apple provides a range of APIs and frameworks that eliminate the need to deal with models statically. The Vision framework, for example, offers capabilities such as text extraction, face detection, body pose recognition, and more. With the introduction of a new Swift API and additional features like hand pose detection and aesthetic score requests, integrating visual understanding capabilities into apps has become even easier.



When model customization is required for specific use cases, Create ML is an excellent tool to start with. The Create ML app allows developers to customize models powering Apple's frameworks using their own data. By choosing a template aligned with the desired task and providing training data, developers can train, evaluate, and iterate on models with just a few clicks. The underlying Create ML and Create ML components frameworks also enable model training from within applications on all platforms.


Running Models On-Device: Advanced Use Cases

For more advanced use cases, such as running fine-tuned diffusion models or large language models downloaded from open-source communities, Apple provides a streamlined workflow for deploying models on its devices. The process involves three distinct phases: defining the model architecture and training the model, converting the model into Core ML format for deployment and optimization, and writing code to integrate with Apple frameworks for loading and executing the prepared model.



Core ML: The Gateway for Model Deployment

Core ML serves as the gateway for deploying models on Apple devices and is used by thousands of apps to enable amazing user experiences. It provides the performance critical for great user experiences while simplifying the development workflow with Xcode integration. Core ML automatically segments models across the CPU, GPU, and Neural Engine to maximize hardware utilization. With new features like the MLTensor type, key-value caches for efficient decoding of large language models, and support for functions to choose specific style adapters in image-generation models at runtime, Core ML continues to evolve and empower developers.


In scenarios where finer-grained control over machine learning task execution is necessary, developers can leverage Metal's MPS Graph and Accelerate's BNNS Graph API. MPS Graph enables the sequencing of ML tasks with other workloads, optimizing GPU utilization, while BNNS Graph provides strict latency and memory management controls for ML tasks running on the CPU. These frameworks form part of Core ML's foundation and are directly accessible to developers.


Responsible AI Development: Apple's Core Values


Apple Intelligence is designed with the company's core values at every step and built on a foundation of groundbreaking privacy innovations. Apple has created a set of Responsible AI principles to guide the development of AI tools and the underlying models:

  • Empower users with intelligent tools
  • Represent users authentically
  • Design with care to avoid misuse or potential harm
  • Protect user privacy by not using private personal data or user interactions in model training

These principles are reflected throughout the architecture that enables Apple Intelligence, connects features and tools with specialized models, and scans inputs and outputs to provide each feature with the information needed to function responsibly.

Modeling Overview: Pre-Training, Post-Training, and Optimization

Apple's foundation models are trained using the AXLearn framework, an open-source project released in 2023. The models are trained on licensed data, publicly available data collected by AppleBot, and filtered to remove personally identifiable information and low-quality content.

Post-training, Apple utilizes a hybrid data strategy incorporating both human-annotated and synthetic data, along with novel algorithms such as rejection sampling fine-tuning with teacher committee and reinforcement learning from human feedback (RLHF) with mirror descent policy optimization and a leave-one-out advantage estimator.

To optimize the models for speed and efficiency, Apple employs techniques such as grouped-query-attention, shared input and output vocab embedding tables, low-bit palletization, activation quantization, and embedding quantization. These optimizations enable impressive performance on iPhone 15 Pro, with a time-to-first-token latency of about 0.6 milliseconds per prompt token and a generation rate of 30 tokens per second.


Apple's foundation models are fine-tuned for users' everyday activities using adapters, small neural network modules plugged into various layers of the pre-trained model. These adapters can be dynamically loaded, cached in memory, and swapped, allowing the foundation model to specialize itself on-the-fly for the task at hand while efficiently managing memory and guaranteeing the operating system's responsiveness.

Performance and Evaluation: Human Satisfaction, Safety, and Benchmarks


Apple focuses on human evaluation when benchmarking their models, as these results are highly correlated to user experience in their products. The company conducts performance evaluations on both feature-specific adapters and the foundation models.


In evaluating the summarization adapter, Apple found that their models with adapters generate better summaries than comparable models, with higher "good" response ratios and lower "poor" response ratios. The models also proved robust when faced with adversarial prompts, achieving lower violation rates than open-source and commercial models.


Apple's foundation models outperform competitors on various benchmarks, including the Instruction-Following Eval (IFEval) benchmark and internal summarization and composition benchmarks. Human graders also preferred Apple's models as safe and helpful over competitor models for adversarial prompts.




A New Era of Personal Intelligence

The introduction of Apple Intelligence, powered by the company's groundbreaking on-device and server foundation models, marks a new era of personal intelligence deeply integrated into iPhone, iPad, and Mac. These models, developed responsibly and guided by Apple's core values, enable powerful capabilities across language, images, actions, and personal context. As Apple continues to innovate and expand its family of generative models, users can look forward to an increasingly seamless and intelligent experience across their Apple products.


Recent Posts