NewAppLander — App landing pages in 60s$69$39
The Swift Kit logoThe Swift Kit
Tutorial

On-Device AI in SwiftUI: Core ML and Create ML Tutorial (2026)

Apple has been quietly building the most powerful on-device AI stack in the industry. With Core ML, Create ML, and the new Foundation Models framework, you can run image classifiers, language models, and object detectors entirely on the user's iPhone, with zero cloud costs and total privacy.

Ahmed GaganAhmed Gagan
14 min read

Key Takeaway

On-device AI means your ML models run entirely on the iPhone's Neural Engine. No server costs, no API keys, no latency, and no privacy concerns. Apple's stack now includes Core ML for inference, Create ML for training, and the Foundation Models framework for accessing Apple Intelligence's ~3B parameter language model directly from your SwiftUI app.

Why On-Device AI Is a Game Changer for iOS Developers

Most AI tutorials for iOS start with "call the OpenAI API." That works, and we have a full guide on building AI apps with ChatGPT and SwiftUI if that is what you need. But cloud AI comes with real tradeoffs: API costs that scale with users, latency that depends on network quality, and privacy concerns that can make or break certain app categories.

On-device AI flips all of that. The model runs on the iPhone's Neural Engine and Apple Silicon GPU. Inference is free. It works offline. And the user's data never leaves their device. For use cases like photo classification, document scanning, health data analysis, or journaling apps, on-device AI is not just a nice option. It is the right one.

Apple has invested heavily in making this accessible. The Neural Engine on the A17 Pro and M-series chips can handle 35 TOPS (trillion operations per second). An image classification model runs in under 10 milliseconds. Object detection takes less than 100ms on iPhone 12 and newer. And with the Foundation Models framework introduced alongside iOS 26, you can now run a ~3 billion parameter language model on-device for free.

Apple's ML Framework Stack: The Full Picture

Before we write any code, let me map out what each framework does. Apple's ML ecosystem has grown significantly, and the naming can be confusing if you are coming to it fresh.

FrameworkPurposeWhen to Use
Core MLRun pre-trained ML models on-device. Handles inference across CPU, GPU, and Neural Engine.Image classification, object detection, style transfer, NLP, any custom model
Create MLTrain ML models directly on your Mac with no code (or via Swift APIs).Custom image classifiers, text classifiers, sound classifiers, object detectors
Foundation ModelsAccess Apple Intelligence's ~3B on-device language model via Swift.Text summarization, entity extraction, content generation, structured data extraction
VisionHigh-level computer vision APIs built on Core ML.Face detection, text recognition (OCR), barcode scanning, body pose estimation
Natural LanguageNLP APIs for text processing.Sentiment analysis, named entity recognition, language detection, tokenization
SpeechOn-device speech recognition.Voice commands, transcription, dictation
Sound AnalysisClassify audio from microphone or files.Music recognition, environmental sound detection, baby cry detection
coremltools (Python)Convert models from PyTorch, TensorFlow, and ONNX to Core ML format.Bringing models trained elsewhere into your iOS app

The typical workflow looks like this: you either train a model using Create ML (no code needed for common tasks), convert an existing PyTorch/TensorFlow model using coremltools, or use Apple's pre-trained models from the Core ML model gallery. Then you load and run the model using Core ML in your SwiftUI app. For language tasks, the Foundation Models framework gives you a shortcut that skips all of that and lets you call Apple Intelligence directly.

Core ML: Running Models On-Device

Core ML is the engine that runs ML models on the iPhone. It automatically decides whether to use the CPU, GPU, or Neural Engine for each operation, optimizing for speed and power efficiency. You do not need to think about hardware scheduling. You hand Core ML a model and an input, and it returns a prediction.

Apple provides a collection of pre-trained models ready to drop into your app. These include MobileNetV2 for image classification (only 6.3 MB), YOLOv3 for real-time object detection, BERT for natural language understanding, and DeepLabV3 for semantic image segmentation. All of these are available in .mlmodel format and can be added to your Xcode project by simply dragging them in.

Here is how simple it is to run image classification in SwiftUI using a Core ML model:

import CoreML
import Vision
import SwiftUI

func classifyImage(_ image: UIImage) async throws -> String {
    guard let ciImage = CIImage(image: image) else {
        throw ClassificationError.invalidImage
    }

    let model = try await MobileNetV2(configuration: .init()).model
    let vnModel = try VNCoreMLModel(for: model)

    return try await withCheckedThrowingContinuation { continuation in
        let request = VNCoreMLRequest(model: vnModel) { request, error in
            guard let results = request.results as? [VNClassificationObservation],
                  let top = results.first else {
                continuation.resume(throwing: ClassificationError.noResults)
                return
            }
            continuation.resume(returning: "\(top.identifier) (\(Int(top.confidence * 100))%)")
        }
        let handler = VNImageRequestHandler(ciImage: ciImage)
        try? handler.perform([request])
    }
}

That is roughly 20 lines. Drag in a model file, write a function, and you have image classification running entirely on the user's device. No API key, no server, no network request.

Create ML: Train Your Own Models Without Writing Python

What if the pre-trained models do not fit your use case? Maybe you need to classify types of houseplants, detect specific product logos, or categorize customer support messages. That is where Create ML comes in.

Create ML is a macOS app (and a set of Swift APIs) that lets you train custom ML models directly on your Mac. No Python. No Jupyter notebooks. No GPU cluster. You provide labeled training data, pick a model type, click Train, and Create ML outputs a .mlmodel file that you drop into Xcode.

What You Can Train with Create ML

  • Image Classification: Label folders of images. Create ML learns to distinguish between categories. Works well with as few as 10-50 images per category thanks to transfer learning.
  • Object Detection: Annotate bounding boxes around objects in images. Create ML trains a model that can locate and identify objects in real time.
  • Text Classification: Provide labeled text samples (positive/negative reviews, spam/not-spam, categories). Create ML builds a text classifier.
  • Sound Classification: Label audio clips. Create ML trains a model to recognize sounds (coughs, dog barks, alarms, musical instruments).
  • Action Classification: Feed in labeled video clips of human actions. Create ML trains a model for gesture or activity recognition.
  • Style Transfer: Provide a style image and Create ML builds a model that applies that artistic style to any photo in real time.
  • Tabular Data: Regression and classification on structured data (CSV files). Good for price predictions, churn models, or scoring systems.

The training process leverages transfer learning, which means Create ML starts with a model that already understands general image or text features, and fine-tunes it on your specific data. This is why you can get good results with relatively small datasets. Training an image classifier with 200 images might take 5-10 minutes on an M-series Mac.

A Practical Example: Training a Plant Identifier

Say you are building a gardening app and you want to identify 20 types of houseplants. Here is the workflow:

  1. Create a folder for each plant type (Monstera, Pothos, Snake Plant, etc.).
  2. Add 30-50 photos per folder. Use a mix of angles, lighting, and backgrounds.
  3. Open Create ML, create a new Image Classification project, and drag in your data folder.
  4. Click Train. On an M2 MacBook Air, 1,000 images across 20 categories takes about 8 minutes.
  5. Review the accuracy metrics. Create ML shows you which categories are confused with each other.
  6. Export the .mlmodel file. It will be 5-15 MB depending on the base model.
  7. Drag it into Xcode. Xcode auto-generates a Swift class for the model.

That is it. No Python environment. No dependencies to manage. No cloud training costs. The resulting model runs on-device with sub-10ms inference time.

Converting Models from PyTorch and TensorFlow

Sometimes you need a model that goes beyond what Create ML can train. Maybe you found a state-of-the-art object detection model on Hugging Face, or your team trained a custom PyTorch model for a specific task. Apple's coremltools Python package handles the conversion.

The Unified Conversion API supports PyTorch and TensorFlow models directly. You no longer need to export to ONNX as an intermediate step (though ONNX conversion is still supported for models from other frameworks like Keras or Caffe). The conversion process transforms your model into an intermediate representation called MIL, then outputs a .mlmodel or .mlpackage file.

# Convert a PyTorch model to Core ML
import coremltools as ct
import torch

model = torch.load("plant_detector.pt")
model.eval()

example_input = torch.randn(1, 3, 224, 224)
traced = torch.jit.trace(model, example_input)

mlmodel = ct.convert(traced, inputs=[ct.ImageType(shape=(1, 3, 224, 224))])
mlmodel.save("PlantDetector.mlpackage")

One important note: coremltools also supports model compression, which is critical for on-device deployment. You can quantize models from 32-bit to 16-bit or even 8-bit precision, reducing model size by 2-4x with minimal accuracy loss. For large language models and diffusion models, this compression is what makes them viable on mobile hardware.

The Foundation Models Framework: Apple Intelligence in Your App

This is the most exciting addition to Apple's ML stack. With iOS 26, Apple opened up the on-device language model that powers Apple Intelligence to third-party developers through the Foundation Models framework.

The model is approximately 3 billion parameters, runs entirely on-device, and is free to use. No API costs. No rate limits. No privacy concerns. It excels at text tasks like summarization, entity extraction, text understanding, content refinement, short dialog, and creative content generation.

Guided Generation: Structured Output from Swift Types

The standout feature is guided generation. Instead of getting freeform text back from the model (which you then have to parse), you define a Swift struct with the @Generable macro, and the model outputs structured data that maps directly to your type. This is constrained decoding, and it is remarkably developer-friendly.

import FoundationModels

@Generable
struct RecipeSuggestion {
    var name: String
    var ingredients: [String]
    var estimatedMinutes: Int
    var difficulty: Difficulty

    enum Difficulty: String, Generable {
        case easy, medium, hard
    }
}

// Generate a structured recipe from a prompt
let session = LanguageModelSession()
let suggestion: RecipeSuggestion = try await session.respond(
    to: "Suggest a quick weeknight dinner using chicken and rice",
    generating: RecipeSuggestion.self
)

The model returns a fully typed RecipeSuggestion with the name, ingredients list, time estimate, and difficulty level. No JSON parsing, no string manipulation, no prompt engineering to get the right format. You define the shape in Swift, and the model fills it in. This is a fundamentally different experience from working with a cloud API where you are constantly wrestling with output format.

When to Use Foundation Models vs. Cloud AI

The Foundation Models framework is excellent for many tasks, but it is not a replacement for GPT-4o or Claude in every scenario. Here is how to think about the tradeoff:

CriteriaOn-Device (Foundation Models / Core ML)Cloud AI (OpenAI, Anthropic, etc.)
CostFree. No per-request charges.$0.15 - $10+ per million tokens depending on model.
PrivacyData never leaves the device. Ideal for health, finance, journaling.Data sent to third-party servers. Requires privacy policy disclosure.
LatencyNear-instant for small tasks. No network dependency.1-5 seconds typical. Depends on network and model size.
Offline SupportWorks completely offline.Requires internet connection.
Model CapabilityGood for focused tasks. ~3B parameters limits complex reasoning.State-of-the-art. GPT-4o and Claude handle nuanced, multi-step tasks.
Device RequirementsRequires Apple Silicon (iPhone 15 Pro+ for Foundation Models).Works on any device with internet.
Best ForClassification, summarization, entity extraction, structured output, real-time vision.Long-form generation, complex reasoning, image generation, multi-modal tasks.

The smart approach is to use both. Run lightweight tasks on-device (summarize this note, classify this image, extract these fields) and call cloud APIs for the heavy lifting (generate a full article, analyze a complex document, create an image). This hybrid architecture gives you the best of both worlds: speed and privacy for common operations, and full capability when you need it.

Real-World Use Cases for On-Device AI

Let me walk through practical examples of how indie developers are using on-device AI in shipping apps right now. These are not theoretical concepts. They are features that real apps charge real money for.

1. Photo Organization and Tagging

Use a Core ML image classifier to automatically tag photos with categories (food, pets, landscape, people, documents). This runs in the background as photos are imported, costs nothing per operation, and works without internet. Combine it with the Vision framework's text recognition to extract text from photos of receipts, whiteboards, or documents.

2. Smart Journal Entries

Use the Foundation Models framework to summarize journal entries, extract mood indicators, and suggest tags. Because everything stays on-device, users can write about sensitive health topics, personal relationships, or financial situations without worrying about their data being sent to a server. Privacy is not a feature here. It is the entire selling point.

3. Real-Time Object Detection for Accessibility

Core ML with a YOLO model can identify objects in the camera feed at 30+ frames per second on modern iPhones. Accessibility apps use this to describe the user's surroundings in real time. The latency of a cloud API would make this unusable. On-device inference makes it seamless.

4. Fitness Form Analysis

The Vision framework's body pose estimation can track 19 body landmarks in real time. Combine this with a custom Core ML model trained on exercise form data, and you can build a personal trainer that analyzes squat depth, push-up alignment, or yoga poses. All processing happens on-device, so the user's workout video never leaves their iPhone.

5. Natural Language Processing for Productivity

The Natural Language framework can perform sentiment analysis, named entity recognition, and language detection without any model download. For a task management app, you can automatically extract dates, people, and locations from natural language input ("Meet Sarah at the coffee shop on Friday" becomes a structured task with a date, contact, and location). The Foundation Models framework takes this further, handling more nuanced text understanding tasks.

6. Audio Classification

A baby monitor app that uses Sound Analysis to detect crying. A nature app that identifies bird calls. A music practice app that listens for specific instruments. All of these can be built with Create ML's sound classification and run in real time on-device.

Privacy: Why On-Device AI Matters More Than You Think

Apple's approach to AI privacy goes beyond just running models locally. With Apple Intelligence, even tasks that require more compute than the device can handle are processed through Private Cloud Compute (PCC), where data is never stored, never accessible to Apple, and the server code is verifiable by independent security researchers.

For developers, this creates a genuine competitive advantage. In 2026, where Apple routes even third-party model weights (including Google's Gemini) through its own PCC hardware so the raw user context is never accessible to the third party, you can build apps that handle deeply personal data without the compliance overhead that cloud AI typically requires.

Health apps, finance apps, journaling apps, and therapy tools can all leverage AI features without collecting user data. That is not just a nice selling point for the App Store listing. It simplifies your GDPR compliance, removes the need for data processing agreements with AI providers, and eliminates an entire category of security risk.

Performance: What to Expect on Real Hardware

Let me share concrete performance numbers so you can plan your features with realistic expectations.

TaskModeliPhone 15 ProiPhone 16 Pro
Image ClassificationMobileNetV2<5ms<3ms
Object DetectionYOLOv3~30ms (33 FPS)~20ms (50 FPS)
Text Recognition (OCR)Vision framework~50ms per frame~30ms per frame
Sentiment AnalysisNatural Language<1ms<1ms
Body Pose EstimationVision framework~25ms (40 FPS)~15ms (66 FPS)
On-Device LLM (text generation)Foundation Models (~3B)~15 tokens/sec~20 tokens/sec

These numbers are fast enough for real-time applications. Image classification at 5ms means you can process every camera frame without the user noticing any delay. Object detection at 30+ FPS is smooth enough for augmented reality overlays. And the on-device LLM generates text faster than most people can read it.

Looking Ahead: Core AI at WWDC 2026

Apple is expected to announce a new framework called Core AI at WWDC 2026, positioned as the next evolution of Core ML. While details are still speculative, the direction is clear: Apple wants on-device AI to be as easy to use as UIKit or SwiftUI. The current stack (Core ML + Vision + Natural Language + Foundation Models) already gets you very far, and anything Apple announces will build on these foundations rather than replace them overnight.

If you are learning Core ML and Create ML today, you are not wasting your time. The concepts, workflows, and mental models will transfer directly to whatever Apple announces next. The best time to start building with on-device AI is now.

Getting Started: Your First On-Device AI Feature

Here is my recommended path for adding on-device AI to your SwiftUI app:

  1. Start with Apple's pre-trained models. Browse the Core ML model gallery and find one that fits your use case. Drag it into Xcode and experiment.
  2. Try Create ML for custom classification. Gather 30-50 images per category, open Create ML, and train your first custom model. The entire process takes under 30 minutes.
  3. Explore the Foundation Models framework. If your app involves any text processing (summarization, extraction, classification), the on-device LLM with guided generation is remarkably easy to integrate.
  4. Use the Vision and Natural Language frameworks. These provide high-level APIs for common tasks that do not require custom models at all.
  5. Consider a hybrid approach. Use on-device AI for real-time and privacy-sensitive tasks, and cloud AI for complex generation tasks.

Build Faster with The Swift Kit

The Swift Kit includes a pre-built AI module with both on-device and cloud AI integration. The cloud AI chat interface comes with a Flask proxy backend, streaming responses, and conversation management. Combined with on-device Core ML helpers, you get the best of both worlds out of the box. Check our full tech stack guide for how it all fits together.

The Bottom Line

On-device AI is not the future of iOS development. It is the present. Apple has quietly built the most powerful and most private ML stack available on any mobile platform. Core ML gives you fast inference. Create ML lets you train custom models without Python. The Foundation Models framework puts a 3-billion-parameter language model in your hands for free. And the Neural Engine on modern iPhones makes all of it run in real time.

If you are building an iOS app in 2026, learn these tools. The apps that win in the App Store are increasingly the ones that feel intelligent, that adapt to the user, that understand context. On-device AI is how you build those features without a cloud bill and without compromising your users' privacy.

For more on how AI fits into the broader indie iOS development picture, check out our 2026 tech stack guide and our collection of app ideas that leverage AI features.

Share this article

Ready to ship your iOS app faster?

The Swift Kit gives you a production-ready SwiftUI codebase with onboarding, paywalls, auth, AI integrations, and more. Stop building boilerplate. Start building your product.

Get The Swift Kit