ArshTechPro

Posted on Feb 18

Apple's Foundation Models Framework: Run AI On-Device With Just a Few Lines of Swift

#ios #mobile #swift #ai

Apple has quietly shipped one of the most significant frameworks for iOS developers in years. With iOS 26 the Foundation Models framework gives you direct access to Apple's on-device ~3B parameter large language model — the same one powering Apple Intelligence — right from your Swift code.

No API keys. No cloud costs. No internet required. And it's completely free.

Let's break down what this means and how to start building with it.

What Is the Foundation Models Framework?

The Foundation Models framework exposes Apple's on-device LLM to third-party developers. Unlike cloud-based models like ChatGPT or Claude that run on remote servers, Apple's model runs entirely on the user's device using Apple silicon (CPU, GPU, and Neural Engine).

This gives you:

Privacy by default — all data stays on-device
Zero latency from network calls — inference happens locally
Offline support — works without internet
Free inference — no per-token costs, no API billing
Deep Swift integration — the API feels native, not bolted on

The model specializes in language understanding, structured output generation, and tool calling. It's not designed as a general-knowledge chatbot, but rather as an engine for building intelligent features tailored to your app.

Getting Started: Your First On-Device AI Feature

Here's how simple it is to generate a response from Apple's on-device LLM:

import FoundationModels

let session = LanguageModelSession()
let response = try await session.respond(to: "What's a good name for a travel app?")
print(response.content)

That's it. Three lines of code and you're running AI inference on-device.

Streaming Responses

For a ChatGPT-like experience where text appears token by token:

let session = LanguageModelSession()
let stream = session.streamResponse(to: "Suggest 5 creative app names for a fitness tracker")

for try await partial in stream {
    print(partial.content, terminator: "")
}

This creates a much smoother UX instead of waiting for the entire response to complete.

Guided Generation: The Killer Feature

Here's where Foundation Models really shines compared to typical LLM APIs. Guided Generation lets you get structured, type-safe outputs directly as Swift types.

Instead of parsing messy JSON strings, you define your output structure using the @Generable macro:

import FoundationModels

@Generable
struct MovieRecommendation {
    let title: String
    @Guide(description: "A brief one-sentence summary")
    let summary: String
    @Guide(.anyOf(["PG", "PG-13", "R", "G"]))
    let rating: String
}

Then generate structured output:

let session = LanguageModelSession()
let movie: MovieRecommendation = try await session.respond(
    to: "Recommend an action movie from the 2020s",
    generating: MovieRecommendation.self
).content

print(movie.title)   // e.g., "Top Gun: Maverick"
print(movie.rating)  // "PG-13" — guaranteed to be one of the allowed values

The @Guide macro lets you:

Add natural language descriptions to guide the model
Constrain values to a specific set with .anyOf()
Control array lengths with count()
Enforce string patterns with regex

This is constrained decoding built directly into the framework — the model is literally forced to produce valid output matching your Swift types at the token level. No more hoping the model returns valid JSON.

Tool Calling: Give the Model Superpowers

The on-device model has a ~3B parameter size, so it doesn't know everything. But with Tool Calling, you can extend its capabilities by giving it access to your app's data and APIs.

Here's an example — a health coach that reads HealthKit data:

import FoundationModels

struct BloodPressureTool: Tool {
    let name = "getBloodPressure"
    let description = "Fetches the user's latest blood pressure reading"

    func call(arguments: EmptyInput) async throws -> String {
        // Fetch from HealthKit
        let systolic = 120
        let diastolic = 80
        return "Systolic: \(systolic) mmHg, Diastolic: \(diastolic) mmHg"
    }
}

let session = LanguageModelSession(tools: [BloodPressureTool()]) {
    """
    You're a health coach. Help users manage their health 
    based on their blood pressure data.
    """
}

let response = try await session.respond(to: "How's my blood pressure looking?")

The model will automatically decide when to call your tool, fetch the data, and incorporate it into a natural language response. This is incredibly powerful for building context-aware features.

Specialized Adapters: Content Tagging Out of the Box

Beyond the general-purpose model, Apple provides specialized adapters for specific tasks. The content tagging adapter is built-in and optimized for:

Topic tag generation
Entity extraction
Topic detection

let taggingModel = SystemLanguageModel(useCase: .contentTagging)
let session = LanguageModelSession(model: taggingModel)

@Generable
struct Tags {
    let topics: [String]
}

let result: Tags = try await session.respond(
    to: "Apple announced new MacBook Pro with M5 chip at their spring event",
    generating: Tags.self
).content

print(result.topics) // ["Apple", "MacBook Pro", "M5", "Product Launch"]

Custom Adapter Training: Teach the Model New Tricks

For advanced use cases, Apple provides a Python-based adapter training toolkit that lets you fine-tune the on-device model with your own data using LoRA (Low-Rank Adaptation).

When should you consider training a custom adapter?

The model needs to become a subject-matter expert for your domain
You need a specific output style, format, or policy
Prompt engineering isn't achieving the required accuracy
You want lower latency by reducing prompt length

Key things to know about adapters:

Each adapter is ~160MB in storage
Adapters are compatible with a single specific model version — you must retrain when Apple updates the base model
Deploy via the Background Assets framework (don't bundle in your app)
Requires Mac with Apple silicon and 32GB+ RAM, or Linux GPU machines
You need the Foundation Models Framework Adapter Entitlement for production deployment

Apple recommends exhausting prompt engineering and tool calling before jumping to adapter training. It's powerful but comes with ongoing maintenance.

Availability and Requirements

Before creating a session, always check availability:

let availability = SystemLanguageModel.default.availability

switch availability {
case .available:
    // Ready to use
case .unavailable(.deviceNotEligible):
    // Device doesn't support Apple Intelligence
case .unavailable(.appleIntelligenceNotEnabled):
    // User needs to enable Apple Intelligence in Settings
case .unavailable(.modelNotReady):
    // Model is still downloading
default:
    break
}

Requirements:

iOS 26, iPadOS 26, macOS 26, or visionOS 26
Apple Intelligence-compatible device (iPhone 15 Pro or later, M-series Macs/iPads)
Apple Intelligence must be enabled in Settings

Practical Use Cases

Here are some real-world features you can build today:

Smart Journaling: Auto-generate mood tags and summaries from journal entries using guided generation.

Recipe Parsing: Point the model at unstructured recipe text and extract structured ingredient lists and steps.

Customer Support: Build an in-app assistant that uses tool calling to access order history and FAQs without any cloud dependency.

Content Moderation: Use the content tagging adapter to automatically classify user-generated content.

Personalized Learning: Generate quiz questions based on study material, all processed locally.

Workout Insights: Combine tool calling with HealthKit to generate natural language summaries of fitness data.

Limitations to Keep in Mind

The Foundation Models framework is powerful, but it's not a silver bullet:

Context window is limited — the ~3B model can't handle massive prompts
Not for general world knowledge — it's optimized for tasks, not trivia
Apple explicitly warns against using it for: code generation, math calculations, or factual Q&A
Device-dependent — older devices can't run it, so always have a fallback
Adapter retraining — every OS update with a new model version means retraining your adapters

What This Means for iOS Development

The Foundation Models framework represents a fundamental shift. For the first time, iOS developers have access to a production-quality LLM with:

Zero marginal cost per inference
Native Swift API that feels like any other Apple framework
Type-safe outputs through guided generation
Built-in privacy without any extra work

This isn't just another AI API wrapper. It's a deeply integrated, first-party framework that makes on-device intelligence a realistic feature for apps of all sizes — from indie side projects to enterprise applications.

If you haven't started experimenting with it yet, now's the time. The barrier to adding AI features to your iOS app has never been lower.

Requirements: Xcode 26 + iOS 26 SDK + Apple Intelligence-enabled device

Documentation: Foundation Models | Apple Developer Documentation