The Foundation Models Framework: Diving Into Apple Intelligence
#
Introduction
Imagine opening your iPhone today and having it act like a smart assistant that runs entirely on your device. That’s now possible thanks to Apple’s new Foundation Models Framework – an on-device large language model (LLM) at the core of “Apple Intelligence.” In this episode, we explore how this on-device AI powers apps and features on our Apple hardware, while keeping data private and offline. We’ll pass examples of journaling prompts written by your own phone and travel itineraries generated in a tap. All thanks to the Foundation Models framework.
This episode might be a bit more technical than usual. This is because of new technology being used, but in some cases that ready yet. It’s hard to explain personal usage for features that have not been out yet. Which is part of why this next episode took a bit longer, as in this podcast I try to explain cool Apple features translating in to real world benefits.
Apple made this clear though: developers can now “tap directly into the on-device, large language model at the core of Apple”. In other words, Apple has packaged a 3-billion-parameter AI model into iOS, iPadOS, macOS, and visionOS. This model are the brains behind Apple Intelligence, and it’s available via a Swift API called FoundationModels. You are right to think features like personalized search suggestions and predictive keyboards were already part of a longer Machine Learning streak by Apple. But what I tell you even story-driven game dialogue can now be run locally on your device, protecting privacy and letting features like this work offline.
How It Works
At its core, the FoundationModels framework exposes a powerful AI model in our devices. Apple says it’s “several orders of magnitude bigger than any other models that are part of the operating system”. Like I just mentioned, the on-device model has 3 billion parameters. By comparison, models like OpenAI’s chatGPT have on the order of hundreds of billions of parameters, so Apple’s is smaller – but it’s carefully optimized for phones or laptops. Apple calls it a “device-scale model” designed for tasks like content generation, text summarization, extraction, classification, and natural language understanding.
Practically speaking, this means the model excels at things like summarizing articles, answering questions about your personal data, or filling out structured content – tasks where it can leverage the context of what’s already on your device. It is not meant for general world knowledge or heavy reasoning about current events. Apple notes the on-device model can’t handle the most complex tasks that cloud-based AIs can. In short, the FoundationModels AI is a compromise: smaller than a cloud LLM, but fast, private, and offline-capable. It uses the Neural Engine together with CPU and GPU in Apple silicon with smart engineering — to cram “an elephant into a keyhole,” as one engineer puts it.
Crucially, everything happens on-device. Your prompts, your personal notes, your conversations – all of it is processed locally. Apple and engineers emphasize this point: “All of this runs on-device, so all data going into and out of the model stays private. That also means it can run offline!”. In practice, that means your travel itinerary app could craft a trip plan without sending your plans to the cloud, and your journaling app could analyze your entries without ever uploading them. This is a “privacy-first” architecture – your data don’t need to leave the secure enclave or Apple servers. There are no hidden API fees or usage limits – which makes it “free of cost” to users, built into the OS.
A Few Examples
Let’s make this more tangible. How might you experience this on-device AI in daily use? For one, it enhances writing and note-taking tools. Imagine writing a diary entry or taking notes about your day. A journaling app like Stoic now taps into the FoundationModels framework to boost creativity. According to Apple, Stoic “generates hyperpersonal journaling prompts from recent entries,” and even provides summaries of past entries – all on-device. In practice, I might write about feeling tired, and the app could suggest, “It seems like you slept late last week; would you like a restful reflection prompt?” The key: my personal diary never leaves my phone, yet I get smart, context-aware suggestions.
Document apps see huge gains too. For example, Signeasy (a PDF editor) can “generate summaries, highlight key points, and support a conversational interface where users can ask document-specific questions”. That means you could feed a long contract into the app and ask, “What are the key deliverables?” and it will answer concisely, without sending your confidential document to the cloud. Likewise, note-taking apps (like the new “Ask Agenda” assistant) can scan your notes and answer queries about them.
Even fitness and travel apps get smarter. A workout app called SmartGym uses it to summarize your exercise data and suggest next workouts. The task manager (OmniFocus) can generate packing lists for your upcoming trip. In educational apps, AI-driven explanations can pop up and give conversational answers about scientific terms. In video tools, a storyboard outline can be turned into a full teleprompter script. In each case, the content (exercise stats, lesson content, PDF text) is processed by the on-device model to generate new content or insights.
Many of these examples echo my personal use as well. For example writing this episode, so I’m excited that Apple’s AI can help with text generation and summarization. In my day-to-day, I might use it to summarize meeting notes or brainstorm email drafts. And yes, all of that could happen without an internet connection, unlike cloud chatbots. In iOS 26 and macOS 26, I expect to see new app updates that highlight “Apple Intelligence” features – and behind the scenes it’s this FoundationModels framework at work.
There’s an app called SmartFruit. This highlights another part of an on-device model, which is speed. This is like any other chatbot, of course with the limits of the smaller on-device model, but I recommend trying it out just to see how fast on-device models can be. And this is on your phone!
The Technology
A big challenge with language models is that they usually output plain text, which apps must then parse. Apple tackles this with Guided Generation, a developer-friendly innovation. In plain terms, developers can define exactly the structure of the response they want (using Swift types), and the framework ensures the AI’s output matches that structure. For instance, if an app needs a JSON-like itinerary object with days and locations, the developer marks up a Swift struct (a struct is used to store variables of different data types) with macros like @Generable and @Guide. Then the model will “generate an instance” of that type, instead of a free-form text blob. It’s hard to explain exactly what this means and if you’re interested watch the Deep dive into the Foundation Models framework video on the Apple Developer website.
In practice this means no more fragile text scraping. Apple explains that without Guided Generation, you’d have to prompt the AI to output JSON or CSV formats and hack around when it makes mistakes. Instead, the new API does constrained decoding to guarantee the structure. The code macro system ensures the AI physically can’t break the schema. The outcome are apps that immediately give back a rich Swift object they can use. For us as users, this means apps’ AI features are more reliable and robust.
Behind the scenes, Guided Generation makes the model focus on content rather than format. It also lets the system optimize the AI runtime. Apple’s documentation emphasizes that with Guided Generation, “your prompts can be simpler and focused on desired behavior instead of the format,” improving accuracy. In everyday terms, if I ask my travel app “Plan a one-day itinerary in Paris,” the app can treat the response as a full “Itinerary” object it knows how to display. The result feels smooth, gives a formatted list, with dates and places that all line up in the UI.
Another way Apple enhances the user experience is through streaming. When I ask a long question, the AI doesn’t make me wait for the final answer; instead, it can stream bits of the answer as they come. But rather than sending raw text tokens, FoundationModels streams “snapshots” – partial structured results. For example, if generating an itinerary with five stops, the app might display the first two stops while the model is still working on the rest. This makes the app feel fast and dynamic.
From a developer viewpoint, these snapshots correspond to partially filled-in Swift data structures. The framework handles merging token streams into these partial objects. The details get technical, but the user takeaway is that even offline, the AI’s responses can appear incrementally with smooth animations. Apple encourages clever UI tricks so that waiting for AI output becomes a delight, not a delay.
One of the most powerful features is Tool Calling. This lets the on-device AI model execute custom code (tools) defined by the app, mid-conversation. Think of it as the AI saying, “I need to check this out” and then running a function in your app. For example, if the user asks “What’s the weather in San Francisco?” your app could provide a WeatherTool. When the model decides it needs that info, it emits a call to that tool, the framework runs it (fetching real weather), and feeds the result back to the model. The final answer the model gives includes the tool’s output.
For you as the user, this means answers can include up-to-date, factual data. Even though the on-device model itself doesn’t know today’s weather or tomorrow’s calendar events, it can pull in current information via these tools. Apple’s documents explain that: “Tool calling enables an AI model to execute custom code within an app… [the model] can autonomously decide when to use external tools to retrieve information or perform actions”. In other words, the AI can orchestrate complex tasks by calling multiple tools in a session.
This system is why Apple says developers can connect the model to sources like MapKit, WeatherKit, or personal calendars, all privately. For example, a travel app’s AI could call mapping tools to get restaurant lists or hotel info in real time. A coding app could let AI call code generators. Everything stays on your device. A writer’s app could even call an “editor” tool to check grammar. This means functions of an app becomes a tool the AI model can use by itself; like weather data, database queries, API calls and the AI model decides if and when to use them.
I suspect this is what Apple wants Siri to do in the future, but that instead of app functions more API-like calls on the internet.
The FoundationModels framework also supports stateful sessions, meaning the AI can remember previous exchanges. In practice, when your app starts a session with the model, all your prompts and the model’s responses are kept in a transcript. You can prompt it multiple times, and it will understand references. For example, if you’ve been talking about generating haikus and you say “do another one,” the model will know to write a new haiku. The session API automatically handles the context window, so developers don’t have to resend the entire conversation each time. Apple’s own blog notes: “Multi-turn conversations maintain context. The model remembers what you discussed three prompts ago.”.
Users will feel this as a more natural chat-like interaction. If I’m using an AI-powered assistant in an app, I can follow up on previous answers seamlessly. The developers can also set a “system instruction” at the start (like saying “Be a friendly travel guide”), which the model is trained to follow over user prompts (protecting against unwanted prompt hacks). All of this is optional, but it means the AI behaves more like a consistent assistant once you’ve initialized a session. It’s a bit like Apple added memory to Siri, but in each individual app.
To summarize the technology side:
- Integrated on-device LLM: A big but phone-friendly AI.
- Swift-based API with guided generation: Letting developers use a Swift framework (FoundationModels) that makes querying this AI easier.
- Structured responses: The system can stream partial answers for responsive UIs, and even return results in rich structured formats.
- Tool calling: The AI can call out to custom app code (like checking weather or calendar) when needed.
- Stateful conversations: Apps create sessions so the AI “remembers” context and prior turns.
- Privacy & offline: Everything runs on the device.
Strengths and Weaknesses
It’s also important to remember the limits. It’s not common, but for Apple Intelligence to work, device and OS requirements are that FoundationModels only runs on Apple Intelligence–enabled hardware with the latest OS. If you’re on an older iPhone, these features won’t work (and Apple advises apps to provide fallback experiences for developers). Only “Apple’s on-device version of Apple Intelligence” is available to developers, not the cloud servers. The system even provides an availability API so apps can disable or hide features if your device isn’t compatible.
Apple also built in error and safety guards. For example, if the model tries to do something disallowed (like generate unsafe content), the framework can throw guardrail errors. Apps should handle these gracefully. And because it obeys instructions over prompts, developers are guided to keep user prompts from altering the AI’s core behavior.
Finally, the on-device model isn’t magical. It has a world knowledge disability and no internet access. If you ask about this week’s news, it simply won’t know. That’s by design: the focus is on your own data and context. But Apple knew this limitation, so they built in those tools. By giving the model tools (like a “current date” tool or a “search the web” tool if allowed), apps can fetch real-time facts when needed. This way, even if the AI itself isn’t connected to the internet, it can still deliver up-to-date answers via the app’s code. But my hunch is this is what the Apple and Google collaboration is about.
Why Apple’s Way Matters
For skeptics, the big question is: Why not just use the cloud? Apple’s answer is privacy and integration. All data stays on your device – a core Apple value. No worrying that your stuff is being mined by a remote server. And you get features even with poor connectivity. Also, it’s free to use (no subscription for each app), and Apple can tie it into system features seamlessly. For instance, Apple is integrating it into Xcode for developers, and even allowing ChatGPT in Xcode if you want – though the FoundationModels stuff is Apple’s own on-device tech.
Critics do note that Apple’s approach is cautious, pointing out that Apple’s ambitions seem “modest” compared to big AI announcements the past few years. The company acknowledges this model can’t replace full-scale cloud AIs for everything. But fans appreciate that Apple is playing to its strengths: hardware acceleration with custom chips, and privacy. Apple is betting that most consumers will value fast, offline AI assistants that keep their data safe, even if they’re not as all-knowing as OpenAI’s or Google’s.
As someone who lives in the Apple ecosystem, it’s interesting how each technology company approaches this radically different. My day-to-day involves a lot of writing and summarization. Having an on-device AI help draft emails, summarize notes, or brainstorm ideas – all without privacy compromise – feels like a natural evolution. I can also see how this will encourage app makers to add clever new features. For example, I could imagine a creative writing app that chats with me to develop a story, or a photo gallery that auto-generates captions or albums based on content. And because it’s on-device, these features could keep improving with each OS update.
Of course, I will test the limits. I expect the AI will sometimes “hallucinate” or give outdated info, especially if it’s not careful. But Apple’s tool-calling and guided design should help reduce those errors. I’ll also enjoy not paying any usage fees, not having usage limits and no data leaving my device. That freedom could create a lot of innovation in apps.
Looking ahead, it’s just the beginning. Apple says it will continue improving its models and building new additions. As a user, I’ll be watching how iOS 26 and macOS 26 unfold. The promise is an iPhone or Mac that feels more helpful and creative, yet still line up with Apple’s privacy ethos.
Apple Intelligence is the engine behind a new generation of intelligent app features on Apple devices. It’s different from cloud AI – focusing on privacy, offline use, and tight integration. With developer tools for structured output, streaming, tool-calling, and context. As we’ve seen, it’s already powering smarter journaling, smarter editing, smarter fitness coaches, and more. Whether you’re an Apple enthusiast or a healthy skeptic, the era of AI in every Apple app has clearly arrived. And if you desperately need to chat with your Google search bar…..there’s still….an app for that.