#GenerativeAI

The Intelligent OS: Making AI agents more helpful for Android apps

Seharakram60 — Wed, 25 Feb 2026 23:47:00 +0000

Posted by Matthew McCullough, VP of Product Management, Android Development

User expectations for AI on their devices are fundamentally shifting how they interact with their apps. Instead of opening apps to do tasks step-by-step, they’re asking AI to do the heavy lifting for them. In this new interaction model, success is shifting from getting users to open your app, to successfully fulfilling their tasks and helping them get more done faster.

To help you evolve your apps for this agentic future, we’re introducing early stage developer capabilities that bridge the gap between your apps and agentic apps and personalized assistants, such as Google Gemini. While we are in the early, beta stages of this journey, we’re designing these features with privacy and security at their core as our first step in exploring this paradigm shift as an app ecosystem.

Empowering apps with AppFunctions

Android AppFunctions allows apps to expose data and functionality directly to AI agents and assistants. With the AppFunctions Jetpack library and platform APIs, developers can create self-describing functions that agentic apps can discover and execute via natural language. Mirroring how backend capabilities are declared via MCP cloud servers, AppFunctions provides an on-device solution for Android apps. Much like WebMCP, it executes these functions locally on the device rather than on a server.

The Samsung Gallery integration with Gemini on the Galaxy S26 series showcases AppFunctions in action. Instead of manually scrolling through photo albums, you can now simply ask Gemini to “Show me pictures of my cat from Samsung Gallery.” Gemini takes the user query, intelligently identifies and triggers the right function, and presents the returned photos from Samsung Gallery directly in the Gemini app, so users never need to leave. This experience is multimodal and can be done via voice or text. Users can even use the returned photos in follow-up conversations, like sending them to friends in a text message.

This integration is currently available on the Galaxy S26 series and will soon expand to Samsung devices running OneUI 8.5 and higher. Through AppFunctions, Gemini can already automate tasks across app categories like Calendar, Notes, and Tasks, on devices from multiple manufacturers. Whether it’s coordinating calendar events, organizing notes, or setting to-do reminders, users can streamline daily activities in one place.

Enabling agentic apps with intelligent UI automation

While AppFunctions provides a structured framework and more control for apps to communicate with AI agents and assistants, we know that not every interaction has a dedicated integration yet. We’re also developing a UI automation framework for AI agents and assistants to intelligently execute generic tasks on users’ installed apps, with user transparency and control built in. This is the platform doing the heavy lifting, so developers can get agentic reach with zero code. It’s a low-effort way to extend their reach without a major engineering lift right now.

To get feedback as we refine this framework, we’re starting with an early preview on the Galaxy S26 series and select Pixel 10 devices, where users will be able to delegate multi-step tasks to Gemini with just a long press of the power button. Launching as a beta feature in the Gemini app, this will support a curated selection of apps in the food delivery, grocery, and rideshare categories in the US and Korea to start. Whether users need to place a complex pizza order for their family members with particular tastes, coordinate a multi-stop rideshare with co-workers, or reorder their last grocery purchase, Gemini can help complete tasks using the context already available from your apps, without any developer work needed.

Users are in control while a task is being actioned in the background through UI automation. For any automation action, users have the option to monitor a task’s progress via notifications or “live view” and can switch to manual control at any point to take over the experience. Gemini is also designed to alert users before completing sensitive tasks, such as making a purchase.

Looking ahead

In Android 17, we’re looking to broaden these capabilities to reach even more users, developers, and device manufacturers.

We are currently building experiences with a small set of app developers, focusing on high-quality user experiences as the ecosystem evolves. We plan to share more details later this year on how you can use AppFunctions and UI automation to enable agentic integrations for your app. Stay tuned for updates.

The post The Intelligent OS: Making AI agents more helpful for Android apps appeared first on InShot Pro.

The latest Gemini Nano with on-device ML Kit GenAI APIs

Seharakram60 — Fri, 22 Aug 2025 16:00:00 +0000

Posted by Caren Chang – Developer Relations Engineer, Joanna (Qiong) Huang – Software Engineer, and Chengji Yan – Software Engineer

The latest version of Gemini Nano, our most powerful multi-modal on-device model, just launched on the Pixel 10 device series and is now accessible through the ML Kit GenAI APIs. Integrate capabilities such as summarization, proofreading, rewriting, and image description directly into your apps.

With GenAI APIs we’re focused on giving you access to the latest version of Gemini Nano while providing consistent quality across devices and model upgrades. Here’s a sneak peak behind the scenes of some of the things we’ve done to achieve this.

Adapting GenAI APIs for the latest Gemini Nano

We want to make it as easy as possible for you to build AI powered features, using the most powerful models. To ensure GenAI APIs provide consistent quality across different model versions, we make many behind the scenes improvements including rigorous evals and adapter training.

Evaluation pipeline: For each supported language, we prepare an evaluation dataset. We then benchmark the evals through a combination of: LLM-based raters, statistical metrics and human raters.
Adapter training: With results from the evaluation pipeline, we then determine if we need to train feature-specific LoRA adapters to be deployed on top of the Gemini Nano base model. By shipping GenAI APIs with LoRA adapters, we ensure each API meets our quality bar regardless of the version of Gemini Nano running on a device.

The latest Gemini Nano performance

One area we’re excited about is how this updated version of Gemini Nano pushes performance even higher, especially the prefix speed – that is how fast the model processes input.

For example, here are results when running text-to-text and image-to-text benchmarks on a Pixel 10 Pro.

	Prefix Speed – Gemini nano-v2 on Pixel 9 Pro	*Prefix Speed – Gemini nano-v2^ on Pixel 10 Pro**	Prefix Speed – Gemini nano-v3 on Pixel 10 Pro
Text-to-text	510 tokens/second	610 tokens/second	940 tokens/second
Image-to-text	510 tokens/second + 0.8 seconds for image encoding	610 tokens/second + 0.7 seconds for image encoding	940 tokens/second + 0.6 seconds for image encoding

^*Experimentation with Gemini nano-v2 on Pixel 10 Pro for benchmarking purposes. All Pixel 10 Pros launched with Gemini nano-v3.

The future of Gemini Nano with GenAI APIs

As we continue to improve the Gemini Nano model, the team is committed to using the same process to ensure consistent and high quality results from GenAI APIs.

We hope this will significantly reduce the effort to integrate Gemini Nano in your Android apps while still allowing you to take full advantage of new versions and their improved capabilites.

Learn more about GenAI APIs

Start implementing GenAI APIs in your Android apps today with guidance from our official documentation and samples: GenAI API Catalog and ML Kit GenAI APIs quickstart samples.

The post The latest Gemini Nano with on-device ML Kit GenAI APIs appeared first on InShot Pro.

Agentic AI takes Gemini in Android Studio to the next level

Seharakram60 — Mon, 23 Jun 2025 17:00:00 +0000

Posted by Sandhya Mohan – Product Manager, and Jose Alcérreca – Developer Relations Engineer

Software development is undergoing a significant evolution, moving beyond reactive assistants to intelligent agents. These agents don’t just offer suggestions; they can create execution plans, utilize external tools, and make complex, multi-file changes. This results in a more capable AI that can iteratively solve challenging problems, fundamentally changing how developers work.

At Google I/O 2025, we offered a glimpse into our work on agentic AI in Android Studio, the integrated development environment (IDE) focused on Android development. We showcased that by combining agentic AI with the built-in portfolio of tools inside of Android Studio, the IDE is able to assist you in developing Android apps in ways that were never possible before. We are now incredibly excited to announce the next frontier in Android development with the availability of ‘Agent Mode’ for Gemini in Android Studio.

These features are available in the latest Android Studio Narwhal Feature Drop Canary release, and will be rolled out to business tier subscribers in the coming days. As with all new Android Studio features, we invite developers to provide feedback to direct our development efforts and ensure we are creating the tools you need to build better apps, faster.

Agent Mode

Gemini in Android Studio’s Agent Mode is a new experimental capability designed to handle complex development tasks that go beyond what you can experience by just chatting with Gemini.

With Agent Mode, you can describe a complex goal in natural language — from generating unit tests to complex refactors — and the agent formulates an execution plan that can span multiple files in your project and executes under your direction. Agent Mode uses a range of IDE tools for reading and modifying code, building the project, searching the codebase and more to help Gemini complete complex tasks from start to finish with minimal oversight from you.

To use Agent Mode, click Gemini in the sidebar, then select the Agent tab, and describe a task you’d like the agent to perform. Some examples of tasks you can try in Agent Mode include:

Build my project and fix any errors
Extract any hardcoded strings used across my project and migrate to strings.xml
Add support for dark mode to my application
Given an attached screenshot, implement a new screen in my application using Material 3

The agent then suggests edits and iteratively fixes bugs to complete tasks. You can review, accept, or reject the proposed changes along the way, and ask the agent to iterate on your feedback.

Gemini breaks tasks down into a plan with simple steps. It also shows the list of IDE tools it needs to complete each step.

While powerful, you are firmly in control, with the ability to review, refine and guide the agent’s output at every step. When the agent proposes code changes, you can choose to accept or reject them.

The Agent waits for the developer to approve or reject a change.

Additionally, you can enable “Auto-approve” if you are feeling lucky — especially useful when you want to iterate on ideas as rapidly as possible.

You can delegate routine, time-consuming work to the agent, freeing up your time for more creative, high-value work. Try out Agent Mode in the latest preview version of Android Studio – we look forward to seeing what you build! We are investing in building more agentic experiences for Gemini in Android Studio to make your development even more intuitive, so you can expect to see more agentic functionality over the next several releases.

Gemini is capable of understanding the context of your app

Supercharge Agent Mode with your Gemini API key

The default Gemini model has a generous no-cost daily quota with a limited context window. However, you can now add your own Gemini API key to expand Agent Mode’s context window to a massive 1 million tokens with Gemini 2.5 Pro.

A larger context window lets you send more instructions, code and attachments to Gemini, leading to even higher quality responses. This is especially useful when working with agents, as the larger context provides Gemini 2.5 Pro with the ability to reason about complex or long-running tasks.

Add your API key in the Gemini settings

To enable this feature, get a Gemini API key by navigating to Google AI Studio. Sign in and get a key by clicking on the “Get API key” button. Then, back in Android Studio, navigate to the settings by going to File (Android Studio on macOS) > Settings > Tools > Gemini to enter your Gemini API key. Relaunch Gemini in Android Studio and get even better responses from Agent Mode.

Be sure to safeguard your Gemini API key, as additional charges apply for Gemini API usage associated with a personal API key. You can monitor your Gemini API key usage by navigating to AI Studio and selecting Get API key > Usage & Billing.

Note that business tier subscribers already get access to Gemini 2.5 Pro and the expanded context window automatically with their Gemini Code Assist license, so these developers will not see an API key option.

Model Context Protocol (MCP)

Gemini in Android Studio’s Agent Mode can now interact with external tools via the Model Context Protocol (MCP). This feature provides a standardized way for Agent Mode to use tools and extend knowledge and capabilities with the external environment.

There are many tools you can connect to the MCP Host in Android Studio. For example you could integrate with the Github MCP Server to create pull requests directly from Android Studio. Here are some additional use cases to consider.

In this initial release of MCP support in the IDE you will configure your MCP servers through a mcp.json file placed in the configuration directory of Studio, using the following format:

{
  "mcpServers": {
    "memory": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-memory"
      ]
    },
    "sequential-thinking": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-sequential-thinking"
      ]
    },
    "github": {
      "command": "docker",
      "args": [
        "run",
        "-i",
        "--rm",
        "-e",
        "GITHUB_PERSONAL_ACCESS_TOKEN",
        "ghcr.io/github/github-mcp-server"
      ],
      "env": {
        "GITHUB_PERSONAL_ACCESS_TOKEN": ""
      }
    }
  }  
}

Example configuration with two MCP servers

For this initial release, we support interacting with external tools via the stdio transport as defined in the MCP specification. We plan to support the full suite of MCP features in upcoming Android Studio releases, including the Streamable HTTP transport, external context resources, and prompt templates.

For more information on how to use MCP in Studio, including the mcp.json configuration file format, please refer to the Android Studio MCP Host documentation.

By delegating routine tasks to Gemini through Agent Mode, you’ll be able to focus on more innovative and enjoyable aspects of app development. Download the latest preview version of Android Studio on the canary release channel today to try it out, and let us know how much faster app development is for you!

As always, your feedback is important to us – check known issues, report bugs, suggest improvements, and be part of our vibrant community on LinkedIn, Medium, YouTube, or X. Let’s build the future of Android apps together!

The post Agentic AI takes Gemini in Android Studio to the next level appeared first on InShot Pro.

Top 3 things to know for AI on Android at Google I/O ‘25

Seharakram60 — Mon, 16 Jun 2025 16:00:00 +0000

Posted by Kateryna Semenova – Sr. Developer Relations Engineer

AI is reshaping how users interact with their favorite apps, opening new avenues for developers to create intelligent experiences. At Google I/O, we showcased how Android is making it easier than ever for you to build smart, personalized and creative apps. And we’re committed to providing you with the tools needed to innovate across the full development stack in this evolving landscape.

This year, we focused on making AI accessible across the spectrum, from on-device processing to cloud-powered capabilities. Here are the top 3 announcements you need to know for building with AI on Android from Google I/O ‘25:

#1 Leverage the efficiency of Gemini Nano for on-device AI experiences

For on-device AI, we announced a new set of ML Kit GenAI APIs powered by Gemini Nano, our most efficient and compact model designed and optimized for running directly on mobile devices. These APIs provide high-level, easy integration for common tasks including text summarization, proofreading, rewriting content in different styles, and generating image description. Building on-device offers significant benefits such as local data processing and offline availability at no additional cost for inference. To start integrating these solutions explore the ML Kit GenAI documentation, the sample on GitHub and watch the “Gemini Nano on Android: Building with on-device GenAI” talk.

#2 Seamlessly integrate on-device ML/AI with your own custom models

The Google AI Edge platform enables building and deploying a wide range of pretrained and custom models on edge devices and supports various frameworks like TensorFlow, PyTorch, Keras, and Jax, allowing for more customization in apps. The platform now also offers improved support of on-device hardware accelerators and a new AI Edge Portal service for broad coverage of on-device benchmarking and evals. If you are looking for GenAI language models on devices where Gemini Nano is not available, you can use other open models via the MediaPipe LLM Inference API.

Serving your own custom models on-device can pose challenges related to handling large model downloads and updates, impacting the user experience. To improve this, we’ve launched Play for On-Device AI in beta. This service is designed to help developers manage custom model downloads efficiently, ensuring the right model size and speed are delivered to each Android device precisely when needed.

For more information watch “Small language models with Google AI Edge” talk.

#3 Power your Android apps with Gemini Flash, Pro and Imagen using Firebase AI Logic

For more advanced generative AI use cases, such as complex reasoning tasks, analyzing large amounts of data, processing audio or video, or generating images, you can use larger models from the Gemini Flash and Gemini Pro families, and Imagen running in the cloud. These models are well suited for scenarios requiring advanced capabilities or multimodal inputs and outputs. And since the AI inference runs in the cloud any Android device with an internet connection is supported. They are easy to integrate into your Android app by using Firebase AI Logic, which provides a simplified, secure way to access these capabilities without managing your own backend. Its SDK also includes support for conversational AI experiences using the Gemini Live API or generating custom contextual visual assets with Imagen. To learn more, check out our sample on GitHub and watch “Enhance your Android app with Gemini Pro and Flash, and Imagen” session.

These powerful AI capabilities can also be brought to life in immersive Android XR experiences. You can find corresponding documentation, samples and the technical session: “The future is now, with Compose and AI on Android XR“.

Figure 1: Firebase AI Logic integration architecture

Get inspired and start building with AI on Android today

We released a new open source app, Androidify, to help developers build AI-driven Android experiences using Gemini APIs, ML Kit, Jetpack Compose, CameraX, Navigation 3, and adaptive design. Users can create personalized Android bot with Gemini and Imagen via the Firebase AI Logic SDK. Additionally, it incorporates ML Kit pose detection to detect a person in the camera viewfinder. The full code sample is available on GitHub for exploration and inspiration. Discover additional AI examples in our Android AI Sample Catalog.

The original image and Androidifi-ed image

Choosing the right Gemini model depends on understanding your specific needs and the model’s capabilities, including modality, complexity, context window, offline capability, cost, and device reach. To explore these considerations further and see all our announcements in action, check out the AI on Android at I/O ‘25 playlist on YouTube and check out our documentation.

We are excited to see what you will build with the power of Gemini!

The post Top 3 things to know for AI on Android at Google I/O ‘25 appeared first on InShot Pro.

On-device GenAI APIs as part of ML Kit help you easily build with Gemini Nano

Seharakram60 — Fri, 30 May 2025 12:00:23 +0000

Posted by Caren Chang – Developer Relations Engineer, Chengji Yan – Software Engineer, Taj Darra – Product Manager

We are excited to announce a set of on-device GenAI APIs, as part of ML Kit, to help you integrate Gemini Nano in your Android apps.

To start, we are releasing 4 new APIs:

Summarization: to summarize articles and conversations
Proofreading: to polish short text
Rewriting: to reword text in different styles
Image Description: to provide short description for images

Key benefits of GenAI APIs

GenAI APIs are high level APIs that allow for easy integration, similar to existing ML Kit APIs. This means you can expect quality results out of the box without extra effort for prompt engineering or fine tuning for specific use cases.

GenAI APIs run on-device and thus provide the following benefits:

Input, inference, and output data is processed locally
Functionality remains the same without reliable internet connection
No additional cost incurred for each API call

To prevent misuse, we also added safety protection in various layers, including base model training, safety-aware LoRA fine-tuning, input and output classifiers and safety evaluations.

How GenAI APIs are built

There are 4 main components that make up each of the GenAI APIs.

Gemini Nano is the base model, as the foundation shared by all APIs.
Small API-specific LoRA adapter models are trained and deployed on top of the base model to further improve the quality for each API.
Optimized inference parameters (e.g. prompt, temperature, topK, batch size) are tuned for each API to guide the model in returning the best results.
An evaluation pipeline ensures quality in various datasets and attributes. This pipeline consists of: LLM raters, statistical metrics and human raters.

Together, these components make up the high-level GenAI APIs that simplify the effort needed to integrate Gemini Nano in your Android app.

Evaluating quality of GenAI APIs

For each API, we formulate a benchmark score based on the evaluation pipeline mentioned above. This score is based on attributes specific to a task. For example, when evaluating the summarization task, one of the attributes we look at is “grounding” (ie: factual consistency of generated summary with source content).

To provide out-of-box quality for GenAI APIs, we applied feature specific fine-tuning on top of the Gemini Nano base model. This resulted in an increase for the benchmark score of each API as shown below:

Use case in English	Gemini Nano Base Model	ML Kit GenAI API
Summarization	77.2	92.1
Proofreading	84.3	90.2
Rewriting	79.5	84.1
Image Description	86.9	92.3

In addition, this is a quick reference of how the APIs perform on a Pixel 9 Pro:

	Prefix Speed (input processing rate)	Decode Speed (output generation rate)
Text-to-text	510 tokens/second	11 tokens/second
Image-to-text	510 tokens/second + 0.8 seconds for image encoding	11 tokens/second

Sample usage

This is an example of implementing the GenAI Summarization API to get a one-bullet summary of an article:

val articleToSummarize = "We are excited to announce a set of on-device generative AI APIs..."

// Define task with desired input and output format
val summarizerOptions = SummarizerOptions.builder(context)
    .setInputType(InputType.ARTICLE)
    .setOutputType(OutputType.ONE_BULLET)
    .setLanguage(Language.ENGLISH)
    .build()
val summarizer = Summarization.getClient(summarizerOptions)

suspend fun prepareAndStartSummarization(context: Context) {
    // Check feature availability. Status will be one of the following: 
    // UNAVAILABLE, DOWNLOADABLE, DOWNLOADING, AVAILABLE
    val featureStatus = summarizer.checkFeatureStatus().await()

    if (featureStatus == FeatureStatus.DOWNLOADABLE) {
        // Download feature if necessary.
        // If downloadFeature is not called, the first inference request will 
        // also trigger the feature to be downloaded if it's not already
        // downloaded.
        summarizer.downloadFeature(object : DownloadCallback {
            override fun onDownloadStarted(bytesToDownload: Long) { }

            override fun onDownloadFailed(e: GenAiException) { }

            override fun onDownloadProgress(totalBytesDownloaded: Long) {}

            override fun onDownloadCompleted() {
                startSummarizationRequest(articleToSummarize, summarizer)
            }
        })    
    } else if (featureStatus == FeatureStatus.DOWNLOADING) {
        // Inference request will automatically run once feature is      
        // downloaded.
        // If Gemini Nano is already downloaded on the device, the   
        // feature-specific LoRA adapter model will be downloaded very  
        // quickly. However, if Gemini Nano is not already downloaded, 
        // the download process may take longer.
        startSummarizationRequest(articleToSummarize, summarizer)
    } else if (featureStatus == FeatureStatus.AVAILABLE) {
        startSummarizationRequest(articleToSummarize, summarizer)
    } 
}

fun startSummarizationRequest(text: String, summarizer: Summarizer) {
    // Create task request  
    val summarizationRequest = SummarizationRequest.builder(text).build()

    // Start summarization request with streaming response
    summarizer.runInference(summarizationRequest) { newText -> 
        // Show new text in UI
    }

    // You can also get a non-streaming response from the request
    // val summarizationResult = summarizer.runInference(summarizationRequest)
    // val summary = summarizationResult.get().summary
}

// Be sure to release the resource when no longer needed
// For example, on viewModel.onCleared() or activity.onDestroy()
summarizer.close()

For more examples of implementing the GenAI APIs, check out the official documentation and samples on GitHub:

Use cases

Here is some guidance on how to best use the current GenAI APIs:

For Summarization, consider:

Conversation messages or transcripts that involve 2 or more users

Articles or documents less than 4000 tokens (or about 3000 English words). Using the first few paragraphs for summarization is usually good enough to capture the most important information.

For Proofreading and Rewriting APIs, consider utilizing them during the content creation process for short content below 256 tokens to help with tasks such as:

Refining messages in a particular tone, such as more formal or more casual

Polishing personal notes for easier consumption later

For the Image Description API, consider it for:

Generating titles of images

Generating metadata for image search

Utilizing descriptions of images in use cases where the images themselves cannot be displayed, such as within a list of chat messages

Generating alternative text to help visually impaired users better understand content as a whole

GenAI API in production

Envision is an app that verbalizes the visual world to help people who are blind or have low vision lead more independent lives. A common use case in the app is for users to take a picture to have a document read out loud. Utilizing the GenAI Summarization API, Envision is now able to get a concise summary of a captured document. This significantly enhances the user experience by allowing them to quickly grasp the main points of documents and determine if a more detailed reading is desired, saving them time and effort.

Supported devices

GenAI APIs are available on Android devices using optimized MediaTek Dimensity, Qualcomm Snapdragon, and Google Tensor platforms through AICore. For a comprehensive list of devices that support GenAI APIs, refer to our official documentation.

Learn more

Start implementing GenAI APIs in your Android apps today with guidance from our official documentation and samples on GitHub: AI Catalog GenAI API Samples with Compose, ML Kit GenAI APIs Quickstart.

The post On-device GenAI APIs as part of ML Kit help you easily build with Gemini Nano appeared first on InShot Pro.