ML Kit

How Automated Prompt Optimization Unlocks Quality Gains for ML Kit’s GenAI Prompt API

Seharakram60 — Sun, 01 Feb 2026 12:06:39 +0000

Posted by Chetan Tekur, PM at AI Innovation and Research, Chao Zhao, SWE at AI Innovation and Research, Paul Zhou, Prompt Quality Lead at GCP Cloud AI and Industry Solutions, and Caren Chang, Developer Relations Engineer at Android

To further help bring your ML Kit Prompt API use cases to production, we are excited to announce Automated Prompt Optimization (APO) targeting On-Device models on Vertex AI. Automated Prompt Optimization is a tool that helps you automatically find the optimal prompt for your use cases.

The era of On-Device AI is no longer a promise—it is a production reality. With the release of Gemini Nano v3, we are placing unprecedented language understanding and multimodal capabilities directly into the palms of users. Through the Gemini Nano family of models, we have wide coverage of supported devices across the Android Ecosystem. But for developers building the next generation of intelligent apps, access to a powerful model is only step one. The real challenge lies in customization: How do you tailor a foundation model to expert-level performance for your specific use case without breaking the constraints of mobile hardware?

In the server-side world, the larger LLMs tend to be highly capable and require less domain adaptation. Even when needed, more advanced options such as LoRA (Low-Rank Adaptation) fine-tuning can be feasible options. However, the unique architecture of Android AICore prioritizes a shared, memory-efficient system model. This means that deploying custom LoRA adapters for every individual app comes with challenges on these shared system services.

But there is an alternate path that can be equally impactful. By leveraging Automated Prompt Optimization (APO) on Vertex AI, developers can achieve quality approaching fine-tuning, all while working seamlessly within the native Android execution environment. By focusing on superior system instruction, APO enables developers to tailor model behavior with greater robustness and scalability than traditional fine-tuning solutions.

Note: Gemini Nano V3 is a quality optimized version of the highly acclaimed Gemma 3N model. Any prompt optimizations that are made on the open source Gemma 3N model will apply to Gemini Nano V3 as well. On supported devices, ML Kit GenAI APIs leverage the nano-v3 model to maximize the quality for Android Developers.

Automated Prompt Optimization (APO)

APO treats the prompt not as a static text, but as a programmable surface that can be optimized. It leverages server-side models (like Gemini Pro and Flash) to propose prompts, evaluate variations and find the optimal one for your specific task. This process employs three specific technical mechanisms to maximize performance:

Automated Error Analysis: APO analyzes error patterns from training data to Automatically identify specific weaknesses in the initial prompt.
Semantic Instruction Distillation: It analyzes massive training examples to distill the “true intent” of a task, creating instructions that more accurately reflect the real data distribution.
Parallel Candidate Testing: Instead of testing one idea at a time, APO generates and tests numerous prompt candidates in parallel to identify the global maximum for quality.

Why APO Can Approach Fine Tuning Quality

It is a common misconception that fine-tuning always yields better quality than prompting. For modern foundation models like Gemini Nano v3, prompt engineering can be impactful by itself:

Preserving General capabilities: Fine-tuning ( PEFT/LoRA) forces a model’s weights to over-index on a specific distribution of data. This often leads to “catastrophic forgetting,” where the model gets better at your specific syntax but worse at general logic and safety. APO leaves the weights untouched, preserving the capabilities of the base model.
Instruction Following & Strategy Discovery: Gemini Nano v3 has been rigorously trained to follow complex system instructions. APO exploits this by finding the exact instruction structure that unlocks the model’s latent capabilities, often discovering strategies that might be hard for human engineers to find.

To validate this approach, we evaluated APO across diverse production workloads. Our validation has shown consistent 5-8% accuracy gains across various use cases.Across multiple deployed on-device features, APO provided significant quality lifts.

Use Case

Task Type

Task Description

Metric

APO Improvement

Topic classification

Text classification

Classify a news article into topics such as finance, sports, etc

Accuracy

+5%

Intent classification

Text classification

Classify a customer service query into intents

Accuracy

+8.0%

Webpage translation

Text translation

Translate a webpage from English to a local language

BLEU

+8.57%

A Seamless, End-to-End Developer Workflow

It is a common misconception that fine-tuning always yields better quality than prompting. For modern foundation models like Gemini Nano v3, prompt engineering can be impactful by itself:

Preserving General capabilities: Fine-tuning ( PEFT/LoRA) forces a model’s weights to over-index on a specific distribution of data. This often leads to “catastrophic forgetting,” where the model gets better at your specific syntax but worse at general logic and safety. APO leaves the weights untouched, preserving the capabilities of the base model.
Instruction Following & Strategy Discovery: Gemini Nano v3 has been rigorously trained to follow complex system instructions. APO exploits this by finding the exact instruction structure that unlocks the model’s latent capabilities, often discovering strategies that might be hard for human engineers to find.

Conclusion

The release of Automated Prompt Optimization (APO) marks a turning point for on-device generative AI. By bridging the gap between foundation models and expert-level performance, we are giving developers the tools to build more robust mobile applications. Whether you are just starting with Zero-Shot Optimization or scaling to production with Data-Driven refinement, the path to high-quality on-device intelligence is now clearer. Launch your on-device use cases to production today with ML Kit’s Prompt API and Vertex AI’s Automated Prompt Optimization.

Relevant links:

The post How Automated Prompt Optimization Unlocks Quality Gains for ML Kit’s GenAI Prompt API appeared first on InShot Pro.

The latest Gemini Nano with on-device ML Kit GenAI APIs

Seharakram60 — Fri, 22 Aug 2025 16:00:00 +0000

Posted by Caren Chang – Developer Relations Engineer, Joanna (Qiong) Huang – Software Engineer, and Chengji Yan – Software Engineer

The latest version of Gemini Nano, our most powerful multi-modal on-device model, just launched on the Pixel 10 device series and is now accessible through the ML Kit GenAI APIs. Integrate capabilities such as summarization, proofreading, rewriting, and image description directly into your apps.

With GenAI APIs we’re focused on giving you access to the latest version of Gemini Nano while providing consistent quality across devices and model upgrades. Here’s a sneak peak behind the scenes of some of the things we’ve done to achieve this.

Adapting GenAI APIs for the latest Gemini Nano

We want to make it as easy as possible for you to build AI powered features, using the most powerful models. To ensure GenAI APIs provide consistent quality across different model versions, we make many behind the scenes improvements including rigorous evals and adapter training.

Evaluation pipeline: For each supported language, we prepare an evaluation dataset. We then benchmark the evals through a combination of: LLM-based raters, statistical metrics and human raters.
Adapter training: With results from the evaluation pipeline, we then determine if we need to train feature-specific LoRA adapters to be deployed on top of the Gemini Nano base model. By shipping GenAI APIs with LoRA adapters, we ensure each API meets our quality bar regardless of the version of Gemini Nano running on a device.

The latest Gemini Nano performance

One area we’re excited about is how this updated version of Gemini Nano pushes performance even higher, especially the prefix speed – that is how fast the model processes input.

For example, here are results when running text-to-text and image-to-text benchmarks on a Pixel 10 Pro.

	Prefix Speed – Gemini nano-v2 on Pixel 9 Pro	*Prefix Speed – Gemini nano-v2^ on Pixel 10 Pro**	Prefix Speed – Gemini nano-v3 on Pixel 10 Pro
Text-to-text	510 tokens/second	610 tokens/second	940 tokens/second
Image-to-text	510 tokens/second + 0.8 seconds for image encoding	610 tokens/second + 0.7 seconds for image encoding	940 tokens/second + 0.6 seconds for image encoding

^*Experimentation with Gemini nano-v2 on Pixel 10 Pro for benchmarking purposes. All Pixel 10 Pros launched with Gemini nano-v3.

The future of Gemini Nano with GenAI APIs

As we continue to improve the Gemini Nano model, the team is committed to using the same process to ensure consistent and high quality results from GenAI APIs.

We hope this will significantly reduce the effort to integrate Gemini Nano in your Android apps while still allowing you to take full advantage of new versions and their improved capabilites.

Learn more about GenAI APIs

Start implementing GenAI APIs in your Android apps today with guidance from our official documentation and samples: GenAI API Catalog and ML Kit GenAI APIs quickstart samples.

The post The latest Gemini Nano with on-device ML Kit GenAI APIs appeared first on InShot Pro.

Top 3 things to know for AI on Android at Google I/O ‘25

Seharakram60 — Mon, 16 Jun 2025 16:00:00 +0000

Posted by Kateryna Semenova – Sr. Developer Relations Engineer

AI is reshaping how users interact with their favorite apps, opening new avenues for developers to create intelligent experiences. At Google I/O, we showcased how Android is making it easier than ever for you to build smart, personalized and creative apps. And we’re committed to providing you with the tools needed to innovate across the full development stack in this evolving landscape.

This year, we focused on making AI accessible across the spectrum, from on-device processing to cloud-powered capabilities. Here are the top 3 announcements you need to know for building with AI on Android from Google I/O ‘25:

#1 Leverage the efficiency of Gemini Nano for on-device AI experiences

For on-device AI, we announced a new set of ML Kit GenAI APIs powered by Gemini Nano, our most efficient and compact model designed and optimized for running directly on mobile devices. These APIs provide high-level, easy integration for common tasks including text summarization, proofreading, rewriting content in different styles, and generating image description. Building on-device offers significant benefits such as local data processing and offline availability at no additional cost for inference. To start integrating these solutions explore the ML Kit GenAI documentation, the sample on GitHub and watch the “Gemini Nano on Android: Building with on-device GenAI” talk.

#2 Seamlessly integrate on-device ML/AI with your own custom models

The Google AI Edge platform enables building and deploying a wide range of pretrained and custom models on edge devices and supports various frameworks like TensorFlow, PyTorch, Keras, and Jax, allowing for more customization in apps. The platform now also offers improved support of on-device hardware accelerators and a new AI Edge Portal service for broad coverage of on-device benchmarking and evals. If you are looking for GenAI language models on devices where Gemini Nano is not available, you can use other open models via the MediaPipe LLM Inference API.

Serving your own custom models on-device can pose challenges related to handling large model downloads and updates, impacting the user experience. To improve this, we’ve launched Play for On-Device AI in beta. This service is designed to help developers manage custom model downloads efficiently, ensuring the right model size and speed are delivered to each Android device precisely when needed.

For more information watch “Small language models with Google AI Edge” talk.

#3 Power your Android apps with Gemini Flash, Pro and Imagen using Firebase AI Logic

For more advanced generative AI use cases, such as complex reasoning tasks, analyzing large amounts of data, processing audio or video, or generating images, you can use larger models from the Gemini Flash and Gemini Pro families, and Imagen running in the cloud. These models are well suited for scenarios requiring advanced capabilities or multimodal inputs and outputs. And since the AI inference runs in the cloud any Android device with an internet connection is supported. They are easy to integrate into your Android app by using Firebase AI Logic, which provides a simplified, secure way to access these capabilities without managing your own backend. Its SDK also includes support for conversational AI experiences using the Gemini Live API or generating custom contextual visual assets with Imagen. To learn more, check out our sample on GitHub and watch “Enhance your Android app with Gemini Pro and Flash, and Imagen” session.

These powerful AI capabilities can also be brought to life in immersive Android XR experiences. You can find corresponding documentation, samples and the technical session: “The future is now, with Compose and AI on Android XR“.

Figure 1: Firebase AI Logic integration architecture

Get inspired and start building with AI on Android today

We released a new open source app, Androidify, to help developers build AI-driven Android experiences using Gemini APIs, ML Kit, Jetpack Compose, CameraX, Navigation 3, and adaptive design. Users can create personalized Android bot with Gemini and Imagen via the Firebase AI Logic SDK. Additionally, it incorporates ML Kit pose detection to detect a person in the camera viewfinder. The full code sample is available on GitHub for exploration and inspiration. Discover additional AI examples in our Android AI Sample Catalog.

The original image and Androidifi-ed image

Choosing the right Gemini model depends on understanding your specific needs and the model’s capabilities, including modality, complexity, context window, offline capability, cost, and device reach. To explore these considerations further and see all our announcements in action, check out the AI on Android at I/O ‘25 playlist on YouTube and check out our documentation.

We are excited to see what you will build with the power of Gemini!

The post Top 3 things to know for AI on Android at Google I/O ‘25 appeared first on InShot Pro.

On-device GenAI APIs as part of ML Kit help you easily build with Gemini Nano

Seharakram60 — Fri, 30 May 2025 12:00:23 +0000

Posted by Caren Chang – Developer Relations Engineer, Chengji Yan – Software Engineer, Taj Darra – Product Manager

We are excited to announce a set of on-device GenAI APIs, as part of ML Kit, to help you integrate Gemini Nano in your Android apps.

To start, we are releasing 4 new APIs:

Summarization: to summarize articles and conversations
Proofreading: to polish short text
Rewriting: to reword text in different styles
Image Description: to provide short description for images

Key benefits of GenAI APIs

GenAI APIs are high level APIs that allow for easy integration, similar to existing ML Kit APIs. This means you can expect quality results out of the box without extra effort for prompt engineering or fine tuning for specific use cases.

GenAI APIs run on-device and thus provide the following benefits:

Input, inference, and output data is processed locally
Functionality remains the same without reliable internet connection
No additional cost incurred for each API call

To prevent misuse, we also added safety protection in various layers, including base model training, safety-aware LoRA fine-tuning, input and output classifiers and safety evaluations.

How GenAI APIs are built

There are 4 main components that make up each of the GenAI APIs.

Gemini Nano is the base model, as the foundation shared by all APIs.
Small API-specific LoRA adapter models are trained and deployed on top of the base model to further improve the quality for each API.
Optimized inference parameters (e.g. prompt, temperature, topK, batch size) are tuned for each API to guide the model in returning the best results.
An evaluation pipeline ensures quality in various datasets and attributes. This pipeline consists of: LLM raters, statistical metrics and human raters.

Together, these components make up the high-level GenAI APIs that simplify the effort needed to integrate Gemini Nano in your Android app.

Evaluating quality of GenAI APIs

For each API, we formulate a benchmark score based on the evaluation pipeline mentioned above. This score is based on attributes specific to a task. For example, when evaluating the summarization task, one of the attributes we look at is “grounding” (ie: factual consistency of generated summary with source content).

To provide out-of-box quality for GenAI APIs, we applied feature specific fine-tuning on top of the Gemini Nano base model. This resulted in an increase for the benchmark score of each API as shown below:

Use case in English	Gemini Nano Base Model	ML Kit GenAI API
Summarization	77.2	92.1
Proofreading	84.3	90.2
Rewriting	79.5	84.1
Image Description	86.9	92.3

In addition, this is a quick reference of how the APIs perform on a Pixel 9 Pro:

	Prefix Speed (input processing rate)	Decode Speed (output generation rate)
Text-to-text	510 tokens/second	11 tokens/second
Image-to-text	510 tokens/second + 0.8 seconds for image encoding	11 tokens/second

Sample usage

This is an example of implementing the GenAI Summarization API to get a one-bullet summary of an article:

val articleToSummarize = "We are excited to announce a set of on-device generative AI APIs..."

// Define task with desired input and output format
val summarizerOptions = SummarizerOptions.builder(context)
    .setInputType(InputType.ARTICLE)
    .setOutputType(OutputType.ONE_BULLET)
    .setLanguage(Language.ENGLISH)
    .build()
val summarizer = Summarization.getClient(summarizerOptions)

suspend fun prepareAndStartSummarization(context: Context) {
    // Check feature availability. Status will be one of the following: 
    // UNAVAILABLE, DOWNLOADABLE, DOWNLOADING, AVAILABLE
    val featureStatus = summarizer.checkFeatureStatus().await()

    if (featureStatus == FeatureStatus.DOWNLOADABLE) {
        // Download feature if necessary.
        // If downloadFeature is not called, the first inference request will 
        // also trigger the feature to be downloaded if it's not already
        // downloaded.
        summarizer.downloadFeature(object : DownloadCallback {
            override fun onDownloadStarted(bytesToDownload: Long) { }

            override fun onDownloadFailed(e: GenAiException) { }

            override fun onDownloadProgress(totalBytesDownloaded: Long) {}

            override fun onDownloadCompleted() {
                startSummarizationRequest(articleToSummarize, summarizer)
            }
        })    
    } else if (featureStatus == FeatureStatus.DOWNLOADING) {
        // Inference request will automatically run once feature is      
        // downloaded.
        // If Gemini Nano is already downloaded on the device, the   
        // feature-specific LoRA adapter model will be downloaded very  
        // quickly. However, if Gemini Nano is not already downloaded, 
        // the download process may take longer.
        startSummarizationRequest(articleToSummarize, summarizer)
    } else if (featureStatus == FeatureStatus.AVAILABLE) {
        startSummarizationRequest(articleToSummarize, summarizer)
    } 
}

fun startSummarizationRequest(text: String, summarizer: Summarizer) {
    // Create task request  
    val summarizationRequest = SummarizationRequest.builder(text).build()

    // Start summarization request with streaming response
    summarizer.runInference(summarizationRequest) { newText -> 
        // Show new text in UI
    }

    // You can also get a non-streaming response from the request
    // val summarizationResult = summarizer.runInference(summarizationRequest)
    // val summary = summarizationResult.get().summary
}

// Be sure to release the resource when no longer needed
// For example, on viewModel.onCleared() or activity.onDestroy()
summarizer.close()

For more examples of implementing the GenAI APIs, check out the official documentation and samples on GitHub:

Use cases

Here is some guidance on how to best use the current GenAI APIs:

For Summarization, consider:

Conversation messages or transcripts that involve 2 or more users

Articles or documents less than 4000 tokens (or about 3000 English words). Using the first few paragraphs for summarization is usually good enough to capture the most important information.

For Proofreading and Rewriting APIs, consider utilizing them during the content creation process for short content below 256 tokens to help with tasks such as:

Refining messages in a particular tone, such as more formal or more casual

Polishing personal notes for easier consumption later

For the Image Description API, consider it for:

Generating titles of images

Generating metadata for image search

Utilizing descriptions of images in use cases where the images themselves cannot be displayed, such as within a list of chat messages

Generating alternative text to help visually impaired users better understand content as a whole

GenAI API in production

Envision is an app that verbalizes the visual world to help people who are blind or have low vision lead more independent lives. A common use case in the app is for users to take a picture to have a document read out loud. Utilizing the GenAI Summarization API, Envision is now able to get a concise summary of a captured document. This significantly enhances the user experience by allowing them to quickly grasp the main points of documents and determine if a more detailed reading is desired, saving them time and effort.

Supported devices

GenAI APIs are available on Android devices using optimized MediaTek Dimensity, Qualcomm Snapdragon, and Google Tensor platforms through AICore. For a comprehensive list of devices that support GenAI APIs, refer to our official documentation.

Learn more

Start implementing GenAI APIs in your Android apps today with guidance from our official documentation and samples on GitHub: AI Catalog GenAI API Samples with Compose, ML Kit GenAI APIs Quickstart.

The post On-device GenAI APIs as part of ML Kit help you easily build with Gemini Nano appeared first on InShot Pro.

Use Case	Task Type	Task Description	Metric	APO Improvement
Topic classification	Text classification	Classify a news article into topics such as finance, sports, etc	Accuracy	+5%
Intent classification	Text classification	Classify a customer service query into intents	Accuracy	+8.0%
Webpage translation	Text translation	Translate a webpage from English to a local language	BLEU	+8.57%