GenAI

The Intelligent OS: Making AI agents more helpful for Android apps

Seharakram60 — Wed, 25 Feb 2026 23:47:00 +0000

Posted by Matthew McCullough, VP of Product Management, Android Development

User expectations for AI on their devices are fundamentally shifting how they interact with their apps. Instead of opening apps to do tasks step-by-step, they’re asking AI to do the heavy lifting for them. In this new interaction model, success is shifting from getting users to open your app, to successfully fulfilling their tasks and helping them get more done faster.

To help you evolve your apps for this agentic future, we’re introducing early stage developer capabilities that bridge the gap between your apps and agentic apps and personalized assistants, such as Google Gemini. While we are in the early, beta stages of this journey, we’re designing these features with privacy and security at their core as our first step in exploring this paradigm shift as an app ecosystem.

Empowering apps with AppFunctions

Android AppFunctions allows apps to expose data and functionality directly to AI agents and assistants. With the AppFunctions Jetpack library and platform APIs, developers can create self-describing functions that agentic apps can discover and execute via natural language. Mirroring how backend capabilities are declared via MCP cloud servers, AppFunctions provides an on-device solution for Android apps. Much like WebMCP, it executes these functions locally on the device rather than on a server.

The Samsung Gallery integration with Gemini on the Galaxy S26 series showcases AppFunctions in action. Instead of manually scrolling through photo albums, you can now simply ask Gemini to “Show me pictures of my cat from Samsung Gallery.” Gemini takes the user query, intelligently identifies and triggers the right function, and presents the returned photos from Samsung Gallery directly in the Gemini app, so users never need to leave. This experience is multimodal and can be done via voice or text. Users can even use the returned photos in follow-up conversations, like sending them to friends in a text message.

This integration is currently available on the Galaxy S26 series and will soon expand to Samsung devices running OneUI 8.5 and higher. Through AppFunctions, Gemini can already automate tasks across app categories like Calendar, Notes, and Tasks, on devices from multiple manufacturers. Whether it’s coordinating calendar events, organizing notes, or setting to-do reminders, users can streamline daily activities in one place.

Enabling agentic apps with intelligent UI automation

While AppFunctions provides a structured framework and more control for apps to communicate with AI agents and assistants, we know that not every interaction has a dedicated integration yet. We’re also developing a UI automation framework for AI agents and assistants to intelligently execute generic tasks on users’ installed apps, with user transparency and control built in. This is the platform doing the heavy lifting, so developers can get agentic reach with zero code. It’s a low-effort way to extend their reach without a major engineering lift right now.

To get feedback as we refine this framework, we’re starting with an early preview on the Galaxy S26 series and select Pixel 10 devices, where users will be able to delegate multi-step tasks to Gemini with just a long press of the power button. Launching as a beta feature in the Gemini app, this will support a curated selection of apps in the food delivery, grocery, and rideshare categories in the US and Korea to start. Whether users need to place a complex pizza order for their family members with particular tastes, coordinate a multi-stop rideshare with co-workers, or reorder their last grocery purchase, Gemini can help complete tasks using the context already available from your apps, without any developer work needed.

Users are in control while a task is being actioned in the background through UI automation. For any automation action, users have the option to monitor a task’s progress via notifications or “live view” and can switch to manual control at any point to take over the experience. Gemini is also designed to alert users before completing sensitive tasks, such as making a purchase.

Looking ahead

In Android 17, we’re looking to broaden these capabilities to reach even more users, developers, and device manufacturers.

We are currently building experiences with a small set of app developers, focusing on high-quality user experiences as the ecosystem evolves. We plan to share more details later this year on how you can use AppFunctions and UI automation to enable agentic integrations for your app. Stay tuned for updates.

The post The Intelligent OS: Making AI agents more helpful for Android apps appeared first on InShot Pro.

Build smarter apps with Gemini 3 Flash

Seharakram60 — Wed, 17 Dec 2025 16:13:00 +0000

Posted by Thomas Ezan, Senior Developer Relations Engineer

Today, we’re expanding the Gemini 3 model family with the release of Gemini 3 Flash, frontier intelligence built for speed at a fraction of the cost. You can start building with it immediately, as we’re officially launching Gemini 3 Flash on Firebase AI Logic. Available globally, you can securely access the Gemini 3 Flash preview model directly from your app via the Gemini Developer API or the Vertex AI Gemini API using Firebase AI Logic client SDKs. Gemini 3 Flash’s strong performance in reasoning, tool use, and multimodal capabilities is ideal for developers looking to do more complex video analysis, data extraction and visual Q&A.

Gemini 3 optimized for low-latency

Gemini 3 is our most intelligent model family to date. With the launch of Gemini 3 Flash, we are making that intelligence more accessible for low-latency and cost-effective use cases. While Gemini 3 Pro is designed for complex reasoning, Gemini 3 Flash is engineered to be significantly faster and more cost-effective for your production apps.

Seamless integration with Firebase AI Logic

Just like the Pro model, Gemini 3 Flash is available in preview directly through the Firebase AI Logic SDK. This means you can integrate it into your Android app without needing to do any complex server side setup.

Here is how to add it to your Kotlin code:

val model = Firebase.ai(backend = GenerativeBackend.googleAI())
    .generativeModel(
        modelName = "gemini-3-flash-preview")

Scale with Confidence

In addition, Firebase enables you to keep your growth secure and manageable with:

AI Monitoring

The Firebase AI monitoring dashboard gives you visibility into latency, success rates, and costs, allowing you to slice data by model name to see exactly how the model performs.

Server Prompt Templates

You can use server prompt templates to store your prompt and schema securely on Firebase servers instead of hardcoding them in your app binary. This capability ensures your sensitive prompts remain secure, prevents unauthorized prompt extraction, and allows for faster iteration without requiring app updates.

---
model: 'gemini-3-flash-preview'
input:
  schema:
    topic:
      type: 'string'
      minLength: 2
      maxLength: 40
    length:
      type: 'number'
      minimum: 1
      maximum: 200
    language:
      type: 'string'
---

{{role "system"}}
You're a storyteller that tells nice and joyful stories with happy endings.

{{role "user"}}
Create a story about {{topic}} with the length of {{length}} words in the {{language}} language.

Prompt template defined on the Firebase Console

val generativeModel = Firebase.ai.templateGenerativeModel()
val response = generativeModel.generateContent("storyteller-v10",
    mapOf(
        "topic" to topic,
        "length" to length,
        "language" to language
    )
)
_output.value = response.text

Code snippet to access to the prompt template

Gemini 3 Flash for AI development assistance in Android Studio

Gemini 3 Flash is also available for AI assistance in Android Studio. While Gemini 3 Pro Preview is our best model for coding and agentic experiences, Gemini 3 Flash is engineered for speed, and great for common development tasks and questions.

The new model is rolling out to developers using Gemini in Android Studio at no-cost (default model) starting today. For higher usage rate limits and longer sessions with Agent Mode, you can use an AI Studio API key to leverage the full capabilities of either Gemini 3 Flash or Gemini 3 Pro. We’re also rolling out Gemini 3 model family access with higher usage rate limits to developers who have Gemini Code Assist Standard or Enterprise licenses. Your IT administrator will need to enable access to preview models through the Google Cloud console.

Get Started Today

You can start experimenting with Gemini 3 Flash via Firebase AI Logic today. Learn more about it in the Android and Firebase documentation. Try out any of the new Gemini 3 models in Android Studio for development assistance, and let us know what you think! As always you can follow us across LinkedIn, Blog, YouTube, and X.

The post Build smarter apps with Gemini 3 Flash appeared first on InShot Pro.

Boost user engagement with AI Image Generation

Seharakram60 — Mon, 13 Oct 2025 18:00:00 +0000

Posted by Thomas Ezan, Senior Developer Relations Engineer and Mozart Louis, Developer Relations Engineer

Adding custom images to your app can significantly improve and personalize user experience and boost user engagement. This post explores two new capabilities for image generation with Firebase AI Logic: the specialized Imagen editing features, currently in preview, and the general availability of Gemini 2.5 Flash Image (a.k.a “Nano Banana”), designed for contextual or conversational image generation.

Boost user engagement with images generated via Firebase AI Logic

Image generation models can be used to create custom user profile avatars or to integrate personalized visual assets directly into key screen flows.

For example, Imagen offers new editing features (in developer preview). You can now draw a mask and utilize inpainting to generate pixels within the masked area. Additionally, outpainting is available to generate pixels outside the mask.

Imagen supports inpainting, letting generate only a part of an image.

Alternatively, Gemini 2.5 Flash Image (a.k.a Nano Banana), can use extended world knowledge and the reasoning capabilities of the Gemini models to generate contextually relevant images, which is ideal for creating dynamic illustrations that align with a user’s current in-app experience.

Use Gemini 2.5 Flash Image to create dynamic illustrations contextually relevant to your app.

Finally, the ability to conversationally and iteratively edit images allow users to edit a photo using natural language.

Use Gemini 2.5 Flash Image to edit a picture using natural language.

When starting to integrate AI to your application, it is important to learn about AI safety. It is particularly key to assess your application’s security risks, consider adjustments to mitigate safety risks, perform safety testing appropriate to your use case and solicit user feedback and monitor content.

Imagen or Gemini: The choice is yours

The difference between Gemini 2.5 Flash Image (“Nano Banana”) and Imagen lies in their primary focus and advanced capabilities. Gemini 2.5 Flash Image, as an image model within the larger Gemini family, excels in conversational image editing, maintaining context and subject consistency across multiple iterations, and leveraging “world knowledge and reasoning” to create contextually relevant visuals or embed accurate visuals within long text sequences.

Imagen is Google’s specialized image generation model, designed for greater creative control, specializing in highly photorealistic outputs, artistic detail, specific styles, and providing explicit controls for specifying the aspect ratio or format of the generated image.

Gemini 2.5 Flash Images

(Nano Banana )

Imagen

world knowledge and reasoning for more contextually relevant images

edit images conversationally while maintaining context

embed accurate visuals within long text sequences

specify the aspect ratio or format of generated images

Support of mask-based editing for in-painting and out-painting.

greater control over details of the generated image (quality, artistic detail and specific styles)

Let’s see how to use them in your app.

Inpainting with Imagen

A few months ago, we released new editing features for Imagen. Although Imagen is now ready for production for image generation, editing features are still in developer preview.

Imagen editing features include inpainting and outpainting, mask-based image editing features. This new capability allows users to modify specific areas of an image without regenerating the entire picture. This means you can preserve the best parts of your image and only alter what you wish to change.

Use Imagen editing features to make precise targeted changes in an image and guaranteeing the rest of the image integrity

These changes are made while maintaining the core elements and overall integrity of the original image and modifying only the area in the mask.

To implement inpainting with Imagen, first initialize imagen-3.0-capability-001 a specific Imagen model supporting editing features:

// Copyright 2025 Google LLC.
// SPDX-License-Identifier: Apache-2.0
val editingModel =
        Firebase.ai(backend = GenerativeBackend.vertexAI()).imagenModel(
            "imagen-3.0-capability-001",
            generationConfig = ImagenGenerationConfig(
                numberOfImages = 1,
                aspectRatio = ImagenAspectRatio.SQUARE_1x1,
                imageFormat = ImagenImageFormat.jpeg(compressionQuality = 75),
            ),
        )

From there, define the inpainting function:

// Copyright 2025 Google LLC.
// SPDX-License-Identifier: Apache-2.0

val prompt = "remove the pancakes and make it an omelet instead"

suspend fun inpaintImageWithMask(sourceImage: Bitmap, maskImage: Bitmap, prompt: String, editSteps: Int = 50): Bitmap {
        val imageResponse = editingModel.editImage(
            referenceImages = listOf(
                ImagenRawImage(sourceImage.toImagenInlineImage()),
                ImagenRawMask(maskImage.toImagenInlineImage()),
            ),
            prompt = prompt,
            config = ImagenEditingConfig(
                editMode = ImagenEditMode.INPAINT_INSERTION,
                editSteps = editSteps,
            ),
        )
        return imageResponse.images.first().asBitmap()
    }

You provide both a sourceImage, a maskImage and a prompt for the edit and the number of edit steps to be performed.

You can see it in action in the Imagen Editing Sample in the Android AI Sample catalog!

And Imagen also supports outpainting that enables you to let the model generate the pixels outside of a mask. You can also use Imagen’s Image customization capabilities to change the style of a picture or update a subject in a picture. Read more about it in the Android developer documentation.

Conversational image generation with Gemini 2.5 Flash Image

One way to edit images with Gemini 2.5 Flash Image is to use the model’s multi-turn chat capabilities.

First, initialize the model:

// Copyright 2025 Google LLC.
// SPDX-License-Identifier: Apache-2.0

val model = Firebase.ai(backend = GenerativeBackend.googleAI()).generativeModel(
    modelName = "gemini-2.5-flash-image",
    // Configure the model to respond with text and images (required)
    generationConfig = generationConfig {
        responseModalities = listOf(ResponseModality.TEXT,
        ResponseModality.IMAGE)
    }
)

To achieve a similar outcome to the mask-based Imagen method described above, we can utilize the chat API to initiate a conversation with Gemini 2.5 Flash Image.

// Copyright 2025 Google LLC.
// SPDX-License-Identifier: Apache-2.0

// Initialize the chat
val chat = model.startChat()


// Load a bitmap
val source = ImageDecoder.createSource(context.contentResolver, uri)
val bitmap = ImageDecoder.decodeBitmap(source)


// Create the initial prompt instructing the model to edit the image
val prompt = content {
    image(bitmap)
    text("remove the pancakes and add an omelet")
}

// To generate an initial response, send a user message with the image and text prompt
var response = chat.sendMessage(prompt)

// Inspect the returned image
var generatedImageAsBitmap = response
    .candidates.first().content.parts.filterIsInstance<ImagePart>().firstOrNull()?.image

// Follow up requests do not need to specify the image again
response = chat.sendMessage("Now, center the omelet in the pan")
generatedImageAsBitmap = response
    .candidates.first().content.parts.filterIsInstance<ImagePart>().firstOrNull()?.image

You can see it in action in the Gemini Image Chat sample in the Android AI Sample catalog and read more about it in the Android documentation.

Conclusion

Both Imagen and Gemini 2.5 Flash Image offer powerful capabilities, allowing you to select the ideal image generation model to personalize your app and boost user engagement, depending on your specific use case.

The post Boost user engagement with AI Image Generation appeared first on InShot Pro.

Androidify: How Androidify leverages Gemini, Firebase and ML Kit

Seharakram60 — Tue, 03 Jun 2025 12:07:31 +0000

Posted by Thomas Ezan – Developer Relations Engineer, Rebecca Franks – Developer Relations Engineer, and Avneet Singh – Product Manager

We’re bringing back Androidify later this year, this time powered by Google AI, so you can customize your very own Android bot and share your creativity with the world. Today, we’re releasing a new open source demo app for Androidify as a great example of how Google is using its Gemini AI models to enhance app experiences.

In this post, we’ll dive into how the Androidify app uses Gemini models and Imagen via the Firebase AI Logic SDK, and we’ll provide some insights learned along the way to help you incorporate Gemini and AI into your own projects. Read more about the Androidify demo app.

App flow

The overall app functions as follows, with various parts of it using Gemini and Firebase along the way:

Gemini and image validation

To get started with Androidify, take a photo or choose an image on your device. The app needs to make sure that the image you upload is suitable for creating an avatar.

Gemini 2.5 Flash via Firebase helps with this by verifying that the image contains a person, that the person is in focus, and assessing image safety, including whether the image contains abusive content.

val jsonSchema = Schema.obj(
   properties = mapOf("success" to Schema.boolean(), "error" to Schema.string()),
   optionalProperties = listOf("error"),
   )
   
val generativeModel = Firebase.ai(backend = GenerativeBackend.googleAI())
   .generativeModel(
            modelName = "gemini-2.5-flash-preview-04-17",
   	     generationConfig = generationConfig {
                responseMimeType = "application/json"
                responseSchema = jsonSchema
            },
            safetySettings = listOf(
                SafetySetting(HarmCategory.HARASSMENT, HarmBlockThreshold.LOW_AND_ABOVE),
                SafetySetting(HarmCategory.HATE_SPEECH, HarmBlockThreshold.LOW_AND_ABOVE),
                SafetySetting(HarmCategory.SEXUALLY_EXPLICIT, HarmBlockThreshold.LOW_AND_ABOVE),
                SafetySetting(HarmCategory.DANGEROUS_CONTENT, HarmBlockThreshold.LOW_AND_ABOVE),
                SafetySetting(HarmCategory.CIVIC_INTEGRITY, HarmBlockThreshold.LOW_AND_ABOVE),
    	),
    )

 val response = generativeModel.generateContent(
            content {
                text("You are to analyze the provided image and determine if it is acceptable and appropriate based on specific criteria.... (more details see the full sample)")
                image(image)
            },
        )

val jsonResponse = Json.parseToJsonElement(response.text)
val isSuccess = jsonResponse.jsonObject["success"]?.jsonPrimitive?.booleanOrNull == true
val error = jsonResponse.jsonObject["error"]?.jsonPrimitive?.content

In the snippet above, we’re leveraging structured output capabilities of the model by defining the schema of the response. We’re passing a Schema object via the responseSchema param in the generationConfig.

We want to validate that the image has enough information to generate a nice Android avatar. So we ask the model to return a json object with success = true/false and an optional error message explaining why the image doesn’t have enough information.

Structured output is a powerful feature enabling a smoother integration of LLMs to your app by controlling the format of their output, similar to an API response.

Image captioning with Gemini Flash

Once it’s established that the image contains sufficient information to generate an Android avatar, it is captioned using Gemini 2.5 Flash with structured output.

val jsonSchema = Schema.obj(
            properties = mapOf(
                "success" to Schema.boolean(),
                "user_description" to Schema.string(),
            ),
            optionalProperties = listOf("user_description"),
        )
val generativeModel = createGenerativeTextModel(jsonSchema)

val prompt = "You are to create a VERY detailed description of the main person in the given image. This description will be translated into a prompt for a generative image model..."

val response = generativeModel.generateContent(
content { 
       	text(prompt) 
             	image(image) 
	})
        
val jsonResponse = Json.parseToJsonElement(response.text!!) 
val isSuccess = jsonResponse.jsonObject["success"]?.jsonPrimitive?.booleanOrNull == true

val userDescription = jsonResponse.jsonObject["user_description"]?.jsonPrimitive?.content

The other option in the app is to start with a text prompt. You can enter in details about your accessories, hairstyle, and clothing, and let Imagen be a bit more creative.

Android generation via Imagen

We’ll use this detailed description of your image to enrich the prompt used for image generation. We’ll add extra details around what we would like to generate and include the bot color selection as part of this too, including the skin tone selected by the user.

val imagenPrompt = "A 3D rendered cartoonish Android mascot in a photorealistic style, the pose is relaxed and straightforward, facing directly forward [...] The bot looks as follows $userDescription [...]"

We then call the Imagen model to create the bot. Using this new prompt, we create a model and call generateImages:

// we supply our own fine-tuned model here but you can use "imagen-3.0-generate-002" 
val generativeModel = Firebase.ai(backend = GenerativeBackend.googleAI()).imagenModel(
            "imagen-3.0-generate-002",
            safetySettings =
            ImagenSafetySettings(
                ImagenSafetyFilterLevel.BLOCK_LOW_AND_ABOVE,
                personFilterLevel = ImagenPersonFilterLevel.ALLOW_ALL,
            ),
)

val response = generativeModel.generateImages(imagenPrompt)

val image = response.images.first().asBitmap()

And that’s it! The Imagen model generates a bitmap that we can display on the user’s screen.

Finetuning the Imagen model

The Imagen 3 model was finetuned using Low-Rank Adaptation (LoRA). LoRA is a fine-tuning technique designed to reduce the computational burden of training large models. Instead of updating the entire model, LoRA adds smaller, trainable “adapters” that make small changes to the model’s performance. We ran a fine tuning pipeline on the Imagen 3 model generally available with Android bot assets of different color combinations and different assets for enhanced cuteness and fun. We generated text captions for the training images and the image-text pairs were used to finetune the model effectively.

The current sample app uses a standard Imagen model, so the results may look a bit different from the visuals in this post. However, the app using the fine-tuned model and a custom version of Firebase AI Logic SDK was demoed at Google I/O. This app will be released later this year and we are also planning on adding support for fine-tuned models to Firebase AI Logic SDK later in the year.

The original image… and Androidifi-ed image

ML Kit

The app also uses the ML Kit Pose Detection SDK to detect a person in the camera view, which triggers the capture button and adds visual indicators.

To do this, we add the SDK to the app, and use PoseDetection.getClient(). Then, using the poseDetector, we look at the detectedLandmarks that are in the streaming image coming from the Camera, and we set the _uiState.detectedPose to true if a nose and shoulders are visible:

private suspend fun runPoseDetection() {
    PoseDetection.getClient(
        PoseDetectorOptions.Builder()
            .setDetectorMode(PoseDetectorOptions.STREAM_MODE)
            .build(),
    ).use { poseDetector ->
        // Since image analysis is processed by ML Kit asynchronously in its own thread pool,
        // we can run this directly from the calling coroutine scope instead of pushing this
        // work to a background dispatcher.
        cameraImageAnalysisUseCase.analyze { imageProxy ->
            imageProxy.image?.let { image ->
                val poseDetected = poseDetector.detectPersonInFrame(image, imageProxy.imageInfo)
                _uiState.update { it.copy(detectedPose = poseDetected) }
            }
        }
    }
}

private suspend fun PoseDetector.detectPersonInFrame(
    image: Image,
    imageInfo: ImageInfo,
): Boolean {
    val results = process(InputImage.fromMediaImage(image, imageInfo.rotationDegrees)).await()
    val landmarkResults = results.allPoseLandmarks
    val detectedLandmarks = mutableListOf()
    for (landmark in landmarkResults) {
        if (landmark.inFrameLikelihood > 0.7) {
            detectedLandmarks.add(landmark.landmarkType)
        }
    }

    return detectedLandmarks.containsAll(
        listOf(PoseLandmark.NOSE, PoseLandmark.LEFT_SHOULDER, PoseLandmark.RIGHT_SHOULDER),
    )
}

The camera shutter button is activated when a person (or a bot!) enters the frame.

Get started with AI on Android

The Androidify app makes an extensive use of the Gemini 2.5 Flash to validate the image and generate a detailed description used to generate the image. It also leverages the specifically fine-tuned Imagen 3 model to generate images of Android bots. Gemini and Imagen models are easily integrated into the app via the Firebase AI Logic SDK. In addition, ML Kit Pose Detection SDK controls the capture button, enabling it only when a person is present in front of the camera.

To get started with AI on Android, go to the Gemini and Imagen documentation for Android.

Explore this announcement and all Google I/O 2025 updates on io.google starting May 22.

The post Androidify: How Androidify leverages Gemini, Firebase and ML Kit appeared first on InShot Pro.