#Media

Media3 1.9.0 – What’s new

Seharakram60 — Fri, 19 Dec 2025 22:00:00 +0000

Posted by Kristina Simakova, Engineering Manager

Media3 1.9.0 – What’s new?

Media3 1.9.0 is out! Besides the usual bug fixes and performance improvements, the latest release also contains four new or largely rewritten modules:

media3-inspector – Extract metadata and frames outside of playback

media3-ui-compose-material3 – Build a basic Material3 Compose Media UI in just a few steps

media3-cast – Automatically handle transitions between Cast and local playbacks
media3-decoder-av1 – Consistent AV1 playback with the rewritten extension decoder based on the dav1d library

We also added caching and memory management improvements to PreloadManager, and provided several new ExoPlayer, Transformer and MediaSession simplifications.

This release also gives you the first experimental access to CompositionPlayer to preview media edits.

Read on to find out more, and as always please check out the full release notes for a comprehensive overview of changes in this release.

Extract metadata and frames outside of playback

There are many cases where you want to inspect media without starting a playback. For example, you might want to detect which formats it contains or what its duration is, or to retrieve thumbnails.

The new media3-inspector module combines all utilities to inspect media without playback in one place:

MetadataRetriever to read duration, format and static metadata from a MediaItem.
FrameExtractor to get frames or thumbnails from an item.
MediaExtractorCompat as a direct replacement for the Android platform MediaExtractor class, to get detailed information about samples in the file.

MetadataRetriever and FrameExtractor follow a simple AutoCloseable pattern. Have a look at our new guide pages for more details.

suspend fun extractThumbnail(mediaItem: MediaItem) {
  FrameExtractor.Builder(context, mediaItem).build().use {
    val thumbnail = frameExtractor.getThumbnail().await()
  } 
}

Build a basic Material3 Compose Media UI in just a few steps

In previous releases we started providing connector code between Compose UI elements and your Player instance. With Media3 1.9.0, we added a new module media3-ui-compose-material3 with fully-styled Material3 buttons and content elements. They allow you to build a media UI in just a few steps, while providing all the flexibility to customize style. If you prefer to build your own UI style, you can use the building blocks that take care of all the update and connection logic, so you only need to concentrate on designing the UI element. Please check out our extended guide pages for the Compose UI modules.

We are also still working on even more Compose components, like a prebuilt seek bar, a complete out-of-the-box replacement for PlayerView, as well as subtitle and ad integration.

@Composable
fun SimplePlayerUI(player: Player, modifier: Modifier = Modifier) {
  Column(modifier) {
    ContentFrame(player)  // Video surface and shutter logic
    Row (Modifier.align(Alignment.CenterHorizontally)) {                 
      SeekBackButton(player)   // Simple controls
      PlayPauseButton(player)
      SeekForwardButton(player)
    }
  }
}

Simple Compose player UI with out-of-the-box elements

Automatically handle transitions between Cast and local playbacks

The CastPlayer in the media3-cast module has been rewritten to automatically handle transitions between local playback (for example with ExoPlayer) and remote Cast playback.

When you set up your MediaSession, simply build a CastPlayer around your ExoPlayer and add a MediaRouteButton to your UI and you’re done!

// MediaSession setup with CastPlayer 
val exoPlayer = ExoPlayer.Builder(context).build()
val castPlayer = CastPlayer.Builder(context).setLocalPlayer(exoPlayer).build()
val session = MediaSession.Builder(context, player)
// MediaRouteButton in UI 
@Composable fun UIWithMediaRouteButton() {
  MediaRouteButton()
}

New CastPlayer integration in Media3 session demo app

Consistent AV1 playback with the rewritten extension based on dav1d

The 1.9.0 release contains a completely rewritten AV1 extension module based on the popular dav1d library.

As with all extension decoder modules, please note that it requires building from source to bundle the relevant native code correctly. Bundling a decoder provides consistency and format support across all devices, but because it runs the decoding in your process, it’s best suited for content you can trust.

Integrate caching and memory management into PreloadManager

We made our PreloadManager even better as well. It already enabled you to preload media into memory outside of playback and then seamlessly hand it over to a player when needed. Although pretty performant, you still had to be careful to not exceed memory limits by accidentally preloading too much. So with Media3 1.9.0, we added two features that makes this a lot easier and more stable:

Caching support – When defining how far to preload, you can now choose PreloadStatus.specifiedRangeCached(0, 5000) as a target state for preloaded items. This will add the specified range to your cache on disk instead of loading the data to memory. With this, you can provide a much larger range of items for preloading as the ones further away from the current item no longer need to occupy memory. Note that this requires setting a Cache in DefaultPreloadManager.Builder.
Automatic memory management – We also updated our LoadControl interface to better handle the preload case so you are now able to set an explicit upper memory limit for all preloaded items in memory. It’s 144 MB by default, and you can configure the limit in DefaultLoadControl.Builder. The DefaultPreloadManager will automatically stop preloading once the limit is reached, and automatically releases memory of lower priority items if required.

Rely on new simplified default behaviors in ExoPlayer

As always, we added lots of incremental improvements to ExoPlayer as well. To name just a few:

Mute and unmute – We already had a setVolume method, but have now added the convenience mute and unmute methods to easily restore the previous volume without keeping track of it yourself.
Stuck player detection – In some rare cases the player can get stuck in a buffering or playing state without making any progress, for example, due to codec issues or misconfigurations. Your users will be annoyed, but you never see these issues in your analytics! To make this more obvious, the player now reports a StuckPlayerException when it detects a stuck state.
Wakelock by default – The wake lock management was previously opt-in, resulting in hard to find edge cases where playback progress can be delayed a lot when running in the background. Now this feature is opt-out, so you don’t have to worry about it and can also remove all manual wake lock handling around playback.
Simplified setting for CC button logic – Changing TrackSelectionParameters to say “turn subtitles on/off” was surprisingly hard to get right, so we added a simple boolean selectTextByDefault option for this use case.

Simplify your media button preferences in MediaSession

Until now, defining your preferences for which buttons should show up in the media notification drawer on Android Auto or WearOS required defining custom commands and buttons, even if you simply wanted to trigger a standard player method.

Media3 1.9.0 has new functionality to make this a lot simpler – you can now define your media button preferences with a standard player command, requiring no custom command handling at all.

session.setMediaButtonPreferences(listOf(
    CommandButton.Builder(CommandButton.ICON_FAST_FORWARD) // choose an icon
      .setDisplayName(R.string.skip_forward)
      .setPlayerCommand(Player.COMMAND_SEEK_FORWARD) // choose an action 
      .build()
))

Media button preferences with fast forward button

CompositionPlayer for real-time preview

The 1.9.0 release introduces CompositionPlayer under a new @ExperimentalApi annotation. The annotation indicates that it is available for experimentation, but is still under development.

CompositionPlayer is a new component in the Media3 editing APIs designed for real-time preview of media edits. Built upon the familiar Media3 Player interface, CompositionPlayer allows users to see their changes in action before committing to the export process. It uses the same Composition object that you would pass to Transformer for exporting, streamlining the editing workflow by unifying the data model for preview and export.

We encourage you to start using CompositionPlayer and share your feedback, and keep an eye out for forthcoming posts and updates to the documentation for more details.

InAppMuxer as a default muxer in Transformer

Transformer now uses InAppMp4Muxer as the default muxer for writing media container files. Internally, InAppMp4Muxer depends on the Media3 Muxer module, providing consistent behaviour across all API versions.

Note that while Transformer no longer uses the Android platform’s MediaMuxer by default, you can still provide FrameworkMuxer.Factory via setMuxerFactory if your use case requires it.

New speed adjustment APIs

The 1.9.0 release simplifies speed adjustments APIs for media editing. We’ve introduced new methods directly on EditedMediaItem.Builder to control speed, making the API more intuitive. You can now change the speed of a clip by calling setSpeed(SpeedProvider provider) on the EditedMediaItem.Builder:

val speedProvider = object : SpeedProvider {
    override fun getSpeed(presentationTimeUs: Long): Float {
        return speed
    }

    override fun getNextSpeedChangeTimeUs(timeUs: Long): Long {
        return C.TIME_UNSET
    }
}

EditedMediaItem speedEffectItem = EditedMediaItem.Builder(mediaItem)
    .setSpeed(speedProvider)
    .build()

This new approach replaces the previous method of using Effects#createExperimentalSpeedChangingEffects(), which we’ve deprecated and will remove in a future release.

Introducing track types for EditedMediaItemSequence

In the 1.9.0 release, EditedMediaItemSequence requires specifying desired output track types during sequence creation. This change ensures track handling is more explicit and robust across the entire Composition.

This is done via a new EditedMediaItemSequence.Builder constructor that accepts a set of track types (e.g., C.TRACK_TYPE_AUDIO, C.TRACK_TYPE_VIDEO).

To simplify creation, we’ve added new static convenience methods:

EditedMediaItemSequence.withAudioFrom(List)
EditedMediaItemSequence.withVideoFrom(List)
EditedMediaItemSequence.withAudioAndVideoFrom(List)

We encourage you to migrate to the new constructor or the convenience methods for clearer and more reliable sequence definitions.

Example of creating a video-only sequence:

EditedMediaItemSequence videoOnlySequence =
    EditedMediaItemSequence.Builder(setOf(C.TRACK_TYPE_VIDEO))
        .addItem(editedMediaItem)
        .build()

—

Please get in touch via the Media3 issue Tracker if you run into any bugs, or if you have questions or feature requests. We look forward to hearing from you!

The post Media3 1.9.0 – What’s new appeared first on InShot Pro.

Elevating media playback: Introducing preloading with Media3 – Part 1

Seharakram60 — Fri, 05 Sep 2025 17:00:00 +0000

Posted by Mayuri Khinvasara Khabya – Developer Relations Engineer (LinkedIn and X)

In today’s media-centric apps, delivering a smooth, uninterrupted playback experience is key to a delightful user experience. Users expect their videos to start instantly and play seamlessly without pauses.

The core challenge is latency. Traditionally, a video player only starts its work—connecting, downloading, parsing, buffering—after the user has chosen an item for playback. This reactive approach is slow for today’s short form video context. The solution is to be proactive. We need to anticipate what the user will watch next and get the content ready ahead of time. This is the essence of preloading.

The key benefits of preloading include:

Faster Playback Start: Videos are already ready to go, leading to quicker transitions between items and a more immediate start.
Reduced Buffering: By proactively loading data, playback is far less likely to stall, for example due to network hiccups.
Resulting smoother User Experience: The combination of faster starts and less buffering creates a more fluid, seamless interaction for users to enjoy.

In this three-part series, we’ll introduce and deep dive into Media3’s powerful utilities for (pre)loading components.

In Part 1, we’ll cover the foundations: understanding the different preloading strategies available in Media3, enabling PreloadConfiguration and setting up the DefaultPreloadManager, enabling your app to preload items. By the end of this blog, you should be able to preload and play media items with your configured ranking and duration.
In Part 2, we’ll get into more advanced topics of DefaultPreloadManager: using listeners for analytics, exploring production-ready best practices like the sliding window pattern and custom shared components of DefaultPreloadManager and ExoPlayer.
In Part 3, we’ll dive deep into disk caching with DefaultPreloadManager.

Preloading to the rescue!

The core idea behind preloading is simple: load media content before you need it. By the time a user swipes to the next video, the first segments of the video are already downloaded and available, ready for immediate playback.

Think of it like a restaurant. A busy kitchen doesn’t wait for an order to start chopping onions. They do their prep work in advance. Preloading is the prep work for your video player.

When enabled, preloading can help minimize join latency when a user skips to the next item before the playback buffer reaches the next item. The first period of the next window is prepared and video, audio and text samples are buffered. The preloaded period is later queued into the player with buffered samples immediately available and ready to be fed to the codec for rendering.

In Media3 there are two primary APIs for preloading, each suited for different use cases. Choosing the right API is the first step.

1. Preloading playlist items with PreloadConfiguration

This is the simple approach, useful for linear, sequential media like playlists where the playback order is predictable (like a series of episodes). You give the player the full list of media items using ExoPlayer’s playlist APIs and set the PreloadConfiguration for the player, then it automatically preloads the next items in the sequence as configured. This API attempts to optimize the join latency when a user skips to the next item before the playback buffer already overlaps into the next item.

Preloading is only started when no media is being loaded for the ongoing playback, which prevents it from competing for bandwidth with the primary playback.

If you’re still not sure whether you need preloading, this API is a great low-lift option to try it out!

player.preloadConfiguration =
    PreloadConfiguration(/* targetPreloadDurationUs= */ 5_000_000L)

With the PreloadConfiguration above, the player tries to preload five seconds of media for the next item in the playlist.

Once opted-in, playlist preloading can be turned off again by using PreloadConfiguration.DEFAULT to disable playlist preloading:

player.preloadConfiguration = PreloadConfiguration.DEFAULT

2. Preloading dynamic lists with PreloadManager

For dynamic UIs like vertical feeds or carousels, where the “next” item is determined by user interaction, the PreloadManager API is appropriate. This is a new powerful, standalone component within the Media3 ExoPlayer library specifically designed to proactively preload. It manages a collection of potential MediaSources, prioritizing them based on proximity to the user’s current position and offers granular control over what to preload, suitable for complex scenarios like dynamic feeds of short form videos.

Setting Up Your PreloadManager

The DefaultPreloadManager is the canonical implementation for PreloadManager.

The builder of DefaultPreloadManager can build both the DefaultPreloadManager and any ExoPlayer instances that will play its preloaded content. To create a DefaultPreloadManager, you will need to pass a TargetPreloadStatusControl, which the preload manager can query to find out how much to load for an item. We will explain and define an example of TargetPreloadStatusControl in the section below.

val preloadManagerBuilder =
DefaultPreloadManager.Builder(context, targetPreloadStatusControl)
val preloadManager = val preloadManagerBuilder.build()

// Build ExoPlayer with DefaultPreloadManager.Builder
val player = preloadManagerBuilder.buildExoPlayer()

Using the same builder for both the ExoPlayer and DefaultPreloadManager is necessary, which ensures that the components under the hood of them are correctly shared.

And that’s it! You now have a manager ready to receive instructions.

Configuring Duration and Ranking with TargetPreloadStatusControl

What if you want to preload, say, 10 seconds of video ? You can provide the position of your media items in the carousel, and the DefaultPreloadManager prioritizes loading the items based on how close it is to the item the user is currently playing.

If you want to control how much duration of the item to preload, you can tell that with DefaultPreloadManager.PreloadStatus you return.

For example,

Item ‘A’ is the highest priority, load 5 seconds of video.
Item ‘B’ is medium priority but when you get to it, load 3 seconds of video.
Item ‘C’ is less priority, load only tracks.
Item ‘D’ is even less of a priority, just prepare.
Any other items are far away, Don’t preload anything.

This granular control can help you optimize your resource utilization which is recommended for a seamless playback.

import androidx.media3.exoplayer.DefaultPreloadManager.PreloadStatus


class MyTargetPreloadStatusControl(
    currentPlayingIndex: Int = C.INDEX_UNSET
) : TargetPreloadStatusControl<Int,PreloadStatus> {


    // The app is responsible for updating this based on UI state
    override fun getTargetPreloadStatus(index: Int): PreloadStatus? {

        val distance = index - currentPlayingIndex

        // Adjacent items (Next): preload 5 seconds
        if (distance == 1) { 
        // Return a PreloadStatus that is labelled by STAGE_SPECIFIED_RANGE_LOADED and suggest loading // 5000ms from the default start position
                    return PreloadStatus.specifiedRangeLoaded(5000L)
                } 

        // Adjacent items (Previous): preload 3 seconds
        else if (distance == -1) { 
        // Return a PreloadStatus that is labelled by STAGE_SPECIFIED_RANGE_LOADED //and suggest loading 3000ms from the default start position
                    return PreloadStatus.specifiedRangeLoaded(3000L)
                } 

        // Items two positions away: just select tracks
        else if (distance) == 2) {
        // Return a PreloadStatus that is labelled by STAGE_TRACKS_SELECTED
                    return PreloadStatus.TRACKS_SELECTED
                } 

        // Items four positions away: just select prepare
        else if (abs(distance) <= 4) {
        // Return a PreloadStatus that is labelled by STAGE_SOURCE_PREPARED
                    return PreloadStatus.SOURCE_PREPARED
                }

             // All other items are too far away
             return null
            }
}

Tip: PreloadManager can keep both the previous and next items preloaded, whereas the PreloadConfiguration will only look ahead to the next items.

Managing Preloading Items

With your manager created, you can start telling it what to work on. As your user scrolls through a feed, you’ll identify the upcoming videos and add them to the manager. The interaction with the PreloadManager is a state-driven conversation between your UI and the preloading engine.

1. Add Media Items

As you populate your feed, you must inform the manager of the media it needs to track. If you are starting, you could add the entire list you want to preload. Subsequently you can keep adding a single item to the list as and when required. You have full control over what items are in the preloading list which means you also have to manage what is added and removed from the manager.

val initialMediaItems = pullMediaItemsFromService(/* count= */ 20)
for (index in 0 until initialMediaItems.size) {
    preloadManager.add(
        initialMediaItems.get(index),index)
    )
}

The manager will now start fetching data for this MediaItem in the background.

After adding, tell the manager to re-evaluate its new list (hinting that something has changed like adding/ removing an item, or the user switches to play a new item.)

preloadManager.invalidate()

2. Retrieve and Play an Item

Here comes the main playback logic. When the user decides to play that video, you don’t need to create a new MediaSource. Instead, you ask the PreloadManager for the one it has already prepared. You can retrieve the MediaSource from the Preload Manager using the MediaItem.

If the retrieved item from the PreloadManager is null, that means the mediaItem is not preloaded yet or added to the PreloadMamager, so you choose to set the mediaItem directly.

// When a media item is about to display on the screen
val mediaSource = preloadManager.getMediaSource(mediaItem)
if (mediaSource!= null) {
  player.setMediaSource(mediaSource)
} else {
  // If mediaSource is null, that mediaItem hasn't been added yet.
  // So, send it directly to the player.
  player.setMediaItem(mediaItem)
}
player.prepare()
// When the media item is displaying at the center of the screen
player.play()

By preparing the MediaSource retrieved from the PreloadManager, you seamlessly transition from preloading to playback, using the data that’s already in memory. This is what makes the start time faster.

3. Keep the current index in sync with the UI

Since our feed / list could be dynamic, it’s important to notify the PreloadManager of your current playing index so that it can always prioritize items nearest to your current index for preloading.

preloadManager.setCurrentPlayingIndex(currentIndex)
// Need to call invalidate() to update the priorities
preloadManager.invalidate()

4. Remove an Item

To keep the manager efficient, you should remove items it no longer needs to track, such as items that are far away from the user’s current position.

// When an item is too far from the current playing index
preloadManager.remove(mediaItem)

If you need to clear all items at once, you can call preloadManager.reset().

5. Release the Manager

When you no longer need the PreloadManager (e.g., when your UI is destroyed), you must release it to free up its resources. A good place to do this is where you’re already releasing your Player’s resources. It’s recommended to release the manager before the player as the player can continue to play if you don’t need any more preloading.

// In your Activity's onDestroy() or Composable's onDispose
preloadManager.release()

Demo time

Check it live in action

In the demo below , we see the impact of PreloadManager on the right side which has faster load times, whereas the left side shows the existing experience. You can also view the code sample for the demo. (Bonus: It also displays startup latency for every video)

What’s Next?

And that’s a wrap for Part 1! You now have the tools to build a dynamic preloading system. You can either use PreloadConfiguration to preload the next item of a playlist in ExoPlayer or set up a DefaultPreloadManager, add and remove items on the fly, configure the target preload status, and correctly retrieve the preloaded content for playback.

In Part 2, we’ll go deeper on the DefaultPreloadManager. We’ll explore how to listen for preloading events, discuss best practices like using a sliding window to avoid memory issues, and peek under the hood at custom shared components of ExoPlayer and DefaultPreloadManager.

Do you have any feedback to share? We are eager to hear from you.

Stay tuned, and go make your app faster!

The post Elevating media playback: Introducing preloading with Media3 – Part 1 appeared first on InShot Pro.

Media3 1.8.0 – What’s new?

Seharakram60 — Mon, 11 Aug 2025 19:00:00 +0000

Posted by Toni Heidenreich – Engineering Manager

This release includes several bug fixes, performance improvements, and new features. Read on to find out more, and as always please check out the full release notes for a comprehensive overview of changes in this release.

Scrubbing in ExoPlayer

This release introduces a scrubbing mode in ExoPlayer, designed to optimize performance for frequent, user-driven seeks, like dragging a seek bar handle. You can enable it with ExoPlayer.setScrubbingModeEnabled(true). We’ve also integrated this into PlayerControlView in the UI module where it can be enabled with either time_bar_scrubbing_enabled=”true” in XML or the setTimeBarScrubbingEnabled(boolean) method. Media3 1.8.0 contains the first batch of scrubbing improvements, with more to come in 1.9.0!

Repeated seeking while scrubbing with scrubbing mode OFF

Repeated seeking while scrubbing with scrubbing mode ON

Live streaming ads with HLS interstitials

Extending the initial support for VOD in Media3 1.6.0, HlsInterstitialsAdsLoader now supports live streams and asset lists for all your server-guided ad insertion (SGAI) needs. The Google Ads Manager team explains how SGAI works. Follow our documentation for how to integrate HLS interstitals into your app.

HLS interstitials processing flow

Duration retrieval without playback

MetadataRetriever has been significantly updated – it’s now using an AutoCloseable pattern and lets you retrieve the duration of media items without playback. This means Media3 now offers the full functionality of the Android platform MediaMetadataRetriever but without having to worry about device specific quirks and cross-process communication (some parts like frame extraction are still experimental, but we’ll integrate them properly in the future).

try {
  MetadataRetriever.Builder(context, mediaItem).build().use {
     val trackInfo = it.retrieveTrackGroups().await()
     val duration = it.retrieveDurationUs().await()
  }
} catch (e: IOException) {
  handleFailure(e)
}

Partial downloads, XR audio routing and more efficient playback

There were several other improvements and bug fixes across ExoPlayer and playback related components. To name just a few:

Downloader implementations now support partial downloads, with a new PreCacheHelper to organize manual caching of single items. This will be integrated into ExoPlayer’s DefaultPreloadManager in Media3 1.9.0 for an even more seamless caching and preloading experience.

When created with a Context with a virtual device ID, ExoPlayer now automatically routes the audio to the virtual XR device for that ID.

We enabled more efficient interactions with Android’s MediaCodec, for example skipping buffers that are not needed earlier in the pipeline.

Playback resumption in demo app and better notification defaults

The MediaSession module has a few changes and improvements for notification handling. It’s now keeping notifications for longer by default, for example when playback is paused, stopped or failed, so that a user has more time to resume playback in your app. Notifications for live streams (in particular with DVR windows) also became more useful by removing the confusing DVR window duration and progress from the notification.

The media session demo app now also supports playback resumption to showcase how the feature can be integrated into your app! It allows the user to resume playback long after your app has been terminated and even after reboot.

Media resumption notification after device reboot

Faster trim operations with edit list support

We are continuing to add optimizations for faster trim operations to Transformer APIs. In the new 1.8.0 release, we introduced support for trimming using MP4 edit lists. Call experimentalSetMp4EditListTrimEnabled(true) to make trim-only edits significantly faster.

val transformer = Transformer.Builder(requireContext())
        .addListener(transformerListener)
        .experimentalSetMp4EditListTrimEnabled(true)
        .build()

A standard trimming operation often requires a full re-transcoding of the video, even for a simple trim. This meant decoding, re-encoding the entire file, which is a time-consuming and resource-intensive process. With MP4 edit list support, Transformer can now perform “trim-only” edits much more efficiently. Instead of re-encoding, it leverages the existing encoded samples and defines a “pre-roll” within the edit list. This pre-roll essentially tells the player where to start playback within an existing encoded sample, effectively skipping the unwanted beginning portion.

The following diagram illustrates how this works:

Processing overview for faster trim optimizations

As illustrated above, each file contains encoded samples and each sample begins with a keyframe. The red line indicates the intended clip point in the original file, allowing us to safely discard two first samples. The major difference in this approach lies in how we handle the third encoded sample. Instead of running a transcoding operation, we transmux this sample and define a pre-roll for a video start position. This significantly accelerates the export operation; however this optimization is only applicable if no other effects are applied. Player implementations may also ignore the pre-roll component of the final video and play from the start of the encoded sample.

Chipset specific optimizations with CodecDbLite

CodecDBLite optimizes two elements of encoder configuration on a chipset-by-chipset basis: codec selection and B-frames. Depending on the chipset, these parameters can have either a positive or adverse impact on video quality. CodecDB Lite leverages benchmark data collected on production devices to recommend a configuration that achieves the maximum user-perceived quality for the developer’s target bitrate. By enabling CodecDB Lite, developers can leverage advanced video codecs and features without worrying about whether or not they work on a given device.

To use CodecDbLite, simply call setEnableCodecDbLite(true) when building the encoder factory:

val transformer =
    Transformer.Builder()
        .setEncoderFactory(
            DefaultEncoderFactory.Builder()
                .setEnableCodecDbLite(true)
                .build()
        )
        .build()

New Composition demo

The Composition Demo app has been refreshed, and is now built entirely with Kotlin and Compose to showcase advanced multi-asset editing capabilities in Media3. Our team is actively extending the APIs, and future releases will introduce more advanced editing features, such as transitions between media items and other more advanced video compositing settings.

Adaptive-first: Editing flows can get complicated, so it helps to take advantage of as much screen real estate as possible. With the adaptive layouts provided by Jetpack Compose, such as the supporting pane layout, we can dynamically adapt the UI based on the device’s screen size.

Processing overview for faster trim optimizations

Multi-asset video compositor: We’ve added a custom video compositor that demonstrates how to arrange input media items into different layouts, such as a 2×2 grid or a picture-in-picture overlay. These compositor settings are applied to the Composition, and can be used both with CompositionPlayer for preview and Transformer for export.

Picture-in-picture video overlay in the Composition demo app

Get started with Media3 1.8.0

Please get in touch via the Media3 issue Tracker if you run into any bugs, or if you have questions or feature requests. We look forward to hearing from you!

The post Media3 1.8.0 – What’s new? appeared first on InShot Pro.

What is HDR?

Seharakram60 — Wed, 06 Aug 2025 16:00:00 +0000

Posted by John Reck – Software Engineer

For Android developers, delivering exceptional visual experiences is a continuous goal. High Dynamic Range (HDR) unlocks new possibilities, offering the potential for more vibrant and immersive content. Technologies like UltraHDR on Android are particularly compelling, providing the benefits of HDR displays while maintaining crucial backwards compatibility with SDR displays. On Android you can use HDR for both video and images.

Over the years, the term HDR has been used to signify a number of related, but ultimately distinct visual fidelity features. Users encounter it in the context of camera features (exposure fusion), or as a marketing term in TV or monitor (“HDR capable”). This conflates distinct features like wider color gamuts, increased bit depth or enhanced contrast with HDR itself.

From an Android Graphics perspective, HDR primarily signifies higher peak brightness capability that extends beyond the conventional Standard Dynamic Range. Other perceived benefits often derive from standards such as HDR10 or Dolby Vision which also include the usage of wider color spaces, higher bit depths, and specific transfer functions.

In this article, we’ll establish the foundational color principles, then address common myths, clarify HDR’s role in the rendering pipeline, and examine how Android’s display technologies and APIs enable HDR experience.

The components of color

Understanding HDR begins with defining the three primary components that form the displayed volume of color: bit depth, transfer function, and color gamut. These describe the precision, scaling, and range of the color volume, respectively.

While a color model defines the format for encoding pixel values (e.g., RGB, YUV, HSL, CMYK, XYZ), RGB is typically assumed in a graphics context. The combination of a color model, a color gamut, and a transfer function constitutes color space. Examples include sRGB, Display P3, Adobe RGB, BT.2020, or BT.2020 HLG. Numerous combinations of color gamut and transfer function are possible, leading to a variety of color spaces.

Components of color

Bit Depth

Bit depth defines the precision of color representation. A higher bit depth allows for finer gradation between color values. In modern graphics, bit depth typically refers to bits per channel (e.g., an 8-bit image uses 8 bits for each red, green, blue, and optionally alpha channel).

Crucially, bit depth does not determine the overall range of colors (minimum and maximum values) an image can represent; this is set by the color gamut and, in HDR, the transfer function. Instead, increasing bit depth provides more discrete steps within that defined range, resulting in smoother transitions and reduced visual artifacts such as banding in gradients.

5-bit

8-bit

Although 8-bit is one of the most common formats in widespread usage, it’s not the only option. RAW images can be captured at 10, 12, 14, or 16 bits. PNG supports 16 bits. Games frequently use 16-bit floating point (FP16) instead of integer space for intermediate render buffers. Modern GPU APIs like Vulkan even support 64-bit RGBA formats in both integer and floating point varieties, providing up to 256-bits per pixel.

Transfer Function

A transfer function defines the mathematical relationship between a pixel’s stored numerical value and its final displayed luminance or color. In other words, the transfer function describes how to interpret the increments in values between the minimum and maximum. This function is essential because the human visual system’s response to light intensity is non-linear. We are more sensitive to changes in luminance at low light levels than at high light levels. Therefore, a linear mapping from stored values to display luminance would not result in an efficient usage of the available bits. There would be more than necessary precision in the brighter region and too little in the darker region with respect to what is perceptual. The transfer function compensates for this non-linearity by adjusting the luminance values to match the human visual response.

While some transfer functions are linear, most employ complex curves or piecewise functions to optimize image quality for specific displays or viewing conditions. sRGB, Gamma 2.2, HLG, and PQ are common examples, each prioritizing bit allocation differently across the luminance range.

Color Gamut

Color gamut refers to the entire range of colors that a particular color space or device can accurately reproduce. It is typically a subset of the visible color spectrum, which encompasses all the colors that the human eye can perceive. Each color space (e.g., sRGB, Display P3, BT2020) defines its own unique gamut, establishing the boundaries for color representation.

A wider gamut signifies that the color space can display a greater variety of colors, leading to richer and more vibrant images. However, simply having a larger gamut doesn’t always guarantee better color accuracy or a more vibrant result. The device or medium used to display the colors must also be capable of reproducing the full range of the gamut. When a display encounters colors outside its reproducible gamut, the typical handling method is clipping. This is to ensure that in-gamut colors are properly preserved for accuracy, as otherwise attempts to scale the color gamut may produce unpleasant results, particularly in regions in which human vision is particularly sensitive like skin tones.

HDR myths and realities

With an understanding of what forms the basic working color principles, it’s now time to evaluate some of the common claims of HDR and how they apply in a general graphics context.

Claim: HDR offers more vibrant colors

This claim comes from HDR video typically using the BT2020 color space, which is indeed a wide color volume. However, there are several problems with this claim as a blanket statement.

The first is that images and graphics have been able to use wider color gamuts, such as Display P3 or Adobe RGB, for quite a long time now. This is not a unique advancement that was coupled to HDR. In JPEGs for example this is defined by the ICC profile, which dates back to the early 1990s, although wide-spread adoption of ICC profile handling is somewhat more recent. Similarly on the graphics rendering side the usage of wider color spaces is fully decoupled from whether or not HDR is being used.

The second is that not all HDR videos even use such a wider gamut at all. Although HDR10 specifies the usage of BT2020, other HDR formats have since been created that do not use such a wide gamut.

The biggest issue, though, is one of capturing and displaying. Just because the format allows for the color gamut of BT2020 does not mean that the entire gamut is actually usable in practice. For example current Dolby Vision mastering guidelines only require a 99% coverage of the P3 gamut. This means that even for high-end professional content, it’s not expected that the authoring of content beyond that of Display P3 is possible. Similarly, the vast majority of consumer displays today are only capable of displaying either sRGB or Display P3 color gamuts. Given that the typical recommendation of out-of-gamut colors is to clip them, this means that even though HDR10 allows for up to BT2020 gamut, the widest gamut in practice is still going to be P3.

Thus this claim should really be considered something offered by HDR video profiles when compared to SDR video profiles specifically, although SDR videos could use wider gamuts if desired without using an HDR profile.

Claim: HDR offers more contrast / better black detail

One of the benefits of HDR sometimes claimed is dark blacks (e.g. Dolby Vision Demo #3 – Core Universe – 4K HDR or “Dark scenes come alive with darker darks” ) or more detail in the dark regions. This is even reflected in BT.2390: “HDR also allows for lower black levels than traditional SDR, which was typically in the range between 0.1 and 1.0 cd/m2 for cathode ray tubes (CRTs) and is now in the range of 0.1 cd/m2 for most standard SDR liquid crystal displays (LCDs).” However, in reality no display attempts to show anything but SDR black as the blackest black the display is physically capable of. Thus there is no difference between HDR or SDR in terms of how dark it can reach – both bottom out at the same dark level on the same display.

As for contrast ratio, as that is the ratio between the brightest white and the darkest black, it is overwhelmingly influenced by how dark a display can get. With the prevalence of OLED displays, particularly in the mobile space, both SDR and HDR have the same contrast ratio as a result, as they both have essentially perfect black levels giving them infinite contrast ratios.

The PQ transfer function does allocate more bits to the dark region, so in theory it can convey better black detail. However, this is a unique aspect of PQ rather than a feature of HDR. HLG is increasingly the more common HDR format as it is preferred by mobile cameras as well as several high end cameras. And while PQ may contain this detail, that doesn’t mean the HDR display can necessarily display it anyway, as discussed in Display Realities.

Claim: HDR offers higher bit depth

This claim comes from HDR10 and some, but not all, Dolby Vision profiles using 10 or 12-bits for the video stream. Similar to more vibrant colors, this is really just an aspect of particular video profiles rather than something HDR itself inherently provides or is coupled to HDR. The usage of 10-bits or more is otherwise not uncommon in imaging, particularly in the higher end photography world, with RAW and TIFF image formats capable of having 10, 12, 14, or 16-bits. Similarly, PNG supports 16-bits, although that is rarely used.

Claim: HDR offers higher peak brightness

This then, is all that HDR really is. But what does “higher peak brightness” really mean? After all, SDR displays have been pushing ever increasing brightness levels before HDR was significant, particularly for sunlight viewing. And even without that, what is the difference between “HDR” and just “SDR with the brightness slider cranked up”? The answer is that we define “HDR” as having a brightness range bigger than SDR, and we think of SDR as being the range driven by autobrightness to be comfortably readable in the current ambient conditions. Thus we define HDR in terms of things like “HDR headroom” or “HDR/SDR ratio” to indicate it’s a floating region relative to SDR. This makes brightness policies easier to reason about. However, it does complicate the interaction with traditional HDR such as that used in video, specifically HLG and PQ content.

PQ/HLG transfer functions

PQ and HLG represent the two most common approaches to HDR in terms of video content. They represent two transfer functions that represent different concepts of what is “HDR.” PQ, published as SMPTE ST 2084:2014, is defined in terms of absolute nits in the display. The expectation is that it encodes from 0 to 10,000 nits, and expects to be mastered for a particular reference viewing environment. HLG takes a different approach, instead opting to take a typical gamma curve for part of the range before switching to logarithmic for the brighter portion. This has a claimed nominal peak brightness of 1000 nits in the reference environment, although it is not defined in absolute luminance terms like PQ is.

Industry-wide specifications have recently formalized the brightness range of both PQ- and HLG-encoded content in relation to SDR. ITU-R BT. 2408-8 defines the reference white level for graphics to be 203 nits. ISO/TS 22028-5 and ISO/PRF 21496-1 have followed suit; 21496-1 in particular defines HDR headroom in terms of nominal peak luminance, relative to a diffuse white luminance at 203 nits.

The realities of modern displays, discussed below, as well as typical viewing environments mean that traditional HDR video are nearly never displayed as intended. A display’s HDR headroom may evaporate under bright viewing conditions, demanding an on-demand tonemapping into SDR. Traditional HDR video encodes a fixed headroom, while modern displays employ a dynamic headroom, resulting in vast differences in video quality even on the same display.

Display Realities

So far most of the discussion around HDR has been from the perspective of the content. However, users consume content on a display, which has its own capabilities and more importantly limits. A high-end mobile display is likely to have characteristics such as gamma 2.2, P3 gamut, and a peak brightness of around 2000 nits. If we then consider something like HDR10 there are mismatches in bit usage prioritization:

PQ’s increased bit allocation at the lower ranges ends up being wasted

The usage of BT2020 ends up spending bits on parts of a gamut that will never be displayed

Encoding up to 10,000 nits of brightness is similarly headroom that’s not utilized

These mismatches are not inherently a problem, however, but it means that as 10-bit displays become more common the existing 10-bit HDR video profiles are unable to actually take advantage of the full display’s capabilities. Thus HDR video profiles are in a position of simultaneously being forward looking while also already being unable to maximize a current 10-bit display’s capabilities. This is where technology such as Ultra HDR or gainmaps in general provide a compelling alternative. Despite sometimes using an 8-bit base image, because the gain layer that transforms it to HDR is specialized to the content and its particular range needs it is more efficient with its bit usage, leading to results that still look stunning. And as that base image is upgraded to 10-bit with newer image formats such as AVIF, the effective bit usage is even better than those of typical HDR video codecs. Thus these approaches do not represent evolutionary or stepping stones to “true HDR”, but rather are also an improvement on HDR in addition to having better backwards compatibility. Similarly Android’s UI toolkit’s usage of the extendedRangeBrightness API actually still primarily happens in 8-bit space. Because the rendering is tailored to the specific display and current conditions it is still possible to have a good HDR experience despite the usage of RGBA_8888.

Unlocking HDR on Android: Next steps

High Dynamic Range (HDR) offers advancement in visual fidelity for Android developers, moving beyond the traditional constraints of Standard Dynamic Range (SDR) by enabling higher peak brightness.

By understanding the core components of color – bit depth, transfer function, and color gamut – and debunking common myths, developers can leverage technologies like Ultra HDR to deliver truly immersive experiences that are both visually stunning and backward compatible.

In our next article, we’ll delve into the nuances of HDR and user intent, exploring how to optimize your content for diverse display capabilities and viewing environments.

The post What is HDR? appeared first on InShot Pro.

Building delightful Android camera and media experiences

Seharakram60 — Fri, 06 Jun 2025 12:04:12 +0000

Posted by Donovan McMurray, Mayuri Khinvasara Khabya, Mozart Louis, and Nevin Mital – Developer Relations Engineers

Hello Android Developers!

We are the Android Developer Relations Camera & Media team, and we’re excited to bring you something a little different today. Over the past several months, we’ve been hard at work writing sample code and building demos that showcase how to take advantage of all the great potential Android offers for building delightful user experiences.

Some of these efforts are available for you to explore now, and some you’ll see later throughout the year, but for this blog post we thought we’d share some of the learnings we gathered while going through this exercise.

Grab your favorite Android plush or rubber duck, and read on to see what we’ve been up to!

Future-proof your app with Jetpack

Nevin Mital

One of our focuses for the past several years has been improving the developer tools available for video editing on Android. This led to the creation of the Jetpack Media3 Transformer APIs, which offer solutions for both single-asset and multi-asset video editing preview and export. Today, I’d like to focus on the Composition demo app, a sample app that showcases some of the multi-asset editing experiences that Transformer enables.

I started by adding a custom video compositor to demonstrate how you can arrange input video sequences into different layouts for your final composition, such as a 2×2 grid or a picture-in-picture overlay. You can customize this by implementing a VideoCompositorSettings and overriding the getOverlaySettings method. This object can then be set when building your Composition with setVideoCompositorSettings.

Here is an example for the 2×2 grid layout:

object : VideoCompositorSettings {
  ...

  override fun getOverlaySettings(inputId: Int, presentationTimeUs: Long): OverlaySettings {
    return when (inputId) {
      0 -> { // First sequence is placed in the top left
        StaticOverlaySettings.Builder()
          .setScale(0.5f, 0.5f)
          .setOverlayFrameAnchor(0f, 0f) // Middle of overlay
          .setBackgroundFrameAnchor(-0.5f, 0.5f) // Top-left section of background
          .build()
      }

      1 -> { // Second sequence is placed in the top right
        StaticOverlaySettings.Builder()
          .setScale(0.5f, 0.5f)
          .setOverlayFrameAnchor(0f, 0f) // Middle of overlay
          .setBackgroundFrameAnchor(0.5f, 0.5f) // Top-right section of background
          .build()
      }

      2 -> { // Third sequence is placed in the bottom left
        StaticOverlaySettings.Builder()
          .setScale(0.5f, 0.5f)
          .setOverlayFrameAnchor(0f, 0f) // Middle of overlay
          .setBackgroundFrameAnchor(-0.5f, -0.5f) // Bottom-left section of background
          .build()
      }

      3 -> { // Fourth sequence is placed in the bottom right
        StaticOverlaySettings.Builder()
          .setScale(0.5f, 0.5f)
          .setOverlayFrameAnchor(0f, 0f) // Middle of overlay
          .setBackgroundFrameAnchor(0.5f, -0.5f) // Bottom-right section of background
          .build()
      }

      else -> {
        StaticOverlaySettings.Builder().build()
      }
    }
  }
}

Since getOverlaySettings also provides a presentation time, we can even animate the layout, such as in this picture-in-picture example:

Next, I spent some time migrating the Composition demo app to use Jetpack Compose. With complicated editing flows, it can help to take advantage of as much screen space as is available, so I decided to use the supporting pane adaptive layout. This way, the user can fine-tune their video creation on the preview screen, and export options are only shown at the same time on a larger display. Below, you can see how the UI dynamically adapts to the screen size on a foldable device, when switching from the outer screen to the inner screen and vice versa.

What’s great is that by using Jetpack Media3 and Jetpack Compose, these features also carry over seamlessly to other devices and form factors, such as the new Android XR platform. Right out-of-the-box, I was able to run the demo app in Home Space with the 2D UI I already had. And with some small updates, I was even able to adapt the UI specifically for XR with features such as multiple panels, and to take further advantage of the extra space, an Orbiter with playback controls for the editing preview.

Orbiter(
  position = OrbiterEdge.Bottom,
  offset = EdgeOffset.inner(offset = MaterialTheme.spacing.standard),
  alignment = Alignment.CenterHorizontally,
  shape = SpatialRoundedCornerShape(CornerSize(28.dp))
) {
  Row (horizontalArrangement = Arrangement.spacedBy(MaterialTheme.spacing.mini)) {
    // Playback control for rewinding by 10 seconds
    FilledTonalIconButton({ viewModel.seekBack(10_000L) }) {
      Icon(
        painter = painterResource(id = R.drawable.rewind_10),
        contentDescription = "Rewind by 10 seconds"
      )
    }
    // Playback control for play/pause
    FilledTonalIconButton({ viewModel.togglePlay() }) {
      Icon(
        painter = painterResource(id = R.drawable.rounded_play_pause_24),
        contentDescription = 
            if(viewModel.compositionPlayer.isPlaying) {
                "Pause preview playback"
            } else {
                "Resume preview playback"
            }
      )
    }
    // Playback control for forwarding by 10 seconds
    FilledTonalIconButton({ viewModel.seekForward(10_000L) }) {
      Icon(
        painter = painterResource(id = R.drawable.forward_10),
        contentDescription = "Forward by 10 seconds"
      )
    }
  }
}

Jetpack libraries unlock premium functionality incrementally

Donovan McMurray

Not only do our Jetpack libraries have you covered by working consistently across existing and future devices, but they also open the doors to advanced functionality and custom behaviors to support all types of app experiences. In a nutshell, our Jetpack libraries aim to make the common case very accessible and easy, and it has hooks for adding more custom features later.

We’ve worked with many apps who have switched to a Jetpack library, built the basics, added their critical custom features, and actually saved developer time over their estimates. Let’s take a look at CameraX and how this incremental development can supercharge your process.

// Set up CameraX app with preview and image capture.
// Note: setting the resolution selector is optional, and if not set,
// then a default 4:3 ratio will be used.
val aspectRatioStrategy = AspectRatioStrategy(
  AspectRatio.RATIO_16_9, AspectRatioStrategy.FALLBACK_RULE_NONE)
var resolutionSelector = ResolutionSelector.Builder()
  .setAspectRatioStrategy(aspectRatioStrategy)
  .build()

private val previewUseCase = Preview.Builder()
  .setResolutionSelector(resolutionSelector)
  .build()
private val imageCaptureUseCase = ImageCapture.Builder()
  .setResolutionSelector(resolutionSelector)
  .setCaptureMode(ImageCapture.CAPTURE_MODE_MINIMIZE_LATENCY)
  .build()

val useCaseGroupBuilder = UseCaseGroup.Builder()
  .addUseCase(previewUseCase)
  .addUseCase(imageCaptureUseCase)

cameraProvider.unbindAll()

camera = cameraProvider.bindToLifecycle(
  this,  // lifecycleOwner
  CameraSelector.DEFAULT_BACK_CAMERA,
  useCaseGroupBuilder.build(),
)

After setting up the basic structure for CameraX, you can set up a simple UI with a camera preview and a shutter button. You can use the CameraX Viewfinder composable which displays a Preview stream from a CameraX SurfaceRequest.

// Create preview
Box(
  Modifier
    .background(Color.Black)
    .fillMaxSize(),
  contentAlignment = Alignment.Center,
) {
  surfaceRequest?.let {
    CameraXViewfinder(
      modifier = Modifier.fillMaxSize(),
      implementationMode = ImplementationMode.EXTERNAL,
      surfaceRequest = surfaceRequest,
     )
  }
  Button(
    onClick = onPhotoCapture,
    shape = CircleShape,
    colors = ButtonDefaults.buttonColors(containerColor = Color.White),
    modifier = Modifier
      .height(75.dp)
      .width(75.dp),
  )
}

fun onPhotoCapture() {
  // Not shown: defining the ImageCapture.OutputFileOptions for
  // your saved images
  imageCaptureUseCase.takePicture(
    outputOptions,
    ContextCompat.getMainExecutor(context),
    object : ImageCapture.OnImageSavedCallback {
      override fun onError(exc: ImageCaptureException) {
        val msg = "Photo capture failed."
        Toast.makeText(context, msg, Toast.LENGTH_SHORT).show()
      }

      override fun onImageSaved(output: ImageCapture.OutputFileResults) {
        val savedUri = output.savedUri
        if (savedUri != null) {
          // Do something with the savedUri if needed
        } else {
          val msg = "Photo capture failed."
          Toast.makeText(context, msg, Toast.LENGTH_SHORT).show()
        }
      }
    },
  )
}

You’re already on track for a solid camera experience, but what if you wanted to add some extra features for your users? Adding filters and effects are easy with CameraX’s Media3 effect integration, which is one of the new features introduced in CameraX 1.4.0.

Here’s how simple it is to add a black and white filter from Media3’s built-in effects.

val media3Effect = Media3Effect(
  application,
  PREVIEW or IMAGE_CAPTURE,
  ContextCompat.getMainExecutor(application),
  {},
)
media3Effect.setEffects(listOf(RgbFilter.createGrayscaleFilter()))
useCaseGroupBuilder.addEffect(media3Effect)

The Media3Effect object takes a Context, a bitwise representation of the use case constants for targeted UseCases, an Executor, and an error listener. Then you set the list of effects you want to apply. Finally, you add the effect to the useCaseGroupBuilder we defined earlier.

(Left) Our camera app with no filter applied. (Right) Our camera app after the createGrayscaleFilter was added.

There are many other built-in effects you can add, too! See the Media3 Effect documentation for more options, like brightness, color lookup tables (LUTs), contrast, blur, and many other effects.

To take your effects to yet another level, it’s also possible to define your own effects by implementing the GlEffect interface, which acts as a factory of GlShaderPrograms. You can implement a BaseGlShaderProgram’s drawFrame() method to implement a custom effect of your own. A minimal implementation should tell your graphics library to use its shader program, bind the shader program’s vertex attributes and uniforms, and issue a drawing command.

Jetpack libraries meet you where you are and your app’s needs. Whether that be a simple, fast-to-implement, and reliable implementation, or custom functionality that helps the critical user journeys in your app stand out from the rest, Jetpack has you covered!

Jetpack offers a foundation for innovative AI Features

Mayuri Khinvasara Khabya

Just as Donovan demonstrated with CameraX for capture, Jetpack Media3 provides a reliable, customizable, and feature-rich solution for playback with ExoPlayer. The AI Samples app builds on this foundation to delight users with helpful and enriching AI-driven additions.

In today’s rapidly evolving digital landscape, users expect more from their media applications. Simply playing videos is no longer enough. Developers are constantly seeking ways to enhance user experiences and provide deeper engagement. Leveraging the power of Artificial Intelligence (AI), particularly when built upon robust media frameworks like Media3, offers exciting opportunities. Let’s take a look at some of the ways we can transform the way users interact with video content:

Empowering Video Understanding: The core idea is to use AI, specifically multimodal models like the Gemini Flash and Pro models, to analyze video content and extract meaningful information. This goes beyond simply playing a video; it’s about understanding what’s in the video and making that information readily accessible to the user.

Actionable Insights: The goal is to transform raw video into summaries, insights, and interactive experiences. This allows users to quickly grasp the content of a video and find specific information they need or learn something new!

Accessibility and Engagement: AI helps make videos more accessible by providing features like summaries, translations, and descriptions. It also aims to increase user engagement through interactive features.

A Glimpse into AI-Powered Video Journeys

The following example demonstrates potential video journies enhanced by artificial intelligence. This sample integrates several components, such as ExoPlayer and Transformer from Media3; the Firebase SDK (leveraging Vertex AI on Android); and Jetpack Compose, ViewModel, and StateFlow. The code will be available soon on Github.

(Left) Video summarization (Right) Thumbnails timestamps and HDR frame extraction

There are two experiences in particular that I’d like to highlight:

HDR Thumbnails: AI can help identify key moments in the video that could make for good thumbnails. With those timestamps, you can use the new ExperimentalFrameExtractor API from Media3 to extract HDR thumbnails from videos, providing richer visual previews.
Text-to-Speech: AI can be used to convert textual information derived from the video into spoken audio, enhancing accessibility. On Android you can also choose to play audio in different languages and dialects thus enhancing personalization for a wider audience.

Using the right AI solution

Currently, only cloud models support video inputs, so we went ahead with a cloud-based solution.Iintegrating Firebase in our sample empowers the app to:

Generate real-time, concise video summaries automatically.
Produce comprehensive content metadata, including chapter markers and relevant hashtags.
Facilitate seamless multilingual content translation.

So how do you actually interact with a video and work with Gemini to process it? First, send your video as an input parameter to your prompt:

val promptData =
"Summarize this video in the form of top 3-4 takeaways only. Write in the form of bullet points. Don't assume if you don't know"

val generativeModel = Firebase.vertexAI.generativeModel("gemini-2.0-flash")
_outputText.value = OutputTextState.Loading

viewModelScope.launch(Dispatchers.IO) {
    try {
        val requestContent = content {
            fileData(videoSource.toString(), "video/mp4")
            text(prompt)
        }
        val outputStringBuilder = StringBuilder()

        generativeModel.generateContentStream(requestContent).collect { response ->
            outputStringBuilder.append(response.text)
            _outputText.value = OutputTextState.Success(outputStringBuilder.toString())
        }

        _outputText.value = OutputTextState.Success(outputStringBuilder.toString())

    } catch (error: Exception) {
        _outputText.value = error.localizedMessage?.let { OutputTextState.Error(it) }
    }
}

Notice there are two key components here:

FileData: This component integrates a video into the query.
Prompt: This asks the user what specific assistance they need from AI in relation to the provided video.

Of course, you can finetune your prompt as per your requirements and get the responses accordingly.

In conclusion, by harnessing the capabilities of Jetpack Media3 and integrating AI solutions like Gemini through Firebase, you can significantly elevate video experiences on Android. This combination enables advanced features like video summaries, enriched metadata, and seamless multilingual translations, ultimately enhancing accessibility and engagement for users. As these technologies continue to evolve, the potential for creating even more dynamic and intelligent video applications is vast.

Go above-and-beyond with specialized APIs

Mozart Louis

Android 16 introduces the new audio PCM Offload mode which can reduce the power consumption of audio playback in your app, leading to longer playback time and increased user engagement. Eliminating the power anxiety greatly enhances the user experience.

Oboe is Android’s premiere audio api that developers are able to use to create high performance, low latency audio apps. A new feature is being added to the Android NDK and Android 16 called Native PCM Offload playback.

Offload playback helps save battery life when playing audio. It works by sending a large chunk of audio to a special part of the device’s hardware (a DSP). This allows the CPU of the device to go into a low-power state while the DSP handles playing the sound. This works with uncompressed audio (like PCM) and compressed audio (like MP3 or AAC), where the DSP also takes care of decoding.

This can result in significant power saving while playing back audio and is perfect for applications that play audio in the background or while the screen is off (think audiobooks, podcasts, music etc).

We created the sample app PowerPlay to demonstrate how to implement these features using the latest NDK version, C++ and Jetpack Compose.

Here are the most important parts!

First order of business is to assure the device supports audio offload of the file attributes you need. In the example below, we are checking if the device support audio offload of stereo, float PCM file with a sample rate of 48000Hz.

       val format = AudioFormat.Builder()
            .setEncoding(AudioFormat.ENCODING_PCM_FLOAT)
            .setSampleRate(48000)
            .setChannelMask(AudioFormat.CHANNEL_OUT_STEREO)
            .build()

        val attributes =
            AudioAttributes.Builder()
                .setContentType(AudioAttributes.CONTENT_TYPE_MUSIC)
                .setUsage(AudioAttributes.USAGE_MEDIA)
                .build()
       
        val isOffloadSupported = 
            if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.Q) {
                AudioManager.isOffloadedPlaybackSupported(format, attributes)
            } else {
                false
            }

        if (isOffloadSupported) {
            player.initializeAudio(PerformanceMode::POWER_SAVING_OFFLOADED)
        }

Once we know the device supports audio offload, we can confidently set the Oboe audio streams’ performance mode to the new performance mode option, PerformanceMode::POWER_SAVING_OFFLOADED.

Player::initializeAudio(bool isOffloadSupported) {
    // Create an audio stream
    AudioStreamBuilder builder;
    builder.setChannelCount(mChannelCount);
    builder.setDataCallback(mDataCallback);
    builder.setFormat(AudioFormat::Float);
    builder.setSampleRate(48000);
    builder.setErrorCallback(mErrorCallback);
    builder.setPresentationCallback(mPresentationCallback);

    if (isOffloadSupported) {
      builder.setPerformanceMode(oboe::PerformanceMode::POWER_SAVING_OFFLOADED);
      builder.setFramesPerDataCallback(128); // set a low frame buffer amount
    } else {
      builder.setPerformanceMode(oboe::PerformanceMode::LowLatency
    }
      builder.setSharingMode(SharingMode::Exclusive);
      builder.setSampleRateConversionQuality(SampleRateConversionQuality::Medium);
      Result result = builder.openStream(mAudioStream);
}

Now when audio is played back, it will be offloading audio to the DSP, helping save power when playing back audio.

There is more to this feature that will be covered in a future blog post, fully detailing out all of the new available APIs that will help you optimize your audio playback experience!

What’s next

Of course, we were only able to share the tip of the iceberg with you here, so to dive deeper into the samples, check out the following links:

Hopefully these examples have inspired you to explore what new and fascinating experiences you can build on Android. Tune in to our session at Google I/O in a couple weeks to learn even more about use-cases supported by solutions like Jetpack CameraX and Jetpack Media3!

The post Building delightful Android camera and media experiences appeared first on InShot Pro.