A Ravnur Media Services (RMS) asset stores media files in an associated storage account. The Tracks API lets you upload subtitle and caption files to an asset and register them as text tracks. The RMS streaming endpoint communicates track information to players via DASH manifests and HLS playlists.
In this article:
Generating text tracks with Video Indexer
Limitations and considerations
Overview
Text tracks are subtitle and caption files associated with a media asset. They provide text alternatives and translations of spoken audio content, synchronized with video playback. Using the Tracks API, you can add, update, and remove text tracks to support multi-language content and accessibility requirements.
Tracks API capabilities
- Get track information - retrieve all text tracks in an asset via Tracks - List.
- Add text tracks - upload the file to the output asset, then call Tracks - Create or Update to register it by file name.
- Remove text tracks - use Tracks - Delete to remove a track from the asset.
- Update track properties - modify a track's display name and HLS settings using Tracks - Update.
-
Update track data - use Tracks - Update Track Data to sync changes to the streaming endpoint after manually editing the
.ismmanifest in storage. - Edit track files - download the track file, make corrections or translations, and upload it back to the asset. See How to edit a text track in the Azure portal.
Text tracks
Text tracks provide subtitles, closed captions, and other text-based content synchronized with your video.
Supported text track formats
- WebVTT (.vtt) - Web Video Text Tracks format, the standard for HTML5 video subtitles and captions. This format is most widely supported across different players.
-
TTML (.ttml) - Timed Text Markup Language, commonly used in broadcast and streaming environments. Note: TTML files should follow the IMSC1 profile (e.g., include
ttp:profile), otherwise playback compatibility may vary. - SRT (.srt) - SubRip subtitle format, widely supported and easy to create.
Text track types
- Subtitles - translations of spoken dialogue for viewers who understand the video's language but prefer or need text.
- Closed captions - complete transcription of dialogue and sound effects for viewers who are deaf or hard of hearing.
- Descriptions - additional descriptive text providing context beyond spoken dialogue.
Generating text tracks with Microsoft AI Video Indexer
Microsoft AI Video Indexer can automatically generate captions using the AudioAnalyzer preset in a transform job. Note that the preset must be applied to the source asset, not the streaming output asset.
The generated VTT file is placed in the transform's output asset. To use it as a text track on your streaming asset:
- Run a transform job with the AudioAnalyzer preset on the source asset.
- Download the generated VTT file from the transform's output asset.
- Upload the VTT file to the streaming asset - the one containing the
.ismmanifest. - Register it as a text track using Tracks - Create or Update.
This service transcribes audio content with high accuracy, generates properly timed timestamps, supports multiple languages, and produces WebVTT output.
Text track metadata
When adding text tracks, specify the following properties in the request body:
| Name | Required | Type | Description |
|---|---|---|---|
properties.track[] | |||
@odata.type |
True | string | Identifies the track type. Must be #Microsoft.Media.TextTrack. |
fileName |
True | string | The name of the file already uploaded to the asset's storage container. Cannot be changed after the track is created. |
displayName |
string | Human-readable label shown to users (e.g., "English Subtitles", "Spanish Closed Captions"). | |
languageCode |
string | RFC5646 language code (e.g., en for English, es for Spanish, fr for French). Defaults to en-US if not specified. |
|
hlsSettings |
HlsSettings | The HLS-specific setting for the text track, represented by the default parameter. |
|
playerVisibility |
string | When set to "Visible", the text track is included in the DASH manifest or HLS playlist when requested by a client. When set to "Hidden", the track is not available to the client. Default is "Visible". |
|
Managing text tracks
Adding text tracks
- Create or obtain a properly formatted subtitle or caption file (VTT, TTML, or SRT).
Upload the file to the output asset via RMS Console (see below) or Azure Portal.
- Call Tracks - Create or Update to register it.
-
Optionally, specify
languageCode(enis the default) anddisplayNameto name the track in the player selector.
Editing text tracks
To edit the content of a track file, use the Azure portal. See How to edit a text track in the Azure portal.
Removing text tracks
Use the Tracks - Delete API call to delete unwanted text tracks from the asset, which removes them from player manifests. Alternatively, remove it via the RMS Console or Azure Portal.
Uploading the track file via RMS Console
In the Assets tab, select + Upload asset.
Select your track file.
Clear the generated Asset name and enter the exact name of the output asset you want to add the track to. Enter the asset name exactly as it appears. The form does not confirm whether the asset already exists - if the name doesn't match, a new asset will be created instead.
Select Upload asset.
Important: Upload the file to the output asset, not the source asset. Track files added to a source asset are not available for playback.
Player integration
When a player requests content from the RMS streaming endpoint:
-
HLS playlists - include
#EXT-X-MEDIAtags describing available subtitle tracks. - Player selection - video players parse the manifest and present available tracks to users through their interface.
- Runtime switching - users can switch between text tracks during playback without interrupting the video.
Accessibility considerations
Properly configured text tracks are essential for accessibility:
- WCAG compliance - captions and audio descriptions help meet Web Content Accessibility Guidelines.
- Legal requirements - many jurisdictions require captions for specific content types (broadcast, educational, government).
- Language attributes - correctly specified language codes enable assistive technologies to function properly.
- Default settings - consider setting accessibility tracks as the default for users who may benefit from them.
- Display names - track display names are passed to players via manifest metadata. Some players may show the language code instead of the display name. This is player-specific behavior and cannot be controlled through the API.
Best practices
- Provide multiple text track options - include both subtitles (for translation) and closed captions (for accessibility) when possible.
- Use standard language codes - always specify RFC5646 codes to ensure proper player behavior and user device compatibility.
- Create descriptive display names - use clear, user-friendly labels that help viewers understand each track's purpose. Note that some players may display the language code instead - this is player-specific behavior outside of API control.
- Set appropriate defaults - if your primary audience is English-speaking, set the English track as the default rather than having no track selected.
- Maintain track quality - ensure captions are accurate, synchronized, and properly formatted.
- Test player behavior - verify that tracks appear correctly in different players (web, mobile, smart TV) and that switching works smoothly.
-
Organize track metadata consistently - use a consistent naming scheme for
fileName(for example,assetname_(en-US).vtt) and a consistent labeling scheme fordisplayName, which is what surfaces to users in the player captions menu. Applying these conventions across all assets makes bulk management significantly easier. - Leverage automated tools - use Microsoft AI Video Indexer with AudioAnalyzer preset to generate initial captions, then refine them for accuracy.
- Plan for localization - structure your track management workflow to accommodate future language additions.
Limitations and considerations
-
Track addition timing - text tracks can only be added to assets that have been processed through a transform and contain a streaming manifest (
.ism). Tracks cannot be added to unprocessed or raw assets. - Format requirements - text tracks must be in supported formats (VTT, TTML, SRT).
-
Manifest updates - track changes made through the API are reflected in the streaming manifest automatically. Allow up to 2 hours for CDN propagation after any changes. If you have manually edited the
.ismmanifest in storage, call Tracks - Update Track Data to sync the changes. - Storage impact - each additional track increases storage costs; plan accordingly for large content libraries.
See more
For more information about related concepts, see: