Internet-Draft media container March 2024
Zanaty, et al. Expires 5 September 2024 [Page]
Workgroup:
Network Working Group
Internet-Draft:
draft-mzanaty-moq-loc-03
Published:
Intended Status:
Informational
Expires:
Authors:
M. Zanaty
Cisco
S. Nandakumar
Cisco
P. Thatcher
Microsoft

Low Overhead Media Container

Abstract

This specification describes a media container format for encoded and encrypted audio and video media data to be used primarily for interactive Media over QUIC transport (MOQ), with the goal of it being a low-overhead format. It also defines the LOC Streaming Format for the MOQ Common Catalog format for publishers to annouce and describe their LOC tracks and for subscribers to consume them.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 5 September 2024.

Table of Contents

1. Introduction

This specification describes a low-overhead media container format for encoded and encrypted audio and video media data to be used primarily for interactive Media over QUIC transport (MOQT) [MoQTransport], with the goal of it being a low-overhead format. It also defines the LOC Streaming Format for the MOQ Common Catalog format [MoQCatalog] for publishers to annouce and describe their LOC tracks and for subscribers to consume them.

"Low-overhead" refers to minimal extra encapsulation as well as minimal application overhead when interfacing with WebCodecs [WebCodecs].

The container format description is specified for all audio and video codecs defined in the WebCodecs Codec Registry [WEBCODECS-CODEC-REGISTRY]. The audio and video payload bitstream is identical to the "internal data" inside an EncodedAudioChunk and EncodedVideoChunk, respectively, specified in the registry.

In addition to the media payloads, critical metadata is also specified for audio and video payloads. (Note: Align with MOQT terminology of either "metadata" or "header".)

A primary motivation is to align with media formats used in WebCodecs to minimize extra encapsulation and application overhead when interfacing with WebCodecs. Other container formats like CMAF or RTP would require more extensive application overhead in format conversions, as well as larger encapsultion overhead which may burden some use cases like low bitrate audio scenarios.

This specification can also be used by applications outside the context of WebCodecs or a web browser. While the media payloads are defined by referring to the "internal data" of an EncodedAudioChunk or EncodedVideoChunk in the WebCodecs Codec Registry, this "internal data" is the elementary bitstream format of codecs without any encapsulation. Referring to the WebCodecs Codec Registry avoids duplicating it in an identical IANA registry.

1.1. Requirements Notation and Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

2. Payload Format

The WebCodecs Codec Registry defines the contents of an EncodedAudioChunk and EncodedVideoChunk for the audio and video codec formats in the registry. The "internal data" in these chunks is used directly in this specification as the "LOC Payload" bitstream. This "internal data" is the elementary bitstream format of each codec without any encapsulation.

For video formats with multiple bitstream formats in the WebCodecs Registry, such as H.264/AVC or H.265/HEVC, the LOC Payload uses the "canonical" format ("avcc" or "hevc", not "annexB") with the following additions: * Parameter sets are sent in the bitstream before key frames. * 4 byte lengths are sent before each NAL Unit. * No start codes or emulation prevention are used in the bitstream. * No additional codec configuration information ("extradata") is needed.

2.1. MOQ Object Mapping

An application object when transported as a [MoQTransport] object is composed of a MOQ Object Header and its Payload. Media objects encoded using the container format defined in this specification populate the MOQ Object Payload with a LOC Header and LOC Payload as shown below.

The LOC Payload is the "internal data" of an EncodedAudioChunk or EncodedVideoChunk.

+--------------+----------+-----------+
|  MOQ Object  |  LOC     |  LOC      |
|  Header      |  Header  |  Payload  |
+--------------+----------------------+
               <---------------------->
                  MOQ Object Payload

                  MOQ Object with LOC Container

2.2. LOC Header Metadata

The LOC Header carries metadata for the corresponding LOC Payload. This metadata provides necessary information for intermediaries such as media switches to perform their media switching decisions when the payload is inaccessible due to encryption.

Section Section 2.2.4 provides a framework for registering new LOC Header fields that aren't defined by this specification.

2.2.1. Common Header Data

The following metadata MUST be captured for each media frame.

Sequence Number: Identifies a sequentially increasing variable length integer that is incremented per encoded media frame. This may be replaced with the Object Sequence from the MOQ Object Header in cases where a MOQ Object is exactly one frame.

Capture Timestamp in Microseconds: Captures the wall-clock time of the encoded media frame in a 64-bit unsigned integer.

2.2.2. Video Header Data

Flags for frames which are independent, discardable, or base layer sync points, as well as temporal and spatial layer identification. [Framemarking] .

2.2.3. Audio Header Data

Audio Level: Captures the magnitude of the audio level of the corresponding audio frame encoded in 7 bits as defined in section 3 of [RFC6464].

2.2.4. Header Data Registration

This section details the procedures to register header data fields that might be useful for a particular class of media applications.

Registering a given metadata field requires the following attributes to be specified.

Shortname: Short name for the metadata. (Not sent on the wire.)

Description: Detailed description for the metadata. (Not sent on the wire.)

ID: Identifier assigned by the registry. (varint)

Length: Length of metadata value in bytes. (varint)

Value: Value of metadata. (length bytes)

Registration of type "Specification Required" is followed for registering new metadata in the LOC Header.

3. Catalog

A catalog is a MOQT Object that provides information about tracks from a given publisher. A catalog is used by subscribers for consuming tracks and by publishers to advertise and describe the tracks. The content of a catalog is opaque to the relays and may be end to end encrypted. A catalog describes the details of tracks such as Track IDs and corresponding media configuration details, for example, audio/video codec details.

The LOC Streaming Format uses the MoQ Common Catalog Format [MoQCatalog] to describe the content being produced by a publisher.

Per Sect 5.1 of [MoQCatalog], this document registers an entry in the "MoQ Streaming Format Type" table, with the type value 2, the name "LOC Streaming Format", and the RFC XXX.

Every LOC catalog track MUST declare a streaming format type (See Sect 3.2.1 of [MoQCatalog]) value of 2.

Every LOC catalog track MUST declare a streaming format version (See Sect 3.2.1 of [MoQCatalog]) value of 1, which is the version described in this document.

Every LOC catalog track MUST declare a packaging type (See Sect 3.2.9 of [MoQCatalog]) of "loc".

The catalog track MUST have a track name of "catalog". A catalog object MAY be independent of other catalog objects or it MAY represent a delta update of a prior catalog object. The first catalog object published within a new group MUST be independent. A catalog object SHOULD only be published only when the availability of tracks changes.

Each catalog update MUST be mapped to a discreet moq-transport object.

3.1. Catalog Fields

The MOQ Common Catalog defines the required base fields and optional extensions.

3.1.1. Optional Extensions for Video

The LOC Streaming Format allows the following optional extensions for video media.

  • temporalId: Identifies the temporal layer/sub-layer encoded, starting with 0 for the base layer, and increasing with higher temporal fidelity.

  • spatialId: Identifies the spatial and quality layer encoded, starting with 0 for the base layer, and increasing with higher fidelity.

  • depends: Identifies track dependencies for a given track, usually for video media with scalable layers in separate tracks.

  • renderGroup: Identifies a group of time-aligned tracks which should be rendered simultaneously.

  • selectionParams: Selection parameters for media quality, fidelity, etc.; see next section.

3.1.2. Selection Parameters for Video

Each video track can have the following associated Selection Parameters.

3.1.3. Optional Extensions for Audio

The LOC Streaming Format allows the following optional extensions for audio media.

  • renderGroup: Identifies a group of time-aligned tracks which should be rendered simultaneously.

  • selectionParams: Selection parameters for media quality, fidelity, etc.; see next section.

3.1.4. Selection Parameters for Audio

Each audio track can have the following associated Selection Parameters.

3.2. Catalog Examples

See section 3.4 of the MOQ Common Catalog [MoQCatalog].

4. Payload Encryption

When end to end encryption is supported, the encoded payload is encrypted with symmetric keys derived from key establishment mechanisms, such as [MOQ-MLS], and the payload itself is protected using mechanisms defined in [SecureObjects].

5. Container Serialization

The wire encoding of the payload conforming to this specification is a set of length delimited values as shown below.

The Bytes is obtained as output of AEAD operation for encrypting the Payload with the header data as additional data input.

+--------+------------+-------+------------+
| Payload | Bytes | Payload  | Bytes |
| Len     |  (0)  | Len (1)  |  (1)  | ...
+--------+------------+-------+------------+

6. Security Considerations

TODO

7. IANA Considerations

A new IANA registry for LOC Header Metadata is defined and populated with the information in section Section 2.2.4. Specification required for new metadata registration.

This document creates a new entry in the "MoQ Streaming Format" Registry (see [MoQTransport] Sect 8). The type value is 0x002, the name is "LOC Streaming Format" and the RFC is XXX.

8. Normative References

[MoQTransport]
Curley, L., Pugin, K., Nandakumar, S., Vasiliev, V., and I. Swett, "Media over QUIC Transport", Work in Progress, Internet-Draft, draft-ietf-moq-transport-02, , <https://datatracker.ietf.org/doc/html/draft-ietf-moq-transport-02>.
[MoQCatalog]
Nandakumar, S., Law, W., and M. Zanaty, "Common Catalog Format for moq-transport", Work in Progress, Internet-Draft, draft-wilaw-moq-catalogformat-02, , <https://datatracker.ietf.org/doc/html/draft-wilaw-moq-catalogformat-02>.
[Framemarking]
Zanaty, M., Berger, E., and S. Nandakumar, "Video Frame Marking RTP Header Extension", Work in Progress, Internet-Draft, draft-ietf-avtext-framemarking-15, , <https://datatracker.ietf.org/doc/html/draft-ietf-avtext-framemarking-15>.
[SecureObjects]
"Secure Objects for Media over QUIC", n.d., <https://suhashere.github.io/moq-secure-objects/#go.draft-jennings-moq-secure-objects.html>.
[MOQ-MLS]
"Secure Group Key Agreement with MLS over MoQ", n.d., <https://suhashere.github.io/moq-e2ee-mls/draft-jennings-moq-e2ee-mls.html>.
[WebCodecs]
"WebCodecs", , <https://www.w3.org/TR/webcodecs/>.
[WEBCODECS-CODEC-REGISTRY]
"WebCodecs Codec Registry", , <https://www.w3.org/TR/webcodecs-codec-registry/>.
[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/rfc/rfc2119>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/rfc/rfc8174>.
[RFC6464]
Lennox, J., Ed., Ivov, E., and E. Marocco, "A Real-time Transport Protocol (RTP) Header Extension for Client-to-Mixer Audio Level Indication", RFC 6464, DOI 10.17487/RFC6464, , <https://www.rfc-editor.org/rfc/rfc6464>.
[RFC5646]
Phillips, A., Ed. and M. Davis, Ed., "Tags for Identifying Languages", BCP 47, RFC 5646, DOI 10.17487/RFC5646, , <https://www.rfc-editor.org/rfc/rfc5646>.

Appendix A. Acknowledgements

Thanks to Cullen Jennings for suggestions and review.

Authors' Addresses

Mo Zanaty
Cisco
Suhas Nandakumar
Cisco
Peter Thatcher
Microsoft