ORGANISATION INTERNATIONALE DE NORMALISATION
ISO/IEC JTC1/SC29/WG11
CODING OF MOVING PICTURES AND AUDIO
ISO/IEC JTC1/SC29/WG11/N5231
Shanghai, October 2002
Title: |
MPEG-21 Overview v.5 |
Source: |
Requirements Group |
Editors: |
Jan Bormans, Keith Hill |
Status: |
Approved |
2 MPEG-21 Multimedia Framework
5.1 ISO/IEC TR 21000-1: MPEG-21 Multimedia Framework Part 1: Vision, Technologies and Strategy
5.2 MPEG-21 Part 2 – Digital Item Declaration
5.3 MPEG-21 Part 3 - Digital Item Identification
5.4 MPEG-21 Part 4 – Intellectual Property Management and Protection (IPMP)
5.5 MPEG-21 Part 5 – Rights Expression Language
5.6 MPEG-21 Part 6 – Rights Data Dictionary
5.7 MPEG-21 Part 7 - Digital Item Adaptation
5.8 MPEG-21 Part 8 – Reference Software
5.9 MPEG-21 Part 9 – File Format
6 Proposals and Recommendations for Further Work
6.1 Persistent Association of Identification and Description with Digital Items
6.4 Timetable for MPEG-21 Standardisation
The appetite for consuming content and the accessibility of information continues to increase at a rapid pace. Access devices, with a large set of differing terminal and network capabilities, continue to evolve, having a growing impact on peoples’ lives. Additionally, these access devices possess the functionality to be used in different locations and environments: anywhere and at anytime. Their users, however, are currently not given tools to deal efficiently with all the intricacies of this new multimedia usage context.
Solutions with advanced multimedia functionality are becoming increasingly important as individuals are producing more and more digital media, not only for professional use but also for their personal use. All these “content providers” have many of the same concerns: management of content, re-purposing content based on consumer and device capabilities, protection of rights, protection from unauthorised access/modification, protection of privacy of providers and consumers, etc.
Such developments are pushing the boundaries of existing business models for trading physical goods and require new models for distributing and trading digital content electronically. For example, it is becoming increasingly difficult for legitimate users of content to identify and interpret the different intellectual property rights that are associated with the elements of multimedia content. Additionally, there are some users who freely exchange content with disregard for the rights associated with content and rights holders are powerless to prevent them. The boundaries between the delivery of audio (music and spoken word), accompanying artwork (graphics), text (lyrics), video (visual) and synthetic spaces are becoming increasingly blurred. New solutions are required for the access, delivery, management and protection processes of these different content types in an integrated and harmonized way, to be implemented in a manner that is entirely transparent to the many different users of multimedia services.
The need for technological solutions to these challenges is motivating the MPEG-21 Multimedia Framework initiative that aims to enable the transparent and augmented use of multimedia resources across a wide range of networks and devices..
For a detailed examination and description of the requirements for the MPEG-21 multimedia framework readers are advised to refer to the MPEG-21 Technical Report, “Vision, Technologies and Strategy”[1].
Based on the above observations, MPEG-21 aims at defining a normative open framework for multimedia delivery and consumption for use by all the players in the delivery and consumption chain. This open framework will provide content creators, producers, distributors and service providers with equal opportunities in the MPEG-21 enabled open market. This will also be to the benefit of the content consumer providing them access to a large variety of content in an interoperable manner
MPEG-21 is based on two essential concepts: the definition of a fundamental unit of distribution and transaction (the Digital Item) and the concept of Users interacting with Digital Items. The Digital Items can be considered the “what” of the Multimedia Framework (e.g., a video collection, a music album) and the Users can be considered the “who” of the Multimedia Framework.
The goal of MPEG-21 can thus be rephrased to: defining the technology needed to support Users to exchange, access, consume, trade and otherwise manipulate Digital Items in an efficient, transparent and interoperable way.
During the MPEG-21 standardization process, Calls for Proposals based upon requirements have been and continue to be issued by MPEG. Eventually the responses to the calls result in different parts of the MPEG-21 standard (i.e. ISO/IEC 21000-N) after intensive discussion, consultation and harmonization efforts between MPEG experts, representatives of industry and other standards bodies.
MPEG-21 identifies and defines the mechanisms and elements needed to support the multimedia delivery chain as described above as well as the relationships between and the operations supported by them. Within the parts of MPEG-21, these elements are elaborated by defining the syntax and semantics of their characteristics, such as interfaces to the elements.
The Technical Report sets out the User requirements in the multimedia framework. A User is any entity that interacts in the MPEG-21 environment or makes use of a Digital Item. Such Users include individuals, consumers, communities, organisations, corporations, consortia, governments and other standards bodies and initiatives around the world. Users are identified specifically by their relationship to another User for a certain interaction. From a purely technical perspective, MPEG-21 makes no distinction between a “content provider” and a “consumer”—both are Users. A single entity may use content in many ways (publish, deliver, consume, etc.), and so all parties interacting within MPEG-21 are categorised as Users equally. However, a User may assume specific or even unique rights and responsibilities according to their interaction with other Users within MPEG-21.
At its most basic level, MPEG-21 provides a framework in which one User interacts with another User and the object of that interaction is a Digital Item commonly called content. Some such interactions are creating content, providing content, archiving content, rating content, enhancing and delivering content, aggregating content, delivering content, syndicating content, retail selling of content, consuming content, subscribing to content, regulating content, facilitating transactions that occur from any of the above, and regulating transactions that occur from any of the above. Any of these are “uses” of MPEG-21, and the parties involved are Users.
Within any system (such as MPEG-21) that proposes to facilitate a wide range of actions involving “Digital Items”, there is a need for a very precise description for defining exactly what constitutes such an “item”. Clearly there are many kinds of content, and probably just as many possible ways of describing it to reflect its context of use. This presents a strong challenge to lay out a powerful and flexible model for Digital Items which can accommodate the myriad forms that content can take (and the new forms it will assume in the future). Such a model is only truly useful if it yields a format that can be used to represent any Digital Items defined within the model unambiguously and communicate them, and information about them, successfully. The Digital Item Declaration specification (part 2 of ISO/IEC 21000) provides such flexibility for representing Digital Items.
An Example:
Consider a simple “web page” as a Digital Item. A web page typically consists of an HTML document with embedded “links” to (or dependencies on) various image files (e.g., JPEGs and GIFs), and possibly some layout information (e.g., Style Sheets). In this simple case, it is a straightforward exercise to inspect the HTML document and deduce that this Digital Item consists of the HTML document itself, plus all of the other resources upon which it depends.
Now let’s modify the example to assume that the “web page” contains some custom scripted logic (e.g., JavaScript, etc.) to determine the preferred language of the viewer (among some predefined set of choices) and to either build/display the page in that language, or to revert to a default choice if the preferred translation is not available.
The key point in this modified example is that the presence of the language logic clouds the question of exactly what constitutes this Digital Item now and how this can be unambiguously determined.
The first problem is one of actually determining all of the dependencies. The addition of the scripting code changes the declarative “links” of the simple web page into links that can be (in the general case) determined only by running the embedded script on a specific platform. This could still work as a method of deducing the structure of the Digital Item, assuming that the author intended each translated “version” of the web page to be a separate and distinct Digital Item.
This assumption highlights the second problem: it is ambiguous whether the author actually intends for each translation of the page to be a standalone Digital Item, or whether the intention is for the Digital Item to consist of the page with the language choice left unresolved. If the latter is the case, it makes it impossible to deduce the exact set of resources that this Digital Item consists of which leads back to the first problem.
The problem stated above is addressed by the Digital Item Declaration. A Digital Item Declaration (DID) is a document that specifies the makeup, structure and organisation of a Digital Item. Part 2 of MPEG-21 contains the DID Specification.
MPEG-21 has established a work plan for future standardisation. Nine parts of standardisation within the Multimedia Framework have already started (note that the Technical Report is part 1 of the MPEG-21 Standard). These are elaborated in the subsections below.
In addition to these specifications, MPEG maintains a document containing the consolidated requirements for MPEG-21[2]. This document will continue to evolve during the development of the various parts of MPEG-21 to reflect new requirements and changes to existing requirements.
A Technical Report has been written to describe the multimedia framework and its architectural elements together with the functional requirements for their specification that was formally approved in September 2001.
The title “Vision, Technologies and Strategy” has been chosen to reflect the fundamental purpose of the Technical Report. This is to:
The purpose of the Digital Item Declaration (DID) specification is to describe a set of abstract terms and concepts to form a useful model for defining Digital Items. Within this model, a Digital Item is the digital representation of “a work”, and as such, it is the thing that is acted upon (managed, described, exchanged, collected, etc.) within the model. The goal of this model is to be as flexible and general as possible, while providing for the “hooks” that enable higher level functionality. This, in turn, will allow the model to serve as a key foundation in the building of higher level models in other MPEG-21 elements (such as Identification & Description or IPMP). This model specifically does not define a language in and of itself. Instead, the model helps to provide a common set of abstract concepts and terms that can be used to define such a scheme, or to perform mappings between existing schemes capable of Digital Item Declaration, for comparison purposes.
The DID technology is described in three normative sections:
Model: The Digital Item Declaration Model describes a set of abstract terms and concepts to form a useful model for defining Digital Items. Within this model, a Digital Item is the digital representation of “a work”, and as such, it is the thing that is acted upon (managed, described, exchanged, collected, etc.) within the model.
Representation: Normative description of the syntax and semantics of each of the Digital Item Declaration elements, as represented in XML. This section also contains some non-normative examples for illustrative purposes.
Schema: Normative XML schema comprising the entire grammar of the Digital Item Declaration representation in XML.
The following sections describe the semantic “meaning” of the principle elements of the Digital Item Declaration Model. Please note that in the descriptions below, the defined elements in italics are intended to be unambiguous terms within this model.
A container is a structure that allows items and/or containers to be grouped. These groupings of items and/or containers can be used to form logical packages (for transport or exchange) or logical shelves (for organization). Descriptors allow for the “labeling” of containers with information that is appropriate for the purpose of the grouping (e.g. delivery instructions for a package, or category information for a shelf).
It should be noted that a container itself is not an item; containers are groupings of items and/or containers.
An item is a grouping of sub-items and/or components that are bound to relevant descriptors. Descriptors contain information about the item, as a representation of a work. Items may contain choices, which allow them to be customized or configured. Items may be conditional (on predicates asserted by selections defined in the choices). An item that contains no sub-items can be considered an entity -- a logically indivisible work. An item that does contain sub-items can be considered a compilation -- a work composed of potentially independent sub-parts. Items may also contain annotations to their sub-parts.
The relationship between items and Digital Items (as defined in ISO/IEC 21000-1:2001, MPEG-21 Vision, Technologies and Strategy) could be stated as follows: items are declarative representations of Digital Items.
A component is the binding of a resource to all of its relevant descriptors. These descriptors are information related to all or part of the specific resource instance. Such descriptors will typically contain control or structural information about the resource (such as bit rate, character set, start points or encryption information) but not information describing the “content” within.
It should be noted that a component itself is not an item; components are building blocks of items.
An anchor binds descriptors to a fragment, which corresponds to a specific location or range within a resource.
A descriptor associates information with the enclosing element. This information may be a component (such as a thumbnail of an image, or a text component), or a textual statement.
A condition describes the enclosing element as being optional, and links it to the selection(s) that affect its inclusion. Multiple predicates within a condition are combined as a conjunction (an AND relationship). Any predicate can be negated within a condition. Multiple conditions associated with a given element are combined as a disjunction (an OR relationship) when determining whether to include the element.
A choice describes a set of related selections that can affect the configuration of an item. The selections within a choice are either exclusive (choose exactly one) or inclusive (choose any number, including all or none).
A selection describes a specific decision that will affect one or more conditions somewhere within an item. If the selection is chosen, its predicate becomes true; if it is not chosen, its predicate becomes false; if it is left unresolved, its predicate is undecided.
An annotation describes a set of information about another identified element of the model without altering or adding to that element. The information can take the form of assertions, descriptors, and anchors.
An assertion defines a full or partially configured state of a choice by asserting true, false or undecided values for some number of predicates associated with the selections for that choice.
A resource is an individually identifiable asset such as a video or audio clip, an image, or a textual asset. A resource may also potentially be a physical object. All resources must be locatable via an unambiguous address.
A fragment unambiguously designates a specific point or range within a resource. Fragment may be resource type specific.
A statement is a literal textual value that contains information, but not an asset. Examples of likely statements include descriptive, control, revision tracking or identifying information.
A predicate is an unambiguously identifiable Declaration that can be true, false or undecided.
Figure 1 is an example showing the most important elements within this model, how they are related, and as such, the hierarchical structure of the Digital Item Declaration Model.
Figure 1
- Relationship of the principle elements within the Digital Identification Declaration
Model
The scope of the Digital Item Identification (DII) specification includes:
The DII specification does not specifynew identification systems for the content elements for which identification and description schemes already exist and are in use (e.g., ISO/IEC 21000-3 does not attempt to replace the ISRC (as defined in ISO 3901) for sound recordings but allows ISRCs to be used within MPEG-21).
Identifiers covered by this specification can be associated with Digital Items by including them in a specific place in the Digital Item Declaration. This place is the Statement element. Examples of likely Statements include descriptive, control, revision tracking and/or identifying information
Error! Reference source not found. below shows this relationship. The shaded boxes are subject of the DII specification while the bold boxes are defined in the DID specification:
Figure 2 – Relationship between Digital Item Declaration and Digital Item Identification
Several elements within a Digital Item Declaration can have zero, one or more Descriptors (as specified in part 2). Each Descriptor may contain one Statement which can contain one identifier relating to the parent element of the Statement. In Error! Reference source not found. above, the two statements shown are used to identify a Component (left hand side of the diagram) and an Item (right hand side of the diagram).
Digital Items and their parts within the MPEG-21 Framework are identified by encapsulating Uniform Resource Identifiers into the Identification DS. A Uniform Resource Identifier (URI) is a compact string of characters for identifying an abstract or physical resource, where a resource is defined as "anything that has identity".
The requirement that an MPEG-21 Digital Item Identifier be a URI is also consistent with the statement that the MPEG-21 identifier may be a Uniform Resource Locator (URL). The term URL refers to a specific subset of URI that is in use today as pointers to information on the Internet; it allows for long-term to short-term persistence depending on the business case.
ISO/IEC-21000-3 allows any identifier in the form of a URI to be used as identifiers for Digital Items (and parts thereof). The specification also provides the ability the register identification systems through the process of a Registration Authority. Requirements for this Registration Authority are available in an Annex to the specification and ISO is in the process of appointing this Registration Authority. The figure below shows how a music album – and its parts can be identified through DII.
Figure 3 – Example: Metadata and Identifiers within an MPEG-21 Music Album
In some cases, it may be necessary to use an automated resolution[3] system to retrieve the Digital Item (or parts thereof) or information related to a Digital Item from a server (e.g., in the case of an interactive on-line content delivery system). An example of such a resolution system can be found in an informative annex to the specification.
As different Users of MPEG-21 may have different schemes to describe "their" content, it is necessary for MPEG-21 DII to allow differentiating such different schemes. MPEG-21 DII utilises the XML mechanism of namespaces to do this.
Different parts of MPEG-21 will define different types of Digital Item. For example, Digital Item Adaptation (DIA) defines a "Context Digital Item" (XDI) in addition to the "Content Digital Item" (CDI). While CDIs contain resources such as MP3 files or MPEG-2 Video streams, XDIs contain information on the context in which a CDI will be used (more information in XDIs can be found in the section on DIA below).
DII provides a mechanism to allow an MPEG-21 Terminal to distinguish between these different Digital Item Types by placing a URI inside a Type tag as the sole child element of a Statement that shall appear as a child element of a Descriptor that shall appear as a child element of an Item. The syntax of the tag will be defined by subsequent parts of MPEG-21. If no such Type tag is present, the Digital Item is deemed to be a Content Digital Item.
The 4th part of MPEG-21 will define an interoperable framework for Intellectual Property Management and Protection (IPMP). Fairly soon after MPEG-4, with its IPMP hooks, became an International Standard, concerns were voiced within MPEG that many similar devices and players might be built by different manufacturers, all MPEG-4, but many of them not interworking. This is why MPEG decided to start a new project on more interoperable IPMP systems and tools. The project includes standardized ways of retrieving IPMP tools from remote locations, exchanging messages between IPMP tools and between these tools and the terminal. It also addresses authentication of IPMP tools, and has provisions for integrating Rights Expressions according to the Rights Data Dictionary and the Rights Expression Language.
Efforts are currently ongoing to define the requirements for the management and protection of intellectual property in the various parts of the MPEG-21 standard currently under development.
Following an extensive requirements gathering process, which started in January 2001, MPEG issued a Call for Proposals during its July meeting in Sydney for a Rights Data Dictionary and a Rights Expression Language. Responses to this Call were processed during the December meeting in Pattaya and the evaluation process established an approach for going forward with the development of a specification, expected to be an International Standard in late 2003.
A Rights Expression Language is seen as a machine-readable language that can declare rights and permissions using the terms as defined in the Rights Data Dictionary.
The REL is intended to provide flexible, interoperable mechanisms to support transparent and augmented use of digital resources in publishing, distributing, and consuming of digital movies, digital music, electronic books, broadcasting, interactive games, computer software and other creations in digital form, in a way that protects digital content and honours the rights, conditions, and fees specified for digital contents. It is also intended to support specification of access and use controls for digital content in cases where financial exchange is not part of the terms of use, and to support exchange of sensitive or private digital content.
The Rights Expression Language is also intended to provide a flexible interoperable mechanism to ensure personal data is processed in accordance with individual rights and to meet the requirement for Users to be able to express their rights and interests in a way that addresses issues of privacy and use of personal data.
A standard Rights Expression Language should be able to support guaranteed end-to-end interoperability, consistency and reliability between different systems and services. To do so, it must offer richness and extensibility in declaring rights, conditions and obligations, ease and persistence in identifying and associating these with digital contents, and flexibility in supporting multiple usage/business models.
MPEG REL adopts a simple and extensible data model for many of its key concepts and elements.
The MPEG REL data model for a rights expression consists of four basic entities and the relationship among those entities. This basic relationship is defined by the MPEG REL assertion “grant”. Structurally, an MPEG REL grant consists of the following:
Figure 4 – The REL Data Model
A principal encapsulates the identification of principals to whom rights are granted. Each principal identifies exactly one party. In contrast, a set of principals, such as the universe of everyone, is not a principal.
A principal denotes the party that it identifies by information unique to that individual. Usefully, this is information that has some associated authentication mechanism by which the principal can prove its identity. The Principal type supports the following identification technologies:
A right is the "verb" that a principal can be granted to exercise against some resource under some condition. Typically, a right specifies an action (or activity) or a class of actions that a principal may perform on or using the associated resource.
MPEG REL provides a right element to encapsulate information about rights and provides a set of commonly used, specific rights, notably rights relating to other rights, such as issue, revoke and obtain. Extensions to MPEG REL could define rights appropriate to using specific types of resource. For instance, the MPEG REL content extension defines rights appropriate to using digital works (e.g., play and print).
A resource is the "object" to which a principal can be granted a right. A resource can be a digital work (such as an e-book, an audio or video file, or an image), a service (such as an email service, or B2B transaction service), or even a piece of information that can be owned by a principal (such as a name or an email address).
MPEG REL provides mechanisms to encapsulate the information necessary to identify and use a particular resource or resources that match a certain pattern. The latter allows identification of a collection of resources with some common characteristics. Extensions to MPEG REL could define resources appropriate to specific business models and technical applications.
A condition specifies the terms, conditions and obligations under which rights can be exercised. A simple condition is a time interval within which a right can be exercised. A slightly complicated condition is to require the existence of a valid, prerequisite right that has been issued to some principal. Using this mechanism, the eligibility to exercise one right can become dependent on the eligibility to exercise other rights.
MPEG REL defines a condition element to encapsulate information about conditions and some very basic conditions. Extensions to MPEG REL could define conditions appropriate to specific distribution and usage models. For instance, the MPEG REL content extension defines conditions appropriate to using digital works (e.g., watermark, destination, and renderer).
The entities in the MPEG REL data model: “principal”, “right”, “resource”, and “condition”, can correspond to (but are not necessarily equivalent to) to “user” including “terminal”, “right”, “digital item” and “condition” in the MPEG-21 terminology.
Since MPEG REL is defined using the XML Schema recommendation from W3C, its element model follows the standard one that relates its elements to other classes of elements. For example, the “grant” element is related to its child elements, “principal”, “right”, “resource” and “condition”.
Following the evaluation of submissions in response to a Call for Proposals the specification of a Rights Data Dictionary (RDD) began in December 2001. The working draft was refined at the following three meetings and a Committee Draft published in July 2002. The following points summarise the scope of this specification:
The Rights Data Dictionary (RDD) comprises a set of clear, consistent, structured, integrated and uniquely identified Terms to support the MPEG-21 Rights Expression Language.
The structure of the dictionary is specified, along with a methodology for creating the dictionary. The means by which further Terms may be defined is also explained.
The Dictionary is a prescriptive Dictionary, in the sense that it defines a single meaning for a Term represented by a particular RDD name (or Headword), but it is also inclusive in that it recognizes the prescription of other Headwords and definitions by other Authorities and incorporates them through mappings. The RDD also supports the circumstance that the same name may have different meanings under different Authorities. The RDD specification has audit provisions so that additions, amendments and deletions to Terms and their attributes can be tracked.
RDD recognises legal definitions as and only as Terms from other Authorities that can be mapped into the RDD. Therefore Terms that are directly authorized by RDD neither define nor prescribe intellectual property rights or other legal entities.
As well as providing definitions of Terms for use in the REL, the RDD specification is designed to support the mapping and transformation of metadata from the terminology of one namespace (or Authority) into that of another namespace (or Authority) in an automated or partially-automated way, with the minimum ambiguity or loss of semantic integrity.
The dictionary is based on a logical model, the Context Model, which is the basis of the dictionary ontology. The model is described in detail in the specification. It is based on the use of verbs which are contextualised so that a dictionary created with it can be as extensible and granular are required.
The goal of the Terminals and Networks key element is to achieve interoperable transparent access to (distributed) advanced multimedia content by shielding users from network and terminal installation, management and implementation issues. This will enable the provision of network and terminal resources on demand to form user communities where multimedia content can be created and shared, always with the agreed/contracted quality, reliability and flexibility, allowing the multimedia applications to connect diverse sets of Users, such that the quality of the user experience will be guaranteed.
Towards this goal the adaptation of Digital Items is required. This concept is illustrated in Figure . As shown in this conceptual architecture, a Digital Item is subject to a resource adaptation engine, as well as a descriptor adaptation engine, which produce together the adapted Digital Item.
It is important to emphasise that the adaptation engines themselves are non-normative tools of Digital Item Adaptation. However, descriptions and format-independent mechanisms that provide support for Digital Item Adaptation in terms of resource adaptation, descriptor adaptation, and/or Quality of Service management are within the scope of the requirements.
Figure 5 – Concept of Digital Item Adaptation
In May 2002, a number of responses to the Call for Proposals on MPEG-21 Digital Item Adaptation were received. Based on the evaluation of these proposals, a Working Draft has been produced. The specific items targeted for standardization are outlined below.
User Characteristics: Description tools that specify the characteristics of a User, including preferences to particular media resources, preferences regarding the presentation of media resources, and the mobility characteristics of a User. Additionally, description tools to support the accessibility of Digital Items to various users, including those with audio-visual impairments, are being considered.
Terminal Capabilities: Description tools that specify the capability of terminals, including media resource encoding and decoding capability, hardware, software and system-related specifications, as well as communication protocols that are supported by the terminal.
Network Characteristics: Description tools that specify the capabilities and conditions of a network, including bandwidth utilization, delay and error characteristics.
Natural Environment Characteristics: Description tools that specify the location and time of a User in a given environment, as well as audio-visual characteristics of the natural environment, which may include auditory noise levels and illumination properties.
Resource Adaptability: Tools to assist with the adaptation of resources including the adaptation of binary resources in a generic way and metadata adaptation. Additionally, tools that assist in making resource-complexity trade-offs and making associations between descriptions and resource characteristics for Quality of Service are targeted.
Session Mobility: Tools that specify how to transfer the state of Digital Items from one User to another. Specifically, the capture, transfer and reconstruction of state information will be specified.
The part of MPEG-21 that has most recently been identified as a candidate for specification is Reference Software. Reference software will form the first of what is envisaged to be a number of systems-related specifications in MPEG-21. Other candidates for specification are likely to include a binary representation of the Digital Item Declaration and an MPEG-21 file format.
The development of the Reference Software will be based on the requirements that have been defined for an architecture for processing Digital Items.
An MPEG-21 Digital Item can be a complex collection of information. Both still and dynamic media (e.g. images and movies) can be included, as well as Digital Item information, meta-data, layout information, and so on. It can include both textual data (e.g. XML) and binary data (e.g. an MPEG-4 presentation or a still picture). For this reason, the MPEG-21 file format will inherit several concepts from MP4, in order to make ‘multi-purpose’ files possible. A dual-purpose MP4 and MP21 file, for example, would play just the MPEG-4 data on an MP4 player, and would play the MPEG-21 data on an MP21 player.
Requirements have been established with respect to the file format and work on the WD has been initiated.
The following recommendations for WG11 standardisation activities with respect to the MPEG-21 multimedia framework are proposed:
As a logical extension to the ongoing specification of the Digital Item Declaration and Digital Item Identification, MPEG intends to consider the requirements for the persistent association of identification and description with content. MPEG experts wish to define the functional requirements for the persistent association of identification and description with content and how this interacts with the rest of the MPEG-21 architecture.
The term persistent association is used to categorise all the techniques for managing identification and description with content[4]. This will include the carriage of identifiers within the context of different content file and transport formats, including file headers and embedded into content as a watermark. It also encompasses the ability for identifiers associated with content to be protected against their unauthorised removal and modification.
The Technical Report documents the following high-level requirements for persistent association of identification and description with Digital Items:
A framework that supports Digital Item identification and description shall make it possible to persistently associate identifiers and descriptors with media resources. This includes that the association of media resources with identifiers and/or descriptors may need to be authenticated;
The environment for the storage of identifiers and descriptions associated with Digital Items shall fulfil the following requirements in a standardised way:
It shall be possible that descriptors contain binary and/or textual information; (e.g., HTML, AAC, JPEG, etc);
It shall be possible to associate descriptors with those elements within a hierarchical Digital Item that contain Resources;
It shall be possible to store, within the Digital Item, a reference to descriptive metadata regardless of its location;
A framework that supports Digital Item identification and description shall allow for locating Digital Items from its descriptions and vice versa. Note that this does not necessarily imply that they are bundled together;
A framework that supports Digital Item identification and description shall provide an efficient resolution system for related Digital Items, such as different versions, different manifestations of the same Digital Item, different names of the same Digital Item (e.g. aliases, nick names), their elements, etc.;
A framework that supports Digital Item identification and description should provide, provide for, support, adopt, reference or integrate mechanisms to define levels of access to descriptions within the rights expressions, such as the discovery of usage rules[5].
Subsequent to the completion of the Technical Report a new activity called Digital Item Adaptation[6] has commenced (see section 5.7 ). A high-level requirement for persistent association related to Digital Item Adaptation is as follows:
Digital Item Adaptation has been identified as one essential aspect of Terminals and Networks that will provide tools to support resource adaptation, descriptor (‘metadata’) adaptation, and Quality of Service management. As part of this work item, a description of usage environments, including terminal and network characteristics, as well as information describing user preferences is required. A requirement exists for the persistent association of such descriptions to Digital Items and their Resources.
While MPEG has identified the need for such persistent association of identification and description, the requirements are not yet well enough understood to decide what MPEG might consider necessary to standardise. Hence, MPEG is now asking interested parties and experts to submit requirements for this technology to MPEG, and invites these parties and experts to take part in the work.
MPEG seeks these inputs by its 61st meeting, in July 2002. It will be used by MPEG to plan future work in assessing the ability of existing specifications (of both MPEG and others) to meet these requirements and in planning future specification. Further timing will be decided when the requirements are better understood.
The goal of the ‘Content Representation’ item has as its goal to provide, adopt or integrate content representation technologies able to efficiently represent MPEG-21 content, in a scalable and error resilient way. The content representation of the media resources shall be synchronisable and multiplexed and allow interaction.
The encoding of XML defined by the MPEG-7 specification part 1 will be extended to fulfil this requirement. The call for contributions for these extensions is defined in N4715 under the item " Binary representation of MPEG-21 Digital Item Declaration".
This item should allow the Multimedia Framework to optimally use existing and ongoing developments of media coders in MPEG.
MPEG-21 Event Reporting should standardise metrics and interfaces for performance of all reportable events in MPEG-21 and provide a means of capturing and containing these metrics and interfaces that refers to identified Digital Items, environments, processes, transactions and Users.
Such metrics and interfaces will enable Users to understand precisely the performance of all reportable events within the framework. “Event Reporting” must provide Users a means of acting on specific interactions, as well as enabling a vast set of out-of-scope processes, frameworks and models to interoperate with MPEG-21.
The following table sets out the current timetable for MPEG-21standardisation:
Part |
Title |
CfP |
WD |
CD PDAM PDTR |
FCD FPDAM |
FDIS FDAM DTR DCOR |
IS AMD TR COR |
MPEG-21 |
|||||||
1 |
Vision, Technologies and Strategy |
Published |
|||||
2 |
Digital Item Declaration |
02/12 |
|||||
3 |
Digital Item Identification |
02/12 |
|||||
4 |
Intellectual Property Management and Protection |
03/03 |
03/10 |
03/12 |
04/07 |
04/09 |
|
5 |
Rights Expression Language |
02/07 |
02/12 |
03/07 |
03/09 |
||
6 |
Rights Data Dictionary |
02/07 |
02/12 |
03/07 |
03/09 |
||
7 |
Digital Item Adaptation |
02/05 |
02/12 |
03/03 |
03/07 |
03/09 |
|
8 |
Reference Software |
02/12 |
03/07 |
03/10 |
04/03 |
04/07 |
|
9 |
File Format |
02/07 |
02/12 |
03/03 |
03/07 |
03/09 |
Also see: http://www.itscj.ipsj.or.jp/sc29/29w42911.htm#MPEG-21
[1] ISO/IEC TR 21000-1:2001(E) Part 1: Vision, Technologies and Strategy, freely downloadable from http://www.iso.ch/iso/en/ittf/PubliclyAvailableStandards
[2] The current version of the MPEG-21 Requirements document can be found at http://mpeg.telecomitalialab.com/working_documents.htm
[3] The act of submitting an identifier to a network service and receiving in return one or more pieces of some information (which includes resources, descriptions, another identifier, Digital Item, etc.) related to the identifier
[4] The term ‘content’ is widely used by many industries that apply various meanings. In the current specifications of MPEG-21 the term ‘content’ has therefore been replaced by ‘Resource’ (for a definition see section 5.2.11 ).
[5] More information can be found in the RDD and REL Working Draft specifications (that will become Parts 5 and 6 that are attached to this Call for Requirements.
[6] Digital Item Adaptation is the subject of a Call for Proposals which is attached for information to this CfR