Xem mẫu

20 MPEG System — Video, Audio, and Data Multiplexing In this chapter, we present the methods and standards requiring how to multiplex and synchronize the MPEG-coded video, audio, and other data into a single bitstream or multiple bitstreams for storage and transmission. 20.1 INTRODUCTION ISO/IEC MPEG has completed work on the ISO/IEC 11172 and 13818 standards known as MPEG-1 and MPEG-2, respectively, which deal with the coding of digital audio and video signals. Currently, ISO/IEC is working on ISO/IEC 14496 known as MPEG-4 that is object-based generic coding for multimedia applications. As mentioned in the previous chapters, the MPEG-1, 2, and 4 standards are designed as generic standards and as such are suitable for use in a wide range of audiovisual applications. The coding part of the standards convert the digital visual, audio, and data signals to the compressed formats that are represented as binary bits. The task of the MPEG system is focused on multiplexing and synchronizing the coded audio, video, and data into a single bitstream or multiple bitstreams. In other words, the digital compressed video, audio, and data are all first represented as binary formats which are referred to as bitstreams, and then the function of system is to mix the bitstreams from video, audio, and data together. For this purpose, several issues have to be addressed by the system part of the standard: · Distinguishing different data, such as audio, video, or other data; · Allocating bandwidth during muxing; · Reallocating or decoding the different data during demuxing; · Protecting the bitstreams in error-prone media and detecting the errors; · Dynamically multiplexing several bitstreams. Additional requirements for the system should include extensibility issues, such as: · New service extensions should be possible; · Existing decoders should recognize and ignore data they cannot understand; · The syntax should have extension capacity. It should also be noted that all system-timing signals are included in the bitstream. This is the big difference between digital systems and traditional analog systems in which the timing signals are transmitted separately. In this chapter, we will introduce the concept of systems and give detailed explanations for existing standards such as MPEG-2. However, we will not go through the standards page by page to explain the syntax, we will pay more attention to the core parts of the standard and the parts which always cause confusion during implementation. One of the key issues is system timing. For MPEG-4, we will give a presentation of the current status of the system part of the standards. © 2000 by CRC Press LLC FIGURE 20.1 Simplified overview of system layer scope. (From ISO/IEC 13818-1, 1996. With permission.) 20.2 MPEG-2 SYSTEM The MPEG-2 system standard is also referred to as ITU-T Rec. H.222.0/ISO/IEC 13818-1 (ISO/IEC, 1996). The ISO document gives a very detailed description of this standard. A simplified overview of this system is shown in Figure 20.1. The MPEG-2 system coding is specified in two forms: the transport stream and the program stream. Each is optimized for a different set of applications. The audio and video data are first encoded by an audio and a video encoder, respectively. The coded data are the compressed bitstreams, which follow the syntax rules specified by the video-coding standard 13818-2 and audio-coding standard 13818-3. The compressed audio and video bitstreams are then packetized to the packetized elementary streams (PES). The video PES and audio PES are coded by system coding to the transport stream or program stream according to the requirements of the application. The system coding provides a coding syntax which is necessary and sufficient to synchronize the decoding and presentation of the video and audio information; at the same time it also has to ensure that data buffers in the decoders do not overflow and underflow. Of course, buffer regulation is also considered by the buffer control or rate control mechanism in the encoder. The video, audio, and data information are multiplexed according to the system syntax by inserting time stamps for decoding, presenting, and delivering the coded audio, video, and other data. It should be noted that both the program stream and the transport stream are packet-oriented multiplexing. Before we explain these streams, we first give a set of parameter definitions used in the system documents. Then, we describe the overall picture regarding the basic multiplexing approach for single video and audio elementary streams. 20.2.1 MAJOR TECHNICAL DEFINITIONS IN THE MPEG-2 SYSTEM DOCUMENT In this section, the technical definitions that are often used in the system document are provided. First, the major packet- and stream-related definitions are given. Access unit: A coded representation of a presentation unit. In the case of audio, an access unit is the coded representation of audio frame. In the case of video, an access unit indicates all the coded data for a picture, and any stuffing that follows it, up to but not including the start of the next access unit. In other words, the access unit begins with the first byte of the first start code. Except for the end of sequence, all bytes between the last byte of the coded picture and the sequence end code belong to the access unit. DSM-CC: Digital storage media command and control. Elementary stream (ES): A generic term for one of the coded video, coded audio, or other coded bitstreams in PES packets. One elementary stream is carried in a sequence of PES © 2000 by CRC Press LLC packets with one and only one stream identification. This implies that one elementary stream can only carry the same type of data, such as audio or video. Packet: A packet consists of a header followed by a number of contiguous bytes from an elementary data stream. Packet identification (PID): A unique integer value used to associate elementary streams of a program in a single- or multiprogram transport stream. It is a 13-bit field, which indicates the type of data stored in the packet payload. PES packet: The data structure used to carry elementary stream data. It contains a PES packet header followed by PES packet payload. PES stream: A PES stream consists of PES packets, all of whose payloads consist of data from a single elementary steam, and all of which have the same stream identification. Specific semantic constraints apply. PES packet header: The leading fields in the PES packet up to and not including the PES packet data byte fields. Its function will be explained in the section on syntax description. System target decoder (STD): A hypothetical reference model of a decoding process used to describe the semantics of the MPEG-2 system-multiplexed bitstream. Program-specific information (PSI): PSI includes normal data that will be used for demul-tiplexing of programs in the transport stream by decoders. One case of PSI, the nonman-datory network information table, is privately defined. System header: The leading fields of program stream packets. Transport stream packet header: The leading fields of program stream packets. The following definitions are related to timing information: Time stamp: A term that indicates the time of a specific action such as the arrival of a byte or the presentation of a presentation unit. System clock reference (SCR): A Time stamp in the program stream from which decoder timing is derived. Elementary stream clock reference (ESCR): A time stamp in the PES stream from which decoders of the PES stream may derive timing information. Decoding time stamp (DTS): A time stamp that may be presented in a PES packet header used to indicate the time when an access unit is decoded in the system target decoder. Program clock reference (PCR): A time stamp in the transport stream from which decoder timing is derived. Presentation time stamp (PTS): A time stamp that may be presented in the PES packet header used to indicate the time that a presentation unit is presented in the system target decoder. 20.2.2 TRANSPORT STREAMS The transport stream is a stream definition that is designed for communicating or storing one or more programs of coded video, audio, and other kinds of data in lossy or noisy environments where significant errors may occur. A transport stream combines one or more programs with one or more time bases into a single stream. However, there are some difficulties with constructing and delivering a transport stream containing multiple programs with independent time bases such that the overall bit rate is variable. As in other standards, the transport stream may be constructed by any method that results in a valid stream. In other words, the standards just specify the system coding syntax. In this way, all compliant decoders can decode bitstreams generated according to the standard syntax. However, the standard does not specify how the encoder generates the bitstreams. It is possible to generate transport streams containing one or more programs from elementary coded data streams, from program streams, or from other transport streams, which may themselves contain © 2000 by CRC Press LLC FIGURE 20.2 Example of transport demultiplexing and decoding. (From ISO/IEC 13818-1, 1996. With permission.) one or more programs. An important feature of a transport stream is that the transport stream is designed in such a way that makes the following operations possible with minimum effort. These operations include several transcoding requirements, including the following: · Retrieve the coded data from one program within the transport stream, decode it, and present the decoded results. In this operation, the transport stream is directly demulti-plexed and decoded. The data in the transport stream are constructed in two layers: a system layer and a compression layer. The system decoder decodes the transport streams and demultiplexes them to the compressed video and audio streams that are further decoded to the video and audio data by the video decoder and the audio decoder, respectively. It should be noted that nonaudio/video data is also allowed. The function of the transport decoder includes demultiplexing, depacketization, and other functions such as error detection, which will be explained in detail later. This procedure is shown in Figure 20.2. · Extract the transport stream packets from one program within the transport stream and produce as the output a new transport stream that contains only that one program. This operation can be seen as system-layer transcoding that converts a transport stream containing multiple programs to a transport stream containing only a single program. In this case, the remultiplexing operation may need the correction of PCR values to account for changes in the PCR locations in the bitstream. · Extract the transport stream packets of one or more programs from one or more transport streams and produce as output of a new transport stream. This is another kind of transcoding that converts selected programs of one transport stream to a different one. · Extract the contents of one program from the transport stream and produce as output another program stream. This is a transcoding that converts the transport program to a program stream for certain applications. · Convert a program stream to a transport stream that can be used in a lossy communication environment. To answer the question of how to define the transport stream and then make the above transcoding simpler and more efficient, we will begin by describing the technical detail of the systems specification in the following section. 20.2.2.1 Structure of Transport Streams As described earlier, the task of the transport stream coding layer is to allow one or more programs to be combined into a single stream. Data from each elementary stream are multiplexed together with timing information, which is used for synchronization and presentation of the elementary © 2000 by CRC Press LLC FIGURE 20.3 Structure of transport stream containing only PES packets. (From ISO/IEC 13818-1, 1996. With permission.) FIGURE 20.4 Structure of transport stream containing both PES packets and PSI packets. stream during decoding. Therefore, the transport stream consists of one or more programs such as audio, video, and data elementary stream access units. The transport stream structure is a layered structure. All the bits in the transport stream are packetized to the transport packets. The size of transport packet is chosen to be 188 bytes, among which 4 bytes are used as the transport stream packet header. In the first layer, the header of the transport packets indicates whether or not the transport packet has an adaptation field. If there is no adaptation field, the transport payload may consist of only PES packets or consist of both PES packets and PSI packets. Figure 20.3 illustrates the case of containing PES packets only. If the transport stream carries both PES and PSI packets, then the structure of transport stream is as shown in Figure 20.4 would result. If the transport stream packet header indicates that the transport stream packet includes the adaptation field, then the construct is as shown in Figure 20.5. In Figure 20.5, the appearance of the optional field depends on the flag settings. The function of adaptation field will be explained in the syntax section. Before we go ahead, however, we should give a little explanation regarding the size of the transport stream packet. More specifically, why is a packet size of 188 bytes chosen? Actually, there are several reasons. First, the transport packet size needs to be large enough so that the overhead due to the transport headers is not too significant. Second, the size should not be so large that the packet-based error correction code becomes inefficient. Finally, the size 188 bytes is also compatible with ATM packet size which is 47 bytes; one transport stream packet is equal to four ATM packets. So the size of 188 bytes is not a theoretical solution but a practical and compromised solution. © 2000 by CRC Press LLC ... - tailieumienphi.vn
nguon tai.lieu . vn