Xem mẫu

Chapter 3 Computer Architectures for Multimedia and Video Analysis Edmundo Sa´ez, University of Cordoba, Spain Jose´ Gonza´lez-Mora, University of Malaga, Spain Nicola´s Guil, University of Malaga, Spain Jose´ I. Benavides, University of Cordoba, Spain Emilio L. Zapata, University of Malaga, Spain Contents 3.1 Introduction ............................................................ 44 3.2 The Computing Architectures ........................................... 46 3.2.1 Multimedia Extensions .......................................... 46 3.2.2 Graphics Processing Units ....................................... 47 3.3 Temporal Video Segmentation ........................................... 49 3.3.1 Temporal Video Segmentation Algorithm ......................... 50 3.4 Applying Multimedia Extensions to Temporal Video Segmentation ....... 52 3.4.1 Sum of Products ................................................. 52 3.4.2 Calculation of W1D(·) ............................................ 52 3.4.3 Calculation of W2D(·) ............................................ 52 3.4.4 1-D Convolution ................................................. 53 3.4.5 Gradients Magnitude ............................................. 53 3.4.6 Arctangent ...................................................... 54 3.5 Performance of Multimedia Extensions .................................. 56 3.6 Object Tracking in Video ................................................ 57 3.7 Mapping the Tracking Algorithm in GPUs ............................... 60 3.8 GPU Performance with the Tracking Algorithm .......................... 62 43 © 2008 by Taylor & Francis Group, LLC 44 High-Performance Computing in Remote Sensing 3.9 Conclusions ............................................................ 64 3.10 Acknowledgments ...................................................... 65 References ................................................................... 65 Multimedia processing involves applications with a high demanding computational power. New capabilities have been included in modern processors to cope with these new requirements, and specific architectures have been designed to increase the per-formance of different multimedia applications. Thus, multimedia extensions were included in general purpose processors to exploit the single-instruction multiple-data (SIMD) parallelism that appears in signal processing applications. Moreover, new ar-chitectures, such as Graphics Processing Units (GPU), are being successfully applied togeneralpurposecomputation(GPGPU)and,specifically,imageandvideoprocess-ing. In this chapter we discuss the impact of the usage of multimedia extensions and GPUs in multimedia applications and illustrate this discussion with the study of two concrete applications involving video analysis. 3.1 Introduction The growing importance of multimedia applications is greatly influencing current computer architectures. Thus, new extensions have been included in general purpose processors in order to exploit the subword parallelism (SIMD paradigm) that appears in some of these applications [1],[2]. Additionally, the capability of new processors designs, such as VLIW and SMT, to exploit the coarse and fine grain parallelism present at multimedia processing is being studied [3]. Also, specific architectures have been developed to speed up the performance of programs that process multimedia contents. Thus, specific implementations based on associative processors [4] and FPGAs [5] have been performed. Graphics Processing Units (GPUs) were initially devoted to manage 3-D objects byincludingapowerfuldesignwithplentyoffunctionalunitsabletoapplygeometric transformationandprojectionstoahighnumberofspatialpointsandpixelsdescribing the 3-D scenario. Nowadays, the two processors included in GPUs, named vertex and pixel processors, can be programmed using a high level language [6]. This allows other applications different from graphics to use the capabilities of those processing units, as, for example, video decoding [37]. Several specific benchmarks have been developed to measure the impact of archi-tectural improvements on multimedia processing. This way, UCLA Media Bench is a suite of multimedia applications and data sets designed to represent the workload of emerging multimedia and communications systems [7]. Berkley multimedia work-load [8] updates previous the benchmark by adding new applications (most notably MP3 audio) and modifying several data sets. © 2008 by Taylor & Francis Group, LLC Computer Architectures for Multimedia and Video Analysis 45 Attending to existing benchmarks, multimedia applications can be classified into the following three groups: r Coding/decoding algorithms. Audio and/or video signals are digitized in or-der to obtain better quality and low bit-rates. Different efforts in the field of standardization have driven the development of specific standards for video transmission, such as MPEG 1, MPEG 2, and MPEG 4, and also for image compression, as JPEG. Signal coding and decoding are basic tools to generate multimedia contents. This fact, in addition to their high demanding computa-tional requirements, have made these applications the most common used for testing the multimedia capabilities of a computer architecture. r Media content management and analysis. The huge amount of digital content that is daily produced and stored increases the need for powerful tools for its manipulation. Thus, editing tools are used to cut and paste different pieces of sequences according to the audiovisual language. Simple editing effects can be applied in the compress domain. However, more complex effects must be generated in the uncompress domain, requiring decoding and encoding stages. Asaresult,animportantchallengeinthisfieldisthetopicofautomaticindexing of the multimedia contents. It is based on the application of computational techniques to the digital audiovisual information in order to analyze those contents and extract some descriptors. These descriptors will be annotated in a database for indexing purposes, allowing the implementation of content-based retrieval techniques. Description of the multimedia content is a wide area. Currently, MPEG 7 is a standard that proposes both a set of multimedia descriptors and the relationship between them. It should be noted that the complexity of the analysis to be applied to the video content is related to the semantic level of the descriptors to be extracted. Thus, temporal video segmentation involves the comparison of low-level frame information to locate edition effects and camera movements. Object identification and tracking need more complex algorithms to improve the robustness. r 3-D graphics. Real-time 3D applications have become usual in multimedia applications. The use of more realistic scenarios implies the increase in the number of polygons needed to describe the objects. Thus, more computational requirements are needed for the rendering process. Current multimedia benchmarks mainly focus on the computation of specific ker-nels or functions. However, we consider that the mapping of a complete application to a concrete computing architecture can help to extract interesting conclusions about the limits of the architecture and evaluate the impact of possible improvements. Thus,inthischapter,wepresenttwodifferentapplicationsinvolvingvideoanalysis: temporal video segmentation and object tracking. In the first application, multimedia extensions implemented in general purpose processors are used to increase the per-formance of a temporal video segmentation algorithm. Our study extracts the com-putational kernels that appear in the algorithm and analyzes how to map these kernels © 2008 by Taylor & Francis Group, LLC 46 High-Performance Computing in Remote Sensing to the available SIMD instructions. Final speedup values consider the whole appli-cation, that is, optimized and non-optimized kernels. In the second application, we show the capabilities of the GPUs for general purpose processing and illustrate the performance that this architecture can achieve by implementing an application for object tracking in video. Reported performance is compared with that from general purpose processors showing that it depends on the problem size. The rest of the chapter is organized as follows. In the next section, basic concepts about the two computing architectures are presented. Section 3.3 introduces the tem-poral video segmentation algorithm, including the analysis of its main computational kernels. Section 3.4 shows the optimization of the computational kernels in the pro-posed segmentation algorithm, and Section 3.5 analyses the achieved performance by using multimedia extensions. The following three sections illustrate the use of the GPU in object tracking in video. Hence, Section 3.6 introduces the techniques for object tracking using a second order approach while Section 3.7 studies the mapping of this algorithm to the GPU. Section 3.8 shows the performance of the GPU results compared with general purpose processors. Finally, in Section 3.9, main conclusions are summarized. 3.2 The Computing Architectures Two different computing architectures have been proposed to implement multimedia applications. Following are the basic concepts of these architectures. 3.2.1 Multimedia Extensions The multimedia extensions were proposed to deal with the demanding requirements of modern multimedia applications. They are based on the SIMD computation model and exploit the subword parallelism present in signal processing applications. Thus, arithmetic functions that operate with long words (e.g. 128 bits) can be subdivided to perform parallel computation with shorter words, as shown in Figure 3.1. The major manufacturers have proposed their own set of multimedia extensions. Thus,AMDusesintheirprocessorsthe3DNow!technology.TheprocessorsPowerPC from Motorola include the AltiVec extensions. Finally, Sun Microsystems presented the VIS extensions in their UltraSparc processors. Nevertheless, the most famous extensions are those from Intel. In 1997, Intel introduced the first version of its multi-media extensions, the MMX technology. It consisted of 57 new instructions used to perform SIMD calculations on up to 8 integer operands. The FPU registers were used asMMXregisters.Hence,combiningfloating-pointandmultimediacodewasnotad-visable.ThePentium3processorintroducedtheSSEextensions,with70newinstruc-tions,someofwhichwereusedtoperformSIMDcalculationsonfoursimpleprecision floating-point operands. Here, the set of registers, known as XMM registers, were in-dependent of the FPU registers. Later, the Pentium 4 introduced the SSE2 extensions, © 2008 by Taylor & Francis Group, LLC Computer Architectures for Multimedia and Video Analysis 47 Destination X4 X3 X2 X1 Source Y4 Y3 Y2 Y1 op op op op Destination X4 op Y4 X3 op Y3 X2 op Y2 X1 op Y1 Figure 3.1 Typical SIMD operation using multimedia extensions. containing144newinstructions.Themaininnovationwastheabilitytoworkwithup to two double precision floating-point operands. The last proposal from Intel is the SSE3 extensions, which includes instructions to perform horizontal and asymmetric computations,insteadofverticalcomputations,asshowninFigure3.1.Acomparison of the performance of different sets of multimedia extensions is given in [23]. 3.2.2 Graphics Processing Units General purpose programming of Graphic Processing Units has become a topic of considerable interest in the past years due to the spectacular evolution of this kind of hardware in terms of capabilities to satisfy leisure and multimedia applications requirements. Nowadays it is common to find this kind of processor in almost every personalcomputer,and,inmanyoccasions,theircomputingcapabilities(innumberof operations per second) are higher than those offered by the CPU, at a reasonable cost. Initial GPU designs consisted of a specific fixed pipeline for graphic rendering. However, the increasing demand of customized graphic effects to fit different appli-cations requirements motivated a fast evolution of its programming possibilities in subsequentgenerations.GPUs’desirablecharacteristicsincludeahighmemoryband-width and a higher number of floating-point units than the CPUs. These facts make these platforms interesting candidates to be taken into account as efficient coproces- sors of intensive computing applications in domestic platforms and workstations. The computing capabilities of GPUs as intrinsic parallel systems have been shown inmanyrecentworks,includingnumericalcomputingapplications,suchas[35],[36]. Nowadays, the GPU is a deeply segmented architecture. The number of processing stages and its sequence greatly differ from one manufacturer to another, and even between the different models. However, a common high level structure can be dis-tinguished, such as the one illustrated in Figure 3.2. Initial stages are related to the managementofinputvertexattributesdefiningthegraphicobjectsfeedingthegraphic card. Ending stages focus on color processing of the pixels to be shown on the screen. In this design, vertex processors and fragment processors (darker in the figure) are the programmable stages of the pipeline. © 2008 by Taylor & Francis Group, LLC ... - tailieumienphi.vn
nguon tai.lieu . vn