Xem mẫu

Chapter 12 Design and Implementation of a Grid Computing Environment for Remote Sensing Massimo Cafaro, Euromediterranean Center for Climate Change & University of Salento, Italy Italo Epicoco, Euromediterranean Center for Climate Change & University of Salento, Italy Gianvito Quarta, Institute of Atmospheric Sciences and Climate, National Research Council, Italy Sandro Fiore, Euromediterranean Center for Climate Change & University of Salento, Italy Giovanni Aloisio, Euromediterranean Center for Climate Change & University of Salento, Italy Contents 12.1 Introduction .......................................................... 282 12.2 Grid Computing Environments ........................................ 283 12.3 Common Components of GCE ........................................ 286 12.3.1 Security ...................................................... 286 12.3.2 Job Management ............................................. 287 12.3.3 Data Management ............................................ 287 12.3.4 Information Services ......................................... 285 12.4 The Design and Implementation of GCEs for Remote Sensing ......... 288 12.4.1 System Overview ............................................ 290 12.4.2 Knowledge Container ........................................ 291 12.4.3 Distributed Data Management System ........................ 293 12.4.4 Workflow Management System ............................... 296 12.4.5 Resource Broker ............................................. 298 12.5 The Implementation of GCE, Best Practices ........................... 299 12.5.1 Front-End Interface .......................................... 299 12.5.2 Information and Data Management ........................... 300 12.5.3 Use Case ..................................................... 302 12.6 Comparison Between a GCE for Remote Sensing versus the Classical Approach ............................................... 304 12.7 Conclusions .......................................................... 305 References .................................................................. 305 281 © 2008 by Taylor & Francis Group, LLC 282 High-Performance Computing in Remote Sensing This chapter presents an overview of a Grid Computing Environment designed for remote sensing. Combining recent grid computing technologies, concepts related to problem-solving environments, and high-performance computing, we show how a dynamic Earth Observation system can be designed and implemented, with the goal of management of huge quantities of data coming from space missions and for their on-demand processing and delivering to final users. 12.1 Introduction The term remote sensing was first used in the United States in the 1950s by Evelyn Pruitt of the U.S. Office of Naval Research, and is now commonly used to describe the science of identifying, observing, and measuring an object without coming into direct contact with it. This process involves the detection and measurement of differ-ent wavelength radiations, reflected or emitted from distant objects or materials, by which they may be identified and categorized by class, type, substance, and spatial distribution. Remote Sensing Systems are thus made of: r sensors mounted on an aircraft or a spacecraft that gather information from the Earth’s surface; r acquisition and archiving facilities that store the acquired raw data; r computing resources that process and store them as images into distributed databases; r on-line systems provided by space agencies for their distribution to final users. Informationcanbeachievedbymeansoftheso-calledpassivesensors,whichdetect the radiation emitted by the sun and the one spontaneously emitted by the ground. It should be noted that passive sensors do not work during the night and their efficiency is strongly influenced by atmospheric conditions. Active sensors instead detect the backscattered radiation emitted by a radar installed on-board the spacecraft and, unlike passive sensors, can be used for monitoring both day and night, whatever the weatherconditions.TheSyntheticApertureRadar (SAR)issuchanactivesensor,and it is widely used in remote sensing missions to achieve high-resolution Earth images. Whatweareinterestedinarethesystemcomponentsbelongingtothegroundsegment and devoted to the archiving, processing, and delivering of remote sensing images to the final user. The scenario is thus characterized by big distributed archives in which information is stored as raw data, that is, in the original format acquired on-board the spacecraft, or as images derived from processing of the raw data. As a matter of fact, further post-processing is usually required to generate standard products to be delivered to final users. To access the huge quantity of remote sensing data stored in the archives, some user interfaces have been developed. An example of a Web interface is the Intelligent Satellite Data Information System (ISIS) at the German Remote Sensing Data Center © 2008 by Taylor & Francis Group, LLC Design and Implementation of a Grid Computing Environment 283 (DFD). This interface provides catalogue search and visualization of digital quick-looks and electronic order placement. It is interesting to remark that links to external EarthObservationSystems(EOS)archivesarealsoprovided.UsingtheISISinterface, the user is allowed to search for the information through a clickable map of the world to set the geographic region of interest and other parameters like the campaign, the selected data center, the time range, and the processing level. The ISIS interface is similar to that provided by the European Space Agency Earthnet On Line. These on-line systems are in our opinion good examples of Web interfaces to static Earth Observation Systems: The user is just allowed to access a static catalogue (i.e., only images previously processed and stored in the data base can be retrieved) and no on-demand processing is permitted. Moreover, a limited number of post-processing facilities is provided (no true real-time services) and the level of transparency in the data access is very low, i.e., the user must know in advance where the data are stored, how to access them, etc. Through a nice Web interface, the user is well guided on a clickable map of the world, where she can select the region of interest, the data source, the acquisition time range, the data centre, but she needs to know too many details and no high-level queries are allowed, due to a lack of intelligence capable to translate a high-level user’s request to low-level actions on the system. The level of integration of multi-sources data is also very low, so that these systems are not capable of fusing multiple data sources together to infer new knowledge. For these reasons, although the EOS data are so useful, the number of real users is very limited compared to the big investments of the International Space Agencies. Inthischapter,wepresentanoverviewofaGridComputingEnvironmentdesigned for remote sensing. Combining recent grid computing technologies, concepts related to problem solving environments, and high-performance computing, we show how a dynamic Earth Observation system can be designed and implemented, with the goal of management of huge quantities of data coming from space missions and for their on-demand processing and delivering to final users. The rest of the chapter is organizedasfollows.Section12.2introducesGridComputingEnvironments(GCEs), andSection12.3describescommongridcomponentsofGCEs.Wedescribethedesign of our GCE for remote sensing highlighting implementation details in Section 12.4 and report about the implementation best practices in Section 12.5. We discuss the GCE approach and compare it against classic approaches in Section 12.6. Finally, we draw our conclusions in Section 12.7. 12.2 Grid Computing Environments Due to the rapid evolution of grid computing technologies with respect to both con-cepts and implementations, people increasingly think about grids clearly separating the user and the system sides. Indeed, a useful distinction in a grid system is made by considering separately how the users access computing services and how these servicesinterfacethemselveswithback-endresources.Usually,theusersideiscalled theGridComputingEnvironmentwhereastheunderlyingdistributedsystemiscalled © 2008 by Taylor & Francis Group, LLC 284 High-Performance Computing in Remote Sensing the Core Grid. A GCE therefore embraces the tools provided by a grid system that allow users accessing grid resources and applications; it includes graphical interfaces (for authentication/authorization, job composition and submission, file management, job monitoring, resource management and monitoring, etc.), run-time support, result visualization, knowledge sharing modules, and so on [1]. Often, GCEs are implemented as Web applications (so-called grid portals) that provide the users with a friendly interface to grid applications and thus, to a set of resources and applications. Usually, grid portals reside on the top level of a multi-tier application development stack; the major reasons for using the Web as a transfer protocol are ubiquity, portability, reliability, and trust. It is a comfortable, low-tech delivery system that is available at the lab or at home, or from a laptop in a hotel room: The only requirement that users must satisfy to access a Web-based GCE is the availability of a Web browser. GCEs have also been called grid-based Problem Solving Environments (PSEs) in order to stress the natural tendency of these systems to provide services and function-alitiestosolveusers’problems.Nevertheless,GCEsandPSEsareoftenconfusedand misunderstood:APSErepresentsacomplete,integratedenvironmentforcomposing, compiling, and executing applications belonging to a specific area (or areas). It in-cludes advanced features in order to assist users in creating and solving a problem on a specific platform. With this in mind, a useful distinction between GCEs and PSEs is related to their focus: While a GCE is meant to control a grid environment that can be potentiallycomposedofthousandsofresources,aPSEisspecializedtoone(ormore) applicative domain and, thanks to its specialization, it can provide high-level services (such as assistance) to users in composing their applications. A PSE may access virtual libraries, knowledge containers, and sophisticated control systems related to executionandresultvisualization.PSEsareabletoproviderapidprototypingsystems and a high scientific production rate without worrying about hardware or software details. In this chapter we argue that the design of a grid environment for remote sensing must take into account all of the aspects related to the specific applicative domain, not just the nature of data involved in the processing (with regard to format heterogeneity, file size, etc.), the kind of application really used by scientists, etc., but especially and more than anything else the users. Consequently, we will describe the GCE design process and its specialization to this applicative field. Focusing now on general GCEs features, we can describe them taking into ac-count the functional point of view and considering the technologies used for the implementation. With regard to the former, we define a GCE as a grid subsystem able to satisfy at least the following two requirements [2]: r programming the users’ side of the grid by means of best-suited programming technologies; r controlling users’ interactions in order to implement functionalities such as authentication, jobs composition, and submission. These two requirements of a generic GCE can be analyzed in depth taking into ac-countthefunctionalitiesthatmustbeprovided:users’authentication,datamanagement, © 2008 by Taylor & Francis Group, LLC Design and Implementation of a Grid Computing Environment 285 job management, and resource management. Indeed, a computing environment in which the resources are geographically distributed and owned by different organiza-tions,asinagridsystem,requiresproperauthentication,authorization,andaccounting to address the key aspects related to access and resource usage control. Moreover, we refer to all of these issues using the general term security, recalling that confidential-ity and integrity of data must also be provided. Job submission and monitoring, and file transfers are two other services essential to any computing environment in which userscanperformsomekindofprocessing.Finally,ingridenvironments,thenumber of involved resources can rapidly change, and thus an efficient mechanism to manage them is required in order to react rapidly to changes in the environment due to such a dynamic behavior. It is worth noting here that a resource can be a computing node, software, an instrument, and so on, consequently the heterogeneity of these resources must by properly considered. All of these aspects are faced by the core grid services available as grid middleware, but again, the mechanisms provided by the middleware itself are not immediately accessible to end users. A GCE must implement, using the underlying service functionalities, a set of higher-level interfaces. Let us consider now the second point of view related to the characterization of GCEs: technologies and programming languages. While the GCE functionalities can be rapidly summarized, the way to implement a GCE is more variegated and requires somecare.Indeed,GCEsusuallyexploitthecoregridservicesprovidedbytheunder-lyingsoftwaremodules(Gridmiddleware).Forinstance,theGlobusToolkit[3]isthe most widely used grid middleware, but it does not provide direct support for building GCEs: It makes available to developers a set of modules that face all of the aspects involved in the development of grid systems. The modules provided can be accessed directly harnessing their clients and servers, but, besides offering limited functional-ities, this approach is rather inconvenient for those scientists who are not computer experts and requires learning the gory details of the Globus software. Instead, scien-tistswouldratheremploytheirtimecomputingandproducingusefulscientificresults. Nevertheless, GCEs can be built using the Globus APIs with the mission to hide all of the low-level details in order to provide a friendly interface. With respect to the programming languages and technologies suited for GCE development, we note here that many different GCE implementations do exist based on C/C++, Java, and related technologies (JavaBeans, Java Server Pages, Portlets, CORBA, XML, Web services, etc.). Often, several architectural layers have been defined in order to de-couple raw resources and low-level services from user interfaces (see Figure 12.1). Examples of toolkits for development of layered GCEs include: the Grid Portal Development Toolkit [4] (GPDK, a suite of JavaBeans components suitable for Java based GCEs), the Java [5], CORBA [6], Python [7], and Perl [8] Commodity Grid interfaces to the Globus Toolkit (COG kits) [9]. In general, distributed object technology is often used due to the distributed nature of both the software and hardware objects. ItisclearthatGCEscanbeclassifiedwithregardtothetechnologiesandlanguages used. In this case, considerations about the performances and their relation with the programming language can be drawn in order to support the design phase [10]. We conclude this section by introducing OGSA and WSRF. Open Grid Services Architecture(OGSA)andWebServicesResourceFramework(WSRF)specifications © 2008 by Taylor & Francis Group, LLC ... - tailieumienphi.vn
nguon tai.lieu . vn