Xem mẫu

30 Distributed object-based Grid computing environments Tomasz Haupt1 and Marlon E. Pierce2 1Mississippi State University, Starkville, Mississippi, United States, 2Indiana University, Bloomington, Indiana, United States 30.1 INTRODUCTION Computational Grid technologies hold the promise of providing global scale distributed computing for scientific applications. The goal of projects such as Globus [1], Legion [2], Condor [3], and others is to provide some portion of the infrastructure needed to sup-port ubiquitous, geographically distributed computing [4, 5]. These metacomputing tools provide such services as high-throughput computing, single login to resources distributed across multiple organizations, and common Application Programming Interfaces (APIs) and protocols for information, job submission, and security services across multiple orga-nizations. This collection of services forms the backbone of what is popularly known as the computational Grid, or just the Grid. The service-oriented architecture of the Grid, with its complex client tools and pro-gramming interfaces, is difficult to use for the application developers and end users. The perception of complexity of the Grid environment comes from the fact that often Grid Grid Computing – Making the Global Infrastructure a Reality. Edited by F. Berman, A. Hey and G. Fox  2003 John Wiley & Sons, Ltd ISBN: 0-470-85319-0 714 TOMASZ HAUPT AND MARLON E. PIERCE services address issues at levels that are too low for the application developers (in terms of API and protocol stacks). Consequently, there are not many Grid-enabled applications, and in general, the Grid adoption rate among the end users is low. By way of contrast, industry has undertaken enormous efforts to develop easy user inter-faces that hide the complexity of underlying systems. Through Web portals the user has access to a wide variety of services such as weather forecasts, stock market quotes and on-line trading, calendars, e-mail, auctions, air travel reservations and ticket purchasing, and many others yet to be imagined. It is the simplicity of the user interface, which hides all implementation details from the user, that has contributed to the unprecedented success of the idea of a Web browser. Grid computing environments (GCEs) such as computational Web portals are an exten-sion of this idea. GCEs are used for aggregating, managing, and delivering grid services to end users, hiding these complexities behind user-friendly interfaces. Computational Web portal takes advantage of the technologies and standards developed for Internet comput-ing such as HTTP, HTML, XML, CGI, Java, CORBA [6, 7], and Enterprise JavaBeans (EJB) [8], using them to provide browser-based access to High Performance Computing (HPC) systems (both on the Grid and off). A potential advantage of these environments also is that they may be merged with more mainstream Internet technologies, such as information delivery and archiving and collaboration. Besides simply providing a good user interface, computing portals designed around dis-tributed object technologies provide the concept of persistent state to the Grid. The Grid infrastructure is implemented as a bag of services. Each service performs a particular trans-action following a client-server model. Each transaction is either stateless or supports only a conversional state. This model closely resemble HTTP-based Web transaction model: the user makes a request by pointing the Web browser to a particular URL, and a Web server responds with the corresponding, possibly dynamically generated, HTML page. However, the very early Web developers found this model too restrictive. Nowadays, most Web servers utilize object- or component-oriented technologies, such as EJB or CORBA, for session management, multistep transaction processing, persistence, user profiles, providing enterprise-wide access to resources including databases and for incorporating third-party services. There is a remarkable similarity between the current capabilities of the Web servers (the Web technologies), augmented with Application Servers (the Object and Com-ponent Technologies), and the required functionality of a Grid Computing Environment. This paper provides an overview of Gateway and Mississippi Computational Web Portal (MCWP). These projects are being developed separately at Indiana University and Mississippi State University, respectively, but they share a common design heritage. The key features of both MCWP and Gateway are the use of XML for describing portal metadata and the use of distributed object technologies in the control tier. 30.2 DEPLOYMENT AND USE OF COMPUTING PORTALS In order to make concrete the discussion presented in the introduction, we describe below our deployed portals. These provide short case studies on the types of portal users and the services that they require. DISTRIBUTED OBJECT-BASED GRID COMPUTING ENVIRONMENTS 715 30.2.1 DMEFS: an application of the Mississippi Computational Web Portal The Distributed Marine Environment Forecast System (DMEFS) [9] is a project of the Mississippi State team that is funded by the Office of Naval Research. DMEFS’s goal is to provide open framework to simulate the littoral environments across many temporal and spatial scales in order to accelerate the evolution of timely and accurate forecasting. DMEFS is expected to provide a means for substantially reducing the time to develop, prototype, test, validate, and transition simulation models to operation, as well as support a genuine, synergistic collaboration among the scientists, the software engineers, and the operational users. In other words, the resulting system must provide an environment for model development, including model coupling, model validation and data analysis, routine runs of a suite of forecasts, and decision support. Such a system has several classes of users. The model developers are expected to be computer savvy domain specialists. On the other hand, operational users who routinely run the simulations to produce daily forecasts have only a limited knowledge on how the simulationsactuallywork,whilethedecisionsupportistypicallyinterestedonlyinaccessing the end results. The first type of users typically benefits from services such as archiving and data pedigree as well as support for testing and validation. The second type of users benefits from an environment that simplifies the complicated task of setting up and running the simulations, while the third type needs ways of obtaining and organizing results. DMEFS is in its initial deployment phase at the Naval Oceanographic Office Major Shared Resource Center (MSRC). In the next phase, DMEFS will develop and inte-grate metadata-driven access to heterogenous, distributed data sources (databases, data servers, scientific instruments). It will also provide support for data quality assessment, data assimilation, and model validation. 30.2.2 Gateway support for commodity codes The Gateway computational Web portal is deployed at the Army Research Laboratory MSRC, with additional deployment approved for the Aeronautical Systems Center MSRC. Gateway’s initial focus has been on simplifying access to commercial codes for novice HPC users. These users are assumed to understand the preprocessing and postprocessing tools of their codes on their desktop PC or workstation but not to be familiar with common HPC tasks such as queue script writing and job submission and management. Problems using HPC systems are often aggravated by the use of different queuing systems between and even within the same center, poor access for remote users caused by slow network speeds at peak hours, changing locations for executables, and licensing issues for commercial codes. Gateway attempts to hide or manage as much of these details as possible, while providing a browser front end that encapsulates sets of commands into relatively few portal actions. Currently, Gateway supports job creation, submission, monitoring, and archiving for ANSYS, ZNS, and Fluent, with additional support planned for CTH. Gateway interfaces to these codes are currently being tested by early users. Because Gateway must deal with applications with restricted source codes, we wrap these codes in generic Java proxy objects that are described in XML. The interfaces for the invocation of these services likewise are expressed in XML, and we are in the process 716 TOMASZ HAUPT AND MARLON E. PIERCE of converting our legacy service description to the Web service standard Web Services Description Language (WSDL) [10]. Gatewayalsoprovidessecurefiletransfer,jobmonitoring andjobmanagementthrougha Webbrowserinterface.Thesearecurrentlyintegratedwiththeapplicationinterfacesbuthave proven popular on their own and so will be provided as stand-alone services in the future. Future plans for Gateway include integration with the Interdisciplinary Computing Envi-ronment (ICE) [11], which provides visualization tools and support for light code coupling through a common data format. Gateway will support secure remote job creation and man-agement for ICE-enabled codes, as well as secure, remote, sharable visualization services. 30.3 COMPUTING PORTAL SERVICES One may build computational environments such as the one above out of a common set of core services. We list the following as the base set of abstract service definitions, which may be (but are not necessarily) implemented more or less directly with typical Grid technologies in the portal middle tier. 1. Security: Allow access only to authenticated users, give them access only to authorized areas, and keep all communications private. 2. Information resources: Inform the user about available codes and machines. 3. Queue script generation: On the basis of the user’s choice of code and host, create a script to run the job for the appropriate queuing system. 4. Job submission: Through a proxy process, submit the job with the selected resources for the user. 5. Job monitoring: Inform the user of the status of his submitted jobs, and more generally provide events that allow loosely coupled applications to be staged. 6. File transfer and management: Allow the user to transfer files between his desktop computer and a remote system and to transfer files between remote systems. Going beyond the initial core services above, both MCWP and Gateway have identified and have or are in the process of implementing the following GCE-specific services. 1. Metadata-driven resource allocation and monitoring: While indispensable for acquir-ing adequate resources for an application, allocation of remote resources adds to the complexity of all user tasks. To simplify this chore, one requires a persistent and platform-independent way to express computational tasks. This can be achieved by the introduction of application metadata. This user service combines standard authentica-tion, information, resource allocation, and file transfer Grid services with GCE services: metadata discovery, retrieval and processing, metadata-driven Resource Specification Language (RSL) (or batch script) generation, resource brokerage, access to remote file systems and data servers, logging, and persistence. 2. Task composition or workflow specification and management: This user service auto-mates mundane user tasks with data preprocessing and postprocessing, file transfers, format conversions, scheduling, and so on. It replaces the nonportable ‘spaghetti’ shell DISTRIBUTED OBJECT-BASED GRID COMPUTING ENVIRONMENTS 717 scripts currently widely used. It requires task composition tools capable of describing the workflow in a platform-independent way, since some parts of the workflow may be preformed on remote systems. The workflow is built hierarchically from reusable mod-ules (applications), and it supports different mechanisms for triggering execution of modules: from static sequences with branches to data flow to event-driven systems. The workflow manager combines information, resource brokers, events, resource allocation and monitoring, file transfer, and logging services. 3. Metadata-driven, real-time data access service: Certain simulation types perform assimilation of observational data or analyze experimental data in a real time. These data are available from many different sources in a variety of formats. Built on top of the metadata, file transfer and persistence services, this user service closely interacts with the resource allocation and monitoring or workflow management services. 4. User space, persistency, and pedigree service: This user service provides support for reuse and sharing of applications and their configuration, as well as for preserving the pedigree of all jobs submitted by the user. The pedigree information allows the user to reproduce any previous result on the one hand and to localize the product of any completed job on the other. It collects data generated by other services, in particular, by the resource allocation and workflow manager. 30.4 GRID PORTAL ARCHITECTURE A computational Web portal is implemented as a multitier system composed of clients running on the users’ desktops or laptops, portal servers providing user level services (i.e. portal middleware), and backend servers providing access to the computing resources. 30.4.1 The user interface The user interacts with the portal through either a Web browser, a client application, or both. The central idea of both the Gateway and the MCWP user interfaces is to allow users to organize their work into problem contexts, which are then subdivided into session contexts in Gateway terminology, or projects and tasks using MCWP terms. Problems (or projects) are identified by a descriptive name handle provided by the user, with sessions automatically created and time-stamped to give them unique names. Within a particular session (or task), the user chooses applications to run and selects computing resources to use. This interface organization is mapped to components in the portal mid-dleware (user space, persistency, and pedigree services) described below. In both cases, the Web browser–based user interface is developed using JavaServer Pages (JSP), which allow us to dynamically generate Web content and interface easily with our Java-based middleware. The Gateway user interface provides three tracks: code selection, problem archive, and administration. The code selection track allows the user to start a new problem, make an initial request for resources, and submit the job request to the selected host’s queuing system. The problem archive allows the user to revisit and edit old problem sessions so that he/she can submit his/her job to a different machine, use a different input file, and ... - tailieumienphi.vn
nguon tai.lieu . vn