| GriPhyN Project Description: Version 1.0, August 1, 2000 |
|
|
|
Note: The full project description is available from the original proposal.
|
|
|
Project Summary
|
 |
The GriPhyN (Grid Physics Network) project brings together an outstanding team of information technology (IT) researchers and experimental physicists to provide the IT advances required to enable Petabyte-scale data intensive science in the 21st century. Driving the project are unprecedented requirements for geographically dispersed extraction of complex scientific information from very large collections of measured data. To meet these requirements, which arise initially from the four physics experiments involved in this project but will also be fundamental to science and commerce in the 21st century, the GriPhyN team will pursue IT advances centered on the creation of Petascale Virtual Data Grids (PVDG) that meet the data-intensive computational needs of a diverse community of thousands of scientists spread across the globe.
Our team is composed of seven IT research groups and members of four NSF-funded frontier physics experiments. We believe that only an integrated research effort will provide the coordination and tight feedback from prototypes and tests that will enable both communities to meet their goals. The four physics experiments are about to enter a new era of exploration of the fundamental forces of nature and the structure of the universe. The CMS and ATLAS experiments at the Large Hadron Collider will search for the origins of mass and probe matter at the smallest length scales; LIGO (Laser Interferometer Gravitational-wave Observatory) will detect the gravitational waves of pulsars, supernovae and in-spiraling binary stars; and SDSS (Sloan Digital Sky Survey) will carry out an automated sky survey enabling systematic studies of stars, galaxies, nebulae, and large-scale structure.
The data analysis for these experiments presents enormous IT challenges. Communities of thousands of scientists, distributed globally and served by networks of varying bandwidths, need to extract small signals from enormous backgrounds via computationally demanding analyses of datasets that will grow from the 100 Terabyte to the 100 Petabyte scale over the next decade. The computing and storage resources required will be distributed, for both technical and strategic reasons, across national centers, regional centers, university computing centers, and individual desktops. The scale of this task, far outpaces our current ability to manage and process data in a distributed environment, requiring fundamental advances in many areas of computer science.
To meet these challenges, GriPhyN will pursue an aggressive program of fundamental IT research focused on realizing the concept of Virtual Data. Virtual Data encompasses the definition and delivery to a large community of a (potentially unlimited) virtual space of data products derived from experimental data. In this virtual data space, requests can be satisfied via direct access and/or computation, with local and global resource management, policy, and security constraints determining the strategy used. Overcoming this challenge and realizing the Virtual Data concept requires advances in three major areas in which GriPhyN will target IT advances:
Virtual data technologies. Advances are required in information models and in new methods of cataloging, characterizing, validating, and archiving software components to implement virtual data manipulations
Policy-driven request planning and scheduling of networked data and computational resources. We require mechanisms for representing and enforcing both local and global policy constraints and new policy-aware resource discovery techniques.
Management of transactions and task-execution across national-scale and worldwide virtual organizations. New mechanisms are needed to meet user requirements for performance, reliability, and cost. Agent computing will be important to permit the grid to balance user requirements and grid throughput, with fault tolerance.
GriPhyN is primarily focused on achieving the fundamental IT advances required to create PVDGs, but will also work synergistically on creating PVDG software systems for community use, and applying PVDG technologies to enable distributed, collaborative analysis of data. In the process, a new generation of interdisciplinary scientists with expertise in this critical area will be educated. These goals are being pursued by an exceptional team that includes computer scientists with substantial expertise in key technology areas as well as members of the four experiments.
In order to apply these advances to the experimental data analysis problems, GriPhyN will package them in a multi-faceted, domain-independent Virtual Data Toolkit, and use this toolkit to prototype the PVDGs and support the CMS, ATLAS, LIGO, and SDSS analysis tasks. This combination of IT advances, toolkit development, and PVDG development will deliver new data analysis capabilities to the entire research community, to educators, and to students, enabling revolutionary discoveries in both fundamental computer science and physics disciplines. The challenges addressed by this program are not unique to physics, but are also encountered in biology (e.g., the human genome project), medicine (e.g., the human brain project), environment (e.g., the Earth Observing System), and many other areas. GriPhyN's results and resources thus could drive future scientific advances in these disciplines.
Note: The full project description (22 pages)
is available in PDF and MS word format here.
|
|
Senior Personnel
|
|
Argonne National Laboratory
Veronika Nefedova
Lawrence E. Price
Valerie Taylor
Steven Tuecke
California Institute of Technology
Julian J. Bunn
Takako Hickey
Albert Lazzarini
Harvey B. Newman
Roy D. Williams
Fermilab
Stephen M. Kent
Harvard University
John E. Huth
Indiana University
Randall Bramley
Dennis Gannon
Robert W. Gardner
Johns Hopkins University
Alexander Sandor Szalay
Lawrence Berkeley National Laboratory
Arie Shoshani
Northwestern University
Jennifer Schopf
San Diego Supercomputer Center
Reagan W. Moore
|
Stanford Linear Accelerator Center
Richard P. Mount
University of Florida
Sanguthevar Rajasekaran
University of Illinois at Chicago
Thomas A. DeFanti
University of California, Berkeley
Michael J. Franklin
University of California, San Diego
Keith A. Marzullo
University of Pennsylvania
Robert Hollebeek
University of Southern California
Ann Chervenak
Carl Kesselman
University of Texas at Brownville
Joseph Romano
University of Wisconsin, Madison
Andrea Arpaci-Dusseau
Remzi Arpaci-Dusseau
Miron Livny
University of Wisconsin, Milwaukee
Bruce Allen
|
 |
|
|
|
|