07/30/2006   
 
VDT   Chimera   Outreach  
 
 
Project Information
 
Project
 Project Information
 Documents
 Education & Outreach
 Links
 
News & Events
 News
 Meetings & Events
 
Activities
 Chimera
 Pegasus
 Sphinx
 Virtual Data Toolkit
 Work Space
 
People
 Participants
 Contacts
 E-mail Archive
 
Related Projects
 iVDGL
 PPDG
 Open Science Grid
 EGEE
 European DataGrid
 TeraGrid
 Globus
 Condor



Home > Project Information > Project Summary
GriPhyN Project Description: Version 1.0, August 1, 2000
Principal Investigators: Paul Avery, University of Florida
avery@phys.ufl.edu

Ian Foster, University of Chicago and Argonne National Laboratory
foster@cs.uchicago.edu
Note: The full project description is available from the original proposal.
Project Summary
The GriPhyN (Grid Physics Network) project brings together an outstanding team of information technology (IT) researchers and experimental physicists to provide the IT advances required to enable Petabyte-scale data intensive science in the 21st century. Driving the project are unprecedented requirements for geographically dispersed extraction of complex scientific information from very large collections of measured data. To meet these requirements, which arise initially from the four physics experiments involved in this project but will also be fundamental to science and commerce in the 21st century, the GriPhyN team will pursue IT advances centered on the creation of Petascale Virtual Data Grids (PVDG) that meet the data-intensive computational needs of a diverse community of thousands of scientists spread across the globe.

Our team is composed of seven IT research groups and members of four NSF-funded frontier physics experiments. We believe that only an integrated research effort will provide the coordination and tight feedback from prototypes and tests that will enable both communities to meet their goals. The four physics experiments are about to enter a new era of exploration of the fundamental forces of nature and the structure of the universe. The CMS and ATLAS experiments at the Large Hadron Collider will search for the origins of mass and probe matter at the smallest length scales; LIGO (Laser Interferometer Gravitational-wave Observatory) will detect the gravitational waves of pulsars, supernovae and in-spiraling binary stars; and SDSS (Sloan Digital Sky Survey) will carry out an automated sky survey enabling systematic studies of stars, galaxies, nebulae, and large-scale structure.

The data analysis for these experiments presents enormous IT challenges. Communities of thousands of scientists, distributed globally and served by networks of varying bandwidths, need to extract small signals from enormous backgrounds via computationally demanding analyses of datasets that will grow from the 100 Terabyte to the 100 Petabyte scale over the next decade. The computing and storage resources required will be distributed, for both technical and strategic reasons, across national centers, regional centers, university computing centers, and individual desktops. The scale of this task, far outpaces our current ability to manage and process data in a distributed environment, requiring fundamental advances in many areas of computer science.

To meet these challenges, GriPhyN will pursue an aggressive program of fundamental IT research focused on realizing the concept of Virtual Data. Virtual Data encompasses the definition and delivery to a large community of a (potentially unlimited) virtual space of data products derived from experimental data. In this virtual data space, requests can be satisfied via direct access and/or computation, with local and global resource management, policy, and security constraints determining the strategy used. Overcoming this challenge and realizing the Virtual Data concept requires advances in three major areas in which GriPhyN will target IT advances:

Virtual data technologies. Advances are required in information models and in new methods of cataloging, characterizing, validating, and archiving software components to implement virtual data manipulations

Policy-driven request planning and scheduling of networked data and computational resources. We require mechanisms for representing and enforcing both local and global policy constraints and new policy-aware resource discovery techniques.

Management of transactions and task-execution across national-scale and worldwide virtual organizations. New mechanisms are needed to meet user requirements for performance, reliability, and cost. Agent computing will be important to permit the grid to balance user requirements and grid throughput, with fault tolerance.

GriPhyN is primarily focused on achieving the fundamental IT advances required to create PVDGs, but will also work synergistically on creating PVDG software systems for community use, and applying PVDG technologies to enable distributed, collaborative analysis of data. In the process, a new generation of interdisciplinary scientists with expertise in this critical area will be educated. These goals are being pursued by an exceptional team that includes computer scientists with substantial expertise in key technology areas as well as members of the four experiments.

In order to apply these advances to the experimental data analysis problems, GriPhyN will package them in a multi-faceted, domain-independent Virtual Data Toolkit, and use this toolkit to prototype the PVDGs and support the CMS, ATLAS, LIGO, and SDSS analysis tasks. This combination of IT advances, toolkit development, and PVDG development will deliver new data analysis capabilities to the entire research community, to educators, and to students, enabling revolutionary discoveries in both fundamental computer science and physics disciplines. The challenges addressed by this program are not unique to physics, but are also encountered in biology (e.g., the human genome project), medicine (e.g., the human brain project), environment (e.g., the Earth Observing System), and many other areas. GriPhyN's results and resources thus could drive future scientific advances in these disciplines.

Note: The full project description (22 pages) is available in PDF and MS word format here.

Senior Personnel

Argonne National Laboratory
Veronika Nefedova
Lawrence E. Price
Valerie Taylor
Steven Tuecke

California Institute of Technology
Julian J. Bunn
Takako Hickey
Albert Lazzarini
Harvey B. Newman
Roy D. Williams

Fermilab
Stephen M. Kent

Harvard University
John E. Huth

Indiana University
Randall Bramley
Dennis Gannon
Robert W. Gardner

Johns Hopkins University
Alexander Sandor Szalay

Lawrence Berkeley National Laboratory
Arie Shoshani

Northwestern University
Jennifer Schopf

San Diego Supercomputer Center
Reagan W. Moore

Stanford Linear Accelerator Center
Richard P. Mount

University of Florida
Sanguthevar Rajasekaran

University of Illinois at Chicago
Thomas A. DeFanti

University of California, Berkeley
Michael J. Franklin

University of California, San Diego
Keith A. Marzullo

University of Pennsylvania
Robert Hollebeek

University of Southern California
Ann Chervenak
Carl Kesselman

University of Texas at Brownville
Joseph Romano

University of Wisconsin, Madison
Andrea Arpaci-Dusseau
Remzi Arpaci-Dusseau
Miron Livny

University of Wisconsin, Milwaukee
Bruce Allen

Supported by the National Science Foundation comments? contact webmaster