|
Part 1: Introduction
The GriPhyN (Grid Physics Network) collaboration is a team of experimental physicists and information technology (IT) researchers who plan to implement the first Petabyte-scale computational environments for data intensive science in the 21st century. Driving the project are unprecedented requirements for geographically dispersed extraction of complex scientific information from very large collections of measured data. To meet these requirements, which arise initially from the four physics experiments involved in this project but will also be fundamental to science and commerce in the 21st century, GriPhyN will deploy computational environments called Petascale Virtual Data Grids (PVDGs) that meet the data-intensive computational needs of a diverse community of thousands of scientists spread across the globe.
Our team is composed of IT research groups and members of four NSF-funded frontier physics experiments. Our integrated research effort provides the coordination and tight feedback from prototypes and tests that will enable both communities to meet their goals. The four physics experiments are about to enter a new era of exploration of the fundamental forces of nature and the structure of the universe. The CMS and ATLAS experiments at the Large Hadron Collider (LHC) at CERN will search for the origins of mass and probe matter at the smallest length scales; LIGO (Laser Interferometer Gravitational-wave Observatory) will detect the gravitational waves of pulsars, supernovae and in-spiraling binary stars; and SDSS (Sloan Digital Sky Survey) will carry out an automated sky survey enabling systematic studies of stars, galaxies, nebula, and large-scale structure.
The data analysis for these experiments presents enormous IT challenges. Communities of thousands of scientists, distributed globally and served by networks of varying bandwidths, need to extract small signals from enormous backgrounds via computationally demanding analyses of datasets that will grow from the 100 Terabyte to the 100 Petabyte scale over the next decade. The computing and storage resources required will be distributed, for both technical and strategic reasons, across national centers, regional centers, university computing centers, and individual desktops. The scale of this task, far outpaces our current ability to manage and process data in a distributed environment. The GriPhyN collaboration proposes to carry out the necessary computer science and validate the concepts through a series of staged deployments, ultimately resulting in a set of production Data Grids.
|