SBGrid Consortium
The SBGrid Consortium [1] is an innovative global research computing group financially supported by participating research laboratories and operated out of Harvard Medical School. SBGrid provides the global structural biology community with support for research computing. Members of the SBGrid Consortium fund SBGrid’s ongoing operations through an annual membership fee. The resulting organization is a user-supported and user-directed community resource.
SBGrid’s primary service is the collection, deployment and maintenance of a comprehensive set of software and computational tools that are useful in structural biology research. As of 2015, SBGrid curates a collection of 300 structural biology applications for installation on computers in SBGrid laboratories around the world. The SBGrid software library acts as a scientific "app store" that allows users to access a wide range of up-to-date applications without having to download, compile, configure, maintain or update software.
SBGrid also develops a specialized research computing infrastructure for structural biologists in the Boston area, and develops specialized cloud and web-based software, including the recently released AppCiter application, and the pilot SBGrid Data Bank System.
SBGrid also maintains the SBGridTV YouTube channel, which houses a collection of data processing software tutorials, and organizes structural biology workshops, including the 2014 International Workshop on Data Processing in Crystallography. Seventeen workshop lectures are also posted on the YouTube channel. Members also benefit from access to SBGrid-supported high performance computing (HPC) resources and training opportunities.
SBGrid Background
SBGrid was first created by Piotr Sliz as an in-house effort to support and maintain a few dozen X-ray crystallography in the laboratory of Stephen C. Harrison and the late Don Craig Wiley, then at Harvard University and Boston Children’s Hospital. After adding support for additional labs, SBGrid began charging user fees to recover operational costs in 2002. It also expanded software support to include electron microscopy (EM), nuclear magnetic resonance (NMR) and other structural biology techniques. In response to requests from users for support for Macintosh computers, SBGrid recompiled most of its applications to run on the OSX platform in 2004. By 2006, the SBGrid consortium included 37 laboratories at 14 different institutions.
SBGrid’s user-oriented community began to solidify in 2008 with its first user meeting: Quo Vadis Structural Biology (“Where is structural biology heading?”). The meeting attracted approximately 300 participants and incorporated a structural biology symposium and three workshops: scientific programming with Python; molecular visualization with Maya; and OSX programming. SBGrid held subsequent meetings in Boston (2009, 2013, 2014). In 2011 SBGrid hosted the Open Science Grid All-Hands Meeting at Harvard Medical School after having established a Virtual Organization (SBGrid VO) within the Open Science Grid (OSG) and deployed a grid computing portal in 2010. SBGrid has become one of the top OSG users (outside of high-energy physics users) and utilizes ~5,000,000 CPU hours per year.
In 2012, SBGrid launched a webinar program featuring software tutorials from a different developer each month. Recordings are publicly available on the SBGridTV YouTube channel. SBGrid team members have also published a guide to software licensing,[2] an editorial that advocates for better disclosure of source code,[3] and recommendations for optimizing peer review of software source code.[4]
By 2014, SBGrid had 245 member laboratories around the world.
SBGrid Membership
Laboratories interested in joining SBGrid may request a membership packet or apply through the SBGrid Consortium Registration process. SBGrid has developed an end-user licensing agreement (EULA) in cooperation with the Harvard University Office of Technology Development (OTD) to formalize its relationships with Consortium laboratories.
During the registration process, an SBGrid associate will advise new labs regarding hardware and computing requirements to deploy SBGrid support onsite. Once a new member laboratory’s hardware is in place, most new members are fully operational with SBGrid within two weeks of joining.
SBGrid Software Services for Members
The SBGrid team installs and maintains its collection of structural biology applications on Linux and OS X computers in member laboratories, including laptops. A few commercial applications are also supported, including Geneious for cloning and bioinformatics, incentive builds for PyMOL, and for North American labs, the Schrödinger Small-Molecule Drug Discovery Suite. Members access a complete execution environment that includes the suite of structural biology applications preconfigured to run without any additional settings.
SBGrid monitors all software websites for updates and installs major software upgrades on a monthly basis. The SBGrid team also recompile existing software for newer releases of supported operating systems and respond to user bug reports and new software requests.
Training for SBGrid Members
SBGrid hosts monthly live webinars that feature tutorials by contributing developers and offer members the opportunity to ask the developer questions directly. This collection of tutorials is also published on the SBGridTV YouTube channel.
Resources for SBGrid Members
The SBGrid technical team offers guidance to new members in setting up an adequate computing infrastructure. Members also benefit from access to a number of other specialized computing resources, including:
- The SBGrid Data Bank (SBGrid-DB) archival system. SBGrid launched a prototype of the SBGrid-DB archival system in 2015. The SBGrid-DB system includes a web portal, a Digital Object Identifier (DOI) registration system, and a basic data replication framework. The system has been populated with datasets from 43 structural biology laboratories, and is currently undergoing optimization for increased scalability. The system will evolve to support features of the Harvard Dataverse, an open source Research Data Management System (RDMS) that provides a leading solution for data publication. Dataverse archives provide individual universities and scientific publishers with a data preservation solution.
- The Wide-Search Molecular Replacement (WSMR) computing portal,[5] a service for determining crystallographic phase using the Phaser program.[6]
- The Deformable Elastic Network (DEN) portal,[7] a service for refining low-resolution electron density data.
- A dedicated server to host the SHARP application.[8]
- A Discovery Server for Small Molecule Docking Computations with Schrodinger Glide, which is available to SBGrid members in North America. A library of 400,000 compounds, available from the ICCB-Longwood Screening Facility, has been preprocessed with Schrödinger’s ligprep and can be incorporated in the virtual screening workflow.
- SBGrid provides European WeNMR Grid Certificates to the North American Structural Biology Community. WeNMR is a grid-based platform that integrates and streamlines the computational approaches necessary for Nuclear Magnetic Resonance (NMR) and Small-angle X-ray scattering (SAXS) data analysis and structural modeling.
- XSEDE is a virtual cyberinfrastructure in the U.S. that is supported by the National Science Foundation that provides access to High Performance Computing (HPC) by combining resources from several HPC sites.
- SBGrid operates an OpenScienceGrid (OSG) Virtual Organization and utilizes OSG opportunistic resources to support WSMR and DEN workflows. The OSG is a US-based, NSF-supported multi-disciplinary partnership to federate local, regional, community, and national cyberinfrastructures to meet high throughput computing needs for researchers.
SBGrid Resources for Software Developers
SBGrid provides developers of SBGrid-supported applications with access to the SBGrid build-test computing network at Harvard Medical School for building and testing software on a range of operating systems.
References
- ↑ Morin, A. "Cutting edge: Collaboration gets the most out of software.". eLife 2013;2:e01456.
- ↑ Morin, A. "A quick guide to software licensing for the scientist-programmer.". PLOS Computational Biology July 2012.
- ↑ Morin, A. "Shining Light into Black Boxes.". Science 336:159-60(2012).
- ↑ Morin, A. "Optimizing Peer Review of Software Code.". Science 341:236-237(2013).
- ↑ Stokes-Rees, I. "Compute and data management strategies for grid deployment of high throughput protein structure studies.". 3rd IEEE workshop on Many-Task Computing on Grids and Supercomputers. (2010).
- ↑ McCoy, AJ. "Phaser crystallographic software". J Appl Cryst 40:658–74.
- ↑ O’Donovan, DJ. "A grid-enabled web service for low-resolution crystal structure refinement.". Acta Crystallographica D68: 261-267 (2012).
- ↑ Bricogne, G. "Generation, representation and flow of phase information in structure determination: recent developments in and around SHARP 2.0.". Acta Crystallogr D Biol Crystallogr. 2003; 59:2023-30).