High-performance Integrated Virtual Environment
The High-performance Integrated Virtual Environment (HIVE) is a distributed computing environment used for biological research, including analysis of Next Generation Sequencing (NGS) data, post market data, adverse events, metagenomic data, etc.[1]
Infrastructure
HIVE is a massively parallel distributed computing environment where the distributed storage library and the distributed computational powerhouse are linked seamlessly.[2] The system is both robust and flexible due to maintaining both storage and the metadata database on the same network.[3] The distributed storage layer of software is the key component for file and archive management and is the backbone for the deposition pipeline. The data deposition back-end allows automatic uploads and downloads of external datasets into HIVE data repositories. The metadata database can be used to maintain specific information about extremely large files ingested into the system (big data) as well as metadata related to computations run on the system. This metadata then allows details of a computational pipeline to be brought up easily in the future in order to validate or replicate experiments. Since the metadata is associated with the computation, it stores the parameters of any computation in the system eliminating manual record keeping.
Differentiating HIVE from other object oriented databases is that HIVE implements a set of unified APIs to search, view, and manipulate data of all types. The system also facilitates a highly secure hierarchical access control and permission system, allowing determination of data access privileges in a finely granular manner without creating a multiplicity of rules in the security subsystem. The security model, designed for sensitive data, provides comprehensive control and auditing functionality in compliance with HIVE's designation as a FISMA Moderate system.[4]
Public Presentations
- Dr. Vahan Simonyan and Dr. Raja Mazumder presented at the NIH Frontiers in Data Science[5] about HIVE acting as a bridge between research and regulatory analytics.[6][7]
- HIVE was additionally discussed in FedScoop.[8]
- Inside the HIVE, the FDA's Multi-Omics Compute Architecture, BioIT World.[9]
References
- ↑ Simonyan, Vahan; Mazumder, Raja (2014). "High-Performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis". Genes 5 (4): 957–81. doi:10.3390/genes5040957. PMC 4276921. PMID 25271953.
- ↑ https://hive.biochemistry.gwu.edu/help/HIVEWhitePaper_12_16_2014.pdf[]
- ↑ https://hive.biochemistry.gwu.edu/help/HIVEInfrastructuresUK.pdf[]
- ↑ Wilson, C. A.; Simonyan, V. (2014). "FDA's Activities Supporting Regulatory Application of 'Next Gen' Sequencing Technologies". PDA Journal of Pharmaceutical Science and Technology 68 (6): 626–30. doi:10.5731/pdajpst.2014.01024. PMID 25475637.
- ↑ https://datascience.nih.gov/community/datascience-at-nih/frontiers[]
- ↑ http://videocast.nih.gov/summary.asp?Live=18299&bhcp=1[]
- ↑ https://datascience.nih.gov/community/datascience-at-nih/frontiers#title4[]
- ↑ http://fedscoop.com/fdas-examines-nextgen-sequencing-too[]l
- ↑ http://www.bio-itworld.com/2014/10/22/inside-hive-fdas-multi-omics-compute-architecture.html[]
External links
- The public version of HIVE is at https://hive.biochemistry.gwu.edu/dna.cgi?cmd=about