Nirvana (software)
Developer(s) | General Atomics |
---|---|
Initial release | August 8, 2003 |
Stable release | 4.3.05 / February 6, 2014 |
Preview release | 5.0 / January 15, 2016 |
Development status | Current |
Written in | C |
Operating system | Linux, Microsoft Windows, OS X, Solaris (operating system) |
Platform | X86-64, POWER8, SPARC |
Type | Metadata and data management software |
License | Proprietary commercial software |
Website |
www |
Nirvana (software)
Nirvana is metadata, data placement and data management software that lets organizations manage unstructured data on multiple storage devices located anywhere in the world in order to orchestrate global data intensive workflows, and search for and locate data no matter where it is located or when it was created. Nirvana does this by capturing system and user-defined metadata to enable detailed search and enact policies to control data movement and protection. Nirvana also maintains data provenance, audit, security and access control. Nirvana can reduce storage costs by identifying data to be moved to lower cost storage and data that no longer needs to be stored.
History
Nirvana is the result of research started in 1995 at the San Diego Supercomputer Center (SDSC) (which was founded by and run at the time by General Atomics[1]), in response to a DARPA sponsored project for a Massive Data Analysis System.[2] Led by General Atomics computational plasma physicist Dr. Reagan Moore, development continued through the cooperative efforts of General Atomics and the SDSC on the Storage Resource Broker (SRB), with the support of the National Science Foundation (NSF). SRB 1.1 was delivered in 1998,[3] demonstrating a logical distributed file system with a single Global Namespace across geographically distributed storage systems.
In 2003, General Atomics turned over operation of the SDSC to the University of California San Diego (UCSD) and Dr. Moore became a full time professor there establishing the Data Intensive Computing Environments (DICE) Center, continuing development of SRB. In that same year, General Atomics acquired the exclusive license to develop a commercial version of SRB, calling it Nirvana.[4] The DICE team ended development of SRB in 2006 and started a rules oriented data management project called iRODS[5] for open source distribution. Dr. Moore and his DICE team relocated to the University of North Carolina at Chapel Hill where they continue to develop iRODS. General Atomics continued development of Nirvana at their San Diego headquarters, focusing on capabilities to serve government and commercial users, including high scalability, fail-over, performance, implementation, maintenance and support.
In 2009, General Atomics won a data management contract with the US Department of Defense (DOD).[6] The requirements of this contract focused General Atomics to expand Nirvana’s performance, scalability, security and ease of use. A major deliverable involved integrating Nirvana with Oracle Corporation's SAM-QFS filesystem to provide a policy-based Hierarchical Storage Management (HSM) system with near real-time event synchronization. General Atomics also announced that digital marketing firm infoGROUP deployed Nirvana to create a Global Name Space across three of infoGROUP’s computer operations centers in the Omaha area.[7]
In 2012, General Atomics released Nirvana version 4.3.[8]
In 2014, General Atomics changed the Nirvana business model from a large government contract, fee for service model, to a standard commercial software model.
In 2015, General Atomics initiated a strategic relationship with Pixit Media/ArcaStream in the United Kingdom, integrating Nirvana with Pixit Media and ArcaStream’s products.[9]
Architecture and Operation
Nirvana is client-server software composed of Location Agents that reside on, or access, Storage Resources. A Storage Resource can be a networked-attached storage (NAS) system, object storage system or cloud storage service. Nirvana catalogs the location of the files and objects in these storage resources into its Metadata Catalog (MCAT) and tags the files with storage system metadata (Owner, File Name, File Size and Creation, Change, Modification and Access Timestamps) and additional user-defined, domain specific metadata. System and user-defined metadata can be used to search for a file or object (or groups of files and objects) and also control access to and move those files and objects from one storage resource to another. The MCAT creates a single Global Namespace across all Storage Resources connected to it so users and administrators can search for, access, and move data across multiple heterogeneous storage systems from multiple vendors across geographically dispersed data centers. The MCAT is connected to and interacts with a relational database management system to support its operation. Multiple MCATs can be deployed for horizontal scale-out and failover. Various Clients can interact with Nirvana including the supplied Web browser and Java based GUI Clients, a Command Line Interface, a native Windows virtual network drive interface, and user-developed applications via supplied APIs.
Nirvana operation is controlled by three daemons; Metadata, Sync and ILM. The Metadata Daemon can extract metadata automatically from an instrument creating data, from within the file's actual data using predefined and customizable templates and metadata parsing policies, or capturing user input via the GUI or Command Line Interface. The Sync Daemon, running in the background, detects when files are added to, or deleted from, the underlying Storage Resource filesystems. When filesystem changes are observed by the Sync Daemon, the the changes are registered and updated in MCAT. The ILM Daemon routinely queries the MCAT and executes actions including migration, replication, or backup on a specified schedule. For example, an administrator can set a policy to free up space on an expensive primary storage system by migrating that data to distributed retention locations based on criteria such as: storage consumption watermarks (percent full), all data associated with a specific project, or data that hasn't been accessed in over one year. The policies are extremely flexible. User-defined metadata attributes (e.g. Project, Principal investigator, Data source, Location, Temperature, etc.) can also be used to move data. Nirvana ILM policy execution occurs behind the scenes, transparent to end-users or applications.
Use Cases
Data Aware Cloud Storage Gateway
Nirvana's ILM functionality can be used as a Cloud Storage Gateway, where data stored locally, on premises, can be moved to popular cloud storage services based on Nirvana's various metadata attributes and policies. In 2015, General Atomics and ArcaStream announced a Cloud Storage Appliance that uses IBM's GPFS for on premises storage and integrates with cloud storage providers Amazon S3, and Google Cloud Storage.[10]
Advanced Search
Nirvana can be used to conduct search queries to find data of interest using both system and user-defined metadata. Queries are either entered in the Command Line Interface or through the Web browser client shown below.
Virtual Collections
Nirvana can automate the grouping and distribution of data files into a virtual collection - based on user-friendly logical rules. For example, user-defined metadata can be used to identify data files needing to be transferred between collaborators with domain-specific attributes (experiment, study, project, etc).
Data Provenance
In many fields, it is helpful to know the provenance and processing pipeline used to produce derived results. Nirvana tracks data within workflows, through all transformations, analyses, and interpretations. With Nirvana, data can be shared and used with verified provenance of the conditions under which it was generated – so results are reproducible and analyzable for defects.
Audit
Nirvana can be used to audit every transaction on a data file within a workflow. An audit trail can be stored containing information such as date of transaction, success or error code, user performing transaction, type of transaction and notes, etc. Audit trails, like everything else with Nirvana, can be easily queried and filtered.
Security and Access Control
Nirvana can be used to control access to data by setting up specific access control lists by user, group etc. using user-defined metadata attributes (Project, Study, etc.) and by setting access privilege levels where users assigned higher levels can see more information than others assigned lower levels. Nirvana supports single sign-on and access by integrating with the Lightweight Directory Access Protocol (LDAP) and Active Directory, using Challenge-response authentication, Grid Security Infrastructure (GSI), and Kerberos. Data can only be viewed and modified by users authorized to do so.
File System Analysis
Nirvana can be used to analyze the makeup of a shared filesystem to determine what type of data is being stored, how much space it takes up, when it was last accessed, and who stored it. With this information, storage administrators can determine the most appropriate type of storage system to use and when to move unused data to lower cost archive storage. In the example below, Nirvana's analysis of data stored on an expensive enterprise NAS storage system showed most data hadn't been accessed in over 2 years. The analysis further showed that most files were were very small, and over half the storage was consumed by just two users. Using this data, the organization replaced their enterprise storage system with less expensive object storage to better manage the many small, seldom accessed, files.[11]
References
- ↑ "SDSC Timeline" (PDF). Retrieved 25 January 2016.
- ↑ "MDAS - Massive Data Analysis System". Retrieved 25 January 2016.
- ↑ Baru, Chaitanya; Moore, Reagan; Rajasekar, Arcot; Wan, Michael. "The SDSC storage resource broker". CASCON First Decade High Impact Papers: 189–200. doi:10.1145/1925805.1925816. (Reprint from November 30 – December 3, 1998)
- ↑ "General Atomics Acquires Exclusive License from UCSD for Commercialization of Unique Data Management Software". Retrieved 25 January 2016.
- ↑ "IRODS (integrated Rule-Oriented Data System)". www.irods.org. Retrieved 2016-03-17.
- ↑ "General Atomics Wins $22.5 Million DoD Contract for Storage Lifecycle Management (SLM) across Six High Performance Computing Sites". Retrieved 25 January 2016.
- ↑ "infoGROUP® Architects Innovative Global Namespace with Nirvana® SRB® 2008". Retrieved 25 January 2016.
- ↑ "Nirvana SRB 2012 R3® Is Enhanced With Significant Caching Performance, Synchronization and Database Migration Improvements". Retrieved 25 January 2016.
- ↑ "ArcaStream and General Atomics Introduce World’s First Data-Aware Cloud Storage Gateway". Retrieved 25 January 2016.
- ↑ "ArcaStream and General Atomics Introduce World’s First Data-Aware Cloud Storage Gateway". Retrieved 25 January 2016.
- ↑ "Storage Data Analysis with Nirvana SRB Presented for 2014 IEEE MSST Conference Santa Clara, CA June 2-6 2014" (PDF).