Pivotal Greenplum Database
Original author(s) | Greenplum |
---|---|
Developer(s) | Pivotal Software |
Stable release | 4.3.5.2 / June, 2015 |
Operating system | Red Hat Enterprise Linux 64-bit 5.x and 6.x, SuSE Linux Enterprise Server 64-bit 10 SP4, 11 SP1, 11 SP2, Oracle Unbreakable Linux 64-bit 5.5, CentOS 64-bit 5.x, and 6.x[1] |
License | Apache |
Pivotal Greenplum Database is a database software developed by Pivotal Software, Inc. (Pivotal). It was originally developed by Greenplum which was acquired by EMC Corporation in July 2010[2] and spun out into Pivotal Software in 2013.[3]
System Overview
Pivotal Greenplum Database is a MPP (massively parallel processing) database [4] built on open source PostgreSQL.[5] The system consists of a master node, standby master node, and segment nodes. All of the data resides on the segment nodes and the catalog information is stored in the master nodes. Segment nodes run one or more segments, which are modified PostgreSQL database instances and are assigned a content identifier. For each table the data is divided among the segment nodes based on the distribution column keys specified by the user in the DDL statement. For each segment content identifier there is both a primary segment and mirror segment which are not running on the same physical host. When a SQL query enters the master node, it is parsed, optimized and dispatched to all of the segments to execute the query plan and either return the requested data or insert the result of the query into a database table.
Bulk loading and unload is also supported directly to the segment nodes, bypassing the master nodes and can read and write external data from ETL nodes, flat files, or HDFS file systems residing outside of the Greenplum cluster. Greenplum is known for fast parallel data loading/unloading as well as fast internal data transfer for operations such as CTAS (Create Table as Select).
Greenplum supports ACID principles of transaction management for concurrent data access and modification, allowing it to be a system of record database, but is optimized for analytical database workloads as opposed to OLTP workloads. SQL language, SQL:2003 standard, is the interface to the data in Greenplum. User defined functions can be written in languages such as Python, R, Perl, Java, C, or pgSQL and called from within a SQL query.
Open Source
In February 2015 Pivotal Software announced the intention to Open Source the Pivotal Greenplum Database, along with the other components in Pivotal's Big Data Suite by the end of 2015.[6] Upon open sourcing, under the PostgreSQL license, there will be an open source version and an enterprise distribution provided by Pivotal Software. The enterprise distribution will contain some features not in the open source version, targeted at large enterprise customers.
Competition
The primary competitors for Pivotal Greenplum Database, are the other MPP database systems provided by major industry vendors such as Teradata, Amazon Redshift and IBM Netezza. Additional competition comes from MammothDB[7] a challenger offering performance gains, column-oriented databases such as HP Vertica and data warehousing vendors with, non MPP architecture, such as Oracle Exadata, IBM DB2, and Hadoop distributions such as Cloudera and Hortonworks.
References
- ↑ "Supported Platforms". June 1, 2015.
- ↑ "Company Product Page". July 4, 2015.
- ↑ "EMC and VMware create Pivotal". March 13, 2013.
- ↑ "EMC To Acquire Greenplum". July 6, 2010.
- ↑ "Greenplum Updates Open-Source Based Database". February 22, 2008.
- ↑ "Pivotal Open Source Announcement". February 17, 2015.
- ↑ "MammothDB home page". December 23, 2015.