Pivotal Greenplum Database

Not to be confused with Greenplum, the company acquired by EMC Corporation.
Greenplum Database
Original author(s) Greenplum
Developer(s) Pivotal Software
Stable release 4.3.5.2 / June, 2015
Operating system Red Hat Enterprise Linux 64-bit 5.x and 6.x, SuSE Linux Enterprise Server 64-bit 10 SP4, 11 SP1, 11 SP2, Oracle Unbreakable Linux 64-bit 5.5, CentOS 64-bit 5.x, and 6.x[1]
License Apache

Pivotal Greenplum Database is a database software developed by Pivotal Software, Inc. (Pivotal). It was originally developed by Greenplum which was acquired by EMC Corporation in July 2010[2] and spun out into Pivotal Software in 2013.[3]

System Overview

Pivotal Greenplum Database is a MPP (massively parallel processing) database [4] built on open source PostgreSQL.[5] The system consists of a master node, standby master node, and segment nodes. All of the data resides on the segment nodes and the catalog information is stored in the master nodes. Segment nodes run one or more segments, which are modified PostgreSQL database instances and are assigned a content identifier. For each table the data is divided among the segment nodes based on the distribution column keys specified by the user in the DDL statement. For each segment content identifier there is both a primary segment and mirror segment which are not running on the same physical host. When a SQL query enters the master node, it is parsed, optimized and dispatched to all of the segments to execute the query plan and either return the requested data or insert the result of the query into a database table.

Bulk loading and unload is also supported directly to the segment nodes, bypassing the master nodes and can read and write external data from ETL nodes, flat files, or HDFS file systems residing outside of the Greenplum cluster. Greenplum is known for fast parallel data loading/unloading as well as fast internal data transfer for operations such as CTAS (Create Table as Select).

Greenplum supports ACID principles of transaction management for concurrent data access and modification, allowing it to be a system of record database, but is optimized for analytical database workloads as opposed to OLTP workloads. SQL language, SQL:2003 standard, is the interface to the data in Greenplum. User defined functions can be written in languages such as Python, R, Perl, Java, C, or pgSQL and called from within a SQL query.

Open Source

In February 2015 Pivotal Software announced the intention to Open Source the Pivotal Greenplum Database, along with the other components in Pivotal's Big Data Suite by the end of 2015.[6] Upon open sourcing, under the PostgreSQL license, there will be an open source version and an enterprise distribution provided by Pivotal Software. The enterprise distribution will contain some features not in the open source version, targeted at large enterprise customers.

Competition

The primary competitors for Pivotal Greenplum Database, are the other MPP database systems provided by major industry vendors such as Teradata, Amazon Redshift and IBM Netezza. Additional competition comes from MammothDB[7] a challenger offering performance gains, column-oriented databases such as HP Vertica and data warehousing vendors with, non MPP architecture, such as Oracle Exadata, IBM DB2, and Hadoop distributions such as Cloudera and Hortonworks.

References

This article is issued from Wikipedia - version of the Wednesday, April 13, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.