MemSQL

MemSQL
Developer(s) MemSQL Inc.
Stable release 5.0 / March 30, 2016 (2016-03-30)[1]
Written in C++
Operating system Linux
Type RDBMS
License Closed Source
Website www.memsql.com

MemSQL is a distributed, in-memory, SQL database management system.[2] It is a relational database management system (RDBMS) which complies with the properties of atomicity, consistency, isolation, durability (ACID). It converts Structured Query Language (SQL) into machine code, via automatic programming, termed code generation.[3] MemSQL has both a free Community Edition, and an Enterprise Edition available for trial and production use. It is being developed by MemSQL Inc., that was founded in 2011 and is a graduate of the Y Combinator startup program. MemSQL Inc., has raised more than $85 million to date from a variety of investors including Accel Partners, Khosla Ventures, Data Collective, First Round Capital, and IA Ventures. Early angel investors include include Paul Buchheit, Max Levchin, Aaron Levie, and Ashton Kutcher.[4] On April 23, 2013, MemSQL launched its first generally available version of the database to the public.[5]

Core technology

MemSQL combines lock-free data structures and a just-in-time compilation (JIT) to process highly volatile workloads. More specifically, MemSQL implements lock-free hash tables and lock-free skip lists in memory for fast random access to data. SQL queries sent to the MemSQL server are converted into byte code and compiled through LLVM into machine code.[6] Queries are then stripped of their parameters and the query template is stored as a shared object which is subsequently matched against incoming queries to the system. Executing pre-compiled query plans removes interpretation along hot code paths, providing highly efficient code paths that minimize the number of central processing unit (CPU) instructions required to process SQL statements.

MemSQL is wire-compatible with MySQL.[7] This means that applications can connect to MemSQL through MySQL clients and drivers, as well as standard Open Database Connectivity (ODBC) and Java Database Connectivity (JDBC) connectors.[8]

In addition to MySQL syntax and functionality, MemSQL can also store columns in JSON format, and supports Geospatial datatypes and operations.

Row and Column Table Formats

MemSQL can store database tables either as rowstores or columnstores. The format used is determined by the user at DDL time (i.e. when the table is created). Data for all rowstore tables is stored completely in-memory, with snapshots and transaction logs persisted to disk. Data for all columnstore tables is stored on-disk, using a rowstore-like structure to handle incoming inserts into the columnstore.

Rowstore and columnstore tables differ in more than just the storage medium used. Rowstores, as the name implies, store information in row format, which is the traditional data format used by RDBMS systems. Rowstores are optimized for singleton or small insert, update or delete queries and are most closely associated with OLTP (transactional) use cases. As an example, data from web clicks on a retail website is best stored in row format, since data is inserted as singleton inserts each corresponding to a web click. Columnstores store information in columnar format. Columnstores are optimized for complex select queries, typically associated with OLAP (analytics) use cases. As an example, a large clinical data set for data analysis is best stored in columnar format, since queries run against it will typically be ad-hoc queries where aggregates are computed over large numbers of similar data items.

Distributed architecture

A MemSQL database is a distributed database implemented with aggregators and leaf nodes.[3] MemSQL binaries used for aggregator and leaf nodes are exactly the same, with the only difference being the user identifying the node as an aggregator or leaf. An aggregator is responsible for receiving SQL queries, breaking them up across leaf nodes, and aggregating results back to the client. A leaf node stores MemSQL data and processes queries from the aggregator. All communication between aggregators and leaf nodes is done over the network through SQL syntax. MemSQL uses hash partitioning to distribute data uniformly across the number of leaf nodes.[9]

Durability

MemSQL durability is slightly different for its in-memory rowstore and an on-disk columnstore.

Durability for the in-memory rowstore is guaranteed by implementing a write-ahead log and snapshots, similar to checkpoints. With default settings, as soon as a transaction is acknowledged in memory, the database will asynchronously write the transaction to disk as fast as the disk allows.[10]

The on-disk columnstore is actually fronted by an in-memory rowstore-like structure (skiplist). This structure has the same durability guarantees as the MemSQL rowstore. Apart from that, the columnstore is durable since its all data is stored on disk.

Replication

A MemSQL cluster can be configured in "High Availability" mode, where every data partition is automatically created with master and slave versions on two separate leaf nodes. In High Availability mode, aggregators send transactions to the master partitions, which then send logs to the slave partitions. In the event of an unexpected master failure, the slave partitions take over as master partitions in a fully online operation.

MemSQL Ops

MemSQL ships with an installation, management and monitoring tool called MemSQL Ops. When installing MemSQL, Ops can be used to set up the distributed MemSQL database across machines, and provide metrics about the running system. MemSQL Ops has both a web user interface and command line interface.

Apache Spark Integration

Starting with MemSQL 4.1, launched in September 2015, MemSQL gives users the ability to install Apache Spark as part of the MemSQL cluster, and use Spark as an ETL tool to import data into MemSQL.[11] Apache Spark is installed and set up interactively using MemSQL Ops. Ops users can then define the extract, transform, and load phases of their data pipeline to import data into MemSQL. Management and monitoring of running data pipelines can be done within the Ops UI.

Version history

References

  1. http://docs.memsql.com/v5.0/docs/release-notes
  2. "MemSQL". Retrieved 2012-10-07.
  3. 1 2 http://highscalability.com/blog/2012/8/14/memsql-architecture-the-fast-mvcc-inmem-lockfree-codegen-and.html (article sponsored by MemSQL)
  4. Mishra, Pankaj (2014). "MemSQL Raises $35M Series B Led By Accel Partners" (published 2014-01-22). Retrieved 2014-01-22.
  5. Frenkiel, Eric (2013). "MemSQL ships 2.0. Scales in-memory database across hundreds of nodes, thousands of cores." (published 2013-04-23). Retrieved 2013-04-23.
  6. http://docs.memsql.com/docs/code-generation
  7. http://docs.memsql.com/docs/how-memsql-works
  8. http://www.i-programmer.info/news/84-database/4397-memsql-80000-queries-per-second.html
  9. http://www.dbms2.com/2012/06/18/introduction-to-memsql/
  10. http://developers.memsql.com/docs/1c/faq.html#durability
  11. http://www.reuters.com/article/idUSnMKWDzK5la+1e8+MKW20150924 (press release)
  12. http://vimeo.com/44087431
  13. http://www.dbms2.com/2013/04/23/memsql-scales-out/
  14. http://developers.memsql.com/docs/2.0/2.0releasenotes.html
  15. http://developers.memsql.com/docs/latest/release_notes/2.5releasenotes.html
  16. http://www.memsql.com/whats-new/
  17. http://developers.memsql.com/docs/latest/release_notes/3.1releasenotes.html
  18. http://developers.memsql.com/docs/latest/release_notes/3.2releasenotes.html
  19. http://docs.memsql.com/latest/release_notes/4.0releasenotes.html
  20. http://docs.memsql.com/latest/release_notes/4.1releasenotes/
  21. http://docs.memsql.com/v5.0/docs/release-notes

External links

This article is issued from Wikipedia - version of the Saturday, April 23, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.