Distributed shared memory

"DGAS" redirects here. For the DGA awards, see Directors Guild of America Award.

In computer science, distributed shared memory (DSM) is a form of memory architecture where the (physically separate) memories can be addressed as one (logically shared) address space. Here, the term "shared" does not mean that there is a single centralized memory but "shared" essentially means that the address space is shared (same physical address on two processors refers to the same location in memory).^[1]^:201 Distributed global address space (DGAS), is a similar term for a wide class of software and hardware implementations, in which each node of a cluster has access to shared memory in addition to each node's non-shared private memory.

A distributed-memory system (often called a multicomputer) consist of multiple independent processing nodes with local memory modules which is connected by a general interconnection network. Software DSM systems can be implemented in an operating system, or as a programming library and can be thought of as extensions of the underlying virtual memory architecture. When implemented in the operating system, such systems are transparent to the developer; which means that the underlying distributed memory is completely hidden from the users. In contrast, software DSM systems implemented at the library or language level are not transparent and developers usually have to program differently. However, these systems offer a more portable approach to DSM system implementation. A distributed shared memory system implements the shared-memory model on a physically distributed memory system.

Software DSM implementation

There are three ways of implementing a software distributed shared memory:

page based approach using the system’s virtual memory;
shared variable approach using some routines to access shared variables;
object based approach ideally accessing shared data through object-oriented discipline.

Message passing vs. DSM

Message passing	Distributed shared memory
Variables have to be marshalled	Variables are shared directly
Cost of communication is obvious	Cost of communication is invisible
Processes are protected by having private address space	Processes could cause error by altering data
Processes should execute at the same time	Executing the processes may happen with non-overlapping lifetimes

Software DSM systems also have the flexibility to organize the shared memory region in different ways. The page based approach organizes shared memory into pages of fixed size. In contrast, the object based approach organizes the shared memory region as an abstract space for storing shareable objects of variable sizes. Another commonly seen implementation uses a tuple space, in which the unit of sharing is a tuple.

Shared memory architecture may involve separating memory into shared parts distributed amongst nodes and main memory; or distributing all memory between nodes. A coherence protocol, chosen in accordance with a consistency model, maintains memory coherence.

Abstract view of DSM

Advantages of DSM

System scalable
Hides the message passing
Can handle complex and large data bases without replication or sending the data to processes
DSM is usually cheaper than using multiprocessor system
No memory access bottleneck, as no single bus
DSM provides large virtual memory space
DSM programs portable as they use common DSM programming interface
Shields programmer from sending or receive primitives
DSM can (possibly) improve performance by speeding up data access.

Issues in implementing DSM software

Data is replicated or cached
Reduce delays
Semantics for concurrent access must be clearly specified
DSM is controlled by memory management software, operating system, language run-time system
Locating remote data
Granularity: how much data is transferred in a single operation?

Disadvantages of DSM

Could cause a performance penalty
Should provide for protection against simultaneous access to shared data such as lock
Performance of irregular problems could be difficult

Methods of achieving DSM

There are usually two methods of achieving distributed shared memory:

hardware, such as cache coherence circuits and network interfaces;
software.

Software

We can use this method in different ways such as modifying the operating system kernel.

Consistency models

Memory system tries to behave based on certain rules in the system, which is called system's consistency model.

Memory coherence

Memory coherence is one of the most important abilities into the system. It will make sure that a system executes memory operation correctly. Suppose we have n processes and Mi memory operations for each process i, and that all the operations are executed sequentially. We can conclude that (M1 + M2 + … + Mn)!/(M1! M2!… Mn!) are possible interleavings of the operations. The problem here is which of these interleavings are correct and which ones are wrong. The memory coherence defines which interleavings are permitted. Traditionally, read operation returns the value written by the most recent write operation which is ambiguous with replicas and concurrent accesses.

Strict consistency

In strict consistency each operation is stamped with a global wall-clock time.

Rules:

Each read gets the latest written value.
All operations at one CPU are executed in order of their time stamps.

Multiprocessors maintain cache coherence by broadcasting writes to all processors, which can then either update or invalidate their local caches.

Since software DSM could not implement the atomic broadcasts needed to preserve strict consistency, other consistency models are needed.

Sequential consistency

Sequential consistency is specified as follows: if all operations of the processors were executed in sequential order, the result of any execution would be the same and each processor's operation appears in this sequence in the local program order. Interleaving of the operations which are coming from different processors is possible, but processors should meet the same interleaving. Sequential consistency is slightly weaker model than strict consistency. The most important difference between them is the sequential consistency does not assume real-time.

Rules: there exists a total ordering of operation such that

Each machine’s own ops appear in order.
All machines see results according to total order. For example: reads meet the most recent writes.In other words, the instructions of all processes are interleaved in some sequential order.

For independent processes, sequential consistency presents no problem. For critical sections, there is a possibility of race conditions. Users who wish to enforce a certain order of execution could use synchronization mechanisms, exactly the same way in a shared memory processor.

Slow memory consistency

Slow memory is one of the weakest consistency model which still are used for interprocess communication. In this consistency model all the processors must agree on the order of observed writes to each location by a single processor. Moreover, local writes should be visible instantly.

Replication in DSM

Replication of shared data in general reduced network traffic, promotes increased parallelism, fewer page faults, and is more efficient than non-replicated implementations.

Main problem: preserving consistency when multiple copies are present.

Consistency in structured DSM

Object-based or structured DSM may use more efficient consistency because it is easier to specify what is going to be shared. Users can identify points in the program where the data is consistent. They only share designated objects or variables. If shared data accesses happen only inside critical sections, while a process enters into a critical section, the DSM only needs to ensure that variables are consistent.

Two consistency models

Release consistency: when a process exits a critical section, new values of the variables are propagated to all sites.
Entry consistency: when a process enters a critical section, it will automatically update the values of the shared variables.
View-based Consistency: it is a variant of Entry Consistency, except the shared variables of a critical section are automatically detected by the system. An implementation of view-based consistency is VODCA which has comparable performance to MPI on cluster computers.

Examples of such systems include:

References

↑ Patterson, David A.; Hennessy, John L. (2006). Computer Architecture: A Quantitative Approach (4th ed.). Burlington, Massachusetts: Morgan Kaufmann. ISBN 978-01-2370490-0.

External links

Distributed Shared Cache
Memory coherence in shared virtual memory systems by Kai Li, Paul Hudak published in ACM Transactions on Computer Systems, Volume 7 Issue 4, Nov. 1989

Parallel computing

General	Distributed computing Cloud computing High-performance computing

Levels	Bit Instruction Task Data Memory

Multithreading	Temporal Simultaneous Preemptive Cooperative

Theory	PRAM model Analysis of parallel algorithms Amdahl's law Gustafson's law Cost efficiency Karp–Flatt metric Slowdown Speedup

Elements	Process Thread Fiber Instruction window

Coordination	Multiprocessing Memory coherency Cache coherency Cache invalidation Barrier Synchronization Application checkpointing

Programming	Models Implicit parallelism Explicit parallelism Concurrency Non-blocking algorithm

Hardware	Flynn's taxonomy SISD SIMD MISD MIMD Pipelined processor Superscalar processor Vector processor Multiprocessor symmetric asymmetric Memory shared distributed distributed shared UMA NUMA COMA Massively parallel computer Computer cluster Grid computer

APIs	POSIX Threads OpenMP OpenCL OpenHMPP OpenACC MPI PVM UPC TBB Boost.Thread Global Arrays Ateji PX Charm++ Cilk Coarray Fortran CUDA Dryad C++ AMP PLINQ TPL

Problems	Embarrassingly parallel Software lockout Scalability Race condition Deadlock Livelock Starvation Deterministic algorithm Parallel slowdown

Category: parallel computing Media related to parallel computing at Wikimedia Commons

This article is issued from Wikipedia - version of the Tuesday, January 05, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.