Data Organization for Low Power
Introduction
Memory power consumption has become one of the major concerns for embedded systems. The steady shrinking of the integration scale and the large number of devices packed in a single chip - coupled with high operating frequencies - has led to unacceptable levels of power dissipation. Data Organization is an effective way to cope with the requirement of lowering power consumption to uphold the increasing demands of applications.[1]
Motivation
Power optimization in high memory density electronic systems has become one of the major challenges for device such as mobile, embedded systems, and wireless devices. As the number of cores on a single chip is growing the power consumption by the devices also increases. Studies on power consumption distribution in smartphones and data-centers have shown that the memory sub-system consumes around 40% of the total power. In server systems, the study reveals that the memory consumes around 1.5 times the core power consumption.[2]
Memory data organization of low energy address bus
System level buses such as off-chip buses or long on-chip buses between IP block are often major sources of energy consumption due to their large load capacitance. Experimental results have shown that the bus activity for memory access can be reduced to 50% by organizing the data. Consider the case of compiling the code written in C language :
int A[4][4],B[4][4]; for(i=0;i<4;i++){ for(j=0;j<4;j++){ B[i][j]=A[j][i]; } }
Most of the existing C compiler places a multidimensional array in row-major form that is row by row, this is shown in unoptimized column. As a result, no memory access while running this code has sequential memory access because elements in columns are accessed sequentially. But if we change the structure of form in which they are placed in memory so that there is maximum number of sequential access from memory. This can be achieved if we place data in the order as shown in optimized column of below table. Such redistribution of data by compiler can reduce energy consumption due to memory access by significant amount.[3]
unoptimized | optimized |
---|---|
A[0][0] | A[0][0] |
A[0][1] | B[0][0] |
A[0][2] | A[1][0] |
A[0][3] | B[0][1] |
A[0][0] | A[2][0] |
A[1][0] | B[0][2] |
A[1][1] | A[3][0] |
. | B[0][3] |
. | A[0][1] |
B[0][0] | B[1][0] |
B[0][1] | A[1][1] |
B[0][2] | B[1][1] |
B[0][3] | . |
B[1][0] | . |
. | . |
. | A[3][3] |
B[3][3] | B[3][3] |
Data Structure Transformations
This method involves source code transformations that either modifies the data structure included in the source code or introduces new data structures or, possibly, modifies the access mode and the access paths. It is an effective way to cope with the requirement of lowering power consumption to uphold the increasing demands of applications. These are the following techniques that can be considered to perform such transformation.
- Arrays Declaration Sorting: The basic idea is to modify the local array declaration ordering, so that the arrays more frequently accessed are placed on top of the stack in such a way, the memory locations frequently used are accessed by exploiting direct access mode. The application of this transformation requires either a static estimation or a dynamic analysis of the local arrays access frequency, to achieve this, the array declarations are reorganized to place first the more frequently accessed arrays.
- Array Scope Modification(local to global): In any computation program local variable are stores in stack of a program and global variable are stored in data memory. This method involves converting local arrays into global arrays so that they are stored in data memory instead of stack. The Location of global array can be determined at compile time but local array location can only be determined when the subprogram is called and it depends on the stack pointer value. As a consequence, the global arrays are accessed with offset addressing mode with constant 0 while local arrays, excluding the first, are accessed with constant offset different from 0, thus an energy reduction is thus achieved.
- Array Re-sizing (temporary array insertion): In this method element that are accessed more frequently are identified via profiling or static considerations. A copy of these elements is then stored in a temporary array which can be accessed without any data cache miss. A significant system energy reduction mainly related to the data cache misses improvement. Such transformation can also cause significant performance reduction also.[1]
Using Scratchpad Memory
On-chip caches uses static RAM that consume power in the range of 25% to 50% of the total chip power and occupies about 50% of the total chip area. Scratchpad memory occupies lesser area than on-chip caches. This will typically reduce the energy consumption of the memory unit, because less area implies reduction in the total switched capacitance. Current embedded processors particularly in the area of multimedia applications and graphic controllers have on-chip scratch pad memories. In cache memory systems, the mapping of program elements is done during run time, whereas in scratch pad memory systems this is done either by the user or automatically by the compiler using suitable algorithm.[4]
See also
References
- 1 2 "THE IMPACT OF SOURCE CODE TRANSFORMATIONS ON SOFTWARE POWER AND ENERGY CONSUMPTION". CiteSeerX: 10
.1 ..1 .97 .6254 - ↑ Panda, P.R.; Patel, V.; Shah, P.; Sharma, N.; Srinivasan, V.; Sarma, D. (3–7 January 2015). Power Optimization Techniques for DDR3 SDRAM. 28th International Conference on VLSI Design (VLSID), 2015. IEEE. pp. 310–315. doi:10.1109/VLSID.2015.59.
- ↑ Power Optimization Techniques for DDR3 SDRAMhttp://www.ics.uci.edu/~dutt/pubs/j41-hiroyuki-ieice-e87-c4.pdf
- ↑ "Scratchpad Memory : A Design Alternative for Cache On-chip memory in Embedded Systems" (PDF).