15.4 Memory Locality
The key to understanding efficiency of matrix and vector representations is memory locality and reference passing versus copying.
CPUs on computers bring in memory in blocks through layers of caches. Fetching from memory is much slower than performing arithmetic operations. The only way to make container operations fast is to respect memory locality and access elements that are close together in memory sequentially in the program.
Matrices are stored internally in column-major order. That is, an \(M \times N\) matrix stores its elements in the order \[ (1,1), (2, 1), \ldots, (M, 1), (1, 2), \ldots, (M, 2), \ldots, (1, N), \ldots, (M, N). \]
This means that it’s much more efficient to write loops over matrices column by column, as in the following example.
matrix[M, N] a; ... for (n in 1:N) for (m in 1:M) ... do something with a[m, n] ...
It also follows that pulling a row out of a matrix is not memory local, as it has to stride over the whole sequence of values. It also requires a copy operation into a new data structure as it is not stored internally as a unit in a matrix. For sequential access to row vectors in a matrix, it is much better to use an array of row vectors, as in the following example.
row_vector[N] a[M]; ... for (m in 1:M) ... do something with row vector a[m] ...
Even if what is done involves a function call, the row vector
a[m] will not have to be copied.
Arrays are stored internally following their data structure. That means a two dimensional array is stored in row-major order. Thus it is efficient to pull out a “row” of a two-dimensional array.
real a[M, N]; ... for (m in 1:M) ... do something with a[m] ...
A difference with matrices is that the entries
a[m] in the two
dimensional array are not necessarily adjacent in memory, so there are
no guarantees on iterating over all the elements in a two-dimensional
array will provide memory locality across the “rows.”