Consider the loop
for i = 0:I-1, j = 0:J-1
A[i+1,j+1] = A[i+1,j] + A[i,j+1]
This loop is not trivial to parallelize because it requires serial execution of both the
j loops. The store into
A[i+1,j+1] will be loaded on the next iteration of the
j loop by
A[i+1,j], and also by the next iteration of the
i loop by
A[i,j+1]. We define the store into
A[i+1,j+1] as the source, and the subsequent loads from this memory location as the targets which must hold the stored value.
If the relationships between sources and targets across loop iterations are cast as a dependency graph with coordinates corresponding to the iteration space indices, we can model and transform those relationships using dependence polyhedra, allowing application of optimization techniques from linear programming and operations research.
In general, we define polyhedra using a system of inequalities.
where inequalities involving vectors apply elementwise, bold capital letters indicate matrices, and the vector arrow is used to indicate column vectors.
For dependence polyhedra, we define one per dependence. The aim is to describe how both and relate in the source and target through a system of inequalities. We use the subscript indicate the source (parent) and subscript the target (child). With this notation, the store into and subsequent load from imposes two constraints: and . Encoding this information, as well as the loop bounds, yields the following system of inequalities, which we can interpret as a polyhedra that gives the space of all dependent iterations (note that we can disprove dependence by proving a depdendence polyehedra is empty, which we will demonstrate in a future blog post).
Pruning a few redundant bounds, we get
As seen in the orthogonalize_loops post, we like to define loop schedules as affine functions of the indices. We'll discuss valid schedules and their interpretation in a subsequent post, but for now, suffice it to say that we need to prove things about the differences in schedules at certain levels of the loop nest. The source iteration must occur before the target iteration, which means that we generally need to prove one level of the loop (i.e., one row of the scheduling matrix) either equals, or iterates, before another. Letting and be the affine schedules for level for the target and source respectively, such that
and all other schedules are defined similarly, with the superscript indicating transposition.
Then we may wish to constrain ourselves to solutions that satisfy
for some .
Is there some way we can find constraints on allowed values of and ? Enter Farkas' lemma!
According to the affine form of Farkas' lemma given in Uday's thesis, an affine function is non-negative everywhere in a polyhedron iff it is a non-negative linear combination of the faces:
That is, if some exists such that the equality holds, then is non-negative everywhere inside the polyhedron.
To apply that to our problem of dependence, we let
and then use Fourier-Motzkin elimination to remove the .
Returning to the first dependence of our motivating example, we have
We can now produce the following system of equations by matching induction variable coefficients:
and imposing the inequalities .
We can perform a couple rounds of Gaussian elimination to simplify this to:
before proceding with Fourier-Motzkin elimination to eliminate the Fourier-Motzkin multipliers (the s).
To first eliminate , we have the following inequalities including :
Similarly, eliminating , , , and , we have:
Finally, eliminating and :
We realize we can just drop them. Thus, our final equations are
Thus, the coefficient for the target must be at least as great as those from the source. Additionally, if the offsets are equal, then must exceed our desired .
Repeating this procedure for the other dependency would similarly yield
Note that to parallelize a hyperplane, we need for all unsatisfied dependencies to be 0. One approach is to let to satisfy both dependencies in the outer loop. Then we can freely parallelize the inner loop.
Reference: Effective Automatic Parallelization and Locality Optimization using the Polyehdral Model by Uday Bondhugula.