Dependence Polyhedra

2022-03-28

Dependency Polyhedra

Ready for some math dump without explanation or context? All right, let's go!!

Any errors can be assumed to be due to a pedagogical focus rather than lack of rigor or a mistake on our part. But please file an issue or PR to let us know anyway.

We have this loop:

for i = 1:N, j = 1:i
    A[i,j] = foo(A[i,i])
end
1i0N1j0i01i1N1j1i1i0=i1j0=i1 1 \le i_0 \le N\\ 1 \le j_0 \le i_0\\ 1 \le i_1 \le N\\ 1 \le j_1 \le i_1\\ i_0 = i_1\\ j_0 = i_1

which simplifies to

i0N1j1j0i0j1i1i0=i1j1=i1 i_0 \le N\\ 1 \le j_1\\ j_0 \le i_0\\ j_1 \le i_1\\ i_0 = i_1\\ j_1 = i_1

Given two memory references, you get the polyhedra for the iteration spaces of both, which gives you a higher dimensional space, and then you have equalities to match the indices that lead to identical memory addresses in the accessed arrays. These can restrict the space.

It's straightforward/algorithmic to derive this sort of thing from a program (de-linearization of indices aside). However, when we have two accesses that are not both reads to the same memory address, we must also know which of the two came first. If we have two reads, this may still be of interest, as minimizing the distance between both reads can increase cache reuse, but switching their order does not change correctness/the program's results, so we don't add any constraints based on pairs of reads.

How do we figure out which one came first?

Well, sometimes neither is first, because their isn't actually a dependence at all. E.g.

for i = 1:N, j = 1:i-1
    A[i,j] = foo(A[j,i])
end

the associated dependence polyhedra is empty. Proof is left as an exercise for the reader. Note this is one of LoopModel's test cases, so if you get stuck, check out our C++ code as it's pristine readability will elucidate all.

But in non-empty cases, we use Farkas' lemma. There might be more straightforward approaches to getting the direction, but we'll need to apply it anyway, so making use of it for this proof adds little marginal computational cost.

So what is Farkas' lemma? Read the other blog posts.

So, to find out whether the load or the store came first, we use Farkas lemma to produce contradictory constraints, and then find out which one of these constraints is satisfied by the original schedule of the code as written.

By telling Farkas that 00 (i.e., the store) happens before 11 (i.e., the load), then Farkas gives us these two inequalities:

αi0αj0+αi1+αj10αi1αj0+αi10 -\alpha_{i_0} - \alpha_{j_0} + \alpha_{i_1} + \alpha_{j_1} \le 0\\ -\alpha_{i_1} - \alpha_{j_0} + \alpha_{i_1} \le 0\\

Then, telling Farkas the opposite, it instead returns these two inequalities:

αi0+αj0αi1αj10αi0+αj0αi10 \alpha_{i_0} + \alpha_{j_0} - \alpha_{i_1} - \alpha_{j_1} \le 0\\ \alpha_{i_0} + \alpha_{j_0} - \alpha_{i_1} \le 0

So, how do we check which is violated?

These are constraints on valid schedules. The schedule of the code as written is the identity plus some offsets that aren't important right now. Read the blog post about schedules that will be written later as my schedule allows for more information.

The important bit here is that we iterate over rows of this identity matrix (intermixed with those constants indicating loop orders of imperfect nests). For the first row, we have αi0=αi1=1\alpha_{i_0} = \alpha_{i_1} = 1 and αj0=αj1=0\alpha_{j_0} = \alpha_{j_1} = 0.

Plugging that in, we get for our first equation:

10+1+0=0010+1=00 -1 - 0 + 1 + 0 = 0 \le 0\\ -1 - 0 + 1 = 0 \le 0

and for our second

1+010=001+01=00 1 + 0 - 1 - 0 = 0 \le 0\\ 1 + 0 - 1 = 0 \le 0

Now, for the second row of the identity matrix:

01+0+1=0001+0=10 -0 -1 + 0 + 1 = 0 \le 0\\ -0 - 1 + 0 = -1 \le 0

and

0+101=000+10=10 0 + 1 - 0 - 1 = 0 \le 0\\ 0 + 1 - 0 = 1 \le 0

Meaning this last constraint is invalid, and we have thus proved that the load precedes the store.

Now, let's consider repeated memory accesses across time.

for i = 0:I-1, j = 0:J-1, k = 0:K-1
    A[i+j, j+k, i-k] = foo(A[i+j, j+k, i-k])
end

We can represent the mapping of loop induction variables to array memory accesses using linear algebra by specifying a matrix R\textbf{R} where

[110011][ijk]=[i+jj+k] \begin{bmatrix}1&1&0\\0&1&1\end{bmatrix} \begin{bmatrix}i\\j\\k\end{bmatrix}\\ = \begin{bmatrix} i+j\\j+k\end{bmatrix}

ModelingToolkit is build to help with exactly these sorts of problems (little known fact: it also happens to have a bit of differential equation handling code)

julia> using ModelingToolkit

julia> A = [1  1  0
            0  1  1
            1  0 -1];

julia> ModelingToolkit.nullspace(A)
3×1 Matrix{Int64}:
 -1
  1
 -1

so, [-i, j, -k] is a "time direction" representing repeated accesses to the same memory. Now, we wonder, when holding the memory address constant, i.e. holding both i+ji+j and j+kj+k constant, in what direction does i+jk-i + j -k change? We need to know that to place constraints indicating whether one time is before or after the other.

We can solve this via applying the same approach as above. We first get the direction of the dependencies when time is equal across both. Then, we pick one of the two directions, and see if the constraints assuming the reverse direction are satisfied. If they are, the direction was correct, else the time vector goes in the opposite direction.

Equality constraint simplification can remove any clear correspondence between constraints and time dimensions, thus results are not necessarilly easily interpretable. But the following list of citations suggests readers love lots of mathy output regardless of interpretability or context, so I'm presenting results anyway:

For the forward direction, the dependecy polyhedra: il2kl+is+2ks=0jlkl+js+ks=03kl+3ks=0 -i_l - 2k_l + i_s + 2k_s = 0 -j_l - k_l + j_s + k_s = 0 -3k_l + 3k_s = 0

Schedule Constraints:
-v_1 + v_4 <= 0
-v_2 + v_5 <= 0
-v_0 + v_3 <= 0
-v_6 == 0

Bounding Constraints:
-v_7 <= 0
-v_8 <= 0
-v_9 <= 0
-v_10 <= 0
v_2 - v_5 - v_10 <= 0
v_0 - v_3 - v_8 <= 0
v_1 - v_4 - v_9 <= 0
-v_6 == 0


Reverse:
Dependence Poly x -> y:
v_0 <=  ( M - 1 )
-v_0 <= 0
v_1 <=  ( N - 1 )
-v_1 <= 0
v_2 <=  ( O - 1 )
-v_2 <= 0
v_3 <=  ( M - 1 )
-v_3 <= 0
v_4 <=  ( N - 1 )
-v_4 <= 0
v_5 <=  ( O - 1 )
-v_5 <= 0
-v_0 - 2v_2 + v_3 + 2v_5 == -1
-v_1 - v_2 + v_4 + v_5 == 0
-3v_2 + 3v_5 == -1

Schedule Constraints:
v_2 - v_5 <= 0
v_0 - v_3 <= 0
v_1 - v_4 <= 0
-v_6 == 0

Bounding Constraints:
-v_7 <= 0
-v_8 <= 0
-v_9 <= 0
-v_10 <= 0
-v_0 + v_3 - v_8 <= 0
-v_1 + v_4 - v_9 <= 0
-v_2 + v_5 - v_10 <= 0
-v_6 == 0