2. Overview of Lift

2.1. Patterns

2.1.1. Algorithmic Patterns

Map\((a \rightarrow{} b) \rightarrow{} [a]_n \rightarrow{} [b]_n\)\(map\ f\ [x_1, \ldots{}, x_n]=[f\ x_1, \ldots{}, f\ x_n]\)

The map pattern

Zip\([a]_n \rightarrow{} [b]_n \rightarrow{} [(a, b)]_n\)\(zip\ [x_1, \ldots{}, x_n]\ [y_1, \ldots{}, y_n]=[(x_1, y_1), \ldots{}, (x_n, y_n)]\)

The zip pattern

Reduce\(((a, a) \rightarrow{} a) \rightarrow{} a \rightarrow{} [a]_n \rightarrow{} [a]_1\)\(reduce\ (+)\ 0\ [x_1, \ldots{}, x_n]=[0+x_1+\ldots{}+x_n]\)

The reduce pattern

ReduceSeq\(((a, b) \rightarrow{} a) \rightarrow{} a \rightarrow{} [b]_n \rightarrow{} [a]_1\)\(reduceSeq\ (+)\ 0\ [x_1, x_2, \ldots{}, x_n]=[(\ldots{}((0+x_1)+x_2)+\ldots{}+x_n)]\)

The reduceSeq pattern

Split\(n \rightarrow{} [a]_{m} \rightarrow{} [[a]_{n}]_{{}^{m}\!/\!_{n}}\)\(split\ n\ [x_1, \ldots{}, x_m]=[[x_1, \ldots{}, x_n], \ldots{}, [x_{m-n}, x_m]]\)

The split pattern

Join\([[a]_n]_m \rightarrow{} [a]_{n\times{}m}\)\(join\ [[x_1, \ldots{}, x_n], \ldots{}, [x_{(m\times{}n)-n}, x_{n\times{}m}]]=[x_1, \ldots{}, x_{n\times{}m}]\)

The join pattern

Iterate\(n \rightarrow{} (m \rightarrow{} [a]_{k\times{}m} \rightarrow{} [a]_m) \rightarrow{} [a]_{k^n\times{}l} \rightarrow{} [a]_l\)\(iterate\ n\ f\ xs=iterate\ (n-1)\ f\ (f\ xs)\\iterate\ 0\ f\ xs=xs\)

The iterate pattern

Reorder\(((Int, Type) \rightarrow{} Int) \rightarrow{} [a]_n \rightarrow{} [a]_n\)\(reorder\ idx\ xs=ys\)

The reorder pattern

Transpose\([[a]_m]_n \rightarrow{} [[a]_n]_m\)\(transpose\ [[x_{1,1}, \ldots{}, x_{1,m}], \ldots{}, [x_{n,1}, \ldots{}, x_{n,m}]]=[[x_{1,1}, \ldots{}, x_{n,1}], \ldots{}, [x_{1,m}, \ldots{}, x_{n,m}]]\)

The Transpose pattern. Added for convenience, can be implemented using Join, Reorder and Split.

2.1.2. OpenCL Patterns

MapGlb\((a \rightarrow{} b) \rightarrow{} [a]_n \rightarrow{} [b]_n\)\(mapGlb=map\)

The map pattern for mapping work onto global threads.

MapWrg\((a \rightarrow{} b) \rightarrow{} [a]_n \rightarrow{} [b]_n\)\(mapWrg=map\)

The map pattern for mapping work onto work-groups (groups of threads).

MapLcl\((a \rightarrow{} b) \rightarrow{} [a]_n \rightarrow{} [b]_n\)\(mapLcl=map\)

The map pattern for mapping work onto local threads.

MapSeq\((a \rightarrow{} b) \rightarrow{} [a]_n \rightarrow{} [b]_n\)\(mapSeq=map\)

The map pattern for mapping work sequentially.

PartRed\(((a,a) \rightarrow{} a) \rightarrow{} a \rightarrow{} m \rightarrow{} [a]_{m\times{}n} \rightarrow{} [a]_m\)

The PartRed pattern. Performs a partial reduction to size m.

toGlobal\((a \rightarrow{} b) \rightarrow{} (a \rightarrow{} b)\)

The toGlobal pattern

toLocal\((a \rightarrow{} b) \rightarrow{} (a \rightarrow{} b)\)

The toLocal pattern

toPrivate\((a \rightarrow{} b) \rightarrow{} (a \rightarrow{} b)\)

The toPrivate pattern

asVector\(m \rightarrow{} [a]_{m\times{}n} \rightarrow{} [\langle{}a\rangle{}_m]_n\)

The asVector pattern.

asScalar\([\langle{}a\rangle{}_m]_n \rightarrow{} [a]_{m\times{}n}\)

The asScalar pattern.

Vectorize\(m \rightarrow{} (a \rightarrow{} b) \rightarrow{} (\langle{}a\rangle{}_m \rightarrow{} \langle{}b\rangle{}_m)\)

The Vectorize pattern.

2.2. Rewrite Rules

2.2.1. Algorithmic Rewrite Rules

TODO: distinguish axioms from theorems

map(f) o map(g) == map(f o g)

reduceSeq(z, f) o mapSeq(g) == reduceSeq(z, (acc,x) -> f(acc, g(x)) )

reduce(z, f) == reduce(z, f) o partialReduce(z, f)

partialReduce(z, f) == join o map(partialReduce(z, f)) o split | if f is associative and z is the neutral element (i.e. f(z,x) = x)

New rules (possibly unimplemented/unverified)

reduce(z, map(f) o zip) == map( (e) -> (reduce(e._0, f) (e._1) )) o zip(z) o T

map(map(f) o T) o slide(step,size) == map(map(f) o slide(step,size)) o T

2.2.2. OpenCL Rewrite Rules

New rules (possibly unimplemented/unverified)

mapSeq(f) o slide(step,size) == slideSeqPlus(step,size,f)

2.3. Compilation Flow

Compiling a program from the Lift language to OpenCL has the following steps [1]:

  1. Type Analysis
  2. Memory Allocation
  3. Multi-Dimensional Array Accesses
  4. Barrier Elimination
  5. OpenCL Code Generation
[1]Lift: A Functional Data-Parallel IR for High-Performance GPU Code Generation: http://www.lift-project.org/papers/steuwer17LiftIR.pdf

2.4. Inferring OpenCL Thread Counts

To automatically determine how many threads to launch for a rewritten program, or as a convenience, the following simple algorithm is used.

The algorithm tries to eliminate as many loops resulting from the different Map patterns as possible. It’ll choose thread counts to match the data sizes being mapped over. So, e.g. a MapGlb over an array of length N would get N threads in that dimension (and the local size would be undefined). similarly the number of work-group for MapWrg and the number local threads for MapLcl are determined. If there are several MapLcl in the same dimension, the most common thread count will be chosen, e.g. if there are 3 MapLcl in dimension 0, mapping over arrays of lengths 32, 16 and 32, 32 will be chosen. If there is no MapGlb or MapWrg and MapLcl pair in some dimension, a thread count of 1 will be chosen no parallelism is present.