3. Developing Lift

3.1. Testing

There exists a large set of tests. These are regularly run by a continous integration (CI) server as part of the software development process.

A few tests with long execution time are not run by default, but these tests are still run by the CI server for pull requests.

Every single commit into the master branch must pass all tests.

3.1.1. Running tests

The test suite can either be run from the command line or via the IDE.

  • To run the default set of tests from the command line execute the following command from the Lift root directory:

    > sbt test
    
  • To run the tests in the IntelliJ IDEA IDE, right click on the src/test folder and select Run 'All Tests'.

More verbose output can be enabled by setting the LIFT_VERBOSE environment variable.

NB: if you encounter a lift.arithmetic.NotEvaluableException the compiler will not print the stack trace which can be annoying for debugging. This is because this exception implements ControlThrowable (which itself implements NoStackTrace) for performance reasons. If you want to temporarily enable the stack trace, change ControlThrowable to Throwable in the definition of the exception and also change val to def in the companion object otherwise you will always get the same stack trace. Now you should have a more helpful exception.

3.1.2. Tests with long execution time

Some tests for the rewriting system can take several minutes to run and are therefore disabled by default. They can be included by setting the LIFT_LONG_TESTS environment variable and rerunning the tests using sbt test or the IDE. Tests are marked as a long running one, by calling the opencl.executor.LongTestEnabled() function in the test method or the entire class.

3.1.3. Ignoring tests

Tests related to issues which are not yet resolved are marked with the @Ignore annotation. This annotation should be used only as an exception.

3.1.4. Tests for particular architectures

Some tests use hardware features only available on specific architectures or trigger bugs on some architectures. The methods in org.junit.Assume are used to ignore these based on some condition.

For example:

  • The device might not be able to use double precision;
    See for example test opencl.generator.TestMisc.testDouble.
  • The test uses warp sizes specific to NVIDIA GPUs;
    See for example test opencl.generator.TestReduce.NVIDIA_C.
  • The compiler for some architectures generates wrong code;
    See for example test opencl.generator.TestReduce.NVIDIA_B.

3.2. Guidelines for Contributing

Although Lift is a primarily research focused project, treating it as the complex software project it is, and therefore maintaining a number of software development practices is in all our interests. With these practices, we aim to interrupt fast iteration of design and code as little as possible, while at the same time maintain sufficient stability (of APIs etc) to allow other users to develop features independently.

This section describes our core guidelines for working with this project, how new features and changes should be developed, and added to the core codebase.

3.2.1. Development workflow

This project operates along using fairly standard branch, pull request, and merge git methodology. In essence, all changes to the project should be made in branches of the project, which will be merged back into the master branch after acceptance via pull request. This process should be lightweight enough that creating new features requires as little effort as possible, but restrictive enough that large breaking changes are rare, and occur predictably enough that other users can comment on, and adapt to them.

As a general guideline we prefer short living branches which have a clear purpose and are merged back quickly into master. This style should be – whenever possible – favored over long living branches as explained below.

Basic workflow overview

The basic workflow for adding a single small feature (e.g. a standalone benchmark, without any compiler modifications) to the lift project is as follows:

  1. Create a feature branch from the master branch, i.e.:

    git checkout master && git checkout -b feature_branch
    
  2. Make changes within the feature branch.

  3. Make sure all tests pass locally, and on the continuous integration server.

  4. Open a pull request on bitbucket [1] , from the feature branch (feature_branch) to master.

  5. Wait for comments, and if positive, merge it in.

Bitbucket has a very nice tutorial [2] explaining the workflow in more detail.

Links

[1]https://www.atlassian.com/git/tutorials/making-a-pull-request
[2]https://www.atlassian.com/git/tutorials/comparing-workflows/feature-branch-workflow

Motivation for avoiding long living development branches

Unfortunately, it is rare that we have the luxury of producing one small standalone feature/piece of work, which can be easily described in a single branch, and developed separately. More often than not, we have a separate larger project (e.g. a paper that we’re writing) that has multiple separate interacting components that we wish to develop. In this case, although we might want to develop on a single branch, and push all the changes once we’re done, that can be frustrating for other users, as the branch may contain a large number of breaking changes that are difficult or time consuming to merge and fix.

We therefore advise that where possible, all changes should be broken out into separate feature branches as soon as possible, and pull requests to master submitted. For example, say I have a branch big_paper, which is up to date with master (i.e. is no commits behind master, and several ahead), and a feature that could be separately added to the master branch. In this instance, we would prefer that the changes that make up such a feature are added to a separate branch from big_paper, and a pull request submitted to master.

The alternative, submitting a large pull request at the end of a project, is unfavorable, as it produces a lot of work for other developers, and often involves lots of complex changes. In these instances, we would request that at the end of a large project, a developer splits any new additions into separate feature branches, which then all have independent pull requests opened. This creates more work on the behalf of the feature developer, yes, but allows maintainers to more selectively and carefully accept or delay features that might cause issues to the project.

Pull requests

All changes and additions to the Lift project must be done in the form of a pull request from a branch or fork. In addition, we are fairly opinionated with regards to what makes a “good” pull request, but also flexible enough to recognise that as this is a research project, a “good” pull request won’t always be possible to produce.

As a general guideline, we don’t want pull requests that are “too big”. What this means, exactly, is fairly hazy, but in general a pull request should:

  1. Add a single feature, or change, to the project (features include the associated tests and minor changes that they might require). This could be a change in how we generate code for a parallel pattern, the addition of a new pattern, a benchmark (without any new changes to the compiler - they should have been separate pull requests previously), or a new pattern.
  2. Be small enough, or contain little enough code, that a reviewer can read and understand it within an hour. This is an arbitrary deadline, that shouldn’t be taken seriously, but which sets the tone for the amount of code that is “too much”.

Of course, in cases where a number of features, or changes, are highly coupled, then it is inevitable that they will all be submitted together. In this case, simply be prepared for the pull request to take longer to be approved.

Breaking changes

In some cases, a pull request will contain code that fundamentally changes API, or some other structure of the code that may potentially impact the work of other developers. In this case, our aim is to minimise this impact, by notifying other developers, and delaying the merge until we are sure that it is necessary. As a developer you can also help by reducing the size of the pull request as much as possible to mitigate any potential breakages. Of course, this is not always possible and we do recognise the importance of breaking code to move forward.

3.3. How to …

3.3.1. Read me first

These guides try to clarify how and where things are done in Lift, what files you have to modify if you want to extend the language, what class needs to be extended, etc.

At each step during this process, you should look at the existing stuff, find some code which looks close to what you want to implement and take this as a starting point.

Good luck ;)

3.3.2. … add a new OpenCL pattern

Define an AST Node

Each pattern is a case class extending ir.ast.Pattern. It has to be defined in the opencl.ir.pattern package. Take a look at the FPattern and isGenerable traits because you may want to implement them.

When extending Pattern you should also provide the number of arguments your pattern expects.

The FPattern trait is used when pattern matching to catch all patterns that contain a Lambda inside them.

After defining you pattern, the first thing you have to do is override the checkType method which:

  1. Checks that type of your pattern’s arguments match its expectations.
  2. Checks the types of any nested functions based on the pattern’s input type.
  3. Returns the type of your pattern’s output, determined by the types its inputs or the nested functions.

The MapSeq pattern can be implemented like this:

/**
 * Sequential map
 *
 * @param f a function to be applied to every element of the input array
 */
case class MapSeq(f: Lambda1) extends Pattern(arity=1)
                              with FPattern
                              with isGenerable {

  override def checkType(argType: Type, ...): Type =
    argType match {
      case ArrayType(ty, len) => {
        // f has type: `argType -> a`
        f.params.head.t = argType
        // map(f) has type `[argType],,n,, -> [a],,n,,`
        ArrayType(TypeChecker.check(f), len)
      }
      case _ => // throw an exception
    }
}

Visiting the Pattern

Some operations need to “visit” all nodes of an expression (which may contain your pattern). You should take a look at the visit* and replace methods in ir/ast/Expr.scala to ensure you pattern is visited correctly.

Note: if you implement FPattern, it may work out of the box. See how ReduceWhileSeq is handled as a typical example if you have to do extra work.

Code Generation

Then you have to implement the code generation for your pattern. Look at the OpenCLGenerator.generate method in the opencl.generator package. It takes a type-checked lambda (e.g. an expression that can contain an instance of your pattern) and returns a string (the OpenCL code).

Here are the different steps of this process.

OpenCL Address Spaces

OpenCL has a hierarchy of different address spaces: global memory, local memory and private memory. You need to specify where to store the output of your pattern. If you extend FPattern, it might work out of the box without changing anything. But you should take a look at InferOpenCLAddressSpace.setAddressSpaceFunCall, see how the other patterns are handled and probably add a case for your pattern.

Ranges and Counts

If your pattern generates a for loop in the OpenCL code, it should contain a “loop variable” in its definition. The loop variable determines where the pattern accesses its input array in every iteration of the generated loop.

Should your pattern contain a “loop variable”, you should add a case in the RangesAndCounts.apply method. See how MapSeq is handled for example.

Memory Allocation

Only user functions allocate memory. You have to tell the nested user functions how much memory they will need if they are nested in your pattern.

It is done in OpenCLMemoryAllocator.alloc. More specifically, you have to add a case in allocFunCall.

All these functions take an argument inMem which is the memory, OpenCL buffer(s), that the first user function in the pattern will read from.

Also, some other classes are called during this process. You may need to edit OpenCLMemory as well.

Unrolling loops

ShouldUnroll isn’t really an optimisation at this point. We have to unroll loops when we use private memory as we flatten all private memory arrays into variables and can’t therefore index into them using variables. We represent the private arrays as variables to try and force the compiler to store the arrays in registers instead of spilling into global memory. That means instead of a definition int a[2]; we have int a_0; int a_1; int a_2; and for reading/writing a[i] when i is 1 we need to emit a_1 instead.

Barrier elimination

BarrierElimination is an optimisation consisting in removing unnecessary barriers in the OpenCL generated code. Don’t look at that in the first place.

Debugging

The if (Verbose()) { ... } part of the generate function prints some helpful information, like types, that is displayed if you set the LIFT_VERBOSE environment variable. It will work out of the box if you have correctly extended Expr’s methods (see “Visiting your pattern above)

The Views

The views are used to determine the locations in arrays where user functions in any expression read from and write to.

Add support for your pattern in InputView.buildViewFunCall and OutputView.buildViewFunCall. This means “explaining” how your pattern modifies the reading/writing locations of nested functions.

Some Plumbing

The definition of tuple types and user functions should work out of the box.

Generating the Kernel

Once all of the above passes have been implemented, you are able to generate OpenCLAST nodes. This is done in generateKernel but you probably do not have to edit this function and should directly look at the private generate method of OpenCLGenerator. Add a case for your pattern.

It is probably a good idea to take a look at the classes defined in OpenCLAST.scala and at the utility functions like generateForLoop defined at the end of OpenCLGenerator.scala.

Testing

You have to check that your pattern works as expected. For that add a test class in the test folder in the opencl.generator package with some tests.

For example, for MapSeq, you could have:

object TestMapSeq {
  @BeforeClass def before(): Unit = {
    Executor.loadLibrary()
    println("Initialize the executor")
    Executor.init()
  }

  @AfterClass def after(): Unit = {
    println("Shutdown the executor")
    Executor.shutdown()
  }
}

class TestMapSeq {
  @Test def simpleMap(): Unit = {
    val size = 1024
    val input = Array.fill(size)(util.Random.nextInt)
    val N = SizeVar("N")

    val add2 = UserFun("add2", "x", "return x+2;", Int, Int)

    val kernel = fun(
      ArrayType(Int, N),
      array => MapSeq(add2) $ array
    )

    val (output: Array[Int], _) = Execute(size)(kernel, input)

    assertArrayEquals(input.map(_ + 2), output)
  }
}

Useful tips

  • Use the debugger to compare what you have in your pattern and in an already existing one at different points in the compilation process.
  • Look at the generated OpenCL code. To see it, enable the verbose output by setting the LIFT_VERBOSE environment variable to 1.
  • Try to have something that compiles as soon as possible even if works only in some specific situations. It is easier to start from a simpler version of your pattern and then extend it.

3.3.3. … switch branch

To switch/create/delete branches, you can either use IntelliJ or the command line using the following command:

  • Create a new branch at the current commit (and switch to it):

    > git checkout -b BRANCHNAME
    
  • Create a new branch configured to follow a remote branch:

    > git checkout -b BRANCHNAME origin/REMOTE_BRANCHNAME
    
  • Switch to an existing branch:

    > git checkout BRANCHNAME
    

    NB: If BRANCHNAME does not exist locally, git will try the following:

    git checkout -b BRANCHNAME origin/BRANCHNAME
    
  • Delete a branch:

    git branch -d BRANCHNAME
    

NB: you can only create and delete local branches this way.

Try to always stay up-to-date with bitbucket using the fetch command before any switching to get the remote information about the repository. This can be done directly in IntelliJ or via the command line:

> git fetch

When switching from one branch to another, it is important to ensure that the ArithExpr submodule is being updated. Since IntelliJ does not support well submodules, you should type on the command line:

> git submodule update

After having switched branches. Note that:

> ./updateSubmodules.sh

Will do the same but will show more detailed information.