External Dependencies

One key goal for BLT is to simplify the use of external dependencies when building your libraries and executables.

To accomplish this BLT provides a DEPENDS_ON option for the blt_add_library() and blt_add_executable() macros that supports both CMake targets and external dependencies registered using the blt_register_library() macro.

The blt_register_library() macro allows you to reuse all information needed for an external dependency under a single name. This includes any include directories, libraries, compile flags, link flags, defines, etc. You can also hide any warnings created by their headers by setting the TREAT_INCLUDES_AS_SYSTEM argument.

For example, to find and register the external dependency axom as a BLT registered library, you can simply use:

# FindAxom.cmake takes in AXOM_DIR, which is a installed Axom build and
# sets variables AXOM_INCLUDES, AXOM_LIBRARIES
include(FindAxom.cmake)
blt_register_library(NAME      axom
                     TREAT_INCLUDES_AS_SYSTEM ON
                     DEFINES   HAVE_AXOM=1
                     INCLUDES  ${AXOM_INCLUDES}
                     LIBRARIES ${AXOM_LIBRARIES})

Then axom is available to be used in the DEPENDS_ON list in the following blt_add_executable() or blt_add_library() calls.

This is especially helpful for external libraries that are not built with CMake and don’t provide CMake-friendly imported targets. Our ultimate goal is to use blt_register_library() to import all external dependencies as first-class imported CMake targets to take full advanced of CMake’s dependency lattice.

MPI, CUDA, and OpenMP are all registered via blt_register_library(). You can see how in blt/thirdparty_builtin/CMakelists.txt.

BLT also supports using blt_register_library() to provide additional options for existing CMake targets. The implementation doesn’t modify the properties of the existing targets, it just exposes these options via BLT’s support for DEPENDS_ON.

blt_register_library

A macro to register external libraries and dependencies with BLT. The named target can be added to the DEPENDS_ON argument of other BLT macros, like blt_add_library() and blt_add_executable().

You have already seen one use of DEPENDS_ON for a BLT registered dependency in test_1: gtest

blt_add_executable( NAME       test_1
                    SOURCES    test_1.cpp 
                    DEPENDS_ON calc_pi gtest)

gtest is the name for the Google Test dependency in BLT registered via blt_register_library(). Even though Google Test is built-in and uses CMake, blt_register_library() allows us to easily set defines needed by all dependent targets.

MPI Example

Our next example, test_2, builds and tests the calc_pi_mpi library, which uses MPI to parallelize the calculation over the integration intervals.

To enable MPI, we set ENABLE_MPI, MPI_C_COMPILER, and MPI_CXX_COMPILER in our host config file. Here is a snippet with these settings for LLNL’s Surface Cluster:

set(ENABLE_MPI ON CACHE BOOL "")

set(MPI_C_COMPILER "/usr/local/tools/mvapich2-gnu-2.0/bin/mpicc" CACHE PATH "")

set(MPI_CXX_COMPILER "/usr/local/tools/mvapich2-gnu-2.0/bin/mpicc" CACHE PATH "")

set(MPI_Fortran_COMPILER "/usr/local/tools/mvapich2-gnu-2.0/bin/mpif90" CACHE PATH "")

Here, you can see how calc_pi_mpi and test_2 use DEPENDS_ON:

    blt_add_library( NAME       calc_pi_mpi
                     HEADERS    calc_pi_mpi.hpp calc_pi_mpi_exports.h
                     SOURCES    calc_pi_mpi.cpp 
                     DEPENDS_ON mpi)

    if(WIN32 AND BUILD_SHARED_LIBS)
        target_compile_definitions(calc_pi_mpi PUBLIC WIN32_SHARED_LIBS)
    endif()

    blt_add_executable( NAME       test_2
                        SOURCES    test_2.cpp 
                        DEPENDS_ON calc_pi calc_pi_mpi gtest)

For MPI unit tests, you also need to specify the number of MPI Tasks to launch. We use the NUM_MPI_TASKS argument to blt_add_test() macro.

    blt_add_test( NAME          test_2 
                  COMMAND       test_2
                  NUM_MPI_TASKS 2) # number of mpi tasks to use

As mentioned in Unit Testing, google test provides a default main() driver that will execute all unit tests defined in the source. To test MPI code, we need to create a main that initializes and finalizes MPI in addition to Google Test. test_2.cpp provides an example driver for MPI with Google Test.

// main driver that allows using mpi w/ google test
int main(int argc, char * argv[])
{
    int result = 0;

    ::testing::InitGoogleTest(&argc, argv);

    MPI_Init(&argc, &argv);

    result = RUN_ALL_TESTS();

    MPI_Finalize();

    return result;
}

Note

While we have tried to ensure that BLT chooses the correct setup information for MPI, there are several niche cases where the default behavior is insufficient. We have provided several available override variables:

  • BLT_MPI_COMPILE_FLAGS
  • BLT_MPI_INCLUDES
  • BLT_MPI_LIBRARIES
  • BLT_MPI_LINK_FLAGS

BLT also has the variable ENABLE_FIND_MPI which turns off all CMake’s FindMPI logic and then uses the MPI wrapper directly when you provide them as the default compilers.

CUDA Example

Finally, test_3 builds and tests the calc_pi_cuda library, which uses CUDA to parallelize the calculation over the integration intervals.

To enable CUDA, we set ENABLE_CUDA, CMAKE_CUDA_COMPILER, and CUDA_TOOLKIT_ROOT_DIR in our host config file. Also before enabling the CUDA language in CMake, you need to set CMAKE_CUDA_HOST_COMPILER in CMake 3.9+ or CUDA_HOST_COMPILER in previous versions. If you do not call enable_language(CUDA), BLT will set the appropriate host compiler variable for you and enable the CUDA language.

Here is a snippet with these settings for LLNL’s Surface Cluster:

set(ENABLE_CUDA ON CACHE BOOL "")

set(CUDA_TOOLKIT_ROOT_DIR "/opt/cudatoolkit-8.0" CACHE PATH "")
set(CMAKE_CUDA_COMPILER "/opt/cudatoolkit-8.0/bin/nvcc" CACHE PATH "")
set(CMAKE_CUDA_HOST_COMPILER "${CMAKE_CXX_COMPILER}" CACHE PATH "")
set(CUDA_SEPARABLE_COMPILATION ON CACHE BOOL "")

Here, you can see how calc_pi_cuda and test_3 use DEPENDS_ON:

    # avoid warnings about sm_20 deprecated
    set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS};-arch=sm_30)
	
    blt_add_library( NAME       calc_pi_cuda
                     HEADERS    calc_pi_cuda.hpp calc_pi_cuda_exports.h
                     SOURCES    calc_pi_cuda.cpp 
                     DEPENDS_ON cuda)

    if(WIN32 AND BUILD_SHARED_LIBS)
        target_compile_definitions(calc_pi_cuda PUBLIC WIN32_SHARED_LIBS)
    endif()



    blt_add_executable( NAME       test_3
                        SOURCES    test_3.cpp 
                        DEPENDS_ON calc_pi calc_pi_cuda gtest cuda_runtime)

    blt_add_test( NAME    test_3
                  COMMAND test_3)

The cuda dependency for calc_pi_cuda is a little special, along with adding the normal CUDA library and headers to your library or executable, it also tells BLT that this target’s C/CXX/CUDA source files need to be compiled via nvcc or cuda-clang. If this is not a requirement, you can use the dependency cuda_runtime which also adds the CUDA runtime library and headers but will not compile each source file with nvcc.

Some other useful CUDA flags are:

# Enable separable compilation of all CUDA files for given target or all following targets
set(CUDA_SEPARABLE_COMPILIATION ON CACHE BOOL “”)
set(CUDA_ARCH “sm_60” CACHE STRING “”)
set(CMAKE_CUDA_FLAGS “-restrict –arch ${CUDA_ARCH} –std=c++11” CACHE STRING “”)
set(CMAKE_CUDA_LINK_FLAGS “-Xlinker –rpath –Xlinker /path/to/mpi” CACHE STRING “”)
# Needed when you have CUDA decorations exposed in libraries
set(CUDA_LINK_WITH_NVCC ON CACHE BOOL “”)

OpenMP

To enable OpenMP, set ENABLE_OPENMP in your host-config file or before loading SetupBLT.cmake. Once OpenMP is enabled, simply add openmp to your library executable’s DEPENDS_ON list.

Here is an example of how to add an OpenMP enabled executable:

    blt_add_executable(NAME blt_openmp_smoke 
                       SOURCES blt_openmp_smoke.cpp 
                       OUTPUT_DIR ${TEST_OUTPUT_DIRECTORY}
                       DEPENDS_ON openmp
                       FOLDER blt/tests )

Note

While we have tried to ensure that BLT chooses the correct compile and link flags for OpenMP, there are several niche cases where the default options are insufficient. For example, linking with NVCC requires to link in the OpenMP libraries directly instead of relying on the compile and link flags returned by CMake’s FindOpenMP package. An example of this is in host-configs/llnl/blueos_3_ppc64le_ib_p9/clang@upstream_link_with_nvcc.cmake. We provide two variables to override BLT’s OpenMP flag logic:

  • BLT_OPENMP_COMPILE_FLAGS
  • BLT_OPENMP_LINK_FLAGS

Here is an example of how to add an OpenMP enabled test that sets the amount of threads used:

    blt_add_test(NAME            blt_openmp_smoke
                 COMMAND         blt_openmp_smoke
                 NUM_OMP_THREADS 4)

Example Host-configs

Here are the full example host-config files that use gcc 4.9.3 for LLNL’s Surface, Ray and Quartz Clusters.

llnl-surface-chaos_5_x86_64_ib-gcc@4.9.3.cmake

llnl/blueos_3_ppc64le_ib_p9/clang@upstream_nvcc_xlf

llnl/toss_3_x86_64_ib/gcc@4.9.3.cmake

Note

Quartz does not have GPUs, so CUDA is not enabled in the Quartz host-config.

Here is a full example host-config file for an OSX laptop, using a set of dependencies built with spack.

darwin/elcapitan-x86_64/naples-clang@7.3.0.cmake

Building and testing on Surface

Here is how you can use the host-config file to configure a build of the calc_pi project with MPI and CUDA enabled on Surface:

# load new cmake b/c default on surface is too old
ml cmake/3.9.2
# create build dir
mkdir build
cd build
# configure using host-config
cmake -C ../../host-configs/other/llnl-surface-chaos_5_x86_64_ib-gcc@4.9.3.cmake  \
      -DBLT_SOURCE_DIR=../../../../blt  ..

After building (make), you can run make test on a batch node (where the GPUs reside) to run the unit tests that are using MPI and CUDA:

bash-4.1$ salloc -A <valid bank>
bash-4.1$ make
bash-4.1$ make test

Running tests...
Test project blt/docs/tutorial/calc_pi/build
    Start 1: test_1
1/8 Test #1: test_1 ...........................   Passed    0.01 sec
    Start 2: test_2
2/8 Test #2: test_2 ...........................   Passed    2.79 sec
    Start 3: test_3
3/8 Test #3: test_3 ...........................   Passed    0.54 sec
    Start 4: blt_gtest_smoke
4/8 Test #4: blt_gtest_smoke ..................   Passed    0.01 sec
    Start 5: blt_fruit_smoke
5/8 Test #5: blt_fruit_smoke ..................   Passed    0.01 sec
    Start 6: blt_mpi_smoke
6/8 Test #6: blt_mpi_smoke ....................   Passed    2.82 sec
    Start 7: blt_cuda_smoke
7/8 Test #7: blt_cuda_smoke ...................   Passed    0.48 sec
    Start 8: blt_cuda_runtime_smoke
8/8 Test #8: blt_cuda_runtime_smoke ...........   Passed    0.11 sec

100% tests passed, 0 tests failed out of 8

Total Test time (real) =   6.80 sec

Building and testing on Ray

Here is how you can use the host-config file to configure a build of the calc_pi project with MPI and CUDA enabled on the blue_os Ray cluster:

# load new cmake b/c default on ray is too old
ml cmake
# create build dir
mkdir build
cd build
# configure using host-config
cmake -C ../../host-configs/llnl/blueos_3_ppc64le_ib_p9/clang@upstream_nvcc_xlf.cmake \
      -DBLT_SOURCE_DIR=../../../../blt  ..

And here is how to build and test the code on Ray:

bash-4.2$ lalloc 1 -G <valid group>
bash-4.2$ make
bash-4.2$ make test

Running tests...
Test project projects/blt/docs/tutorial/calc_pi/build
    Start 1: test_1
1/7 Test #1: test_1 ...........................   Passed    0.01 sec
    Start 2: test_2
2/7 Test #2: test_2 ...........................   Passed    1.24 sec
    Start 3: test_3
3/7 Test #3: test_3 ...........................   Passed    0.17 sec
    Start 4: blt_gtest_smoke
4/7 Test #4: blt_gtest_smoke ..................   Passed    0.01 sec
    Start 5: blt_mpi_smoke
5/7 Test #5: blt_mpi_smoke ....................   Passed    0.82 sec
    Start 6: blt_cuda_smoke
6/7 Test #6: blt_cuda_smoke ...................   Passed    0.15 sec
    Start 7: blt_cuda_runtime_smoke
7/7 Test #7: blt_cuda_runtime_smoke ...........   Passed    0.04 sec

100% tests passed, 0 tests failed out of 7

Total Test time (real) =   2.47 sec