Common HPC Dependencies

BLT creates named targets for the common HPC dependencies that most HPC projects need, such as MPI, CUDA, HIP, and OpenMP. Something BLT assists it’s users with is getting these dependencies to interoperate within the same library or executable.

As previously mentioned in Adding Tests, BLT also provides bundled versions of GoogleTest, GoogleMock, GoogleBenchmark, and FRUIT. Not only are the source for these included, we provide named CMake targets for them as well.

BLT’s blt::mpi, blt::cuda, blt::cuda_runtime, blt::hip, blt::hip_runtime, and blt::openmp targets are all defined via the blt_import_library macro. This creates a true CMake imported target that is inherited properly through the CMake’s dependency graph.

Note

BLT also supports exporting its third-party targets via the BLT_EXPORT_THIRDPARTY option. See Exporting Targets for more information.

You have already seen one use of DEPENDS_ON for a BLT dependency, gtest, in test_1:

    blt_add_executable( NAME       test_1
                        SOURCES    test_1.cpp 
                        DEPENDS_ON calc_pi gtest)

MPI

Our next example, test_2, builds and tests the calc_pi_mpi library, which uses MPI to parallelize the calculation over the integration intervals.

To enable MPI, we set ENABLE_MPI, MPI_C_COMPILER, and MPI_CXX_COMPILER in our host config file. Here is a snippet with these settings for LLNL’s Lassen Cluster:

set(ENABLE_MPI ON CACHE BOOL "")

set(MPI_HOME             "/usr/tce/packages/mvapich2/mvapich2-2.3.6-${GCC_VERSION}" CACHE PATH "")

set(MPI_C_COMPILER       "${MPI_HOME}/bin/mpicc" CACHE PATH "")
set(MPI_CXX_COMPILER     "${MPI_HOME}/bin/mpicxx" CACHE PATH "")
set(MPI_Fortran_COMPILER "${MPI_HOME}/bin/mpif90" CACHE PATH "")

set(MPIEXEC              "/usr/bin/srun" CACHE PATH "")
set(MPIEXEC_NUMPROC_FLAG "-n" CACHE PATH "")

Here, you can see how calc_pi_mpi and test_2 use DEPENDS_ON:

        blt_add_library( NAME       calc_pi_mpi
                         HEADERS    calc_pi_mpi.hpp calc_pi_mpi_exports.h
                         SOURCES    calc_pi_mpi.cpp 
                         DEPENDS_ON blt::mpi)

        if(WIN32 AND BUILD_SHARED_LIBS)
            target_compile_definitions(calc_pi_mpi PUBLIC WIN32_SHARED_LIBS)
        endif()

        blt_add_executable( NAME       test_2
                            SOURCES    test_2.cpp 
                            DEPENDS_ON calc_pi calc_pi_mpi gtest)

For MPI unit tests, you also need to specify the number of MPI Tasks to launch. We use the NUM_MPI_TASKS argument to blt_add_test macro.

        blt_add_test( NAME          test_2 
                      COMMAND       test_2
                      NUM_MPI_TASKS 2) # number of mpi tasks to use

As mentioned in Adding Tests, GoogleTest provides a default main() driver that will execute all unit tests defined in the source. To test MPI code, we need to create a main that initializes and finalizes MPI in addition to Google Test. test_2.cpp provides an example driver for MPI with GoogleTest.

// main driver that allows using mpi w/ GoogleTest
int main(int argc, char * argv[])
{
    int result = 0;

    ::testing::InitGoogleTest(&argc, argv);

    MPI_Init(&argc, &argv);

    result = RUN_ALL_TESTS();

    MPI_Finalize();

    return result;
}

Note

While we have tried to ensure that BLT chooses the correct setup information for MPI, there are several niche cases where the default behavior is insufficient. We have provided several available override variables:

  • BLT_MPI_COMPILE_FLAGS

  • BLT_MPI_INCLUDES

  • BLT_MPI_LIBRARIES

  • BLT_MPI_LINK_FLAGS

BLT also has the variable ENABLE_FIND_MPI which turns off all CMake’s FindMPI logic and then uses the MPI wrapper directly when you provide them as the default compilers.

CUDA

Finally, test_3 builds and tests the calc_pi_cuda library, which uses CUDA to parallelize the calculation over the integration intervals.

To enable CUDA, we set ENABLE_CUDA, CMAKE_CUDA_COMPILER, CMAKE_CUDA_ARCHITECTURES, and CUDA_TOOLKIT_ROOT_DIR in our host config file. Also before enabling the CUDA language in CMake, you need to set CMAKE_CUDA_HOST_COMPILER in CMake 3.9+ or CUDA_HOST_COMPILER in previous versions. If you do not call enable_language(CUDA), BLT will set the appropriate host compiler variable for you and enable the CUDA language.

Note

The BLT_CXX_STD variable is useful to set the C++ and CUDA language standard to the same level. For example, c++17 will set a both to C++17.

Here is a snippet with these settings for LLNL’s Lassen Cluster:

set(ENABLE_CUDA ON CACHE BOOL "")

set(CUDA_TOOLKIT_ROOT_DIR "/usr/tce/packages/cuda/cuda-11.2.0" CACHE PATH "")
set(CMAKE_CUDA_COMPILER "${CUDA_TOOLKIT_ROOT_DIR}/bin/nvcc" CACHE PATH "")
set(CMAKE_CUDA_HOST_COMPILER "${CMAKE_CXX_COMPILER}" CACHE PATH "")

set(CMAKE_CUDA_ARCHITECTURES "70" CACHE STRING "")
set(CMAKE_CUDA_FLAGS "-restrict --expt-extended-lambda -G" CACHE STRING "")

set(CUDA_SEPARABLE_COMPILATION ON CACHE BOOL "" )

Here, you can see how calc_pi_cuda and test_3 use DEPENDS_ON:

        
        blt_add_library( NAME       calc_pi_cuda
                         HEADERS    calc_pi_cuda.hpp calc_pi_cuda_exports.h
                         SOURCES    calc_pi_cuda.cpp 
                         DEPENDS_ON blt::cuda)

        if(WIN32 AND BUILD_SHARED_LIBS)
            target_compile_definitions(calc_pi_cuda PUBLIC WIN32_SHARED_LIBS)
        endif()

        blt_add_executable( NAME       test_3
                            SOURCES    test_3.cpp 
                            DEPENDS_ON calc_pi calc_pi_cuda gtest)

        blt_add_test( NAME    test_3
                      COMMAND test_3)

The blt::cuda dependency for calc_pi_cuda is a little special, along with adding the normal CUDA library and headers to your library or executable, it also tells BLT that this target’s C/C++/CUDA source files need to be compiled via nvcc or cuda-clang. If this is not a requirement, you can use the dependency blt::cuda_runtime which also adds the CUDA runtime library and headers but will not compile each source file with nvcc.

Some other useful CUDA variables are:

set(ENABLE_CUDA ON CACHE BOOL "")

set(CUDA_TOOLKIT_ROOT_DIR "/usr/tce/packages/cuda/cuda-11.2.0" CACHE PATH "")
set(CMAKE_CUDA_COMPILER "${CUDA_TOOLKIT_ROOT_DIR}/bin/nvcc" CACHE PATH "")
set(CMAKE_CUDA_HOST_COMPILER "${CMAKE_CXX_COMPILER}" CACHE PATH "")

set(CMAKE_CUDA_ARCHITECTURES "70" CACHE STRING "")
set(CMAKE_CUDA_FLAGS "-restrict --expt-extended-lambda -G" CACHE STRING "")

set(CUDA_SEPARABLE_COMPILATION ON CACHE BOOL "" )

# nvcc does not like gtest's 'pthreads' flag
set(gtest_disable_pthreads ON CACHE BOOL "")

OpenMP

To enable OpenMP, set ENABLE_OPENMP in your host-config file or before loading SetupBLT.cmake. Once OpenMP is enabled, simply add blt::openmp to your library executable’s DEPENDS_ON list.

Here is an example of how to add an OpenMP enabled executable:

    blt_add_executable(NAME blt_openmp_smoke 
                       SOURCES blt_openmp_smoke.cpp 
                       OUTPUT_DIR ${TEST_OUTPUT_DIRECTORY}
                       DEPENDS_ON blt::openmp
                       FOLDER blt/tests )

Here is an example of how to add an OpenMP enabled test that sets the amount of threads used:

    blt_add_test(NAME            blt_openmp_smoke
                 COMMAND         blt_openmp_smoke
                 NUM_OMP_THREADS 4)

HIP

BLT’s AMD HIP support is very similar to it’s CUDA support with one caveat. Our HIP support was implemented before CMake had full HIP language support and therefore requires that the HIP compilers be set as the main compilers. This will change soon.

Important Setup Variables

  • ENABLE_HIP : Enables HIP support in BLT

  • HIP_ROOT_DIR : Root directory for HIP installation

  • CMAKE_HIP_ARCHITECTURES : GPU architecture to use when generating HIP/ROCm code

BLT Targets

  • blt::hip : Adds include directories, hip runtime libraries, and compiles source with hipcc

  • blt::hip_runtime : Adds include directories and hip runtime libraries

Note

The BLT_CXX_STD variable is useful to set the C++ and HIP language standard to the same level. For example, c++17 will set a both to C++17.

The following two code snippets show an example of a basic host-config with HIP enabled for the toss_4_x86_64_ib_cray platform:

set(_compiler_root "/opt/rocm-5.6.0/llvm")

set(CMAKE_C_COMPILER "${_compiler_root}/bin/amdclang" CACHE PATH "")

set(CMAKE_CXX_COMPILER "${_compiler_root}/bin/amdclang++" CACHE PATH "")

set(CMAKE_Fortran_COMPILER "${_compiler_root}/bin/amdflang" CACHE PATH "")

set(CMAKE_Fortran_FLAGS "-Mfreeform" CACHE STRING "")

set(ENABLE_FORTRAN ON CACHE BOOL "")
set(_rocm_root "/opt/rocm-5.6.0")

set(ENABLE_HIP ON CACHE BOOL "")

set(HIP_ROOT_DIR "${_rocm_root}/hip" CACHE STRING "")

set(CMAKE_HIP_ARCHITECTURES "gfx90a" CACHE STRING "")

set(CMAKE_EXE_LINKER_FLAGS "-Wl,--disable-new-dtags -L${_rocm_root}/hip/../llvm/lib -L${_rocm_root}/hip/lib -Wl,-rpath,${_rocm_root}/hip/../llvm/lib:${_rocm_root}/hip/lib -lpgmath -lflang -lflangrti -lompstub -lamdhip64  -L${_rocm_root}/hip/../lib64 -Wl,-rpath,${_rocm_root}/hip/../lib64  -L${_rocm_root}/hip/../lib -Wl,-rpath,${_rocm_root}/hip/../lib -lamd_comgr -lhsa-runtime64 " CACHE STRING "")

Here is an example of using the BLT HIP target to create an executable:

    blt_add_executable(NAME blt_hip_smoke
                       SOURCES blt_hip_smoke.cpp
                       OUTPUT_DIR ${TEST_OUTPUT_DIRECTORY}
                       DEPENDS_ON blt::hip
                       FOLDER blt/tests )