MPI as API: Using UCC’s NCCL Backend for MPI’s Allreduce

Environment Setup Enabling UCC in OpenMPI Enabling NCCL in UCC (Team Layer Selection) All The Variables Results 1. Plain OpenMPI 2. OpenMPI with UCC 3. OpenMPI with UCC+NCCL Scaling Plots Average Latency Bus Bandwidth Comparing MPI, UCC, UCC+NCCL Comparing UCC+NCCL, NCCL Summary Technical Details This post