JSC Accelerating Devices Lab

Electrical EngineeringJekyll
Various notes from the Accelerating Devices Lab (X-Dev) of Jülich Supercomputing Centre
Home PageAtom Feed
language

MPI as API: Using UCC’s NCCL Backend for MPI’s Allreduce

Published

Environment Setup Enabling UCC in OpenMPI Enabling NCCL in UCC (Team Layer Selection) All The Variables Results 1. Plain OpenMPI 2. OpenMPI with UCC 3. OpenMPI with UCC+NCCL Scaling Plots Average Latency Bus Bandwidth Comparing MPI, UCC, UCC+NCCL Comparing UCC+NCCL, NCCL Summary Technical Details This post

ISC23 Project Poster: OpenGPT-X – Training Large Language Models on HPC Systems

Published

Poster publication: http://hdl.handle.net/2128/34532 The ISC High Performance Conference 2023 was held at Hamburg, Germany from 21st May to 25th May. At the conference, we presented a project poster on the OpenGPT-X project, outlining the progress and initial exploration results. The poster was even featured in HPCWire’s May 24 recap of ISC within the AI segment!

GPU Vendor/Programming Model Compatibility Table

Published

For a recent talk at DKRZ in the scope of the natESM project, I created a table summarizing the current state of using a certain programming model on a GPU of a certain vendor, for C++ and Fortran. Since it lead to quite a discussion in the session, I made a standalone version of it with some updates and elaborations here and there.

Talk: Introduction to HPC

Published

TL;DR: I held a HPC intro talk. Slides are below. In MAELSTROM, we connect three areas of science: 🌍Weather and climate simulation with 🤖Machine Learning methods and workflows using 📈HPC techniques and resources. Halfway into the project, we held a boot camp at JSC to teach this Venn diagram to a group of students a few days ago. Some were ML experts, but had never used a HPC system.

Poster: OpenGPT-X - Training Large Language Models on HPC Systems

Published

Poster publication: http://hdl.handle.net/2128/32006 The 14th JLESC workshop (JLESC: Joint Laboratory for Extreme-Scale Computing) was hosted by the National Center for Supercomputing Applications (NCSA) in Urbana, Illinois from 28th September to 30th September. We had the opportunity to present the OpenGPT-X project in form of a poster.

DOIng it Right! (DOIs for This Blog)

Published

1This blog is an experiment. We want to share bits and pieces of our work; the reports we write, the presentations we hold, or the little discoveries we make, or even some first, water-testing investigations; and all the rest. It’s a documentation of what we do. Little bits of science, collected in the open, and sometimes even not that little.

First Benchmarks with AMD Instinct MI250 GPUs at JSC

Published

A few months ago, we extended the JURECA Evaluation Platform1 at JSC by two nodes with AMD Instinct MI250 GPUs (four GPUs each). The nodes are Gigabyte G262-ZO0 servers, each with a dual socket AMD EPYC 7443 processor (24 cores per socket, SMT-2) and with four MI250 GPUs (128 GB memory). OSU Bandwidth Micro-Benchmark A100 Comparison GPU STREAM Variant Data Size Scan Threads and Data Sizes

OOPS Version 1 Release

Published

A few days ago, OPTIMA announced the release of deliverable 3.5, to which I contributed. This deliverable is part of a set of five deliverables under work package 3. But first, let’s talk about OPTIMA. OPTIMA is an EU-funded project whose goal is to prove that several HPC applications can take advantage of the future highly heterogeneous FPGA-populated HPC systems.