(group picture from 5th EasyBuild User Meeting in Barcelona (Jan'20)
EasyBuild is a software build and installation framework that allows you to manage (scientific) software on High Performance Computing (HPC) systems in an efficient way.
The EasyBuild User Meeting is an open and highly interactive event that provides a great opportunity to meet fellow EasyBuild enthusiasts, discuss related topics, and learn about new aspects of the tool. It is intended for people that are already familiar with EasyBuild, ranging from occasional users to EasyBuild core developers and experts, and topics will be less introductory in nature than during EasyBuild hackathons/workshops/tutorials that have been organized in the past. The program includes presentations by both EasyBuild users and developers, as well as talks about open source projects relevant to the EasyBuild community.
The 7th EasyBuild User Meeting will be held online, during the week of Mon-Fri 24-28 Jan 2022.
Attendance is free of cost. This is an open meeting, anybody interested is welcome to join.
All presentations will be given live at the time listed in the program via Zoom sessions, and live streaming will be available via the EasyBuild YouTube channel. We intend to record all sessions, and will make the recordings available shortly after the live presentations.
Attendees will be able to join Zoom sessions for interactive discussions with the speakers.
Note that only registered attendees will have access to the Zoom sessions!
Next to raising questions or comments in the Zoom session, you can also submit questions via the #eum channel in the EasyBuild Slack. Comments in YouTube will be disabled for the live streaming events.
If you are not logged in to the EasyBuild Slack yet, you can request an invitation to join via https://easybuild.io/join-slack.
The 7th EasyBuild User Meeting consists of several sessions spread over the week of 24-28 Jan 2022. Please note that all times are in Universal Coordinated Time (UTC)!
We intentionally left ample time in between talks to allow for Q&A, interactive discussions, switching between speakers and breaks.
Quick overview of agenda + practical guidelines for attending EUM'22.
LUMI (https://lumi-supercomputer.eu) is one of the EuroHPC JU pre-exascale systems. The supercomputer is hosted by a consortium of ten European countries, led by Finland, and is in CSC's data centre in Kajaani, Finland. LUMI is an HPE-Cray EX system fully based on AMD processor technology. Most of the compute power comes from 2,560 compute nodes with four AMD MI250X GPUs, but the cluster also contains over 1,500 regular compute nodes, a partition for data-analysis and visualization and a small cloud partition for containerized services. Storage consists of 5 Lustre file systems (one flash-based and four with conventional hard disks) and an object storage system. The cluster should have been fully built up by the end of 2021, but due to the global component shortage the hardware installation is now expected to be finished by the end of April 2022.
The main programming environment on LUMI is the HPE-Cray Programming Environment with GNU, Cray and AMD compilers. We are building our software stack on top of this using Lmod as the module environment and EasyBuild as the primary tool for software installation. A key element of LUMI is that it is run by a small central team as it is really meant to be a joint effort with local support teams in the consortium countries. Hence the software stack and EasyBuild setup is designed to empower the users and the local support teams and to make it easy to build on top of the centrally installed software stack in a way that is as transparent as possible to the user.
In this presentation we will give a short introduction to LUMI and to the way the support organization is set up as LUMI is an interesting experiment to jointly operate a supercomputer service with ten countries. Next we will tell how we got where we are today with EasyBuild on LUMI, which is still an unfinished story as we are awaiting the finalization of the HPE-Cray programming environment for AMD GPUs before we can fully assess what further developments we need to do into EasyBuild to also install GPU software.
Let's look back at what was changed in EasyBuild in the last year, how we are doing right now, what we are currently working on, which challenges are ahead, and the enhancements and changes in EasyBuild we envision for the future.
In addition, the highlights of the last EasyBuild User Survey will be covered in this talk.
[Tue 25 Jan 2022 - 15:00 UTC]
EasyBuild site presentations (recording @ YouTube)
The CernVM-FileSystem (CVMFS) is ideal for transporting Singularity/ Apptainer "sandbox" style (unpacked) containers. It moves metadata operations to the client in the same performance-enhancing way as monolithic SIF containers, while providing automatic updates and enabling Singularity/Apptainer to run entirely unprivileged.
CERN and OSG each host shared CVMFS repositories containing automatically downloaded containers, which is especially efficient for storage and network because CVMFS deduplicates repeated files. A recent CVMFS feature enables publication of layered docker containers to also be efficient, where the bulk of files shared in a large project's software base can avoid the overhead of reprocessing.
I will discuss my plans for incorporating that new feature in my cvmfs-user-pub package which is designed to quickly make end user software available worldwide to distributed batch processing. I will also discuss the challenges of using CVMFS on HPC systems and the applicability of using my cvmfsexec package to run CVMFS entirely as an end user without assistance from system administrators.
Singularity, an open-source containerization platform built for high performance computing use cases and utilized by HPC sites all over the world, was recently moved into the Linux Foundation and renamed to "Apptainer" (https://apptainer.org).
This presentation will focus on exploring what this change means for Apptainer, including what the current state of the project is, what the priorities for the project in the near-term are, and what the roadmap for the future of the project looks like.
Recent updates to Spack include major rework of its dependency solving algorithm, developer environments, and improvements to binary packaging. With the recent addition of --reuse, binary caches are far simpler to configure and use, and Spack's oldest issue - too many rebuilds - is resolved.
This talk will cover recent developments and two major planned features for 2022 - compilers as dependencies and a public binary cache. We'll talk about how we've built out Spack's package build CI to handle every pull request, how we manage security, and how we expect to start providing binaries for core packages and compilers.
Spack and EasyBuild promise to make software installation more accessible, if not easy. This seems perfect to set up the dependencies for our own research software development. The question is: how easy is it to get started with these systems?
I look at Spack and EasyBuild as package managers without any previous knowledge of either of the tools. Our goal is to install the dependencies and toolchain for my C++ library MetaCG: GCC 9, Clang and LLVM 10, Extra-P, cxxopts, nlohmann json, and several Python packages.
Finally, creating a package file and an easy config for MetaCG is the ultimate achievement. The talk will reflect on what I enjoyed about the package managers, what obstacles I ran into, and which issues I encountered in my own software.
Kubernetes' agility, versatility, and resource scaling make it a great choice for shared data science platforms in many organizations. However, data scientists often need to work with lots of different libraries, languages, and applications, often with multiple versions. Conventional approaches, with a legion of tailored images or a huge 20GB golden container image, don't match the reality of production.
In this presentation, we'll see how you can leverage EasyBuild with Open Data Hub, Red Hat's open source data science project, to solve the challenges of synchronously managing multiple containers of different types, making scientific libraries, languages and packages dynamically available in a simple way.
EasyBuild greatly simplifies the installation of several MPI implementations with a single command. However, the MPI stack is tightly bound to the network hardware and directly interacts with kernel drivers and system libraries. Projects such as OpenUCX and OpenFabrics aim at providing a unified driver to offload the network transport from the MPI implementation. These higher-level drivers bring enhanced flexibility at runtime to MPI applications, as they can switch the network interface on-the-fly. But the dependencies with hardware drivers and the host system are not pushed off the table, just pushed down a larger library stack.
This problem is very hard to solve automagically in EasyBuild due to all factors at play that lie out of its control (hardware configuration, build system, resource manager). Nonetheless, EasyBuild already provides tools to make custom modifications to easyconfigs programmatically (e.g. hooks), which can be used to seamlessly build and deploy an MPI stack fully adapted to your system. This talk will describe strategies to automatically handle the MPI stack with OpenMPI and Intel MPI on TCP and InfiniBand networks, as well as with Nvidia GPUs. Selected configurations reflect what we use at Vrije Universiteit Brussel in our heterogeneous tier-2 HPC.
The European Environment for Scientific Software Installations (EESSI, pronounced as "easy") is a collaborative project between different partners in HPC community, with as common goal to build a common stack of scientific software installations for HPC systems and beyond, including laptops, personal workstations and cloud infrastructure.
In this talk, we will outline how to get started with EESSI, from different angles:
We'll explain different ways of accessing EESSI, along with the requirements and tradeoffs, so you're ready to hit the ground running once EESSI is ready for production use!
Fast and accurate high-resolution weather forecasting has become crucial to limit the impact on human
activities and population of weather events that are growing in frequency and intensity.
The timely simulation of large weather models requires computational resources that were typically attainable only on large HPC clusters. The availability of HPC specific solutions on Azure allow any weather researcher to access parallel computing at scale whenever needed.
The Weather Research and Forecasting (WRF) model is one of the most adopted open-source HPC software for
weather forecasting. The complexity of building the software stack required to run WRF also represents a
significant barrier for researchers.
In this talk we will explore how weather forecast community can leverage EESSI on Azure HBv3-series VMs optimized for HPC applications to rapidly run WRF weather simulations at scale.
Modules, also called "Environment Modules", has turned 30 in 2021 and is still there to help users dynamically managed their environment. After 9 feature releases in the 4.x development cycle, Modules 5.0 was released last September. This talk will detail the prominent features introduced during the last years and what is coming next for 2022.
Nuclear fusion, a promising but difficult future energy source, is actively being researched with the huge international megaproject ITER due to become operational soon (this decade). This talk will feature a brief introduction to nuclear fusion (no physics knowledge required). It will discuss some of the methods used for modelling and simulating fusion plasmas, some of the challenges faced, and new approaches being worked on (with an obvious bias towards the speaker's own research topic).
ReFrame is a powerful framework for writing system regression tests and benchmarks, specifically targeted to HPC systems. The goal of the framework is to abstract away the complexity of the interactions with the system, separating the logic of a test from the low-level details, which pertain to the system configuration and setup. This allows users to write portable tests in a declarative way that describes only the test's functionality.
In this talk we will present the advancements in the framework since the last year and present a roadmap for the upcoming months. We will also touch on the topic of reusability of tests across sites.
Lmod is the modern Environment Module Tool. Sysadmins define packages and let users choice which package and which version of that package. This part of the talk will include a brief overview of Lmod. This will be followed up with the new features such as module overview, updates to sh_to_modulefile and sourcing shell scripts inside modulefiles with source_sh() plus other recent changes to Lmod.
XALT is a tool to take the census of what programs and libraries are run on your cluster. This talk will briefly cover what XALT does and how it does it. It will include some "war" stories of XALT usage. Finally I'll cover recent changes to XALT. These include better container support, a pre-ingestion filter, more compiler detection (rust, chapel, etc) and better debugging support.
Research Software Engineers (RSEs) have been around for decades - we are graduate students and postdocs that fell in love with writing software, or research support staff that strayed from the traditional academic path and are committed to supporting research. However, we have been hidden. It's only been in the last decade that we have started to find one another, and form a strong community that offers to elevate and champion our role.
In this talk, I will tell the story of the RSE movement, starting from the UK and spreading internationally to now what is a global movement. We will focus on events from the past, different avenues for learning or getting involved, and hopes for the future. Whether or not you are a research software engineer, you are also a part of this story and journey toward having high quality software for sustainable, reproducible research.
[Fri 28 Jan 2022 - 14:00 UTC]
EasyBuild site presentations (recording @ YouTube)
Proteins are essential biomolecules in all living organisms, performing various tasks from digestion to muscle contraction to immune responses. A protein consists of amino acids chained together, and are often represented as a one-dimensional sequence. However, protein function is often decided by the 3-D structure into which these chains fold, rather than the sequence itself. Therefore, protein structure prediction from sequence information has been a long-standing problem in biology.
In 2021, DeepMind released AlphaFold (v2), with which they won the international CASP14 structure prediction competition in an overwhelming manner. Their deep-learning based solution was seen by several leading domain experts as a once-in-a-lifetime breakthrough with a seemingly endless range of possible applications.
Given the high computational cost of deep learning algorithms, specialized hardware and software are required, resulting in a very high demand for availability and support on HPC systems. This presentation gives an outline of protein folding, the impact of the AlphaFold release, and the software on the HPC.
One major goal of European Environment for Scientific Software Installations (EESSI) collaborative project is to let people propose software installations to include into EESSI, much like accepting contributions in EasyBuild through pull requests.
The setup of EESSI, where optimized software installations for a diverse range of CPU microarchitectures (Intel, AMD, Arm, POWER, and perhaps later also RISC-V), as well as the need to ensure a secure workflow, makes this challenging. Other aspects of EESSI, like strictly controlling the build environment through containers, distributing software via CernVM-FS, supporting different Linux distributions through a compatibility layer built with Gentoo Prefix, and leveraging EasyBuild + Lmod for software installations and ReFrame for software testing, actually make this feasible.
In this talk, we will outline our projected approach to this, which involves implementing a GitHub App (in Python) that serves as a bot to help with processing pull requests to the easystack file that defines the EESSI software layer.
The bot's tasks include testing software installations in the EESSI environment on all supported CPU targets, (re-)running the tests for that software in different contexts (operating systems, platforms, etc.), and eventually ingesting the software installations into EESSI, under the supervision of humans that review the incoming contributions and assess the results produced by the bot.
Although our primary focus is to automate the contribution workflow in EESSI as much as possible, we believe our GitHub App could also be useful beyond the EESSI project.
With the Frontier and El Capitan systems on the horizon, ExaFLOP-sized systems have now become a reality. These systems are and will be based on heterogeneous, accelerated platforms using AMD EPYC(tm) Processors as well as AMD Instinct(tm) Accelerators.
In in the first part of this talk, we will review the node-level design of the AMD Instinct(tm) MI200 Series GPUs. We will cover key features of the GPU, such as the AMD Infinity Fabric(tm) and review key performance metrics. We will then briefly describe the GPU's micro-architecture, how it is composed of compute units, and what their capabilities are.
In the second part, we will discuss the AMD ROCm(tm) platform software stack that accompanies the AMD Instinct Accelerators and show how it can be used to program for the AMD Processors and the AMD Instinct GPUs. We will provide insight into the individual components of the ROCm platform, build options, as well as containerization of GPU workloads. We will allow amble time to ask questions and interact with the presenters.
Over that last 3 decades, we have witnessed a transition from closed software ecosystems being the foundation for HPC, enterprise, and business to open source software ecosystems based on Linux: from Arduino in the IoT space, to Android in the mobile space, to Linux in HPC and cloud-based systems with various Open Source Software projects built on top.
However, when examining hardware, current commercial off the shelf solutions are closed hardware ecosystems that only enable integration at the peripheral (PCIe) level. The combination of current technology trends, the slowing of Moore's Law, and cost prohibitive silicon manufacturing inhibit significant power-performance gains by relying on traditional closed ecosystems, especially in HPC, technology pushed to the extreme. This new regime forces systems to be much more specialized to achieve the power-performance profiles required for a supercomputer. In the past, HPC has led the way forward, defining the bleeding edge of technology. HPC can do this again with open hardware, as it has done in software with adopting Linux and open source in general. This is not only a technology imperative, but one born out of current geopolitics. Given this technology and geopolitical backdrop, we describe how Europe can exploit its resources targeting research and development for technological independence.
In this talk, first, we describe what RISC-V is. Next, using RISC-V as an instrument, provide a vision for the future and a collection of current research and innovation projects, infrastructure, and the community that are building the foundation for the future. This is a new opportunity for Europe to lead the way to an HPC Future that is Wide Open!
In case of questions, please contact firstname.lastname@example.org.