Introduction
The ever-changing nature of High-Performance Computing (HPC) Computing has always compelled the HPC community to invest in continual efforts to train new and existing practitioners. Historically, these efforts were tailored around a typical group of users possessing, due to their background, a certain set of programming skills. However, as HPC has become more diverse in terms of hardware, software, and user backgrounds, the traditional training approaches have become insufficient to address the training needs of our community. An increasingly complicated HPC landscape makes the development and delivery of new training materials challenging. During training delivery, educators need to address the knowledge gaps resulting from the diverse backgrounds of their learners. It is not uncommon for an attendee to come with a specific learning objective related to their work tasks and not be interested in the core HPC knowledge. In that sense, the term HPC practitioner describes anyone involved in providing or using HPC systems, e.g. a user that runs an application on an HPC-resource, a developer for HPC-systems, or an administrator. At the same time, we define the term “HPC” inclusively, capturing parallel computing and cluster computing, e.g. High Throughput Computing (HTC) or Multi-Task Computing (MTC), as these too suffer from a lack of knowledge with regard to performance issues.
How should we develop training for users, often coming from disciplines that have not traditionally used HPC resources, and are only interested in learning a particular set of skills? How can we satisfy their training needs if we don’t really understand what these are? HPC centres struggle to identify and overcome the gaps in users’ knowledge, while users struggle to identify the skills required to perform their tasks.
The goal of the HPC Certification Forum is to clearly categorize, define, and examine competencies expected from proficient HPC practitioners. The HPC Certification Forum (HPCCF) is the central authority, and curates and maintains the certification program. The program consists of three parts: competencies defined in a modular and easily expandable skill tree, an examination process to verify that practitioners possess those skills, and the certification demonstrating their knowledge. Although the forum is not involved in the development of any training materials or tools, it supports the ecosystem around the competencies.
The ultimate goal of the forum is to offer a free, globally acknowledged certification program that will make HPC education and training more transparent and quantifiable for training providers, and easier to navigate for practitioners. This article highlights relevant aspects of our activities, more details can be found in \cite{Kunkel_2020}.
A Community-Led Forum
The forum is organized around several key roles, which include: the general chair, a publicity chair, and curators for the skill-tree, topics, and examinations. While the board leads the effort, members of the community are expected to contribute to the effort, and anyone is free to benefit from it. Active members can gain nomination and voting rights via an annual steering board election. Decision making is lightweight at the moment: while we have defined roles for steering board members that include final authority in the event it is needed, thus far we have made decisions democratically without the need to rely on this formal mechanism. Basically, any contribution is either accepted or discussed and modified until it is accepted.
The forum uses Slack for its monthly meetings and organizes two face-to-face meetings per year (one at ISC-HPC and the other at the annual SC). GitHub and the Forum’s
webpage are used to coordinate the effort and publish information.
All software used by the forum is Open Source and freely available to allow everyone to participate. The forum aims to provide an ecosystem revolving around the certification specification (including the skill tree and the examination framework) which consists of tools that cover, for example, branding of teaching materials, referring and cross-linking to the competency definitions, and compiling curricula. In particular, we hope to catalogue and reference the existing content of third-parties to allow practitioners to browse the skills and navigate to relevant open and commercial teaching material.
Note that there is currently no direct funding for the effort, but we support all proposals and efforts that members bring forward and associate their work with the forum. For example, in the
ESiWACE project, some contributions regarding HPC IO are expected. Ultimately, we believe that the sustainability of the effort depends upon the recognition of its importance and the voluntary contribution of institutions and individuals.
Categorization of Competencies
The forum groups a well-defined set of competencies into a skill, and a skill is identified by a set of learning outcomes and relevant metadata that clearly specifices what a practitioner should be able to do to be said to possess that skill. The skills are organized in a tree structure from a coarse-grained representation (corresponding to the tree branches) to a fine-grained representation mapped onto the tree leaves. On the leaf level, a skill is orthogonal to other skills – their narrowed scope means they intentionally can be taught in sessions ranging from a 1.5-hour lecture up to a 4-hour workshop. Skills may cover technology-specific knowledge such as the skill “USE1.1-B Command Line Interface” for Linux basics or the skill “K4.2-B SLURM Workload manager” that describes how a cluster manages user jobs. We believe this granularity allows practitioners to select skills relevant to their circumstances, and allows lecturers to prepare modular training sessions with well-defined content while still achieving comparable training outcomes for a varied range of practitioners’ backgrounds. Cross-linking between skills belonging to different branches is allowed and provides for the reuse of the skill definitions and eases the navigation of the tree.