RWTH High Performance Computing (HPC)
Full system maintenance.
Due to various necessary maintenance works, the entire CLAIX HPC System will be unavailable.
The initial phase of the maintenance should last until 12:00. After which filesystem and login nodes should be available.
Jobs will not run until the full maintenance works are completed.
We aim to also upgrade the Slurm scheduler during the downtime.
Please note the following:
- User access to the HPC system through login nodes, HPC JupyterHub or any other connections will not be possible during the initial part of the maintenance the maintenance.
- No Slurm jobs will be able to run during the entire maintenance.
- Before the maintenance, Slurm will only start jobs that guarantee to be finished before the start of maintenance; any running jobs must finish by then or might be terminated.
- Nodes might therefore remain empty leading to the maintenance, as Slurm tries to clear the nodes from user jobs.
- Waiting times before and after the maintenance might be higher than usual, as nodes are emptied before or the queue of waiting jobs increases in size afterwards.
- Files on your personal or project directories will not be available during the initial part of the maintenance.
The first part of the maintenance will take unexpectedly longer. At the moment, we cannot estimate when the maintenance work of the first part will be finished.
The network maintenance is still ongoing.
The maintenance works were sadly delayed due to circumstances out of our control. This will delay the HPC systems availability by a few hours.
Due to issues during the network maintenance and a short-hand failure of the storage backend of one infrastructure server, required in the maintenance, the maintenance tasks were delayed. Due to the delay, some tasks had to be postponed. The Cluster is operational again.