The whole clusters needs to be updated with a new kernel such that user namespaces can be reenabled again, please compare https://maintenance.itc.rwth-aachen.de/ticket/status/messages/14/show_ticket/8929 Simultaneously the Infiniband Stack will be updated for better performance and stability. During this maintenance, the dialog systems and the batchsystem will not be available. The dialog systems are expected to be reopened in the early morning. We do not believe that the maintenance will last the whole day but expect the cluster to open earlier.
Due to technical problems, we will have to postpone the maintenance to 23.04.2024 07:00.
In the last weeks, we started migrating all HPCWORK data to a new filesystem. In this Maintenance we will do the final migration step. HPCWORK will not be available during this maintenance.
Due to technical problems, we will have to postpone the maintenance (and the final lustre migration step) to 23.04.2024 07:00.
We currently register recurring performance degradations on HPCWORK directories which might be partly worsened by the on-going migration process leading on to the filesystem migration on April, 17th. The problems cannot be traced back to a single cause but are actively investigated.
Due to technical problems, we will have to postpone the maintenance (and the final lustre migration step) to 23.04.2024 07:00.
(German version below) Due to an open security issue we are required to disable the feature of so-called user namespaces on the cluster. This feature is mainly used by containerization software and affects the way apptainer containers will behave. The changes are effective immediately. Most users should not experience any interruptions. If you experience any problems, please contact us as usual via servicedesk@itc.rwth-aachen.de with a precise description of the features you are using. We will reactivate user namespaces as soon as we can install the necessary fixes for the aforementioned vulnerability. --- Aufgrund eines ausstehenden Sicherheitsproblems müssen wir sogenannte Usernamespaces auf dem Cluster vorübergehend deaktivieren. Dieses Feature wird hauptsächlich von Containervirtualisierungssoftware wie Apptainer genutzt, und die Abschaltung hat einen Einfluss darauf, wie diese Container intern aufgesetzt werden. Die meisten Nutzer sollten von diesen Änderungen nicht direkt betroffen sein und nahtlos weiterarbeiten können. Sollten Sie dennoch Probleme entdecken, kontaktieren Sie uns bitte via servicedesk@itc.rwth-aachen.de und schildern Sie uns, wie konkret Sie Ihre Container starten. Sobald wir einen Patch für die Sicherheitslücke einspielen können, werden wir User Namespaces wieder aktivieren.
A kernel update addressing the issue was released upstream and will be available to the compute cluster, soon. Upon the update, usernamespaces can be enabled, again.
During the stated time Claix-2023 will not be available due to a benchmark run for the Top500 list[1]. Batch jobs which cannot finish before the start of this downtime or which are scheduled during this time period will be kept in queue and started after the cluster resumes operation. [1] https://www.top500.org
The nodes are available now again
There are currently longer waiting times in the ML partition as the final steps of the acceptance process are still being carried out.
The waiting times should be better now
+++ German version below +++ The RegApp will be updated on 2024-04-03. During the update window, the service will be unavailable for short time intervals. Active sessions should not be affected. +++ English version above +++ Am 03.04.2024 wird die RegApp aktualisiert. Während des Updatefensters kann der Dienst für kurze Zeit unterbrochen sein. Aktive Sitzungen sollten nicht betroffen sein.