Zurück | Archiv

Rechner-Cluster - Migration from lustre18 to lustre22

Dienstag 23.04.2024 07:00 - unbekannt

In the last weeks, we started migrating all HPCWORK data to a new filesystem. In this Maintenance we will do the final migration step. HPCWORK will not be available during this maintenance.

Mi 10.04.2024 11:26

Updates

Due to technical problems, we will have to postpone the maintenance (and the final lustre migration step) to 23.04.2024 07:00.

Di 16.04.2024 16:23

Rechner-Cluster - Performance Problems on HPCWORK

Montag 08.04.2024 11:00 - Mittwoch 24.04.2024 17:00

We currently register recurring performance degradations on HPCWORK directories which might be partly worsened by the on-going migration process leading on to the filesystem migration on April, 17th. The problems cannot be traced back to a single cause but are actively investigated.

Fr 12.04.2024 11:35

Updates

Due to technical problems, we will have to postpone the maintenance (and the final lustre migration step) to 23.04.2024 07:00.

Di 16.04.2024 16:21

Rechner-Cluster - Deactivation of User Namespaces

Mittwoch 27.03.2024 08:15 - unbekannt

(German version below) Due to an open security issue we are required to disable the feature of so-called user namespaces on the cluster. This feature is mainly used by containerization software and affects the way apptainer containers will behave. The changes are effective immediately. Most users should not experience any interruptions. If you experience any problems, please contact us as usual via servicedesk@itc.rwth-aachen.de with a precise description of the features you are using. We will reactivate user namespaces as soon as we can install the necessary fixes for the aforementioned vulnerability. --- Aufgrund eines ausstehenden Sicherheitsproblems müssen wir sogenannte Usernamespaces auf dem Cluster vorübergehend deaktivieren. Dieses Feature wird hauptsächlich von Containervirtualisierungssoftware wie Apptainer genutzt, und die Abschaltung hat einen Einfluss darauf, wie diese Container intern aufgesetzt werden. Die meisten Nutzer sollten von diesen Änderungen nicht direkt betroffen sein und nahtlos weiterarbeiten können. Sollten Sie dennoch Probleme entdecken, kontaktieren Sie uns bitte via servicedesk@itc.rwth-aachen.de und schildern Sie uns, wie konkret Sie Ihre Container starten. Sobald wir einen Patch für die Sicherheitslücke einspielen können, werden wir User Namespaces wieder aktivieren.

Mi 27.03.2024 08:14

Updates

A kernel update addressing the issue was released upstream and will be available to the compute cluster, soon. Upon the update, usernamespaces can be enabled, again.

Do 04.04.2024 11:11

Rechner-Cluster - System Maintenance

Dienstag 23.04.2024 07:00 - Mittwoch 24.04.2024 12:00

The whole clusters needs to be updated with a new kernel such that user namespaces can be reenabled again, please compare https://maintenance.itc.rwth-aachen.de/ticket/status/messages/14/show_ticket/8929 Simultaneously the Infiniband Stack will be updated for better performance and stability. During this maintenance, the dialog systems and the batchsystem will not be available. The dialog systems are expected to be reopened in the early morning. We do not believe that the maintenance will last the whole day but expect the cluster to open earlier.

Mi 10.04.2024 11:22

Updates

Due to technical problems, we will have to postpone the maintenance to 23.04.2024 07:00.

Di 16.04.2024 16:22

Unfortunately, unplanned complications have arisen during maintenance, so that maintenance will have to be extended until midday tomorrow. We will endeavor to complete the work by then. We apologize for any inconvenience this may cause.

Di 23.04.2024 16:27

Rechner-Cluster - HPC JupyterHub update

Dienstag 23.04.2024 07:00 - Mittwoch 24.04.2024 12:00

During the Claix HPC System Maintenance, the HPC JupyterHub will be updated to a newer version. This will improve Claix 2023 support as well mandatory security updates. The whole clusters needs to be updated with a new kernel.

Di 23.04.2024 07:03

Updates

The migration was successfully completed.

Mi 24.04.2024 13:40

Rechner-Cluster - Top500 - Benchmark

Donnerstag 11.04.2024 17:00 - Freitag 12.04.2024 09:10

During the stated time Claix-2023 will not be available due to a benchmark run for the Top500 list[1]. Batch jobs which cannot finish before the start of this downtime or which are scheduled during this time period will be kept in queue and started after the cluster resumes operation. [1] https://www.top500.org

Do 11.04.2024 17:09

Updates

The nodes are available now again

Fr 12.04.2024 09:27