Zurück | News

Rechner-Cluster - Slurm hiccup possible

Donnerstag 19.12.2024 15:50 - Donnerstag 19.12.2024 16:00

We are migrating the slurm controller to a new host. It might come to short timeouts. We try to minimize that as much as possible.

Do 19.12.2024 09:53

Updates

The first try was not successfull, we are on the old master again. We are analyzing the problems that occurred and try again later.

Do 19.12.2024 10:01

we make another attempt

Do 19.12.2024 15:52

Rechner-Cluster - Issues regarding Availability

Montag 02.12.2024 11:15 - Mittwoch 18.12.2024 16:00

There may currently be login problems with various login nodes. We are working on a solution.

Mo 02.12.2024 11:58

Updates

The observed issues affect the batch service as well. Consequently many batch jobs may have failed.

Mo 02.12.2024 12:18

The observed problems can be concluded from power issues. We cannot exclude that further systems may have to be controlled shut down temporarily due to a problem resolution if required. However, we hope the issues can be resolved without any additional measures.

Mo 02.12.2024 14:31

The cluster can be accessed again. cf. ticket https://maintenance.itc.rwth-aachen.de/ticket/status/messages/14/show_ticket/9521 for further details. Several nodes, however, are still unavailable due to the consequences of the aforementioned issues. We are currently working on resolving the issues.

Mi 04.12.2024 12:57

Rechner-Cluster - Infiniband issues leading to not reachable nodes

Freitag 13.12.2024 21:30 - Montag 16.12.2024 10:57

Due to to be analyzed infiniband problems many nodes including the whole GPU cluster are not reachable at the moment. We are working together with the manufacturer, to solve the problems.

Mo 16.12.2024 08:04

Updates

The problem could be fixed

Mo 16.12.2024 10:57

Rechner-Cluster - Single Sign-On und MFA gestört///Single sign-on and MFA mailfunction

Mittwoch 11.12.2024 08:00 - Mittwoch 11.12.2024 09:30

Zurzeit ist der Single Sign-On und die Multifaktorauthentifizierung sporadisch gestört. Wir arbeiten bereits an einer Lösung und bitten um Geduld. ---english--- At the moment, single sign-on and multi-factor authentication are sporadically disrupted. We are already working on a solution and ask for your patience.

Mi 11.12.2024 08:12

Rechner-Cluster - Nodes drained due filesystem issues.

Sonntag 08.12.2024 06:45 - Sonntag 08.12.2024 17:15

Dear users, Sunday the 8.12.2024 at 06:53:00 AM the home filesystems of most nodes went offline. This might have negatively crashed some jobs and no new jobs can start during the downtime. We are actively working on the issue.

So 08.12.2024 16:22

Updates

Most nodes are coming back online. Apologies for the troubles. We expect most nodes to be usable by 18:00.

So 08.12.2024 16:56

Rechner-Cluster - lost filesystem $HOME $WORK connection

Donnerstag 05.12.2024 22:00 - Freitag 06.12.2024 10:00

Due to a problem in our network, some nodes lost their connection to the $HOME and $WORK file system. This included the login23-1 and login23-2 nodes. The issue has been resolved now.

Fr 06.12.2024 14:00

Rechner-Cluster - Emergency Shutdown of CLAIX-2023 Due to Failed Cooling

Montag 02.12.2024 15:15 - Mittwoch 04.12.2024 06:00

CLAIX-2023 was shut-down in emergency to prevent damage to the hardware. Due to severe power issues, the cooling facilities failed and could not provide sufficient heat dissipation. The cluster will be operational again if the inducting issues can be resolved.

Technische Erläuterung

All MPI and GPU nodes are liquid cooled. Both CDUs (= Cooling Distribution Units) failed such that heat dissipation was not possible anymore.

Mo 02.12.2024 15:20

Updates

Both CDUs are active again and cooling could be restored. The cluster will be booted again for damage analysis only. Until further notice, the batch service remains suspended until all issues are resolved and all power security checks are positive.

Di 03.12.2024 10:34

The cooling system is now fully operational again. Additionally, we have implemented further measures to enhance stability in the future. The queues were reopened last night; however, we are currently conducting a detailed investigation into some specific nodes regarding their cooling and performance. Once these investigations are complete, the affected nodes will be made available through the batch system again.

Mi 04.12.2024 10:41

Rechner-Cluster - New file server for home and work directories

Freitag 22.11.2024 12:00 - Dienstag 26.11.2024 13:45

We are putting a new file server for the home and work directories into operation. For this purpose we will carry out a system maintenance in order to finally synchronise all data over the weekend.

Mi 20.11.2024 09:32

Updates

The maintenance needs to be extended.

Mo 25.11.2024 13:32

Due to some issues preventing a normal batch service, the maintenance had to be extended.

Di 26.11.2024 13:33

Rechner-Cluster - Limited Usability of CLAIX-2023

Donnerstag 21.11.2024 09:45 - Donnerstag 21.11.2024 17:15

Due to concurrent external issues regarding the RWTH Aachen network, access and usability of the Compute Cluster is limited at the moment. The respective network department is currently working on a solution. -- Wegen anhaltender externer Störungen im RWTH-Netzwerk ist der Cluster nur eingeschränkt erreichbar und funktionsfähig. Die zuständige Netzwerkfachabteilung arbeitet bereits an einer Lösung der Probleme.

Do 21.11.2024 12:47

Updates

The issues could not be resolved until now and may persist trhoughout tomorrow as well.

Do 21.11.2024 16:00

The issues have been resolved.

Fr 22.11.2024 06:52

Rechner-Cluster - Power disruption / Stromausfall

Freitag 15.11.2024 17:00 - Samstag 16.11.2024 14:00

At 17:00, there was a brief interruption of the power lines in the Aachen area. The power is available again, however, most of the compute nodes went down consequently. Currently, it is unclear when the service can be resumed. At the moment, critical services are under special care and are, if required, being restored. Um 17:00 Uhr hat es einen kurzzeitigen Stromausfall im Raum Aachen gegeben. Die Stromversorgung besteht wieder, jedoch ist die Mehrzahl der Compute-Knoten infolgedessen ausgefallen. Es ist unklar, wann der Betrieb wieder aufgenommen werden kann. Es wird momentan daran gearbeitet, kritische Dienste zu sichern und wiederherzustellen.

Fr 15.11.2024 18:43

Updates

After restoring critical operational infrastructure services, the HPC service is resumed. However, a large portion of the GPU nodes are unavailable due to the impact of the incurred blackout. Until further notice, these nodes are unavailable. Nachdem die kritische Infrastruktur zum Betrieb der Systeme wiederhergestellt werden konnte, wurde der HPC-Cluster wieder bereitgestellt und freigegeben. Allerdings sind durch die Auswirkungen des Stromausfalls eine größere Zahl GPU-Knoten nicht mehr verfügbar. Wir arbeiten an der Behebung der Probleme, können allerdings noch keine Prognose geben, wann und ob die Systeme wieder verfügbar sein werden.

Fr 15.11.2024 21:06

Der Großteil der ML Systeme (GPUs) konnten heute wieder hochgefahren und in den Batchbetrieb übergeben werden. The majority of the ML systems (GPUs) were restarted today and are back in batch operation.

Sa 16.11.2024 14:04

Rechner-Cluster - Scheduler Hiccup

Donnerstag 14.11.2024 10:45 - Donnerstag 14.11.2024 10:55

Our Slurm workload manager crashed due to an unknown reason. Functionality could be restored at short hand. Further investigations are ongoing.

Do 14.11.2024 10:59

Rechner-Cluster - GPU Malfunction on GPU Login Node

Dienstag 12.11.2024 09:15 - Dienstag 12.11.2024 10:35

Currently, a GPU of the GPU login node login23-g-1 shows an issue. The node is unavailable until the issue is resolved.

Di 12.11.2024 09:29

Updates

The issues could be resolved.

Di 12.11.2024 10:36

Rechner-Cluster - Login malfunction

Mittwoch 23.10.2024 17:00 - Donnerstag 24.10.2024 08:40

It is currently not possible to log in to the login23-* frontends. There is a problem with two-factor authentication

Do 24.10.2024 08:09

Rechner-Cluster - Top500 run for GPU nodes

Freitag 27.09.2024 08:00 - Freitag 27.09.2024 11:00

We are doing a new top500 run for the ML partition of CLAIX23. The GPU nodes will not be available during that run. Other Nodes and login23-g-1 might be also unavailable: i23m[0027-0030],r23m[0023-0026,0095-0098,0171-0174],n23i0001

Do 26.09.2024 15:03

Rechner-Cluster - Zertifikat abgelaufen /// Expired certificate

Montag 23.09.2024 08:00 - Montag 23.09.2024 09:03

Aufgrund von dem abgelaufenen Zertifikat für idm.rwth-aachen.de können keine IdM-Anwendungen und die Anwendungen, die über RWTH Single Sign-On angebunden sind, aufgerufen werden. - Beim Aufrufen von IdM-Anwendungen wird eine Meldung zur unsicheren Verbindung angezeigt. - Beim Aufrufen von Anwendungen mit dem Zugang über RWTH Single Sign-On wird eine Meldung zu fehlenden Berechtigungen angezeigt. Wir arbeiten mit Hochdruck an der Lösung des Problems. --- English --- Due to the expired certificate for idm.rwth-aachen.de, no IdM applications and the applications that use RWTH Single Sign-On can be accessed. We are working on a solution. - An insecure connection message is displayed when calling up IdM applications. - When calling up applications with access via RWTH Single Sign-On, a message about missing authorisations is displayed.

Mo 23.09.2024 08:11

Updates

Das Zertifikat wurde aktualisiert und die Anwendungen können wieder aufgerufen werden. Bitte löschen Sie den Browsercache, bevor Sie die Seiten wieder aufrufen. /// The certificate has been updated and the applications can be accessed again. Please delete the browser cache before accessing the pages again.

Mo 23.09.2024 09:06

Rechner-Cluster - Störung der RegApp -> kein login auf dem Cluster möglich

Montag 09.09.2024 10:45 - Montag 09.09.2024 11:30

Leider kam es in dem genannten Zeitraum zu einer Störung der RegApp, so dass man sich nicht auf den Frontends des Clusters einloggen konnte. Bereits bestehende Verbindungen wurden davon nicht beeinflusst. Das Problem ist behoben.

Mo 09.09.2024 15:40

Rechner-Cluster -

Montag 09.09.2024 08:00 - Montag 09.09.2024 09:00

copy23-2 data transfer system will be unavailable for maintenance.

Mo 02.09.2024 09:01

Updates

The maintenance is completed.

Mo 09.09.2024 09:00

Rechner-Cluster - Firmware Update of InfiniBand Gateways

Donnerstag 05.09.2024 15:00 - Freitag 06.09.2024 12:15

The firmware of the InfiniBand Gateways will be updated. The firmware update will be performed in background and should not cause any interruption of service.

Do 05.09.2024 15:31

Updates

The updates are completed.

Fr 06.09.2024 13:19

Rechner-Cluster - Hosts der RWTH Aachen teilweise nicht aus Netzen von anderen Providern erreichbar // Hosts of RWTH Aachen University partly not accessible from networks of other providers

Samstag 24.08.2024 20:15 - Sonntag 25.08.2024 21:00

Aufgrund einer Störung des DNS liefern die Nameserver verschiedener Provider aktuell keine IP-Adresse für Hosts unter *.rwth-aachen.de zurück. Als Workaround können Sie alternative DNS-Server in Ihren Verbindungseinstellungen hinterlegen, wie z.B. die Level3-Nameserver (4.2.2.2 und 4.2.2.1) oder von Comodo (8.26.56.26 und 8.20.247.20). Ggf ist es auch möglich den VPN-Server der RWTH zu erreichen, dann nutzen Sie bitte VPN. // Due to DNS disruption, the name servers of various providers are currently not returning an IP address for hosts under *.rwth-aachen.de. As a workaround, you can store alternative DNS servers in your connection settings, e.g. the Level3-Nameserver (4.2.2.2 and 4.2.2.1) or Comodo (8.26.56.26 und 8.20.247.20). It may also be possible to reach the RWTH VPN server, in which case please use VPN.

So 25.08.2024 10:34

Updates

Anleitungen zur Konfiguration eines alternativen DNS-Server unter Windows finden Sie über die folgenden Links: https://www.ionos.de/digitalguide/server/konfiguration/windows-11-dns-aendern/ https://www.netzwelt.de/galerie/25894-dns-einstellungen-windows-10-11-aendern.html Als Alternative können Sie auch VPN nutzen. Wenn Sie den VPN-Server nicht erreichen, können Sie nach der folgenden Anleitung die Host-Datei unter Windows anpassen. Dadurch kann der Server vpn.rwth-aachen.de erreicht werden. Dazu muss der folgenden Eintrag hinzugefügt werden: 134.130.5.231 vpn.rwth-aachen.de https://www.windows-faq.de/2022/10/04/windows-11-hosts-datei-bearbeiten/ // Instructions for configuring an alternative DNS server under Windows can be found via the following links: https://www.ionos.de/digitalguide/server/konfiguration/windows-11-dns-aendern/ https://www.netzwelt.de/galerie/25894-dns-einstellungen-windows-10-11-aendern.html You can also use VPN as an alternative. If you cannot reach the VPN server, you can adjust the host file under Windows according to the following instructions. This will allow you to reach the server vpn.rwth-aachen.de. To do this, the following entry must be added: 134.130.5.231 vpn.rwth-aachen.de https://www.windows-faq.de/2022/10/04/windows-11-hosts-datei-bearbeiten/

So 25.08.2024 13:20

Die Host der RWTH Aachen sind nun wieder auch von ausserhalb des RWTH Netzwerkes erreichbar. // The hosts of RWTH Aachen University can now be reached again from outside the RWTH network.

So 25.08.2024 21:10

Auch nach der Störungsbehebung am 25.8. um 21 Uhr kann es bei einzelnen Nutzer*innen zu Problemen gekommen sein. Am 26.8. um 9 Uhr wurden alle Nacharbeiten abgeschlossen, sodass es zu keinen weiteren Problemen kommen sollte. // Individual users may have experienced problems even after the fault was rectified on 25 August at 9 pm. On 26.8. at 9 a.m. all follow-up work was completed, so there should be no further problems.

Mo 26.08.2024 15:25

Rechner-Cluster - MPI/CPU Jobs Failed to start overnight

Montag 19.08.2024 17:15 - Dienstag 20.08.2024 08:15

Many nodes suffered an issue after our updates on the 19.08.2024, resulting in jobs failing on the CPU partitions. If your job failed to start or failed on startup, please consider requeuing it if necessary. This list of jobs was identified as possibly being affected by the issue: 48399558,48468084,48470374,48470676,48473716,48473739,48473807,48473831, 48475599,48475607_0,48475607_1,48475607_2,48475607_3,48475607_4,48475607_5, 48475607_6,48475607_7,48475607_8,48475607_9,48475607_10,48475607_11,48475607_12, 48475607_13,48475607_14,48475607_15,48475607_16,48475607_17,48475607_18,48475607_19, 48476753,48482255,48485168,48486404,48488874_5,48488874_6,48488874_7,48488874_8, 48488874_9,48488874_10,48488874_11,48488875_9,48488875_10,48488875_11,48489133_1, 48489133_2,48489133_3,48489133_4,48489133_5,48489133_6,48489133_7,48489133_8,48489133_9, 48489133_10,48489154_0,48489154_1,48489154_2,48489154_3,48489154_4,48489154_5,48489154_6,48489154_7, 48489154_8,48489154_9,48489154_10,48489154_11,48489154_12,48489154_13,48489154_14,48489154_15, 48489154_16,48489154_17,48489154_18,48489154_19,48489154_20,48489154_21,48489154_22,48489154_23, 48489154_24,48489154_25,48489154_26,48489154_27,48489154_28,48489154_29,48489154_30,48489154_31, 48489154_32,48489154_33,48489154_34,48489154_35,48489154_36,48489154_37,48489154_38,48489154_39, 48489154_40,48489154_41,48489154_42,48489154_43,48489154_44,48489154_45,48489154_46,48489154_47, 48489154_100,48489154_101,48489154_102,48489154_103,48489154_104,48489154_105,48489154_106,48489154_107, 48489154_108,48489154_109,48489154_110,48489154_111,48489154_112,48489154_113,48489154_114,48489154_115, 48489154_116,48489154_117,48489154_118,48489154_119,48489154_120,48489154_121,48489154_122,48489154_123, 48489154_124,48489154_125,48489154_126,48489154_127,48489154_128,48489154_129,48489154_130,48489154_131, 48489154_132,48489154_133,48489154_134,48489154_135,48489154_136,48489154_137,48489154_138,48489154_139, 48489154_140,48489154_141,48489154_142,48489154_143,48489154_144,48489154_145,48489154_146,48489154_147, 48489154_148,48489154_149,48489154_150,48489154_151,48489154_152,48489154_153,48489154_154,48489154_155, 48489154_156,48489154_157,48489154_158,48489154_159,48489154_160,48489154_161,48489154_162,48489154_163, 48489154_164,48489154_165,48489154_166,48489154_167,48489154_168,48489154_169,48489154_170,48489154_171, 48489154_172,48489154_173,48489154_174,48489154_175,48489154_176,48489154_177,48489154_178,48489154_179, 48489154_180,48489154_181,48489154_182,48489154_183,48489154_184,48489154_185,48489154_186,48489154_187, 48489154_188,48489154_189,48489154_190,48489154_191,48489154_192,48489154_193,48489154_194,48489154_195, 48489618_1,48489618_2,48489618_3,48489618_4,48489618_5,48489618_6,48489618_7,48489618_8,48489618_9,48489618_10, 48489776,48489806_6,48489806_55,48489806_69,48489806_98,48489842,48489843,48489844,48489845,48489882_1,48489882_2, 48489882_3,48489882_4,48489882_5,48489882_6,48489882_7,48489882_8,48489882_9,48489882_10,48494481,48494490,48494752, 48494753,48494754,48494755,48494756,48494757,48494758,48494759,48494760

Di 20.08.2024 11:34

Rechner-Cluster - Maintenance

Montag 19.08.2024 07:00 - Montag 19.08.2024 16:00

Due to updates to our compute nodes, the HPC system will be unavailable for Maintenance. The login nodes will available at noon without interruptions, but the batch queue for jobs won't be usable during the maintenance work. As soon as the maintenance work has been completed, batch operation will be enabled again. These jobs should be requeued if necessary: 48271714,48271729,48271731,48463405,48463406,48463407,48466930, 48466932,48468086,48468087,48468088,48468089,48468090,48468091, 48468104,48468105,48468108,48468622,48469133,48469262,48469404, 48469708,48469734,48469740,48469754,48469929,48470011,48470017, 48470032,48470042,48470045,48474641,48474666,48475362,48489829, 48489831,48489833_2,48489838

Fr 09.08.2024 11:01

Rechner-Cluster - Old HPCJupyterHub GPU profiles might run slower on the new c23g nodes.

Freitag 24.05.2024 11:00 - Freitag 09.08.2024 13:46

Please migrate your notebooks to work with newer c23 GPU Profiles! -- The migration of the GPU Profiles to Claix 2023 and the new nodes of c23g has made the old python packages use non optimal settings on the new GPUs. Redeployment of these old profiles is necessary and will take some time.

Fr 24.05.2024 11:15

Rechner-Cluster - MPI jobs may crash

Dienstag 16.07.2024 16:12 - Donnerstag 01.08.2024 09:15

Since the cluster maintenance, random MPI job crashes are observed. We are currently investigating the issue and are working on a solution.

Mo 22.07.2024 09:37

Updates

We have identified the issue and are currently testing workarounds with the affected users.

Mi 24.07.2024 12:41

After successful tests with affected users, we have rolled out a workaround that automatically prevents this issue for our IntelMPI installations. We advise users to remove any custom workarounds from their job scripts to ensure compatibility with future changes.

Do 01.08.2024 10:28

Rechner-Cluster - c23i Partition is DOWN for the HPC JupyterHub

Donnerstag 18.07.2024 15:15 - Montag 29.07.2024 10:14

The c23i Partition is DOWN due to unforeseen consequences of our Monitoring systems that automatically downs the only node in the partition. A solution is momentarily unknown and will be investigated. The HPC JupyterHub will not be able to use it until it is resolved.

Do 18.07.2024 15:29

Rechner-Cluster - Temporary Deactivation of User Namespaces

Montag 08.07.2024 14:15 - Donnerstag 18.07.2024 13:00

Due to a security vulnerability in the Linux Kernel, user namespaces are temporarily deactivated. Upon the kernel update, user namespaces can be used again.

Mo 08.07.2024 14:32

Updates

User namespaces are available again.

Do 18.07.2024 13:00

Rechner-Cluster - Quotas on HPCWORK may not work correctly

Donnerstag 27.06.2024 14:30 - Donnerstag 18.07.2024 12:30

The quota system on HPCWORK may not work correctly. There may be an error "Disk quota exceeded" if trying to create files although the r_quota command reports that enough quota should be available. The supplier of the filesystem has been informed and is working on a solution.

Do 27.06.2024 14:40

Updates

File quotas for all hpcwork directories were increased to one million.

Do 18.07.2024 12:39

Rechner-Cluster - Reconfiguration of File Systems and Kernel Update

Montag 15.07.2024 07:00 - Dienstag 16.07.2024 16:11

During the Maintenance, $HPCWORK will be reconfigured, such that RDMA over IB will be possible from the CLAIX23 nodes instead of HPCWORK access over ethernet. At the same time, the Kernel will be updated. After the Kernel Update, the previously deactivated User Namespaces will be re-activated, again.

Mi 10.07.2024 09:43

Updates

The maintenance had to be extended for final filesystem tasks

Mo 15.07.2024 15:24

Due to unforseen problems, the maintenance has to be extended to tomorrow 16.07.2024 18.00. We do not expect the manufacturer of the filesystem to take that long, but expect to open the cluster earlier again.

Mo 15.07.2024 17:24

The maintenance could be ended successfully. Once again, sorry for the long delay.

Di 16.07.2024 16:12

Rechner-Cluster - HPCJupyterHub down due to update to 5.0.0

Mittwoch 26.06.2024 15:00 - Donnerstag 27.06.2024 16:00

HPCJupyterHub is down after faied update to 5.0.0. will stay until the update is complete. HPCJupyterHub could not be updated to 5.0.0. Remains at 4.1.5.

Mi 26.06.2024 15:04

Rechner-Cluster - FastX web servers on login18-x-1 and login18-x-2 stopped

Mittwoch 15.05.2024 14:00 - Donnerstag 27.06.2024 14:26

The FastX web servers on login18-x-1 and login18-x-2 have been stopped, i.e. the addresses https://login18-x-1.hpc.itc.rwth-aachen.de:3300 and https://login18-x-2.hpc.itc.rwth-aachen.de:3300 are not available anymore. Please use login23-x-1 or login23-x-2 instead.

Mi 15.05.2024 14:38

Updates

login18-x-1 and login18-x-2 has been decommissioned.

Do 27.06.2024 14:29

Rechner-Cluster - Maintenance

Mittwoch 26.06.2024 08:00 - Mittwoch 26.06.2024 16:00

Due to maintenance work on the water cooling system, Claix23 must be empty during the specified period. As soon as the maintenance work has been completed, batch operation will be enabled again. The dialog systems are not affected by the maintenance work.

Mi 12.06.2024 07:48

Updates

Additionally, between 10 and 11 o'clock, there will be a maintenance of the RegApp. During this time, new logins will not be possible, existing connections will not be disturbed.

Di 25.06.2024 14:04

Rechner-Cluster - Upgrade to Rocky Linux 8.10

Donnerstag 13.06.2024 11:15 - Mittwoch 26.06.2024 16:00

Due to the reached EOL of Rocky 8.9, the MPI nodes of CLAIX23 must be upgraded to Rocky 8.10. The upgrade is performed in background during production to minimize the downtime of the cluster. However, during the Upgrade, free nodes will be removed on a selection-basis and will not be available for job submission until the upgrade is completed. Please keep in mind that during the update, the library versions installed will likely change. Thus, the performance and application behaviour may vary compared to earlier runs.

Do 13.06.2024 11:49

Updates

Starting now, all new jobs will be scheduled to Rocky 8.10 nodes. The remaining nodes that still need to be updated are unvailable for job submission. These nodes will be upgraded as soon as possible after their jobs' completion.

Fr 14.06.2024 18:22

The update of the frontend and batch nodes is completed. Remaining nodes (i.e. integrated hosting and service nodes) will be updated on the cluster maintanance scheduled for 2024-06-26.

Do 20.06.2024 08:49

Rechner-Cluster - Update of Frontend Nodes

Mittwoch 26.06.2024 08:00 - Mittwoch 26.06.2024 10:00

The dialog nodes (i.e. login23-1/2/3/4, login23-x-1/2) will be updated to Rocky 8.10 today within the weekly reboot. The upgrade of copy23-1/2 will follow.

Mo 17.06.2024 05:08

Updates

The copy frontend nodes (copy23-1, copy23-2) will be updated to Rocky Linux 8.10 during the cluster maintanance ond 2024-06-26.

Mo 24.06.2024 09:13

The update of the remaining frontend nodes is completed.

Mi 26.06.2024 11:12

Rechner-Cluster -

Dienstag 25.06.2024 10:00 - Dienstag 25.06.2024 17:00

Due to technical problems, it is not possible to create/change/delete HPC-Accounts or projects. We are working on that issue

Di 25.06.2024 16:11

Rechner-Cluster - Error on user/project management

Donnerstag 20.06.2024 10:00 - Montag 24.06.2024 10:32

Due to technical problems, it is not possible to create/change/delete HPC-Accounts or projects. We are working on that issue

Do 20.06.2024 12:09

Updates

The issue has been resolved.

Mo 24.06.2024 10:33

Rechner-Cluster - Project management

Mittwoch 29.05.2024 15:30 - Mittwoch 12.06.2024 16:30

During this period no RWTH-S, THESIS, LECTURE or WestAI projects can be granted. We apologize for the inconvenience.

Mi 29.05.2024 15:42

Rechner-Cluster - RegApp Maintenance

Mittwoch 12.06.2024 09:00 - Mittwoch 12.06.2024 10:00

Due to maintenance of the RegApp Identiy Provider, it is not possible to establish new connections to the cluster during the specified period. Existing connections and batch operation are not affected by the maintenance.

Technische Erläuterung

The RegApp is moved to a new server with a new operating system.

Di 04.06.2024 14:28

Rechner-Cluster - Deactivation of User Namespaces

Mittwoch 27.03.2024 08:15 - Montag 29.04.2024 18:00

(German version below) Due to an open security issue we are required to disable the feature of so-called user namespaces on the cluster. This feature is mainly used by containerization software and affects the way apptainer containers will behave. The changes are effective immediately. Most users should not experience any interruptions. If you experience any problems, please contact us as usual via servicedesk@itc.rwth-aachen.de with a precise description of the features you are using. We will reactivate user namespaces as soon as we can install the necessary fixes for the aforementioned vulnerability. --- Aufgrund eines ausstehenden Sicherheitsproblems müssen wir sogenannte Usernamespaces auf dem Cluster vorübergehend deaktivieren. Dieses Feature wird hauptsächlich von Containervirtualisierungssoftware wie Apptainer genutzt, und die Abschaltung hat einen Einfluss darauf, wie diese Container intern aufgesetzt werden. Die meisten Nutzer sollten von diesen Änderungen nicht direkt betroffen sein und nahtlos weiterarbeiten können. Sollten Sie dennoch Probleme entdecken, kontaktieren Sie uns bitte via servicedesk@itc.rwth-aachen.de und schildern Sie uns, wie konkret Sie Ihre Container starten. Sobald wir einen Patch für die Sicherheitslücke einspielen können, werden wir User Namespaces wieder aktivieren.

Mi 27.03.2024 08:14

Updates

A kernel update addressing the issue was released upstream and will be available to the compute cluster, soon. Upon the update, usernamespaces can be enabled, again.

Do 04.04.2024 11:11

We are planning to re-enable user namespaces on April, 29th after some final adjustments

Mi 24.04.2024 17:22

Rechner-Cluster - Performance Problems on HPCWORK

Montag 08.04.2024 11:00 - Mittwoch 24.04.2024 17:00

We currently register recurring performance degradations on HPCWORK directories which might be partly worsened by the on-going migration process leading on to the filesystem migration on April, 17th. The problems cannot be traced back to a single cause but are actively investigated.

Fr 12.04.2024 11:35

Updates

Due to technical problems, we will have to postpone the maintenance (and the final lustre migration step) to 23.04.2024 07:00.

Di 16.04.2024 16:21

Rechner-Cluster - System Maintenance

Dienstag 23.04.2024 07:00 - Mittwoch 24.04.2024 12:00

The whole clusters needs to be updated with a new kernel such that user namespaces can be reenabled again, please compare https://maintenance.itc.rwth-aachen.de/ticket/status/messages/14/show_ticket/8929 Simultaneously the Infiniband Stack will be updated for better performance and stability. During this maintenance, the dialog systems and the batchsystem will not be available. The dialog systems are expected to be reopened in the early morning. We do not believe that the maintenance will last the whole day but expect the cluster to open earlier.

Mi 10.04.2024 11:22

Updates

Due to technical problems, we will have to postpone the maintenance to 23.04.2024 07:00.

Di 16.04.2024 16:22

Unfortunately, unplanned complications have arisen during maintenance, so that maintenance will have to be extended until midday tomorrow. We will endeavor to complete the work by then. We apologize for any inconvenience this may cause.

Di 23.04.2024 16:27

Rechner-Cluster - Migration from lustre18 to lustre22

Dienstag 23.04.2024 07:00 - Mittwoch 24.04.2024 12:00

In the last weeks, we started migrating all HPCWORK data to a new filesystem. In this Maintenance we will do the final migration step. HPCWORK will not be available during this maintenance.

Mi 10.04.2024 11:26

Updates

Due to technical problems, we will have to postpone the maintenance (and the final lustre migration step) to 23.04.2024 07:00.

Di 16.04.2024 16:23

Rechner-Cluster - HPC JupyterHub update

Dienstag 23.04.2024 07:00 - Mittwoch 24.04.2024 12:00

During the Claix HPC System Maintenance, the HPC JupyterHub will be updated to a newer version. This will improve Claix 2023 support as well mandatory security updates. The whole clusters needs to be updated with a new kernel.

Di 23.04.2024 07:03

Updates

The migration was successfully completed.

Mi 24.04.2024 13:40

Rechner-Cluster - Top500 - Benchmark

Donnerstag 11.04.2024 17:00 - Freitag 12.04.2024 09:10

During the stated time Claix-2023 will not be available due to a benchmark run for the Top500 list[1]. Batch jobs which cannot finish before the start of this downtime or which are scheduled during this time period will be kept in queue and started after the cluster resumes operation. [1] https://www.top500.org

Do 11.04.2024 17:09

Updates

The nodes are available now again

Fr 12.04.2024 09:27

Rechner-Cluster - Longer waiting times in the ML partition

Mittwoch 03.04.2024 16:00 - Donnerstag 11.04.2024 13:11

There are currently longer waiting times in the ML partition as the final steps of the acceptance process are still being carried out.

Do 04.04.2024 10:09

Updates

The waiting times should be better now

Do 11.04.2024 13:11

Rechner-Cluster - RegApp Service Update

Mittwoch 03.04.2024 14:00 - Mittwoch 03.04.2024 14:30

+++ German version below +++ The RegApp will be updated on 2024-04-03. During the update window, the service will be unavailable for short time intervals. Active sessions should not be affected. +++ English version above +++ Am 03.04.2024 wird die RegApp aktualisiert. Während des Updatefensters kann der Dienst für kurze Zeit unterbrochen sein. Aktive Sitzungen sollten nicht betroffen sein.

Mi 27.03.2024 13:59

Rechner-Cluster - Problems with submitting jobs

Mittwoch 03.04.2024 12:00 - Mittwoch 03.04.2024 14:03

There are currently problems when submitting jobs. We are working on fixing the problems and apologize for the inconvenience.

Mi 03.04.2024 12:36

Updates

The problem is solved now.

Mi 03.04.2024 14:03

Rechner-Cluster - Deactivation of User Namespaces

Freitag 12.01.2024 10:30 - Donnerstag 08.02.2024 08:00

(German version below) Due to an open security issue we are required to disable the feature of so-called user namespaces on the cluster. This feature is mainly used by containerization software and affects the way apptainer containers will behave. The changes are effective immediately. Most users should not experience any interruptions. If you experience any problems, please contact us as usual via servicedesk@itc.rwth-aachen.de with a precise description of the features you are using. We will reactivate user namespaces as soon as we can install the necessary fixes for the aforementioned vulnerability. Update: We have installed a bugfix release for the affected software component and enabled user namespaces again. --- Aufgrund eines ausstehenden Sicherheitsproblems müssen wir sogenannte User Namespaces auf dem Cluster vorübergehend deaktivieren. Dieses Feature wird hauptsächlich von Containervirtualisierungssoftware wie Apptainer genutzt und die Abschaltung hat einen Einfluss darauf, wie diese Container intern aufgesetzt werden. Die meisten Nutzer sollten von diesen Änderungen nicht direkt betroffen sein und nahtlos weiterarbeiten können. Sollten Sie dennoch Probleme entdecken, kontaktieren Sie uns bitte via servicedesk@itc.rwth-aachen.de und schildern Sie uns, wie konkret Sie Ihre Container starten. Sobald wir einen Patch für die Sicherheitslücke einspielen können, werden wir User Namespaces wieder aktivieren. Update: Wir haben einen Bugfix für die betroffene Softwarekomponente installiert und User Namespaces wieder aktiviert.

Fr 12.01.2024 10:43

Rechner-Cluster - Verzeichnis "hpcwork" ist leer // hpcwork directory is empty

Montag 29.01.2024 10:15 - Montag 29.01.2024 11:34

Zurzeit werden keine Daten auf /hpcwork angezeigt. Die Fachabteilung ist informiert und arbeitet an der Lösung. ---english--- At the moment, no data are shown on /hpcwork. We are working on a solution of the problem.

Mo 29.01.2024 10:26

Updates

Die Störung wurde behoben. // The problem has been solved.

Mo 29.01.2024 11:34

Rechner-Cluster - Scheduled Reboot of CLAIX18 Copy Nodes

Montag 29.01.2024 06:00 - Montag 29.01.2024 07:15

Both CLAIX18 copy nodes will be rebooted on Monday, January 29th, 6.00 am (CET) due to a scheduled kernel upgrade. The systems will temporarily unavailable and cannot be used until the kernel update is finished.

Fr 26.01.2024 17:15

Rechner-Cluster - Netzwerkprobleme

Freitag 19.01.2024 19:45 - Samstag 20.01.2024 09:30

Aufgrund von Netzwerkromplemen kann es im angegeben Zeitraum zu Problemen bei der Nutzung des Clusters gekommen sein.

Mo 22.01.2024 07:45

Rechner-Cluster - Two-factor authentication is again mandatory on login18-4

Montag 09.10.2023 11:00 - Montag 15.01.2024 10:30

For the login to login18-4.hpc.itc.rwth-aachen.de it is again mandatory to use two-factor authentication. For details see https://help.itc.rwth-aachen.de/service/rhr4fjjutttf/article/475152f6390f448fa0904d02280d292d/

Mo 09.10.2023 11:28

Rechner-Cluster - Verbindung mit dem Windows-Cluster nicht möglich///Connection to windows cluster ist no possible

Freitag 29.12.2023 14:45 - Montag 01.01.2024 00:00

Momentan kann keine Verbindung zum Windows-Cluster hergestellt werden. Die Kollegen sind informiert und arbeiten an der Behebung des Problems. -- english -- At the moment it is not possible to connect to the windows cluster. We are working on a solution of the problem.

Fr 29.12.2023 14:55

Updates

--English Version Below-- Die Störung konnte behoben werden. Eine Verbindung mit dem Windows-Cluster ist wieder möglich. --English Version-- The error has been resolved. You can connect to the Windows cluster again.

Mi 03.01.2024 11:46

Rechner-Cluster - jupyterhub.hpc.itc.rwth-aachen.de DNS Temporary out of Service

Donnerstag 14.12.2023 15:30 - Donnerstag 14.12.2023 15:55

The jupyterhub.hpc.itc.rwth-aachen.de DNS is Temporary out of Service for 20 Minutes. Problems accessing the hpc JupyterHub might arise from this failure. Please wait until the system comes back online.

Do 14.12.2023 15:33

Rechner-Cluster - DGX-2 Node nd20-02 unavailable

Montag 27.11.2023 00:00 - Dienstag 12.12.2023 08:00

Der DGX-2-Knoten nd20-02 wird voraussichtlich Montag, den 27.11. und Dienstag, den 28.11. ganztägig nicht zur Verfügung stehen. Grund hierfür ist das Betriebssystemupdate auf Rocky 8. -- The DGX-2 node nd20-02 will not be available on Monday (27.11.) and Tuesday (28.11.) for the whole day. We will be updating the operating system to Rocky 8 in the specified time

Di 21.11.2023 12:21

Updates

The node needs to be reinstalled and cannot be used until further notice.

Di 28.11.2023 12:51

The update of the system was successful.

Di 12.12.2023 07:59

Rechner-Cluster - Wartung HPC-Benutzerverwaltung

Dienstag 05.12.2023 10:00 - Dienstag 05.12.2023 12:00

Aufgrund von Wartungsmassnahmen erfolgt das Einrichten von HPC-Accounts verzoegert. Passwort-Aenderungen sind nicht moeglich.

Di 05.12.2023 09:55

Rechner-Cluster - login18-x-2 gestoert

Montag 27.11.2023 12:45 - Dienstag 28.11.2023 14:40

login18-x-2 ist defekt und steht deshalb aktuell nicht zur Verfuegung.

Di 28.11.2023 12:50

Updates

Das System ist wieder ok.

Di 28.11.2023 14:40

Rechner-Cluster - System Maintenance & Upgrade to Rocky 8.9

Montag 27.11.2023 08:00 - Montag 27.11.2023 14:00

The complete cluster will not be available from 8am to 12am due to system maintenance. Within the maintenance, the HPC Cluster will be upgraded to Rocky 8.9.

Fr 17.11.2023 08:00

Updates

Due to technical problems, we have to postpone the maintenance to next week monday

Di 21.11.2023 11:54

due to technical problems, we have to prolong the maintenance

Mo 27.11.2023 11:34

The maintenance could be finished successfully

Mo 27.11.2023 14:57

Rechner-Cluster - Login problems regapp

Montag 23.10.2023 12:00 - Donnerstag 26.10.2023 12:00

Currently, some users receive an error message after logging into the regapp application.. We are already working on a solution. --- Aktuell kommt es bei einigen Nutzern nach dem Login in die Regapp zu einer Fehlermeldung. Wir arbeiten bereits an einer Lösung.

Mi 25.10.2023 13:00

Rechner-Cluster - Unterbrechung des Batchbetriebs

Dienstag 17.10.2023 07:30 - Dienstag 17.10.2023 10:50

Am 17.10 finden Wartungsarbeiten an der Klimaanlage der Maschinenhalle statt. Aus diesem Grund muss der Batchbetrieb im angegeben Zeitraum angehalten werden und der Cluster leer laufen. Nach den Wartungsarbeiten wird der Batchbetrieb automatisch wieder gestartet. --- Maintenance work on the air conditioning system of the machine hall will take place on 17.10. For this reason, batch operation must be stopped in the specified period and the cluster must run empty. After the maintenance work, batch operation will be restarted automatically.

Mo 18.09.2023 14:56

Updates

The maintenance is completed. Jobs are scheduled and executed again. -- Die Wartung ist abgeschlossen. Jobs werden wieder gescheduled und ausgeführt.

Di 17.10.2023 10:53

Rechner-Cluster - Interruption of HPC Service due to Network Maintenance

Dienstag 17.10.2023 09:00 - Dienstag 17.10.2023 10:15

Due to a network maintenance in the IT Center building SW23, the HPC Service will be temporarily suspended. During the maintenance, the cluster (including all frontend nodes) will not be available. -- Wegen Wartungsarbeiten am Netzwerk im IT-Center SW23 wird der HPC-Betrieb vorübergehend unterbrochen. Während der Wartung ist der Cluster (alle Frontendknoten einbegriffen) nicht erreichbar.

Di 17.10.2023 08:15

Updates

The network maintenance is completed. Until all services of the cluster are restored, the HPC service will remain suspended.

Di 17.10.2023 09:18

The cluster is reachable again.

Di 17.10.2023 10:48

Rechner-Cluster - Temporary Shutdown of Lustre18 & Reboot of Frontend Nodes

Dienstag 17.10.2023 07:30 - Dienstag 17.10.2023 10:10

Lustre18 will be temporarily shut down during the maintanance. The frontend nodes will be mandatorily rebooted. -- Lustre18 wird während der Wartung temporär gestoppt. Die Frontendknoten werden erforderlicherweise neu-gestartet.

Di 17.10.2023 07:38

Rechner-Cluster - gnome-terminal laesst sich nicht starten

Dienstag 30.05.2023 13:15 - Dienstag 10.10.2023 13:25

Aktuell laesst sich auf den HPC-Dialogsystemen das Programm gnome-terminal nicht direkt starten. Wir versuchen aktuell noch herauszufinden, was das Problem ist. Bitte nutzen Sie ersatzweise ein anderes Terminal-Programm wie xterm, mate-terminal oder xfce-terminal. Evtl. ist gnome-terminal auch als Default-Terminal-Applikation in ihrer Desktop-Umgebung eingestellt. In diesem Fall passiert nichts, wenn Sie auf das Terminal-Icon druecken. Sie muessten dann ebenfalls ein anderes Terminal-Programm als Default-Applikation konfigurieren: Currently the program gnome-terminal cannot be started directly on the HPC dialog systems. We are still trying to find out what the problem is. Please use another terminal program like xterm, mate-terminal or xfce-terminal instead. Maybe gnome-terminal is also set as default terminal application in your desktop environment. In this case nothing happens when you press the terminal icon. You would have to configure another terminal program as default application as well: MATE: System - Preferences - Preferred Applications - System - Terminal Emulator XFCE: Applications - Settings - Default Applications - Utilities - Terminal Emulator

Di 30.05.2023 13:37

Rechner-Cluster - Update Nvidia DGX-System

Mittwoch 04.10.2023 07:00 - Montag 09.10.2023 18:00

One of the DGX-2 systems (nd20-01) will be temporarily unavailable due to a scheduled maintenance. We will be updating the system to Rocky Linux 8.8. Eines der DGX-2-Systeme (nd20-01) wird aufgrund geplanter Wartungsarbeiten vorübergehend nicht verfügbar sein. Wir werden das System auf Rocky 8.8 aktualisieren Update: Due to unforeseen problems, the maintenance has to be extended until Monday. We apologize for the inconvenience. Aufgrund unvorhergesehener Probleme müssen die Wartungsarbeiten bis Montag fortgesetzt werden. Wir bitten die Unannehmlichkeiten zu entschuldigen.

Do 28.09.2023 16:12

Rechner-Cluster - Reboot of copy18-1 and copy18-2

Montag 02.10.2023 06:00 - Montag 02.10.2023 06:30

The two systems copy18-1 and copy18-2 will be rebooted for maintenance reasons.

Do 28.09.2023 13:05

Rechner-Cluster - Login - Node: login18-2

Dienstag 26.09.2023 07:00 - Dienstag 26.09.2023 15:00

Login - Node login18-2 steht am Dienstag 26.09 von 7 Uhr bis 15 Uhr nicht zur Verfügung. Es werden Arbeiten zur Verbesserung der Netzwerk-Stabilität durchgeführt. Login - Node login18-2 will not be available on Tuesday 26.09. from 7 a.m. to 3 p.m. Work is being carried out to improve network stability.

Mo 25.09.2023 09:50

Rechner-Cluster - HPC services may be disrupted

Mittwoch 13.09.2023 18:00 - Donnerstag 14.09.2023 12:00

HPC services may be disrupted currently, e.g. it may not be possible to login to our dialog nodes, to start JupyterLab notebooks or to submit batch jobs. We are working on fixing the issue.

Do 14.09.2023 09:52

Updates

The problems are solved.

Do 14.09.2023 14:27

Rechner-Cluster - Login - Node: login18-4

Mittwoch 30.08.2023 06:00 - Mittwoch 30.08.2023 10:15

Login - Node login18-4 steht am Mittwoch 30.08 von 6 Uhr bis 15 Uhr nicht zur Verfügung. Es werden Arbeiten zur Verbesserung der Netzwerk-Stabilität durchgeführt. Login - Node login18-4 will not be available on Wednesday 30.08. from 6 a.m. to 3 p.m. Work is being carried out to improve network stability.

Di 29.08.2023 15:19

Rechner-Cluster - Login - Nodes: login18-2, login18-g-2, login18-3

Dienstag 29.08.2023 06:00 - Dienstag 29.08.2023 15:00

Login - Nodes: login18-2, login18-g-2, login18-3 stehen am Dienstag 29.08 von 6 Uhr bis 15 Uhr nicht zur Verfügung. Es werden Arbeiten zur Verbesserung der Netzwerk-Stabilität durchgeführt. Login - Nodes: login18-2, login18-g-2, login18-3 will not be available on Tuesday 29.08. from 6 a.m. to 3 p.m. Work is being carried out to improve network stability.

Di 29.08.2023 06:36

Rechner-Cluster - Login - Nodes: login18-x-1, login18-g-1, login18-2

Montag 28.08.2023 06:00 - Montag 28.08.2023 14:00

Login - Nodes: login18-x-1, login18-g-1, login18-2 stehen am Montag 28.08 von 6 Uhr bis 14 Uhr nicht zur Verfügung. Es werden Arbeiten zur Verbesserung der Netzwerk-Stabilität durchgeführt. Login - Nodes: login18-x-1, login18-g-1, login18-2 will not be available on Monday 28.08. from 6 a.m. to 2 p.m. Work is being carried out to improve network stability.

Mo 28.08.2023 07:47

Rechner-Cluster - Slurm - Jobs not completing

Montag 17.07.2023 17:00 - Mittwoch 26.07.2023 12:00

Currently, we strongly recommend using IntelMPI instead of OpenMPI because OpenMPI jobs currently crash non-deterministically or remain in a "completing" state and do not complete successfully.

Di 18.07.2023 12:39

Updates

We have identified the root of the issue and are currently working on reverting the batch nodes to a working configuration. This might lead to slightly prolonged waiting times for new jobs. We will update this incident message as soon as all batch nodes are finished with the procedure.

Do 20.07.2023 13:41

The affected batch nodes are fully operational again.

Mi 26.07.2023 19:15

Rechner-Cluster - Maintenance for the RWTH JARDS online submission system

Dienstag 25.07.2023 07:00 - Dienstag 25.07.2023 17:00

The JARDS online submission system for filing applications for RWTH computing projects will be unavailable on 25.07.2023 between 7:00 and 17:00.

Mo 24.07.2023 11:09

Rechner-Cluster - Komplettwartung des Clusters

Montag 17.07.2023 07:00 - Montag 17.07.2023 17:00

In der Wartung wird das aktuelle Betriebssystem Rocky Linux 8.7 auf Rocky Linux 8.8 aktualisiert. Auch die Frontends werden aktualisiert, so dass Sie nicht in der Lage sein werden, sich in den Cluster einzuloggen oder Zugriff auf Ihre Daten zu erhalten. Hierfuer gilt allerdings eine Ausnahme. Die MFA-Testmaschine login18-4 wird erreichbar bleiben, man kann sich dort jedoch nur mit einem zweiten Faktor [1] einloggen. Zeitweise wird aber auch hier $HPCWORK nicht erreichbar sein, da auch das Lustre Filesystem einer Wartung unterzogen wird. Wir gehen nicht davon aus, dass Sie Ihre Software neu kompilieren oder Ihre Jobskripte aendern muessen. Ihre Jobs sollten also nach dem Ende der Wartungsarbeiten normal anlaufen.

Di 11.07.2023 09:42

Rechner-Cluster - Windows Frontends not available

Mittwoch 14.06.2023 11:00 - Mittwoch 14.06.2023 12:00

The Windows dialog systems (cluster-win.rz.rwth-aachen.de) will not be available due to a necessary relocation of the server hardware.

Mo 12.06.2023 15:56

Rechner-Cluster - Login problems with login18-x-1

Freitag 19.05.2023 14:00 - Donnerstag 25.05.2023 14:45

Several users are currently experiencing difficulties logging in to the login18-x-1 frontend. We are investigating the problem. For the meantime, please use login18-x-2 instead.

Fr 19.05.2023 14:06

Updates

Due to network problems the login to login18-x-1 has been deactivated until further notice.

Mo 22.05.2023 14:31

error fixed

Do 25.05.2023 14:49

Rechner-Cluster - cluster : Login - Nodes: login18-x-1, login18-g-1, login18-2

Montag 22.05.2023 08:00 - Montag 22.05.2023 11:45

Login - Nodes: login18-x-1, login18-g-1, login18-2 stehen am Montag 22.05 von 8 Uhr bis 14 Uhr nicht zur Verfügung. Es werden Arbeiten zur Verbesserung der Netzwerk-Stabilität durchgeführt. Login - Nodes: login18-x-1, login18-g-1, login18-2 will not be available on Monday 22.05. from 8 a.m. to 2 p.m. Work is being carried out to improve network stability.

Fr 19.05.2023 14:08

Rechner-Cluster - Data Transfer Node copy18-2

Dienstag 16.05.2023 09:45 - Mittwoch 17.05.2023 15:00

Data Transfer Node copy18-2 steht von Dienstag 16.5. 9:45 Uhr bis Mittwoch 17.5. 15:00 Uhr nicht zur Verfügung. Es werden Arbeiten zur Verbesserung der Netzwerk-Redundanz durchgeführt. --- Data Transfer Node copy18-2 will not be available from Tuesday 16.5. 9:45 a.m. to Wednesday 17.5. 3:00 p.m. Work will be done to improve network redundancy.

Mo 08.05.2023 15:55

Rechner-Cluster - Maintenance of whole cluster

Montag 15.05.2023 13:00 - Montag 15.05.2023 14:00

there will be a maintenance of our ACI tenant, which results in a network interrupt of all our VMs, including LDAP, Kerberos, cvmfs etc. pp. Thus, it is not possible to login during the maintenance, we would also expect that already logged in people could face problems. Regarding the runnning jobs (if there are some) we do not know, how this will be influenced exactly, but hope, that they can run through as expected.

Fr 05.05.2023 12:42

Updates

We will have to postpone the maintenance. A new timeslot still needs to be found, so take the changed time as preliminary timeslot.

Di 09.05.2023 10:16

The maintenance will take place on Monday 15.05.2023 from 13:00 to 14:00

Do 11.05.2023 07:26

Rechner-Cluster - Migration from CentOS 7 Linux to Rocky 8 Linux

Mittwoch 08.03.2023 07:30 - Dienstag 02.05.2023 17:00

-- english version below -- In der angegebenen Zeit findet die Umstellung des Cluster von CentOS 7 auf Rocky 8 statt. Dabei werden von Woche zu Woche weitere Systeme mit Rocky 8 neuinstalliert und im Batchbetrieb zur Verfügung gestellt. Durch diese Umstellung kann es zu höheren Wartezeiten im Batchbetrieb kommen. Mehr Informationen finden Sie auf der folgenden Seite: https://help.itc.rwth-aachen.de/service/rhr4fjjutttf/article/c3735af4173543b9b14a3f645a553e8a/ --- In the given time the changeover of the cluster from CentOS 7 to Rocky 8 takes place. From week to week more systems will be reinstalled with Rocky 8 and made available in batch mode. Due to this changeover, there may be longer waiting times in batch mode. More information can be found on the following page: https://help.itc.rwth-aachen.de/service/rhr4fjjutttf/article/c3735af4173543b9b14a3f645a553e8a/

Di 28.03.2023 10:18

Rechner-Cluster -

Freitag 21.04.2023 09:20 - Freitag 21.04.2023 10:00

Incident JARDS online submission system

Technische Erläuterung

Aufgrund von einem ungeplanten Neustart vieler virtuellen Maschinen war der Zugang gestört.

Fr 21.04.2023 10:59

Rechner-Cluster -

Donnerstag 13.04.2023 10:00 - Donnerstag 13.04.2023 12:10

Data Transfer Node copy18-2 steht am Donnerstag 13.04. zwischen 10:00 Uhr und 15:00 Uhr nicht zur Verfügung. Es werden Arbeiten zur Verbesserung der Netzwerk-Redundanz durchgeführt. --- Data Transfer Node copy18-2 will not be available on Thursday 13.04. between 10:00 and 15:00. Work is being carried out to improve network redundancy.

Do 06.04.2023 09:47

Rechner-Cluster - Linux migration on some dialog systems

Mittwoch 08.03.2023 07:00 - Mittwoch 08.03.2023 12:00

In this mantenance we will switch the operationg system from CentOS 7 to Rocky 8 on the following dialog systems: login18-2.hpc.itc.rwth-aachen.de login18-3.hpc.itc.rwth-aachen.de login18-x-2.hpc.itc.rwth-aachen.de login18-g-2.hpc.itc.rwth-aachen.de copy18-2.hpc.itc.rwth-aachen.de More backgroud information concerning this change will be provided on the rz-cluster mailinglist.

Mi 01.03.2023 08:15

Rechner-Cluster - Wartung fuer copy18-2

Dienstag 07.03.2023 12:30 - Mittwoch 08.03.2023 08:00

Das System steht in dem Wartungszeitraum nicht zur Verfuegung. Bitte weichen Sie auf copy18-1 aus.

Di 07.03.2023 12:38

Rechner-Cluster - Wartung fuer copy18-2, login18-2, login18-3 und login18-g-2

Montag 06.03.2023 06:00 - Dienstag 07.03.2023 13:00

Die Dialog-Systeme copy18-2, login18-2, login18-3 und login18-g-2 stehen in dem Wartungszeitraum nicht zur Verfuegung. Bitte weichen Sie auf eines der anderen Dialog-Systeme aus.

Fr 03.03.2023 14:11

Rechner-Cluster -

Mittwoch 22.02.2023 12:00 - Mittwoch 22.02.2023 15:15

Waehrend der Wartung koennen keine neuen Passworte fuer den HPC-Dienst gesetzt werden. Neue HPC-Accounts werden erst mit Ablauf der Wartung tatsaechlich eingerichtet.

Mi 22.02.2023 09:59

Updates

Wartung muss leider verlaengert werden

Mi 22.02.2023 13:54

Die Wartung ist abgeschlossen.

Mi 22.02.2023 15:12

Rechner-Cluster - Kein Zugriff auf Home-Verzeichnisse ueber SMB

Montag 13.02.2023 08:45 - Montag 13.02.2023 12:30

Der Zugriff auf die Home-Verzeichnisse des RWTH Compute Cluster von Windows-Clients aus ist derzeit aus Wartungsgruenden nicht moeglich.

Mo 13.02.2023 08:52

Updates

Zugriff funktioniert wieder.

Mi 15.02.2023 15:15

Rechner-Cluster - System Maintenance

Montag 13.02.2023 07:00 - Montag 13.02.2023 08:45

To improve the backupability of the HOME file system, we need to restructure the home directories. To do this, login to the frontend nodes is disabled, no Slurm jobs are executed, and you cannot check the status of your jobs.

Mo 06.02.2023 09:17

Updates

Die Systemwartung ist beendet.

Mo 13.02.2023 08:53

Rechner-Cluster - Starten von FastX-Web-Sessions auf login18-x-1 schlaegt fehl

Donnerstag 02.02.2023 16:00 - Montag 13.02.2023 07:00

Aktuell schlägt unter Umständen das Starten einer Web-Desktop-Session auf http://login18-x-1.hpc.itc.rwth-aachen.de:3000/ fehl. Bitte melden Sie sich stattdessen auf http://login18-x-2.hpc.itc.rwth-aachen.de:3000/ an oder nutzen Sie den FastX-Desktop-Client, s. https://help.itc.rwth-aachen.de/service/rhr4fjjutttf/article/25f576374f984c888bb2a01487fef193/

Do 02.02.2023 16:16

Rechner-Cluster - Wartung JARDS

Donnerstag 09.02.2023 08:00 - Donnerstag 09.02.2023 09:00

JARDS online submission system for NHR projects (NHR large, NHR normal, Prep) will not be available on 09.02.23 between 8-9 o'clock.

Mi 08.02.2023 10:24

Rechner-Cluster - Rechner-Cluster - Wartung JARDS

Montag 06.02.2023 07:00 - Montag 06.02.2023 09:00

JARDS online submission system for NHR projects (NHR large, NHR normal, Prep) will not be available on 06.02.23 between 7-9 o'clock.

Di 17.01.2023 13:44

Rechner-Cluster - Rechner-Cluster - Wartung JARDS

Montag 30.01.2023 07:00 - Montag 30.01.2023 09:00

Please note that the JARDS online submission system for filing applications for RWTH computing projects (rwth small, rwth thesis and rwth lecture) will not be available on 30.01.23 between 7-9 o'clock.

Mo 23.01.2023 07:51

Rechner-Cluster - Rechner-Cluster - Wartung JARDS

Donnerstag 26.01.2023 07:00 - Donnerstag 26.01.2023 08:00

Please note that the JARDS online submission system for filing applications for NHR projects (NHR large, NHR normal, Prep) and RWTH computing projects (rwth small, rwth thesis and rwth lecture) will not be available on 26.01.23 between 7-8 o'clock.

Di 17.01.2023 13:41