In der angegebenen Zeit findet eine kurze Wartung der RegApp statt. In dieser Zeit ist kein Login in der RegApp möglich.
We are migrating the slurm controller to a new host. It might come to short timeouts. We try to minimize that as much as possible.
The first try was not successfull, we are on the old master again. We are analyzing the problems that occurred and try again later.
we make another attempt
There may currently be login problems with various login nodes. We are working on a solution.
The observed issues affect the batch service as well. Consequently many batch jobs may have failed.
The observed problems can be concluded from power issues. We cannot exclude that further systems may have to be controlled shut down temporarily due to a problem resolution if required. However, we hope the issues can be resolved without any additional measures.
The cluster can be accessed again. cf. ticket https://maintenance.itc.rwth-aachen.de/ticket/status/messages/14/show_ticket/9521 for further details. Several nodes, however, are still unavailable due to the consequences of the aforementioned issues. We are currently working on resolving the issues.
Due to to be analyzed infiniband problems many nodes including the whole GPU cluster are not reachable at the moment. We are working together with the manufacturer, to solve the problems.
The problem could be fixed