Rechner-Cluster

You can find more information about the service in our documentation portal.

Batch System Controller Unavailable

Partial Outage
Wednesday 12/10/2025 04:30 PM - Wednesday 12/10/2025 06:40 PM

We are experiencing issues with Slurm's batch controller which causes all Slurm commands (sbatch, squeue etc.) to time out. Our team is investigating the issue and working on a quick solution. Until then, job submission will not be possible. Running jobs might run as usual but exit with failures if connections to the controller fails. Results might still be valid.

10.12.2025 17:00
Updates
We have singled out the problem and are still working on a solution. For the time being, job submissions are disabled to keep the controller in a stable state for all running jobs.
10.12.2025 17:45
The batch system is fully operational again. Please review all of your jobs that ran during the time frame above and resubmit them in case of failures. We apologize for the inconvenience.
10.12.2025 19:16

Maintenance of login23-g-1 due to GPU errors

Partial Maintenance
Wednesday 12/10/2025 08:00 AM - Wednesday 12/10/2025 03:25 PM

Due to repeated issues with the GPUs on the GPU dialog node login23-g-1, the node will be under shorthand maintenance on 2025-12-10 and cannot be used until further notice.
A technician will perform an on-site hardware diagnosis and component modifications which require the node to be shut down.

Please consider using an interactive GPU job or the Jupyterhub with the interactive node n23i0001 for short GPU computations.

01.12.2025 09:50
Updates
The maintenance tasks are finished for today.
10.12.2025 15:30

Slurm downtime

Partial Maintenance
Monday 12/01/2025 07:00 AM - Monday 12/01/2025 09:09 AM

Monday morning we will have to restart our Slurm database and this will cause Slurm commands to be unavailable for the planned downtime.
Jobs will not be able to be submitted.

28.11.2025 13:12
Updates
Works ended. Any jobs that failed to submit should be resubmitted.
01.12.2025 09:10