-
Addition to Troubleshooting
Problem: A worker has status "draining" and jobs won't start anymore
- Restart the worker (either reboot or physical restart)
- Execute: sudo scontrol update nodename=<worker_id> state=idle
Edited by leonsick -
Problem: A worker has status "draining" and Reason SlurmSpoolDir is full.
- ssh onto the worker
- Execute docker system prune -af
- Check what is in /var/spool/slurmd and clean up what can be removed
- Set worker to status idle
Edited by leonsick
Please register or sign in to comment