Hi,
First of all thank you so much for the release of cisTEM, it looks very promising.
I am trying to set cisTEM up with our queuing system, SLURM. In the documentation (https://cistem.org/documentation#tab-1-15) there is some information on how to do this.
Although, the descried way seems to submit every single process as a individual SLURM job, resulting in multiple nodes being fire up to run only a single process each.
This seems suboptimal and on top of that, it adds a significant load on the SLURM head node.
To make better usage of resources one could make a special partition e.g (cisTEM) in SLURM, which enables job sharing on all the cpu threads (e.g. "SHARED=YES:72" if machines in partition has 72 cpu thread each). Then the cisTEM job command would look something like this: "srun -n 1 --share -p cisTEM -o /dev/null /cisTEM_bin_directory/$command".
This would to some degree work, but it would still generate a lot of noise in the queue and a lot of cross talk if other cisTEM-SLURM jobs are started simultaneously.
At the moment we are running cisTEM through "srun.x11" which gives an option to run GUI jobs on a allocated SLURM node. Only downside to this is that users need to close cisTEM and exit the terminal for the SLURM allocation to terminate. A reasonable wall need to be set due to this.
I know cisTEM is not OpenMPI compatible, but it could be nice if it was possible to submit a single srun/sbatch job to a single node that uses all the threads on that node.
So "No. Copies #" was replaced by "-n #". Does that make sense?
I was hoping that everybody running cisTEM on SLURM could comments on my thoughts and especially if you have a better solution on running cisTEM through SLURM.
Cheers,
Jesper
Hi Jesper,
I don't know if you've seen it, but there is some information from another user (Craig Yoshioka) with a slurm cluster here :-
https://cistem.org/frequently-asked-questions#tab-1-3
In a future version we will incorporate a more flexible run profile system, which will allow you to specify the number of jobs that each run profile contributes, and add a $number_of_jobs variable. This way, you could even use mpirun to launch the jobs.
Cheers,
Tim
Thank you so much Tim,
This works exellent.
Sorry I did not see the FAQ part on SLURM.
Cheers,
Jesper