Dear all, I am trying to run cistem on our HPC. These are my settings in the run profile:
Manager Command:
ssh -f login01.hpc.rockefeller.internal 'nohup /rugpfs/fs0/ruit/scratch/sbgrid/programs/x86_64-linux/cistem/1.0.0-beta/bin/$command'
Gui Address & Controller Address: Automatic
Command:
sbatch --export=c='$command' /store01/home/jbarandun/cistem/slurm.sh
No. Copies: 2, Delay 100
slurm.sh:
#!/bin/bash
##
## specify queue
##SBATCH -p normal
## run time
#SBATCH -t 202:00:00
## number of nodes
#SBATCH -N 3
## number of cores
#SBATCH -n 72
## error and output files
#SBATCH -o cistem.out
#SBATCH -e cistem.err
## Job Name
#SBATCH -J cistem
$c
The jobs are submitted to the scheduler, run for 30s or so then crash without error message. This is the output from cistem:
Res. limit for class #0 = 20.00
Res. limit for class #1 = 20.00
Res. limit for class #2 = 20.00
Launching Job...
(ssh -f login01.hpc.rockefeller.internal 'nohup /rugpfs/fs0/ruit/scratch/sbgrid/programs/x86_64-linux/cistem/1.0.0-beta/bin/cisTEM_job_control xx.xx.xx.xx,xx.xx.xx.xx,xx.xx.xx.xx 3004 6433352023334462')
Job Control : Executing 'sbatch --export=c='refine3d xx.xx.xx.xx,xx.xx.xx.xx,xx.xx.xx.xx 3005 6433352023334462' /store01/home/jbarandun/cistem/slurm.sh&' 2 times.
cistem.out is empty, cistem.err just contains one line:
Usage: refine3d [controller_address] [controller_port] [job_code]
I tried already different number of nodes and No. of copies.
Any idea what the problem could be?
Thanks a lot for the help,
Best
Jonas
In the gui does it print xx.xx.xx for the addresses? This implies that it cannot find the IP address of the machine?
Tim
no, I blanked out ip adresse by request of our it dept, sorry for not mentioning this
Hmm, the error suggest that the program is not being run with all 3 arguments.
I am not very familiar with running slurm jobs, have you seen the following pages :-
https://cistem.org/frequently-asked-questions#tab-1-3
and
https://cistem.org/documentation#tab-1-15
Tim