cistem on cluster (slurm, sbatch)

4 posts / 0 new
Last post
JBarandun
cistem on cluster (slurm, sbatch)

Dear all, I am trying to run cistem on our HPC. These are my settings in the run profile:

Manager Command:

ssh -f login01.hpc.rockefeller.internal 'nohup /rugpfs/fs0/ruit/scratch/sbgrid/programs/x86_64-linux/cistem/1.0.0-beta/bin/$command'

Gui Address & Controller Address: Automatic

Command:

sbatch --export=c='$command' /store01/home/jbarandun/cistem/slurm.sh

No. Copies: 2, Delay 100

slurm.sh:

#!/bin/bash
##
## specify queue
##SBATCH -p normal
## run time
#SBATCH -t 202:00:00
## number of nodes
#SBATCH -N 3
## number of cores
#SBATCH -n 72
## error and output files
#SBATCH -o cistem.out
#SBATCH -e cistem.err
## Job Name
#SBATCH -J cistem
$c

The jobs are submitted to the scheduler, run for 30s or so then crash without error message. This is the output from cistem:

Res. limit for class #0 = 20.00
Res. limit for class #1 = 20.00
Res. limit for class #2 = 20.00
Launching Job...
(ssh -f login01.hpc.rockefeller.internal 'nohup /rugpfs/fs0/ruit/scratch/sbgrid/programs/x86_64-linux/cistem/1.0.0-beta/bin/cisTEM_job_control xx.xx.xx.xx,xx.xx.xx.xx,xx.xx.xx.xx 3004 6433352023334462')
Job Control : Executing 'sbatch --export=c='refine3d xx.xx.xx.xx,xx.xx.xx.xx,xx.xx.xx.xx 3005 6433352023334462' /store01/home/jbarandun/cistem/slurm.sh&' 2 times.

cistem.out is empty, cistem.err just contains one line:

Usage: refine3d [controller_address] [controller_port] [job_code]

I tried already different number of nodes and No. of copies.

Any idea what the problem could be? 

Thanks a lot for the help,

Best

Jonas

 

timgrant
In the gui does it print xx

In the gui does it print xx.xx.xx for the addresses?  This implies that it cannot find the IP address of the machine?

Tim

JBarandun
no, I blanked out ip adresse

no, I blanked out ip adresse by request of our it dept, sorry for not mentioning this

timgrant
Hmm, the error suggest that

Hmm, the error suggest that the program is not being run with all 3 arguments.

I am not very familiar with running slurm jobs, have you seen the following pages :-

https://cistem.org/frequently-asked-questions#tab-1-3

and

https://cistem.org/documentation#tab-1-15

Tim

Log in or register to post comments