cistem on cluster (slurm, sbatch)

JBarandun

cistem on cluster (slurm, sbatch)

Dear all, I am trying to run cistem on our HPC. These are my settings in the run profile:

Manager Command:

ssh -f login01.hpc.rockefeller.internal 'nohup /rugpfs/fs0/ruit/scratch/sbgrid/programs/x86_64-linux/cistem/1.0.0-beta/bin/$command'

Gui Address & Controller Address: Automatic

Command:

sbatch --export=c='$command' /store01/home/jbarandun/cistem/slurm.sh

No. Copies: 2, Delay 100

slurm.sh:

#!/bin/bash
##
## specify queue
##SBATCH -p normal
## run time
#SBATCH -t 202:00:00
## number of nodes
#SBATCH -N 3
## number of cores
#SBATCH -n 72
## error and output files
#SBATCH -o cistem.out
#SBATCH -e cistem.err
## Job Name
#SBATCH -J cistem
$c

The jobs are submitted to the scheduler, run for 30s or so then crash without error message. This is the output from cistem:

Res. limit for class #0 = 20.00
Res. limit for class #1 = 20.00
Res. limit for class #2 = 20.00
Launching Job...
(ssh -f login01.hpc.rockefeller.internal 'nohup /rugpfs/fs0/ruit/scratch/sbgrid/programs/x86_64-linux/cistem/1.0.0-beta/bin/cisTEM_job_control xx.xx.xx.xx,xx.xx.xx.xx,xx.xx.xx.xx 3004 6433352023334462')
Job Control : Executing 'sbatch --export=c='refine3d xx.xx.xx.xx,xx.xx.xx.xx,xx.xx.xx.xx 3005 6433352023334462' /store01/home/jbarandun/cistem/slurm.sh&' 2 times.

cistem.out is empty, cistem.err just contains one line:

Usage: refine3d [controller_address] [controller_port] [job_code]

I tried already different number of nodes and No. of copies.

Any idea what the problem could be? 

Thanks a lot for the help,

Best

Jonas

 

Mon, 07/30/2018 - 20:20

timgrant

In the gui does it print xx.xx.xx for the addresses?  This implies that it cannot find the IP address of the machine?

Tim