refinement hangs after 2 iterations

8 posts / 0 new
Last post
sstagg
refinement hangs after 2 iterations

Hi all,

Sorry to trouble you with this. I have a refinement job that hangs about 1/2 through the third iteration. I'm using cisTEM on a 24 core machine that is mounting a GPFS file server. I have ~400,000 particles with a box size of 384. I successfully ran 2D classification on these data, but when I use autorefine, it gets through iteration 1 and 2 with no problem and the reconstruction gets to 4.8A. Then in the third iteration, it seems like most of the refine3d jobs go to completion but one just pauses. Here is a snippet of the STDOUT

Number of global search views = 792 (best_parameters to keep = 20)

Average sigma noise = 13.497212, average LogP = 32.733292
Average ShiftX = 2.902153, average ShiftY = 1.301473
Sigma ShiftX = 16.399071, sigma ShiftY = 15.808596
Number of particles to refine = 14519

   100% [=================] done! (0h:49m31s)                      
     0% [                              ] 0h:40m17s                 
Number of global search views = 792 (best_parameters to keep = 20)

Average sigma noise = 13.497212, average LogP = 32.733292
Average ShiftX = 2.902153, average ShiftY = 1.301473
Sigma ShiftX = 16.399071, sigma ShiftY = 15.808596
Number of particles to refine = 14519

    33% [==========                    ] 8h:14m51s                   

Here is a list of the processes that are running:

$ ps -l -u sstagg
F S   UID    PID   PPID  C PRI  NI ADDR SZ WCHAN  TTY          TIME CMD
0 S   700  10111      1  0  80   0 - 38068 ep_pol pts/3    00:00:06 refine3d
0 S   700  19549 127345  0  80   0 - 21631 poll_s pts/3    00:00:00 ssh
0 S   700  22762 126765  0  80   0 - 240941 poll_s pts/4   00:03:13 cisTEM
0 S   700  31651  22762  0  80   0 - 49743 ep_pol pts/4    00:00:01 cisTEM_job_cont
0 S   700  31654      1  0  80   0 - 38068 ep_pol pts/4    00:00:07 refine3d
0 R   700  37134 126765  0  80   0 - 40340 -      pts/4    00:00:00 ps
5 S   700 126353 126349  0  80   0 - 45847 poll_s ?        00:07:02 sshd
0 S   700 126354 126353  0  80   0 - 34789 sigsus pts/4    00:00:00 tcsh
4 S   700 126765 126354  0  80   0 - 33167 sigsus pts/4    00:00:00 tcsh
5 S   700 127236 127234  0  80   0 - 45343 poll_s ?        00:11:37 sshd
0 S   700 127237 127236  0  80   0 - 34754 sigsus pts/3    00:00:00 tcsh
4 S   700 127345 127237  0  80   0 - 33166 sigsus pts/3    00:00:03 tcsh
0 S   700 136405      1  0  80   0 - 38068 ep_pol pts/3    00:00:07 refine3d

I have tried switching to between a Lustre partion and a GPFS partition, and it makes no difference. I tried renormalizing the particles in Relion to see if a particular particle had something weird with its values, but that made no difference. I'm afraid I'm stuck. Do y'all have any insights?

 

Thanks,

Scott

timgrant
Hi Scott,

Hi Scott,

Can you try turning off automasking, and see if that fixes the issue?

Cheers,

Tim

sstagg
OK. I did that and it got

OK. I did that and it got stuck in the same spot.

timgrant
Hi Scott,

Hi Scott,

Hmm, and it is always more or less in the same place, with 1 left running?  How much memory does the machine have?  Are you running 49 processes?

Tim

sstagg
Yes always more or less in

Yes always more or less in the same place with 1 (I think) running. The cisTEM setup says 25 copies, but the machine has 24 cores. The machine has 256G of memory

timgrant
Hmm, I cannot think what is

Hmm, I cannot think what is causing this.   Is sharing the refinement package with me an option - so that I can try and get to the bottom of it?

Thanks,

Tim

sstagg
I would be OK with that.

I would be OK with that.

sstagg
How should I get it to you?

How should I get it to you?

Log in or register to post comments