Hi all,
Sorry to trouble you with this. I have a refinement job that hangs about 1/2 through the third iteration. I'm using cisTEM on a 24 core machine that is mounting a GPFS file server. I have ~400,000 particles with a box size of 384. I successfully ran 2D classification on these data, but when I use autorefine, it gets through iteration 1 and 2 with no problem and the reconstruction gets to 4.8A. Then in the third iteration, it seems like most of the refine3d jobs go to completion but one just pauses. Here is a snippet of the STDOUT
Number of global search views = 792 (best_parameters to keep = 20)
Average sigma noise = 13.497212, average LogP = 32.733292
Average ShiftX = 2.902153, average ShiftY = 1.301473
Sigma ShiftX = 16.399071, sigma ShiftY = 15.808596
Number of particles to refine = 14519
100% [=================] done! (0h:49m31s)
0% [ ] 0h:40m17s
Number of global search views = 792 (best_parameters to keep = 20)
Average sigma noise = 13.497212, average LogP = 32.733292
Average ShiftX = 2.902153, average ShiftY = 1.301473
Sigma ShiftX = 16.399071, sigma ShiftY = 15.808596
Number of particles to refine = 14519
33% [========== ] 8h:14m51s
Here is a list of the processes that are running:
$ ps -l -u sstagg
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
0 S 700 10111 1 0 80 0 - 38068 ep_pol pts/3 00:00:06 refine3d
0 S 700 19549 127345 0 80 0 - 21631 poll_s pts/3 00:00:00 ssh
0 S 700 22762 126765 0 80 0 - 240941 poll_s pts/4 00:03:13 cisTEM
0 S 700 31651 22762 0 80 0 - 49743 ep_pol pts/4 00:00:01 cisTEM_job_cont
0 S 700 31654 1 0 80 0 - 38068 ep_pol pts/4 00:00:07 refine3d
0 R 700 37134 126765 0 80 0 - 40340 - pts/4 00:00:00 ps
5 S 700 126353 126349 0 80 0 - 45847 poll_s ? 00:07:02 sshd
0 S 700 126354 126353 0 80 0 - 34789 sigsus pts/4 00:00:00 tcsh
4 S 700 126765 126354 0 80 0 - 33167 sigsus pts/4 00:00:00 tcsh
5 S 700 127236 127234 0 80 0 - 45343 poll_s ? 00:11:37 sshd
0 S 700 127237 127236 0 80 0 - 34754 sigsus pts/3 00:00:00 tcsh
4 S 700 127345 127237 0 80 0 - 33166 sigsus pts/3 00:00:03 tcsh
0 S 700 136405 1 0 80 0 - 38068 ep_pol pts/3 00:00:07 refine3d
I have tried switching to between a Lustre partion and a GPFS partition, and it makes no difference. I tried renormalizing the particles in Relion to see if a particular particle had something weird with its values, but that made no difference. I'm afraid I'm stuck. Do y'all have any insights?
Thanks,
Scott
Hi Scott,
Can you try turning off automasking, and see if that fixes the issue?
Cheers,
Tim
OK. I did that and it got stuck in the same spot.
Hi Scott,
Hmm, and it is always more or less in the same place, with 1 left running? How much memory does the machine have? Are you running 49 processes?
Tim
Yes always more or less in the same place with 1 (I think) running. The cisTEM setup says 25 copies, but the machine has 24 cores. The machine has 256G of memory
Hmm, I cannot think what is causing this. Is sharing the refinement package with me an option - so that I can try and get to the bottom of it?
Thanks,
Tim
I would be OK with that.
How should I get it to you?