Unable to run jobs - processes exit with status 255 and no visible errors

7 posts / 0 new
Last post
bmcgough
Unable to run jobs - processes exit with status 255 and no visible errors

I am using the pre-compiled binary bundle.

I am able to run the GUI, and some actions like displaying images after import.

When I try to run a movie align action, the job control starts up and I see this in my terminal:

Running...

 JOB CONTROL: Trying to connect to XXX.XXX.XXX.XXX:3000 (timeout = 4 sec) ...
 JOB CONTROL: Succeeded - Connection established!

We read 9196 bytes
Connection Timer Fired

It appears to then start 3 unblur processes (I set my process number to 4, so maybe the job controller is included in that count?).

No processes are ever shown connected in the UI (0/4), and eventualy everything stops. If I try running the job controller by hand, and the unblur processes by hand, they output nothing and their exit code is 255.

I am running this on an Ubuntu 14.04 server and displaying back to XQuartz on my laptop. Eventually I would like to be able to use slurm to run the jobs, but I'm just trying locally with default configuration (except number of processes).

Help?

Note about installing:

I initially tried with the source code bundle, as we use EasyBuild here. There was an existing EasyBuild easyconfig for cisTEM, and it compiles with no errors. It appears to be missing at least 'display' and perhaps other important pieces. I also notice a lot of forum posts that are solved by using the pre-compiled binary bundle. Perhaps the source code bundle should be noted as not intended for use?

The pre-compiled binary bundle appears to have a dependency on wxWidgets. I didn't see this stated anywhere. Usually when I use a pre-compiled binary, it has been statically linked so it has no system dependencies.

Thank you!

timgrant
Hi, 

Hi, 

wxWidgets is statically linked in the binaries, you do not need it as a dependancy.  For the GUI program, you will need some X and GDK libraries.

This looks like one of two things are going on. Either the unblur binary cannot be found, or the IP address was detected incorrectly.  If you created the project using this version of cisTEM, and never moved the binaries, then the run profile should be pointing to the correct place.  Can you try explicity specifying the IP address for the GUI and controller (click specify in the top right of the run profiles panel, don't forget to click save).   If you are running locally, you should be able to set it to 127.0.0.1.

Thanks,

Tim

 

bmcgough
wxWidget load errors (separating two issues)

Thank you for replying. I'm splitting my two issues apart as they are separate and I should not have posted them together.

If I do not have access to the wxWidget libraries, when I try to run a CTF action, for exmaple, I get the following error:

bin/cisTEM_job_control: error while loading shared libraries: libwx_baseu-3.1.so.0: cannot open shared object file: No such file or directory

This is with the precompiled binary package.

If I load a wxWidget library environment module along with the same cisTEM binaries, I do not get this error.

timgrant
Hi,

Hi,

Your run profile must be pointing to a version of cisTEM that you compiled, rather than the version from the web page.  When you first create a project, it creates a default run profile that points to the cisTEM you are currently running.  If you started the project with a version you compiled, then the profile will point there, so that the jobs will run that version when launched, even if you used the staticly linked cisTEM GUI binary.  If you edit your run profile to point to the static cisTEM, this should go away.

Thanks,

Tim

bmcgough
Job control processes not running

Thank you again for replying. I hope you can help me figure out what is going on.

I tried to specify the IP address 127.0.0.1 as this is all running on one host (for now) as GUI and Controller address.

The cisTEM_job_control process starts successfullly and appears to be listening on 127.0.0.1:3000. I get this message in the cisTEM action window:

Approx. memory for each process is  0.28 GB.
If running on a single machine, that machine will need at least  1.11 GB of memory.
If you do not have enough memory available you will have to use a run profile with fewer processes.
Launching Job...
(/app/easybuild/software/cisTEM/1.0.0-beta-foss-2016b/bin/cisTEM_job_control 127.0.0.1 3000 5752880042126038)
Job Control : Executing '/app/easybuild/software/cisTEM/1.0.0-beta-foss-2016b/bin/unblur 127.0.0.1 3001 5752880042126038&' 4 times.

 

Initially I see a cisTEM_job_control process listening on port 3001 with connections from the other unblur processes, then an unblur process is listening on port 3001, then everything shuts down except the main cisTEM process.

In the terminal I used to launch cisTEM, I see the same as my first message. Also, 0/4 processes connect. Also no other error messages I can find.

timgrant
Hi,

Hi,

Can you double check that this happens with the static binary from the web page?  You may need to change your run profile to point there as i mention in my reply above. If it does, are there any errors that appear in the terminal?

Cheers,

Tim

bmcgough
That was it!

Thank you! I should have deleted my project and started from complete scratch. I'll be able to advise our users to do that as well. I did have an old version I compiled and it was saved in both the job control command and job run command in the run profile. I was able to change it as you suggested and can now run jobs successfully.

 

Thanks again!

Log in or register to post comments