The reason this step is slow is that cisTEM checks every single file to find out its dimensions and check a few basic things.
Are you importing TIFF files? In the case of TIFF files, this takes a long time, because it has to check every slice to check that they are all the same dimenions (the TIFF format allows for arbitrary properties for every image in the file), which means a lot of disk reads. The CPU usage is very low because it's just waiting for the disk.
With MRC files, there is less checking required, since it is sufficient to just read the header of the file to find out what we need. If you find that import is slow with MRC movies, you're probably just seeing your disk's limitations.
Perhaps in a future version of cisTEM we can just assume that all images in a TIFF file are the same, but even then I believe that we would need to traverse the file to find out how many images are present - this is something we developers should probably revisit.
In the meantime, the only thing you can do is work on a faster disk and/or make sure you only import things once.
Hope this helps, if only to understand the problem.
If your disk is optimized for large reads, it is a killer for this. Our network disk has similar speeds to this, however if reading the same movies from local SSD it takes maybe 50 seconds.
Hi Axel,
The reason this step is slow is that cisTEM checks every single file to find out its dimensions and check a few basic things.
Are you importing TIFF files? In the case of TIFF files, this takes a long time, because it has to check every slice to check that they are all the same dimenions (the TIFF format allows for arbitrary properties for every image in the file), which means a lot of disk reads. The CPU usage is very low because it's just waiting for the disk.
With MRC files, there is less checking required, since it is sufficient to just read the header of the file to find out what we need. If you find that import is slow with MRC movies, you're probably just seeing your disk's limitations.
Perhaps in a future version of cisTEM we can just assume that all images in a TIFF file are the same, but even then I believe that we would need to traverse the file to find out how many images are present - this is something we developers should probably revisit.
In the meantime, the only thing you can do is work on a faster disk and/or make sure you only import things once.
Hope this helps, if only to understand the problem.
Cheers
Alexis
If your disk is optimized for large reads, it is a killer for this. Our network disk has similar speeds to this, however if reading the same movies from local SSD it takes maybe 50 seconds.
Tim
Yep, they're tiff files.
Thanks for letting me know.
Axel