Page 1 of 1
Job does not terminate/continue after convergence in a image
Posted: Sat May 12, 2012 4:15 pm
by ckande
I wonder if someone has experienced such a problem.
My job does not terminate/continue after convergence in one image has been reached. The following are the brief specifications of what I am using.
VTST: Latest available on the website
VASP: 5.2.12
OpenMPI: 1.5.4
Queuing: PBS Torque
Compiler: Ifort, 12.1.1.256 Build 20111011
When I submit a job, let's say with 3 images, on to the queueing system and the 01/stdout (../vasp.out) shows that it has reached convergence the whole job just seems to freeze. Everything on 02, 03 stops just after the electronic convergence and just before writing the F= ...E0 =... line in the stdout. The job does not get killed, does not go ahead to full convergence on all images. I have to manually kill the job, move the CONTCARs to POSCARs and just hope that all images reach convergence at the same time. As you can see that is a big pain. I wonder if someone has experienced this before and if there something that I am doing wrong.
Thanks in advance.
Re: Job does not terminate/continue after convergence in a i
Posted: Sat May 12, 2012 8:29 pm
by graeme
All images must reach your specified electronic convergence criterion (ediff) in order for them to take an ionic step. If some images take longer, all the others must wait. If this is the issue that you are seeing, it is normal. If I misunderstood, let me know.
Re: Job does not terminate/continue after convergence in a i
Posted: Sat May 12, 2012 9:00 pm
by ckande
Dear Graeme,
Thanks for the quick replies.
You did misunderstand. After one image has reached convergence, nothing is happening with the other images. The calculations of the other images are NOT going forward to the required convergence which as you say is normal behavior.
I have a calculation running right now. I will try to post a detailed query later.
Regards.
Re: Job does not terminate/continue after convergence in a i
Posted: Sat May 12, 2012 9:43 pm
by graeme
Ok. Does the calculation stop when the climbing image reaches convergence? Otherwise, I am at a loss to explain this.
Re: Job does not terminate/continue after convergence in a i
Posted: Sun May 13, 2012 2:24 pm
by ckande
Please find attached the output from a hung job.
The job hung once the image 05 reached convergence. There was no output anymore from any of the other images (01, 02, 03, 04, 06, 07) and the job also did not terminate. I had to kill the job myself.
I also attached the time stamps of all the files. As you can see, once 05/stdout had reached convergence, {01,02,03,04,06,07} images had the electronic convergence complete and then they hung just before writing the line F= ...
Please let me know if you have any clues.
Thanks in advance.
[attachment=0]attach.txt[/attachment]
Re: Job does not terminate/continue after convergence in a i
Posted: Wed May 16, 2012 8:57 am
by ckande
Just a small update. I observe similar behavior on another cluster as well with a completely different setup. On this cluster it's a pgf90 compiler with VASP 5.2.12 and openmpi 1.4.4.
Maybe something is wrong with my parameters in the INCAR file. Please find the INCAR below. Please let me know if I am doing something wrong.
I will try to use VASP 4 and see if that will work. Will update later on that.
ENCUT = 400.000000
ENAUG = 645.000000
SIGMA = 0.050000
POTIM = 0.100000
EDIFF = 1.00e-05
PREC = Accurate
ISMEAR = 1
NSW = 120
IBRION = 2
LREAL = Auto
IMAGES = 4
SPRING = -5
LCLIMB = .TRUE.
LWAVE = .FALSE.
LCHARG = .FALSE.
NPAR = 2
Re: Job does not terminate/continue after convergence in a i
Posted: Thu May 24, 2012 12:57 am
by zhangqingfan
Hi,
I meet the same problem when I run NEB on the cluster. It looks like that the problem just occured when used the version vasp.5.2. Some one can solve this problem?
thx
Re: Job does not terminate/continue after convergence in a i
Posted: Fri May 25, 2012 1:03 am
by graeme
I don't know for sure, but I see that you are running the built-in CG optimizer (IBRION=2) which does not work with the NEB. Try running again with IBRION=3 or 1 (or any of our IOPT optimizers). I don't exactly know the logic of the built-in CG in vasp and how it interacts with the NEB, but an image may decide to quit on it's own without consulting with the other images.
Alternatively, (and if the first suggestion doesn't work) you can send me a .tar.gz file of the calculation so I can try to reproduce the error here, or look more closely at the output.
Re: Job does not terminate/continue after convergence in a i
Posted: Fri Jun 08, 2012 8:31 am
by ckande
Changing IBRION to 1 or 3 solves the problem. Thanks a lot Graeme. :)