NEB Job Stuck at 84th Iteration Without Error - Assistance Needed
Moderator: moderators
-
- Posts: 20
- Joined: Thu Mar 14, 2024 3:26 am
NEB Job Stuck at 84th Iteration Without Error - Assistance Needed
Hello Dr. Henkelman,
I’m running NEB calculations using VASP for interstitial diffusion, but my job seems to be stuck at the 84th iteration. There are no error messages or issues in the output files, and the job is still running, but it’s not progressing. I am using 2 nodes with a total of 60 cores. Another job using the same resources with the climbing method ran perfectly fine, so I’m unsure why this one is stuck.
Additionally, another job for vacancy diffusion got stuck at the 869th iteration, with the climbing method turned on. The strange thing is, I previously studied vacancy diffusion with the same INCAR parameters and the same number of nodes and cores, and that job ran perfectly fine. However, something seems to be going wrong with this particular job.
Here are the relevant details from my INCAR file:
Global Parameters
ISTART = 0
ISPIN = 2 # Spin-polarized DFT
ICHARG = 2 # Self-consistent, GGA/LDA band structure
LREAL = A # Projection operators: automatic
ENCUT = 300 # Cut-off energy (eV)
PREC = Normal # Precision level
Electronic Relaxation
ISMEAR = 0 # Gaussian smearing
SIGMA = 0.05 # Smearing value (eV)
NELMIN = 5 # Minimum SCF steps
EDIFF = 1.0E-05 # SCF energy convergence (eV)
EDIFFG = -0.05 # Force convergence
Ionic Relaxation
NSW = 1800 # Max ionic steps
ISIF = 2 # Stress/relaxation
ICHAIN = 0 # NEB flag (VTST)
IMAGES = 5 # Number of intermediate images
SPRINGS = -5 # Spring constant
LCLIMB = .FALSE. # No climbing method
IOPT = 3 # Optimization flag
IBRION = 3 # NEB optimization method
POTIM = 0 # Step size
ALGO = Normal
NCORE = 10
Any insights on why the jobs are stuck or how to address this issue would be greatly appreciated!
Thanks in advance!
I’m running NEB calculations using VASP for interstitial diffusion, but my job seems to be stuck at the 84th iteration. There are no error messages or issues in the output files, and the job is still running, but it’s not progressing. I am using 2 nodes with a total of 60 cores. Another job using the same resources with the climbing method ran perfectly fine, so I’m unsure why this one is stuck.
Additionally, another job for vacancy diffusion got stuck at the 869th iteration, with the climbing method turned on. The strange thing is, I previously studied vacancy diffusion with the same INCAR parameters and the same number of nodes and cores, and that job ran perfectly fine. However, something seems to be going wrong with this particular job.
Here are the relevant details from my INCAR file:
Global Parameters
ISTART = 0
ISPIN = 2 # Spin-polarized DFT
ICHARG = 2 # Self-consistent, GGA/LDA band structure
LREAL = A # Projection operators: automatic
ENCUT = 300 # Cut-off energy (eV)
PREC = Normal # Precision level
Electronic Relaxation
ISMEAR = 0 # Gaussian smearing
SIGMA = 0.05 # Smearing value (eV)
NELMIN = 5 # Minimum SCF steps
EDIFF = 1.0E-05 # SCF energy convergence (eV)
EDIFFG = -0.05 # Force convergence
Ionic Relaxation
NSW = 1800 # Max ionic steps
ISIF = 2 # Stress/relaxation
ICHAIN = 0 # NEB flag (VTST)
IMAGES = 5 # Number of intermediate images
SPRINGS = -5 # Spring constant
LCLIMB = .FALSE. # No climbing method
IOPT = 3 # Optimization flag
IBRION = 3 # NEB optimization method
POTIM = 0 # Step size
ALGO = Normal
NCORE = 10
Any insights on why the jobs are stuck or how to address this issue would be greatly appreciated!
Thanks in advance!
Re: NEB Job Stuck at 84th Iteration Without Error - Assistance Needed
if you post your calculation, I will take a look at it.
-
- Posts: 20
- Joined: Thu Mar 14, 2024 3:26 am
Re: NEB Job Stuck at 84th Iteration Without Error - Assistance Needed
Can I email it to you? I need to maintain the confidentiality of my data.
Thank you!
Thank you!
Re: NEB Job Stuck at 84th Iteration Without Error - Assistance Needed
You can get it to me in any way that works for you. If the files are too large for email, I'll provide an option.
-
- Posts: 20
- Joined: Thu Mar 14, 2024 3:26 am
Re: NEB Job Stuck at 84th Iteration Without Error - Assistance Needed
Thank you so much for the reply. I have already emailed it to you; however, I can also share it through the other option you provided.
-
- Posts: 20
- Joined: Thu Mar 14, 2024 3:26 am
Re: NEB Job Stuck at 84th Iteration Without Error - Assistance Needed
I can only upload one file on the link you provide (90086.tar.gz- Interstitial Diffusion), which is 456 MB in size. The other file is 4.26 GB, and I believe I cannot upload it due to the storage limitation (maximum limit of 500 MB). However, I have already emailed that file (90232.tar.gz-Vacancy Diffusion) to you (from my try0933@utulsa.edu account), and I shared it via a OneDrive link.
Re: NEB Job Stuck at 84th Iteration Without Error - Assistance Needed
just for others: you can remove all CHG* / WAV* files before uploading; these files are the largest and do not help for debugging problems with NEB / dimer issues.
-
- Posts: 20
- Joined: Thu Mar 14, 2024 3:26 am
Re: NEB Job Stuck at 84th Iteration Without Error - Assistance Needed
Just did it. I have uploaded the other one as well. Thank you so much for your help :)
Re: NEB Job Stuck at 84th Iteration Without Error - Assistance Needed
I don't see any problems with your calculations, but I also don't see the stdout (what it prints to the screen or a file that you specify in our submit script). My guess is that the calculation is running out of the allocated time and is being killed by the scheduler.
You can always continue these calculations. We have the 'vfin.pl subdir' which will move the important files to subdir and leave it ready to restart.
You can always continue these calculations. We have the 'vfin.pl subdir' which will move the important files to subdir and leave it ready to restart.
-
- Posts: 20
- Joined: Thu Mar 14, 2024 3:26 am
Re: NEB Job Stuck at 84th Iteration Without Error - Assistance Needed
Can I upload those stdout files for your review?
-
- Posts: 20
- Joined: Thu Mar 14, 2024 3:26 am
Re: NEB Job Stuck at 84th Iteration Without Error - Assistance Needed
The issue is that the job is still running, but it is not updating the stdout file. I have just uploaded the stderr and stdout files to the link you provided.
Re: NEB Job Stuck at 84th Iteration Without Error - Assistance Needed
Ok, I see what you mean and it is puzzling. It's like vasp has quit internally, but not terminated. It is at the end of a set of electronic iterations, so it's not stalling at a random place. My advice is to continue the calculation when you see this. I will also think about this some more - I just don't have a good idea about what is going on at the moment.
-
- Posts: 20
- Joined: Thu Mar 14, 2024 3:26 am
Re: NEB Job Stuck at 84th Iteration Without Error - Assistance Needed
Indeed, it’s puzzling. I’m just wondering, since it’s been three days and the job is still running without any new updates, should I just let it continue or do something else? Thank you for the help :)
Re: NEB Job Stuck at 84th Iteration Without Error - Assistance Needed
I would kill it and restart (continue). My guess is that the mpi communicator is not passing along the kill message to the process. I don't have a solution - if you want to dig into this, I would look for things like 'stop processes on exit' in the queue settings. I do think this is unrelated to our vtst code, and hopefully the jobs are finishing properly, just not exiting.
-
- Posts: 20
- Joined: Thu Mar 14, 2024 3:26 am
Re: NEB Job Stuck at 84th Iteration Without Error - Assistance Needed
Dr. Henkelman,
I restarted the calculation by copying the CONTCAR of all images (from the 869th iteration) as the new POSCAR files for each image and the job successfully finished in about an hour. However, this is what I got in the neb.dat file:
0 0.000000 0.000000 0.000000 0
1 2.256204 0.075494 -0.173653 1
2 4.512633 0.346478 0.039494 2
3 6.770738 0.242105 -0.061570 3
4 9.028678 0.481803 -0.004409 4
5 13.806147 0.217415 -0.239191 5
6 18.588887 0.487806 0.000000 6
I can see that the forces are converged, and the first transition state is observed at image 4. However, I noticed that the final structure (image 6) has a higher energy than the transition state, which seems unusual, right?. Is it possible for the final structure to have a higher energy than the transition state in NEB calculations, or could this indicate some other issue?
Thank you for your guidance! I really appreciate all your help :)
I restarted the calculation by copying the CONTCAR of all images (from the 869th iteration) as the new POSCAR files for each image and the job successfully finished in about an hour. However, this is what I got in the neb.dat file:
0 0.000000 0.000000 0.000000 0
1 2.256204 0.075494 -0.173653 1
2 4.512633 0.346478 0.039494 2
3 6.770738 0.242105 -0.061570 3
4 9.028678 0.481803 -0.004409 4
5 13.806147 0.217415 -0.239191 5
6 18.588887 0.487806 0.000000 6
I can see that the forces are converged, and the first transition state is observed at image 4. However, I noticed that the final structure (image 6) has a higher energy than the transition state, which seems unusual, right?. Is it possible for the final structure to have a higher energy than the transition state in NEB calculations, or could this indicate some other issue?
Thank you for your guidance! I really appreciate all your help :)