Optimization of Vasp

sangamesh · Post by **sangamesh** » Sat Nov 03, 2007 6:26 am

Hi All Vasp Users

I've the results of VASP on an Intel high end Machine with the Intel Compiler and GOTO BLAS Library.

I'm doing the benchmarking of VASP on an AMD64 Dual core dual processor Machine.
I've to position AMD machine + pathscale compiler V/s Intel machine + Intel compiler.

The AMD cluster has such three nodes. Right now I'm testing only for serial.

Version of Vasp: 4.6.21
Compiler Used: Pathscale 3.0
BLAS Library: ATLAS

No of Iterations in the Input file (INCAR, NSW=30): 30

In intel machine it has took 1 hour to finish the job. But in AMD it is 3 hours.

The VASP-related (like -DNGX...) options used in both of the two Makefile are same.

The outputs on both the machines are not identical.
In AMD machine, 25 iterations are performed.
In Intel only 15 iterations.

Also the calculation results are also different.

I don't know the science behind VASP. I'm not understanding the results.

My doubts:
If we run the same input file twice, whether the outputs are 'exactly' identical?
Is it the compiler (pathscale) problem or the BLAS library-ATLAS problem?
The number of iterations in the initial stage of execution are not same wrt Intel machine. Why is this?

Have anybody run VASP on AMD with Pathscale compiler? Any Optimization performed?

If any one has the makefile for VASP for AMD + pathscale compiler, please send me.

Thanks for your guidance

regards,
Sangamesh

Post by **graeme** » Tue Nov 06, 2007 5:37 am

You might have more success asking the general vasp forum at
http://cms.mpi.univie.ac.at/vasp-forum/
instead of this one, which is specific to transition state theory calculations.

My guess, however, is that the difference in the calculations is due to high-order optimizations, the blas/lapack or fft libraries, or the number of cores being parallelized over. You could try to isolate the difference by trying to make the two calculations as similar as possible, for example, by using the goto library on the AMD machine, or running a serial calculation on both machines, or using the intel compiler on the AMD machine.

Good luck!